NASA Technical Reports Server (NTRS)
Gilbertsen, Noreen D.; Belytschko, Ted
1990-01-01
The implementation of a nonlinear explicit program on a vectorized, concurrent computer with shared memory is described and studied. The conflict between vectorization and concurrency is described and some guidelines are given for optimal block sizes. Several example problems are summarized to illustrate the types of speed-ups which can be achieved by reprogramming as compared to compiler optimization.
Hypercluster - Parallel processing for computational mechanics
NASA Technical Reports Server (NTRS)
Blech, Richard A.
1988-01-01
An account is given of the development status, performance capabilities and implications for further development of NASA-Lewis' testbed 'hypercluster' parallel computer network, in which multiple processors communicate through a shared memory. Processors have local as well as shared memory; the hypercluster is expanded in the same manner as the hypercube, with processor clusters replacing the normal single processor node. The NASA-Lewis machine has three nodes with a vector personality and one node with a scalar personality. Each of the vector nodes uses four board-level vector processors, while the scalar node uses four general-purpose microcomputer boards.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Clark, M. A.; Strelchenko, Alexei; Vaquero, Alejandro
Lattice quantum chromodynamics simulations in nuclear physics have benefited from a tremendous number of algorithmic advances such as multigrid and eigenvector deflation. These improve the time to solution but do not alleviate the intrinsic memory-bandwidth constraints of the matrix-vector operation dominating iterative solvers. Batching this operation for multiple vectors and exploiting cache and register blocking can yield a super-linear speed up. Block-Krylov solvers can naturally take advantage of such batched matrix-vector operations, further reducing the iterations to solution by sharing the Krylov space between solves. However, practical implementations typically suffer from the quadratic scaling in the number of vector-vector operations.more » Using the QUDA library, we present an implementation of a block-CG solver on NVIDIA GPUs which reduces the memory-bandwidth complexity of vector-vector operations from quadratic to linear. We present results for the HISQ discretization, showing a 5x speedup compared to highly-optimized independent Krylov solves on NVIDIA's SaturnV cluster.« less
Optimized Infrastructure for the Earth System Prediction Capability
2013-09-30
for referencing memory between its native coupling datatype (MCT Attribute Vectors) and ESMF Arrays. This will reduce the copies required and will...introduced ability within CESM to share memory between ESMF and MCT datatypes makes using both tools together much easier. Using both is appealing
A communication-avoiding, hybrid-parallel, rank-revealing orthogonalization method.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hoemmen, Mark
2010-11-01
Orthogonalization consumes much of the run time of many iterative methods for solving sparse linear systems and eigenvalue problems. Commonly used algorithms, such as variants of Gram-Schmidt or Householder QR, have performance dominated by communication. Here, 'communication' includes both data movement between the CPU and memory, and messages between processors in parallel. Our Tall Skinny QR (TSQR) family of algorithms requires asymptotically fewer messages between processors and data movement between CPU and memory than typical orthogonalization methods, yet achieves the same accuracy as Householder QR factorization. Furthermore, in block orthogonalizations, TSQR is faster and more accurate than existing approaches formore » orthogonalizing the vectors within each block ('normalization'). TSQR's rank-revealing capability also makes it useful for detecting deflation in block iterative methods, for which existing approaches sacrifice performance, accuracy, or both. We have implemented a version of TSQR that exploits both distributed-memory and shared-memory parallelism, and supports real and complex arithmetic. Our implementation is optimized for the case of orthogonalizing a small number (5-20) of very long vectors. The shared-memory parallel component uses Intel's Threading Building Blocks, though its modular design supports other shared-memory programming models as well, including computation on the GPU. Our implementation achieves speedups of 2 times or more over competing orthogonalizations. It is available now in the development branch of the Trilinos software package, and will be included in the 10.8 release.« less
High-performance computing — an overview
NASA Astrophysics Data System (ADS)
Marksteiner, Peter
1996-08-01
An overview of high-performance computing (HPC) is given. Different types of computer architectures used in HPC are discussed: vector supercomputers, high-performance RISC processors, various parallel computers like symmetric multiprocessors, workstation clusters, massively parallel processors. Software tools and programming techniques used in HPC are reviewed: vectorizing compilers, optimization and vector tuning, optimization for RISC processors; parallel programming techniques like shared-memory parallelism, message passing and data parallelism; and numerical libraries.
A Tensor Product Formulation of Strassen's Matrix Multiplication Algorithm with Memory Reduction
Kumar, B.; Huang, C. -H.; Sadayappan, P.; ...
1995-01-01
In this article, we present a program generation strategy of Strassen's matrix multiplication algorithm using a programming methodology based on tensor product formulas. In this methodology, block recursive programs such as the fast Fourier Transforms and Strassen's matrix multiplication algorithm are expressed as algebraic formulas involving tensor products and other matrix operations. Such formulas can be systematically translated to high-performance parallel/vector codes for various architectures. In this article, we present a nonrecursive implementation of Strassen's algorithm for shared memory vector processors such as the Cray Y-MP. A previous implementation of Strassen's algorithm synthesized from tensor product formulas required working storagemore » of size O(7 n ) for multiplying 2 n × 2 n matrices. We present a modified formulation in which the working storage requirement is reduced to O(4 n ). The modified formulation exhibits sufficient parallelism for efficient implementation on a shared memory multiprocessor. Performance results on a Cray Y-MP8/64 are presented.« less
A portable approach for PIC on emerging architectures
NASA Astrophysics Data System (ADS)
Decyk, Viktor
2016-03-01
A portable approach for designing Particle-in-Cell (PIC) algorithms on emerging exascale computers, is based on the recognition that 3 distinct programming paradigms are needed. They are: low level vector (SIMD) processing, middle level shared memory parallel programing, and high level distributed memory programming. In addition, there is a memory hierarchy associated with each level. Such algorithms can be initially developed using vectorizing compilers, OpenMP, and MPI. This is the approach recommended by Intel for the Phi processor. These algorithms can then be translated and possibly specialized to other programming models and languages, as needed. For example, the vector processing and shared memory programming might be done with CUDA instead of vectorizing compilers and OpenMP, but generally the algorithm itself is not greatly changed. The UCLA PICKSC web site at http://www.idre.ucla.edu/ contains example open source skeleton codes (mini-apps) illustrating each of these three programming models, individually and in combination. Fortran2003 now supports abstract data types, and design patterns can be used to support a variety of implementations within the same code base. Fortran2003 also supports interoperability with C so that implementations in C languages are also easy to use. Finally, main codes can be translated into dynamic environments such as Python, while still taking advantage of high performing compiled languages. Parallel languages are still evolving with interesting developments in co-Array Fortran, UPC, and OpenACC, among others, and these can also be supported within the same software architecture. Work supported by NSF and DOE Grants.
Rapid solution of large-scale systems of equations
NASA Technical Reports Server (NTRS)
Storaasli, Olaf O.
1994-01-01
The analysis and design of complex aerospace structures requires the rapid solution of large systems of linear and nonlinear equations, eigenvalue extraction for buckling, vibration and flutter modes, structural optimization and design sensitivity calculation. Computers with multiple processors and vector capabilities can offer substantial computational advantages over traditional scalar computer for these analyses. These computers fall into two categories: shared memory computers and distributed memory computers. This presentation covers general-purpose, highly efficient algorithms for generation/assembly or element matrices, solution of systems of linear and nonlinear equations, eigenvalue and design sensitivity analysis and optimization. All algorithms are coded in FORTRAN for shared memory computers and many are adapted to distributed memory computers. The capability and numerical performance of these algorithms will be addressed.
Parallel-vector out-of-core equation solver for computational mechanics
NASA Technical Reports Server (NTRS)
Qin, J.; Agarwal, T. K.; Storaasli, O. O.; Nguyen, D. T.; Baddourah, M. A.
1993-01-01
A parallel/vector out-of-core equation solver is developed for shared-memory computers, such as the Cray Y-MP machine. The input/ output (I/O) time is reduced by using the a synchronous BUFFER IN and BUFFER OUT, which can be executed simultaneously with the CPU instructions. The parallel and vector capability provided by the supercomputers is also exploited to enhance the performance. Numerical applications in large-scale structural analysis are given to demonstrate the efficiency of the present out-of-core solver.
On nonlinear finite element analysis in single-, multi- and parallel-processors
NASA Technical Reports Server (NTRS)
Utku, S.; Melosh, R.; Islam, M.; Salama, M.
1982-01-01
Numerical solution of nonlinear equilibrium problems of structures by means of Newton-Raphson type iterations is reviewed. Each step of the iteration is shown to correspond to the solution of a linear problem, therefore the feasibility of the finite element method for nonlinear analysis is established. Organization and flow of data for various types of digital computers, such as single-processor/single-level memory, single-processor/two-level-memory, vector-processor/two-level-memory, and parallel-processors, with and without sub-structuring (i.e. partitioning) are given. The effect of the relative costs of computation, memory and data transfer on substructuring is shown. The idea of assigning comparable size substructures to parallel processors is exploited. Under Cholesky type factorization schemes, the efficiency of parallel processing is shown to decrease due to the occasional shared data, just as that due to the shared facilities.
Multi-GPU and multi-CPU accelerated FDTD scheme for vibroacoustic applications
NASA Astrophysics Data System (ADS)
Francés, J.; Otero, B.; Bleda, S.; Gallego, S.; Neipp, C.; Márquez, A.; Beléndez, A.
2015-06-01
The Finite-Difference Time-Domain (FDTD) method is applied to the analysis of vibroacoustic problems and to study the propagation of longitudinal and transversal waves in a stratified media. The potential of the scheme and the relevance of each acceleration strategy for massively computations in FDTD are demonstrated in this work. In this paper, we propose two new specific implementations of the bi-dimensional scheme of the FDTD method using multi-CPU and multi-GPU, respectively. In the first implementation, an open source message passing interface (OMPI) has been included in order to massively exploit the resources of a biprocessor station with two Intel Xeon processors. Moreover, regarding CPU code version, the streaming SIMD extensions (SSE) and also the advanced vectorial extensions (AVX) have been included with shared memory approaches that take advantage of the multi-core platforms. On the other hand, the second implementation called the multi-GPU code version is based on Peer-to-Peer communications available in CUDA on two GPUs (NVIDIA GTX 670). Subsequently, this paper presents an accurate analysis of the influence of the different code versions including shared memory approaches, vector instructions and multi-processors (both CPU and GPU) and compares them in order to delimit the degree of improvement of using distributed solutions based on multi-CPU and multi-GPU. The performance of both approaches was analysed and it has been demonstrated that the addition of shared memory schemes to CPU computing improves substantially the performance of vector instructions enlarging the simulation sizes that use efficiently the cache memory of CPUs. In this case GPU computing is slightly twice times faster than the fine tuned CPU version in both cases one and two nodes. However, for massively computations explicit vector instructions do not worth it since the memory bandwidth is the limiting factor and the performance tends to be the same than the sequential version with auto-vectorisation and also shared memory approach. In this scenario GPU computing is the best option since it provides a homogeneous behaviour. More specifically, the speedup of GPU computing achieves an upper limit of 12 for both one and two GPUs, whereas the performance reaches peak values of 80 GFlops and 146 GFlops for the performance for one GPU and two GPUs respectively. Finally, the method is applied to an earth crust profile in order to demonstrate the potential of our approach and the necessity of applying acceleration strategies in these type of applications.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Feo, J.T.
1993-10-01
This report contain papers on: Programmability and performance issues; The case of an iterative partial differential equation solver; Implementing the kernal of the Australian Region Weather Prediction Model in Sisal; Even and quarter-even prime length symmetric FFTs and their Sisal Implementations; Top-down thread generation for Sisal; Overlapping communications and computations on NUMA architechtures; Compiling technique based on dataflow analysis for funtional programming language Valid; Copy elimination for true multidimensional arrays in Sisal 2.0; Increasing parallelism for an optimization that reduces copying in IF2 graphs; Caching in on Sisal; Cache performance of Sisal Vs. FORTRAN; FFT algorithms on a shared-memory multiprocessor;more » A parallel implementation of nonnumeric search problems in Sisal; Computer vision algorithms in Sisal; Compilation of Sisal for a high-performance data driven vector processor; Sisal on distributed memory machines; A virtual shared addressing system for distributed memory Sisal; Developing a high-performance FFT algorithm in Sisal for a vector supercomputer; Implementation issues for IF2 on a static data-flow architechture; and Systematic control of parallelism in array-based data-flow computation. Selected papers have been indexed separately for inclusion in the Energy Science and Technology Database.« less
Efficient ICCG on a shared memory multiprocessor
NASA Technical Reports Server (NTRS)
Hammond, Steven W.; Schreiber, Robert
1989-01-01
Different approaches are discussed for exploiting parallelism in the ICCG (Incomplete Cholesky Conjugate Gradient) method for solving large sparse symmetric positive definite systems of equations on a shared memory parallel computer. Techniques for efficiently solving triangular systems and computing sparse matrix-vector products are explored. Three methods for scheduling the tasks in solving triangular systems are implemented on the Sequent Balance 21000. Sample problems that are representative of a large class of problems solved using iterative methods are used. We show that a static analysis to determine data dependences in the triangular solve can greatly improve its parallel efficiency. We also show that ignoring symmetry and storing the whole matrix can reduce solution time substantially.
High Performance Programming Using Explicit Shared Memory Model on Cray T3D1
NASA Technical Reports Server (NTRS)
Simon, Horst D.; Saini, Subhash; Grassi, Charles
1994-01-01
The Cray T3D system is the first-phase system in Cray Research, Inc.'s (CRI) three-phase massively parallel processing (MPP) program. This system features a heterogeneous architecture that closely couples DEC's Alpha microprocessors and CRI's parallel-vector technology, i.e., the Cray Y-MP and Cray C90. An overview of the Cray T3D hardware and available programming models is presented. Under Cray Research adaptive Fortran (CRAFT) model four programming methods (data parallel, work sharing, message-passing using PVM, and explicit shared memory model) are available to the users. However, at this time data parallel and work sharing programming models are not available to the user community. The differences between standard PVM and CRI's PVM are highlighted with performance measurements such as latencies and communication bandwidths. We have found that the performance of neither standard PVM nor CRI s PVM exploits the hardware capabilities of the T3D. The reasons for the bad performance of PVM as a native message-passing library are presented. This is illustrated by the performance of NAS Parallel Benchmarks (NPB) programmed in explicit shared memory model on Cray T3D. In general, the performance of standard PVM is about 4 to 5 times less than obtained by using explicit shared memory model. This degradation in performance is also seen on CM-5 where the performance of applications using native message-passing library CMMD on CM-5 is also about 4 to 5 times less than using data parallel methods. The issues involved (such as barriers, synchronization, invalidating data cache, aligning data cache etc.) while programming in explicit shared memory model are discussed. Comparative performance of NPB using explicit shared memory programming model on the Cray T3D and other highly parallel systems such as the TMC CM-5, Intel Paragon, Cray C90, IBM-SP1, etc. is presented.
Parallel-Vector Algorithm For Rapid Structural Anlysis
NASA Technical Reports Server (NTRS)
Agarwal, Tarun R.; Nguyen, Duc T.; Storaasli, Olaf O.
1993-01-01
New algorithm developed to overcome deficiency of skyline storage scheme by use of variable-band storage scheme. Exploits both parallel and vector capabilities of modern high-performance computers. Gives engineers and designers opportunity to include more design variables and constraints during optimization of structures. Enables use of more refined finite-element meshes to obtain improved understanding of complex behaviors of aerospace structures leading to better, safer designs. Not only attractive for current supercomputers but also for next generation of shared-memory supercomputers.
Hypercluster Parallel Processor
NASA Technical Reports Server (NTRS)
Blech, Richard A.; Cole, Gary L.; Milner, Edward J.; Quealy, Angela
1992-01-01
Hypercluster computer system includes multiple digital processors, operation of which coordinated through specialized software. Configurable according to various parallel-computing architectures of shared-memory or distributed-memory class, including scalar computer, vector computer, reduced-instruction-set computer, and complex-instruction-set computer. Designed as flexible, relatively inexpensive system that provides single programming and operating environment within which one can investigate effects of various parallel-computing architectures and combinations on performance in solution of complicated problems like those of three-dimensional flows in turbomachines. Hypercluster software and architectural concepts are in public domain.
Avoiding and tolerating latency in large-scale next-generation shared-memory multiprocessors
NASA Technical Reports Server (NTRS)
Probst, David K.
1993-01-01
A scalable solution to the memory-latency problem is necessary to prevent the large latencies of synchronization and memory operations inherent in large-scale shared-memory multiprocessors from reducing high performance. We distinguish latency avoidance and latency tolerance. Latency is avoided when data is brought to nearby locales for future reference. Latency is tolerated when references are overlapped with other computation. Latency-avoiding locales include: processor registers, data caches used temporally, and nearby memory modules. Tolerating communication latency requires parallelism, allowing the overlap of communication and computation. Latency-tolerating techniques include: vector pipelining, data caches used spatially, prefetching in various forms, and multithreading in various forms. Relaxing the consistency model permits increased use of avoidance and tolerance techniques. Each model is a mapping from the program text to sets of partial orders on program operations; it is a convention about which temporal precedences among program operations are necessary. Information about temporal locality and parallelism constrains the use of avoidance and tolerance techniques. Suitable architectural primitives and compiler technology are required to exploit the increased freedom to reorder and overlap operations in relaxed models.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wasserman, H.J.
1996-02-01
The second generation of the Digital Equipment Corp. (DEC) DECchip Alpha AXP microprocessor is referred to as the 21164. From the viewpoint of numerically-intensive computing, the primary difference between it and its predecessor, the 21064, is that the 21164 has twice the multiply/add throughput per clock period (CP), a maximum of two floating point operations (FLOPS) per CP vs. one for 21064. The AlphaServer 8400 is a shared-memory multiprocessor server system that can accommodate up to 12 CPUs and up to 14 GB of memory. In this report we will compare single processor performance of the 8400 system with thatmore » of the International Business Machines Corp. (IBM) RISC System/6000 POWER-2 microprocessor running at 66 MHz, the Silicon Graphics, Inc. (SGI) MIPS R8000 microprocessor running at 75 MHz, and the Cray Research, Inc. CRAY J90. The performance comparison is based on a set of Fortran benchmark codes that represent a portion of the Los Alamos National Laboratory supercomputer workload. The advantage of using these codes, is that the codes also span a wide range of computational characteristics, such as vectorizability, problem size, and memory access pattern. The primary disadvantage of using them is that detailed, quantitative analysis of performance behavior of all codes on all machines is difficult. One important addition to the benchmark set appears for the first time in this report. Whereas the older version was written for a vector processor, the newer version is more optimized for microprocessor architectures. Therefore, we have for the first time, an opportunity to measure performance on a single application using implementations that expose the respective strengths of vector and superscalar architecture. All results in this report are from single processors. A subsequent article will explore shared-memory multiprocessing performance of the 8400 system.« less
On the impact of communication complexity in the design of parallel numerical algorithms
NASA Technical Reports Server (NTRS)
Gannon, D.; Vanrosendale, J.
1984-01-01
This paper describes two models of the cost of data movement in parallel numerical algorithms. One model is a generalization of an approach due to Hockney, and is suitable for shared memory multiprocessors where each processor has vector capabilities. The other model is applicable to highly parallel nonshared memory MIMD systems. In the second model, algorithm performance is characterized in terms of the communication network design. Techniques used in VLSI complexity theory are also brought in, and algorithm independent upper bounds on system performance are derived for several problems that are important to scientific computation.
On the impact of communication complexity on the design of parallel numerical algorithms
NASA Technical Reports Server (NTRS)
Gannon, D. B.; Van Rosendale, J.
1984-01-01
This paper describes two models of the cost of data movement in parallel numerical alorithms. One model is a generalization of an approach due to Hockney, and is suitable for shared memory multiprocessors where each processor has vector capabilities. The other model is applicable to highly parallel nonshared memory MIMD systems. In this second model, algorithm performance is characterized in terms of the communication network design. Techniques used in VLSI complexity theory are also brought in, and algorithm-independent upper bounds on system performance are derived for several problems that are important to scientific computation.
Semantic graphs and associative memories
NASA Astrophysics Data System (ADS)
Pomi, Andrés; Mizraji, Eduardo
2004-12-01
Graphs have been increasingly utilized in the characterization of complex networks from diverse origins, including different kinds of semantic networks. Human memories are associative and are known to support complex semantic nets; these nets are represented by graphs. However, it is not known how the brain can sustain these semantic graphs. The vision of cognitive brain activities, shown by modern functional imaging techniques, assigns renewed value to classical distributed associative memory models. Here we show that these neural network models, also known as correlation matrix memories, naturally support a graph representation of the stored semantic structure. We demonstrate that the adjacency matrix of this graph of associations is just the memory coded with the standard basis of the concept vector space, and that the spectrum of the graph is a code invariant of the memory. As long as the assumptions of the model remain valid this result provides a practical method to predict and modify the evolution of the cognitive dynamics. Also, it could provide us with a way to comprehend how individual brains that map the external reality, almost surely with different particular vector representations, are nevertheless able to communicate and share a common knowledge of the world. We finish presenting adaptive association graphs, an extension of the model that makes use of the tensor product, which provides a solution to the known problem of branching in semantic nets.
FPGA Implementation of Generalized Hebbian Algorithm for Texture Classification
Lin, Shiow-Jyu; Hwang, Wen-Jyi; Lee, Wei-Hao
2012-01-01
This paper presents a novel hardware architecture for principal component analysis. The architecture is based on the Generalized Hebbian Algorithm (GHA) because of its simplicity and effectiveness. The architecture is separated into three portions: the weight vector updating unit, the principal computation unit and the memory unit. In the weight vector updating unit, the computation of different synaptic weight vectors shares the same circuit for reducing the area costs. To show the effectiveness of the circuit, a texture classification system based on the proposed architecture is physically implemented by Field Programmable Gate Array (FPGA). It is embedded in a System-On-Programmable-Chip (SOPC) platform for performance measurement. Experimental results show that the proposed architecture is an efficient design for attaining both high speed performance and low area costs. PMID:22778640
A compositional reservoir simulator on distributed memory parallel computers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rame, M.; Delshad, M.
1995-12-31
This paper presents the application of distributed memory parallel computes to field scale reservoir simulations using a parallel version of UTCHEM, The University of Texas Chemical Flooding Simulator. The model is a general purpose highly vectorized chemical compositional simulator that can simulate a wide range of displacement processes at both field and laboratory scales. The original simulator was modified to run on both distributed memory parallel machines (Intel iPSC/960 and Delta, Connection Machine 5, Kendall Square 1 and 2, and CRAY T3D) and a cluster of workstations. A domain decomposition approach has been taken towards parallelization of the code. Amore » portion of the discrete reservoir model is assigned to each processor by a set-up routine that attempts a data layout as even as possible from the load-balance standpoint. Each of these subdomains is extended so that data can be shared between adjacent processors for stencil computation. The added routines that make parallel execution possible are written in a modular fashion that makes the porting to new parallel platforms straight forward. Results of the distributed memory computing performance of Parallel simulator are presented for field scale applications such as tracer flood and polymer flood. A comparison of the wall-clock times for same problems on a vector supercomputer is also presented.« less
Auto and hetero-associative memory using a 2-D optical logic gate
NASA Technical Reports Server (NTRS)
Chao, Tien-Hsin (Inventor)
1992-01-01
An optical system for auto-associative and hetero-associative recall utilizing Hamming distance as the similarity measure between a binary input image vector V(sup k) and a binary image vector V(sup m) in a first memory array using an optical Exclusive-OR gate for multiplication of each of a plurality of different binary image vectors in memory by the input image vector. After integrating the light of each product V(sup k) x V(sup m), a shortest Hamming distance detection electronics module determines which product has the lowest light intensity and emits a signal that activates a light emitting diode to illuminate a corresponding image vector in a second memory array for display. That corresponding image vector is identical to the memory image vector V(sup m) in the first memory array for auto-associative recall or related to it, such as by name, for hetero-associative recall.
Optoelectronic Inner-Product Neural Associative Memory
NASA Technical Reports Server (NTRS)
Liu, Hua-Kuang
1993-01-01
Optoelectronic apparatus acts as artificial neural network performing associative recall of binary images. Recall process is iterative one involving optical computation of inner products between binary input vector and one or more reference binary vectors in memory. Inner-product method requires far less memory space than matrix-vector method.
An OpenACC-Based Unified Programming Model for Multi-accelerator Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, Jungwon; Lee, Seyong; Vetter, Jeffrey S
2015-01-01
This paper proposes a novel SPMD programming model of OpenACC. Our model integrates the different granularities of parallelism from vector-level parallelism to node-level parallelism into a single, unified model based on OpenACC. It allows programmers to write programs for multiple accelerators using a uniform programming model whether they are in shared or distributed memory systems. We implement a prototype of our model and evaluate its performance with a GPU-based supercomputer using three benchmark applications.
Accelerating next generation sequencing data analysis with system level optimizations.
Kathiresan, Nagarajan; Temanni, Ramzi; Almabrazi, Hakeem; Syed, Najeeb; Jithesh, Puthen V; Al-Ali, Rashid
2017-08-22
Next generation sequencing (NGS) data analysis is highly compute intensive. In-memory computing, vectorization, bulk data transfer, CPU frequency scaling are some of the hardware features in the modern computing architectures. To get the best execution time and utilize these hardware features, it is necessary to tune the system level parameters before running the application. We studied the GATK-HaplotypeCaller which is part of common NGS workflows, that consume more than 43% of the total execution time. Multiple GATK 3.x versions were benchmarked and the execution time of HaplotypeCaller was optimized by various system level parameters which included: (i) tuning the parallel garbage collection and kernel shared memory to simulate in-memory computing, (ii) architecture-specific tuning in the PairHMM library for vectorization, (iii) including Java 1.8 features through GATK source code compilation and building a runtime environment for parallel sorting and bulk data transfer (iv) the default 'on-demand' mode of CPU frequency is over-clocked by using 'performance-mode' to accelerate the Java multi-threads. As a result, the HaplotypeCaller execution time was reduced by 82.66% in GATK 3.3 and 42.61% in GATK 3.7. Overall, the execution time of NGS pipeline was reduced to 70.60% and 34.14% for GATK 3.3 and GATK 3.7 respectively.
Optical Associative Memory Model With Threshold Modification Using Complementary Vector
NASA Astrophysics Data System (ADS)
Bian, Shaoping; Xu, Kebin; Hong, Jing
1989-02-01
A new criterion to evaluate the similarity between two vectors in associative memory is presented. According to it, an experimental research about optical associative memory model with threshold modification using complementary vector is carried out. This model is capable of eliminating the posibility to recall erroneously. Therefore the accuracy of reading out is improved.
Vector computer memory bank contention
NASA Technical Reports Server (NTRS)
Bailey, D. H.
1985-01-01
A number of vector supercomputers feature very large memories. Unfortunately the large capacity memory chips that are used in these computers are much slower than the fast central processing unit (CPU) circuitry. As a result, memory bank reservation times (in CPU ticks) are much longer than on previous generations of computers. A consequence of these long reservation times is that memory bank contention is sharply increased, resulting in significantly lowered performance rates. The phenomenon of memory bank contention in vector computers is analyzed using both a Markov chain model and a Monte Carlo simulation program. The results of this analysis indicate that future generations of supercomputers must either employ much faster memory chips or else feature very large numbers of independent memory banks.
Vector computer memory bank contention
NASA Technical Reports Server (NTRS)
Bailey, David H.
1987-01-01
A number of vector supercomputers feature very large memories. Unfortunately the large capacity memory chips that are used in these computers are much slower than the fast central processing unit (CPU) circuitry. As a result, memory bank reservation times (in CPU ticks) are much longer than on previous generations of computers. A consequence of these long reservation times is that memory bank contention is sharply increased, resulting in significantly lowered performance rates. The phenomenon of memory bank contention in vector computers is analyzed using both a Markov chain model and a Monte Carlo simulation program. The results of this analysis indicate that future generations of supercomputers must either employ much faster memory chips or else feature very large numbers of independent memory banks.
An alternative design for a sparse distributed memory
NASA Technical Reports Server (NTRS)
Jaeckel, Louis A.
1989-01-01
A new design for a Sparse Distributed Memory, called the selected-coordinate design, is described. As in the original design, there are a large number of memory locations, each of which may be activated by many different addresses (binary vectors) in a very large address space. Each memory location is defined by specifying ten selected coordinates (bit positions in the address vectors) and a set of corresponding assigned values, consisting of one bit for each selected coordinate. A memory location is activated by an address if, for all ten of the locations's selected coordinates, the corresponding bits in the address vector match the respective assigned value bits, regardless of the other bits in the address vector. Some comparative memory capacity and signal-to-noise ratio estimates for the both the new and original designs are given. A few possible hardware embodiments of the new design are described.
MLP: A Parallel Programming Alternative to MPI for New Shared Memory Parallel Systems
NASA Technical Reports Server (NTRS)
Taft, James R.
1999-01-01
Recent developments at the NASA AMES Research Center's NAS Division have demonstrated that the new generation of NUMA based Symmetric Multi-Processing systems (SMPs), such as the Silicon Graphics Origin 2000, can successfully execute legacy vector oriented CFD production codes at sustained rates far exceeding processing rates possible on dedicated 16 CPU Cray C90 systems. This high level of performance is achieved via shared memory based Multi-Level Parallelism (MLP). This programming approach, developed at NAS and outlined below, is distinct from the message passing paradigm of MPI. It offers parallelism at both the fine and coarse grained level, with communication latencies that are approximately 50-100 times lower than typical MPI implementations on the same platform. Such latency reductions offer the promise of performance scaling to very large CPU counts. The method draws on, but is also distinct from, the newly defined OpenMP specification, which uses compiler directives to support a limited subset of multi-level parallel operations. The NAS MLP method is general, and applicable to a large class of NASA CFD codes.
Benchmarking GPU and CPU codes for Heisenberg spin glass over-relaxation
NASA Astrophysics Data System (ADS)
Bernaschi, M.; Parisi, G.; Parisi, L.
2011-06-01
We present a set of possible implementations for Graphics Processing Units (GPU) of the Over-relaxation technique applied to the 3D Heisenberg spin glass model. The results show that a carefully tuned code can achieve more than 100 GFlops/s of sustained performance and update a single spin in about 0.6 nanoseconds. A multi-hit technique that exploits the GPU shared memory further reduces this time. Such results are compared with those obtained by means of a highly-tuned vector-parallel code on latest generation multi-core CPUs.
Vector Quantization Algorithm Based on Associative Memories
NASA Astrophysics Data System (ADS)
Guzmán, Enrique; Pogrebnyak, Oleksiy; Yáñez, Cornelio; Manrique, Pablo
This paper presents a vector quantization algorithm for image compression based on extended associative memories. The proposed algorithm is divided in two stages. First, an associative network is generated applying the learning phase of the extended associative memories between a codebook generated by the LBG algorithm and a training set. This associative network is named EAM-codebook and represents a new codebook which is used in the next stage. The EAM-codebook establishes a relation between training set and the LBG codebook. Second, the vector quantization process is performed by means of the recalling stage of EAM using as associative memory the EAM-codebook. This process generates a set of the class indices to which each input vector belongs. With respect to the LBG algorithm, the main advantages offered by the proposed algorithm is high processing speed and low demand of resources (system memory); results of image compression and quality are presented.
Performing an allreduce operation using shared memory
Archer, Charles J [Rochester, MN; Dozsa, Gabor [Ardsley, NY; Ratterman, Joseph D [Rochester, MN; Smith, Brian E [Rochester, MN
2012-04-17
Methods, apparatus, and products are disclosed for performing an allreduce operation using shared memory that include: receiving, by at least one of a plurality of processing cores on a compute node, an instruction to perform an allreduce operation; establishing, by the core that received the instruction, a job status object for specifying a plurality of shared memory allreduce work units, the plurality of shared memory allreduce work units together performing the allreduce operation on the compute node; determining, by an available core on the compute node, a next shared memory allreduce work unit in the job status object; and performing, by that available core on the compute node, that next shared memory allreduce work unit.
Performing an allreduce operation using shared memory
Archer, Charles J; Dozsa, Gabor; Ratterman, Joseph D; Smith, Brian E
2014-06-10
Methods, apparatus, and products are disclosed for performing an allreduce operation using shared memory that include: receiving, by at least one of a plurality of processing cores on a compute node, an instruction to perform an allreduce operation; establishing, by the core that received the instruction, a job status object for specifying a plurality of shared memory allreduce work units, the plurality of shared memory allreduce work units together performing the allreduce operation on the compute node; determining, by an available core on the compute node, a next shared memory allreduce work unit in the job status object; and performing, by that available core on the compute node, that next shared memory allreduce work unit.
NASA Astrophysics Data System (ADS)
Galiatsatos, P. G.; Tennyson, J.
2012-11-01
The most time consuming step within the framework of the UK R-matrix molecular codes is that of the diagonalization of the inner region Hamiltonian matrix (IRHM). Here we present the method that we follow to speed up this step. We use shared memory machines (SMM), distributed memory machines (DMM), the OpenMP directive based parallel language, the MPI function based parallel language, the sparse matrix diagonalizers ARPACK and PARPACK, a variation for real symmetric matrices of the official coordinate sparse matrix format and finally a parallel sparse matrix-vector product (PSMV). The efficient application of the previous techniques rely on two important facts: the sparsity of the matrix is large enough (more than 98%) and in order to get back converged results we need a small only part of the matrix spectrum.
Vector generator scan converter
Moore, James M.; Leighton, James F.
1990-01-01
High printing speeds for graphics data are achieved with a laser printer by transmitting compressed graphics data from a main processor over an I/O (input/output) channel to a vector generator scan converter which reconstructs a full graphics image for input to the laser printer through a raster data input port. The vector generator scan converter includes a microprocessor with associated microcode memory containing a microcode instruction set, a working memory for storing compressed data, vector generator hardward for drawing a full graphic image from vector parameters calculated by the microprocessor, image buffer memory for storing the reconstructed graphics image and an output scanner for reading the graphics image data and inputting the data to the printer. The vector generator scan converter eliminates the bottleneck created by the I/O channel for transmitting graphics data from the main processor to the laser printer, and increases printer speed up to thirty fold.
Vector generator scan converter
Moore, J.M.; Leighton, J.F.
1988-02-05
High printing speeds for graphics data are achieved with a laser printer by transmitting compressed graphics data from a main processor over an I/O channel to a vector generator scan converter which reconstructs a full graphics image for input to the laser printer through a raster data input port. The vector generator scan converter includes a microprocessor with associated microcode memory containing a microcode instruction set, a working memory for storing compressed data, vector generator hardware for drawing a full graphic image from vector parameters calculated by the microprocessor, image buffer memory for storing the reconstructed graphics image and an output scanner for reading the graphics image data and inputting the data to the printer. The vector generator scan converter eliminates the bottleneck created by the I/O channel for transmitting graphics data from the main processor to the laser printer, and increases printer speed up to thirty fold. 7 figs.
Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures
2017-10-04
Report: Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures The views, opinions and/or findings contained in this...Chapel Hill Title: Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures Report Term: 0-Other Email: dm...algorithms for scientific and geometric computing by exploiting the power and performance efficiency of heterogeneous shared memory architectures . These
Shared Memory Parallelism for 3D Cartesian Discrete Ordinates Solver
NASA Astrophysics Data System (ADS)
Moustafa, Salli; Dutka-Malen, Ivan; Plagne, Laurent; Ponçot, Angélique; Ramet, Pierre
2014-06-01
This paper describes the design and the performance of DOMINO, a 3D Cartesian SN solver that implements two nested levels of parallelism (multicore+SIMD) on shared memory computation nodes. DOMINO is written in C++, a multi-paradigm programming language that enables the use of powerful and generic parallel programming tools such as Intel TBB and Eigen. These two libraries allow us to combine multi-thread parallelism with vector operations in an efficient and yet portable way. As a result, DOMINO can exploit the full power of modern multi-core processors and is able to tackle very large simulations, that usually require large HPC clusters, using a single computing node. For example, DOMINO solves a 3D full core PWR eigenvalue problem involving 26 energy groups, 288 angular directions (S16), 46 × 106 spatial cells and 1 × 1012 DoFs within 11 hours on a single 32-core SMP node. This represents a sustained performance of 235 GFlops and 40:74% of the SMP node peak performance for the DOMINO sweep implementation. The very high Flops/Watt ratio of DOMINO makes it a very interesting building block for a future many-nodes nuclear simulation tool.
Brandon, Nicole R; Beike, Denise R; Cole, Holly E
2017-07-01
Autobiographical memories (AMs) can be used to create and maintain closeness with others [Alea, N., & Bluck, S. (2003). Why are you telling me that? A conceptual model of the social function of autobiographical memory. Memory, 11(2), 165-178]. However, the differential effects of memory specificity are not well established. Two studies with 148 participants tested whether the order in which autobiographical knowledge (AK) and specific episodic AM (EAM) are shared affects feelings of closeness. Participants read two memories hypothetically shared by each of four strangers. The strangers first shared either AK or an EAM, and then shared either AK or an EAM. Participants were randomly assigned to read either positive or negative AMs from the strangers. Findings suggest that people feel closer to those who share positive AMs in the same way they construct memories: starting with general and moving to specific.
Learning and memory in disease vector insects
Vinauger, Clément; Lahondère, Chloé; Cohuet, Anna; Lazzari, Claudio R.; Riffell, Jeffrey A.
2016-01-01
Learning and memory plays an important role in host preference and parasite transmission by disease vector insects. Historically there has been a dearth of standardized protocols that permit testing their learning abilities, thus limiting discussion on the potential epidemiological consequences of learning and memory to a largely speculative extent. However, with increasing evidence that individual experience and associative learning can affect processes such as oviposition site selection and host preference, it is timely to review the recently acquired knowledge, identify research gaps and discuss the implication of learning in disease vector insects in perspective with control strategies. PMID:27450224
Shared Semantics and the Use of Organizational Memories for E-Mail Communications.
ERIC Educational Resources Information Center
Schwartz, David G.
1998-01-01
Examines the use of shared semantics information to link concepts in an organizational memory to e-mail communications. Presents a framework for determining shared semantics based on organizational and personal user profiles. Illustrates how shared semantics are used by the HyperMail system to help link organizational memories (OM) content to…
Kalman filter tracking on parallel architectures
NASA Astrophysics Data System (ADS)
Cerati, G.; Elmer, P.; Krutelyov, S.; Lantz, S.; Lefebvre, M.; McDermott, K.; Riley, D.; Tadel, M.; Wittich, P.; Wurthwein, F.; Yagil, A.
2017-10-01
We report on the progress of our studies towards a Kalman filter track reconstruction algorithm with optimal performance on manycore architectures. The combinatorial structure of these algorithms is not immediately compatible with an efficient SIMD (or SIMT) implementation; the challenge for us is to recast the existing software so it can readily generate hundreds of shared-memory threads that exploit the underlying instruction set of modern processors. We show how the data and associated tasks can be organized in a way that is conducive to both multithreading and vectorization. We demonstrate very good performance on Intel Xeon and Xeon Phi architectures, as well as promising first results on Nvidia GPUs.
Optical implementation of inner product neural associative memory
NASA Technical Reports Server (NTRS)
Liu, Hua-Kuang (Inventor)
1995-01-01
An optical implementation of an inner-product neural associative memory is realized with a first spatial light modulator for entering an initial two-dimensional N-tuple vector and for entering a thresholded output vector image after each iteration until convergence is reached, and a second spatial light modulator for entering M weighted vectors of inner-product scalars multiplied with each of the M stored vectors, where the inner-product scalars are produced by multiplication of the initial input vector in the first iterative cycle (and thresholded vectors in subsequent iterative cycles) with each of the M stored vectors, and the weighted vectors are produced by multiplication of the scalars with corresponding ones of the stored vectors. A Hughes liquid crystal light valve is used for the dual function of summing the weighted vectors and thresholding the sum vector. The thresholded vector is then entered through the first spatial light modulator for reiteration of the process cycle until convergence is reached.
NASA Astrophysics Data System (ADS)
Casasent, David; Telfer, Brian
1988-02-01
The storage capacity, noise performance, and synthesis of associative memories for image analysis are considered. Associative memory synthesis is shown to be very similar to that of linear discriminant functions used in pattern recognition. These lead to new associative memories and new associative memory synthesis and recollection vector encodings. Heteroassociative memories are emphasized in this paper, rather than autoassociative memories, since heteroassociative memories provide scene analysis decisions, rather than merely enhanced output images. The analysis of heteroassociative memories has been given little attention. Heteroassociative memory performance and storage capacity are shown to be quite different from those of autoassociative memories, with much more dependence on the recollection vectors used and less dependence on M/N. This allows several different and preferable synthesis techniques to be considered for associative memories. These new associative memory synthesis techniques and new techniques to update associative memories are included. We also introduce a new SNR performance measure that is preferable to conventional noise standard deviation ratios.
NASA Astrophysics Data System (ADS)
Georgiev, K.; Zlatev, Z.
2010-11-01
The Danish Eulerian Model (DEM) is an Eulerian model for studying the transport of air pollutants on large scale. Originally, the model was developed at the National Environmental Research Institute of Denmark. The model computational domain covers Europe and some neighbour parts belong to the Atlantic Ocean, Asia and Africa. If DEM model is to be applied by using fine grids, then its discretization leads to a huge computational problem. This implies that such a model as DEM must be run only on high-performance computer architectures. The implementation and tuning of such a complex large-scale model on each different computer is a non-trivial task. Here, some comparison results of running of this model on different kind of vector (CRAY C92A, Fujitsu, etc.), parallel computers with distributed memory (IBM SP, CRAY T3E, Beowulf clusters, Macintosh G4 clusters, etc.), parallel computers with shared memory (SGI Origin, SUN, etc.) and parallel computers with two levels of parallelism (IBM SMP, IBM BlueGene/P, clusters of multiprocessor nodes, etc.) will be presented. The main idea in the parallel version of DEM is domain partitioning approach. Discussions according to the effective use of the cache and hierarchical memories of the modern computers as well as the performance, speed-ups and efficiency achieved will be done. The parallel code of DEM, created by using MPI standard library, appears to be highly portable and shows good efficiency and scalability on different kind of vector and parallel computers. Some important applications of the computer model output are presented in short.
Vectorized and multitasked solution of the few-group neutron diffusion equations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zee, S.K.; Turinsky, P.J.; Shayer, Z.
1989-03-01
A numerical algorithm with parallelism was used to solve the two-group, multidimensional neutron diffusion equations on computers characterized by shared memory, vector pipeline, and multi-CPU architecture features. Specifically, solutions were obtained on the Cray X/MP-48, the IBM-3090 with vector facilities, and the FPS-164. The material-centered mesh finite difference method approximation and outer-inner iteration method were employed. Parallelism was introduced in the inner iterations using the cyclic line successive overrelaxation iterative method and solving in parallel across lines. The outer iterations were completed using the Chebyshev semi-iterative method that allows parallelism to be introduced in both space and energy groups. Formore » the three-dimensional model, power, soluble boron, and transient fission product feedbacks were included. Concentrating on the pressurized water reactor (PWR), the thermal-hydraulic calculation of moderator density assumed single-phase flow and a closed flow channel, allowing parallelism to be introduced in the solution across the radial plane. Using a pinwise detail, quarter-core model of a typical PWR in cycle 1, for the two-dimensional model without feedback the measured million floating point operations per second (MFLOPS)/vector speedups were 83/11.7. 18/2.2, and 2.4/5.6 on the Cray, IBM, and FPS without multitasking, respectively. Lower performance was observed with a coarser mesh, i.e., shorter vector length, due to vector pipeline start-up. For an 18 x 18 x 30 (x-y-z) three-dimensional model with feedback of the same core, MFLOPS/vector speedups of --61/6.7 and an execution time of 0.8 CPU seconds on the Cray without multitasking were measured. Finally, using two CPUs and the vector pipelines of the Cray, a multitasking efficiency of 81% was noted for the three-dimensional model.« less
NASA Technical Reports Server (NTRS)
Stehle, Roy H.; Ogier, Richard G.
1993-01-01
Alternatives for realizing a packet-based network switch for use on a frequency division multiple access/time division multiplexed (FDMA/TDM) geostationary communication satellite were investigated. Each of the eight downlink beams supports eight directed dwells. The design needed to accommodate multicast packets with very low probability of loss due to contention. Three switch architectures were designed and analyzed. An output-queued, shared bus system yielded a functionally simple system, utilizing a first-in, first-out (FIFO) memory per downlink dwell, but at the expense of a large total memory requirement. A shared memory architecture offered the most efficiency in memory requirements, requiring about half the memory of the shared bus design. The processing requirement for the shared-memory system adds system complexity that may offset the benefits of the smaller memory. An alternative design using a shared memory buffer per downlink beam decreases circuit complexity through a distributed design, and requires at most 1000 packets of memory more than the completely shared memory design. Modifications to the basic packet switch designs were proposed to accommodate circuit-switched traffic, which must be served on a periodic basis with minimal delay. Methods for dynamically controlling the downlink dwell lengths were developed and analyzed. These methods adapt quickly to changing traffic demands, and do not add significant complexity or cost to the satellite and ground station designs. Methods for reducing the memory requirement by not requiring the satellite to store full packets were also proposed and analyzed. In addition, optimal packet and dwell lengths were computed as functions of memory size for the three switch architectures.
Evaluation of the SPAR thermal analyzer on the CYBER-203 computer
NASA Technical Reports Server (NTRS)
Robinson, J. C.; Riley, K. M.; Haftka, R. T.
1982-01-01
The use of the CYBER 203 vector computer for thermal analysis is investigated. Strengths of the CYBER 203 include the ability to perform, in vector mode using a 64 bit word, 50 million floating point operations per second (MFLOPS) for addition and subtraction, 25 MFLOPS for multiplication and 12.5 MFLOPS for division. The speed of scalar operation is comparable to that of a CDC 7600 and is some 2 to 3 times faster than Langley's CYBER 175s. The CYBER 203 has 1,048,576 64-bit words of real memory with an 80 nanosecond (nsec) access time. Memory is bit addressable and provides single error correction, double error detection (SECDED) capability. The virtual memory capability handles data in either 512 or 65,536 word pages. The machine has 256 registers with a 40 nsec access time. The weaknesses of the CYBER 203 include the amount of vector operation overhead and some data storage limitations. In vector operations there is a considerable amount of time before a single result is produced so that vector calculation speed is slower than scalar operation for short vectors.
Semantic similarity between old and new items produces false alarms in recognition memory.
Montefinese, Maria; Zannino, Gian Daniele; Ambrosini, Ettore
2015-09-01
In everyday life, human beings can report memories of past events that did not occur or that occurred differently from the way they remember them because memory is an imperfect process of reconstruction and is prone to distortion and errors. In this recognition study using word stimuli, we investigated whether a specific operationalization of semantic similarity among concepts can modulate false memories while controlling for the possible effect of associative strength and word co-occurrence in an old-new recognition task. The semantic similarity value of each new concept was calculated as the mean cosine similarity between pairs of vectors representing that new concept and each old concept belonging to the same semantic category. Results showed that, compared with (new) low-similarity concepts, (new) high-similarity concepts had significantly higher probability of being falsely recognized as old, even after partialling out the effect of confounding variables, including associative relatedness and lexical co-occurrence. This finding supports the feature-based view of semantic memory, suggesting that meaning overlap and sharing of semantic features (which are greater when more similar semantic concepts are being processed) have an influence on recognition performance, resulting in more false alarms for new high-similarity concepts. We propose that the associative strength and word co-occurrence among concepts are not sufficient to explain illusory memories but is important to take into account also the effects of feature-based semantic relations, and, in particular, the semantic similarity among concepts.
A new model for CD8+ T cell memory inflation based upon a recombinant adenoviral vector1
Bolinger, Beatrice; Sims, Stuart; O’Hara, Geraldine; de Lara, Catherine; Tchilian, Elma; Firner, Sonja; Engeler, Daniel; Ludewig, Burkhard; Klenerman, Paul
2013-01-01
CD8+ T cell memory inflation, first described in murine cytomegalovirus (MCMV) infection, is characterized by the accumulation of high-frequency, functional antigen-specific CD8+ T cell pools with an effector-memory phenotype and enrichment in peripheral organs. Although persistence of antigen is considered essential, the rules underpinning memory inflation are still unclear. The MCMV model is, however, complicated by the virus’s low-level persistence, and stochastic reactivation. We developed a new model of memory inflation based upon a βgal-recombinant adenovirus vector (Ad-LacZ). After i.v. administration in C57BL/6 mice we observe marked memory inflation in the βgal96 epitope, while a second epitope, βgal497, undergoes classical memory formation. The inflationary T cell responses show kinetics, distribution, phenotype and functions similar to those seen in MCMV and are reproduced using alternative routes of administration. Memory inflation in this model is dependent on MHC Class II. As in MCMV, only the inflating epitope showed immunoproteasome-independence. These data define a new model for memory inflation, which is fully replication-independent, internally controlled and reproduces the key immunologic features of the CD8+ T cell response. This model provides insight into the mechanisms responsible for memory inflation, and since it is based on a vaccine vector, also is relevant to novel T cell-inducing vaccines in humans. PMID:23509359
NASA Astrophysics Data System (ADS)
Lai, Siyan; Xu, Ying; Shao, Bo; Guo, Menghan; Lin, Xiaola
2017-04-01
In this paper we study on Monte Carlo method for solving systems of linear algebraic equations (SLAE) based on shared memory. Former research demostrated that GPU can effectively speed up the computations of this issue. Our purpose is to optimize Monte Carlo method simulation on GPUmemoryachritecture specifically. Random numbers are organized to storein shared memory, which aims to accelerate the parallel algorithm. Bank conflicts can be avoided by our Collaborative Thread Arrays(CTA)scheme. The results of experiments show that the shared memory based strategy can speed up the computaions over than 3X at most.
CaLRS: A Critical-Aware Shared LLC Request Scheduling Algorithm on GPGPU
Ma, Jianliang; Meng, Jinglei; Chen, Tianzhou; Wu, Minghui
2015-01-01
Ultra high thread-level parallelism in modern GPUs usually introduces numerous memory requests simultaneously. So there are always plenty of memory requests waiting at each bank of the shared LLC (L2 in this paper) and global memory. For global memory, various schedulers have already been developed to adjust the request sequence. But we find few work has ever focused on the service sequence on the shared LLC. We measured that a big number of GPU applications always queue at LLC bank for services, which provide opportunity to optimize the service order on LLC. Through adjusting the GPU memory request service order, we can improve the schedulability of SM. So we proposed a critical-aware shared LLC request scheduling algorithm (CaLRS) in this paper. The priority representative of memory request is critical for CaLRS. We use the number of memory requests that originate from the same warp but have not been serviced when they arrive at the shared LLC bank to represent the criticality of each warp. Experiments show that the proposed scheme can boost the SM schedulability effectively by promoting the scheduling priority of the memory requests with high criticality and improves the performance of GPU indirectly. PMID:25729772
Time Constraints and Resource Sharing in Adults' Working Memory Spans
ERIC Educational Resources Information Center
Barrouillet, Pierre; Bernardin, Sophie; Camos, Valerie
2004-01-01
This article presents a new model that accounts for working memory spans in adults, the time-based resource-sharing model. The model assumes that both components (i.e., processing and maintenance) of the main working memory tasks require attention and that memory traces decay as soon as attention is switched away. Because memory retrievals are…
Wang, Qi; Lee, Dasom; Hou, Yubo
2017-07-01
Internet technology provides a new means of recalling and sharing personal memories in the digital age. What is the mnemonic consequence of posting personal memories online? Theories of transactive memory and autobiographical memory would make contrasting predictions. In the present study, college students completed a daily diary for a week, listing at the end of each day all the events that happened to them on that day. They also reported whether they posted any of the events online. Participants received a surprise memory test after the completion of the diary recording and then another test a week later. At both tests, events posted online were significantly more likely than those not posted online to be recalled. It appears that sharing memories online may provide unique opportunities for rehearsal and meaning-making that facilitate memory retention.
FFTs in external or hierarchical memory
NASA Technical Reports Server (NTRS)
Bailey, David H.
1989-01-01
A description is given of advanced techniques for computing an ordered FFT on a computer with external or hierarchical memory. These algorithms (1) require as few as two passes through the external data set, (2) use strictly unit stride, long vector transfers between main memory and external storage, (3) require only a modest amount of scratch space in main memory, and (4) are well suited for vector and parallel computation. Performance figures are included for implementations of some of these algorithms on Cray supercomputers. Of interest is the fact that a main memory version outperforms the current Cray library FFT routines on the Cray-2, the Cray X-MP, and the Cray Y-MP systems. Using all eight processors on the Cray Y-MP, this main memory routine runs at nearly 2 Gflops.
Singh, Shailbala; Toro, Haroldo; Tang, De-Chu; Briles, Worthie E.; Yates, Linda M.; Kopulos, Renee T.; Collisson, Ellen W.
2010-01-01
Avian influenza virus (AIV) specific CD8+ T lymphocyte responses stimulated by intramuscular administration of an adenovirus (Ad) vector expressing either HA or NP were evaluated in chickens following ex vivo stimulation by non-professional antigen presenting cells. The CD8+ T lymphocyte responses were AIV specific, MHC-I restricted, and cross-reacted with heterologousH7N2 AIV strain. Specific effector responses, at 10 days post-inoculation (p.i.), were undetectable at 2 weeks p.i., and memory responses were detected from 3 to 8 weeks p.i. Effector memory responses, detected 1 week following a booster inoculation, were significantly greater than the primary responses and, within 7 days, declined to undetectable levels. Inoculation of an Ad-vector expressing human NP resulted in significantly greater MHC restricted, activation of CD8+ T cell responses specific for AIV. Decreases in all responses with time were most dramatic with maximum activation of T cells as observed following effector and effector memory responses. PMID:20557918
Associative memory - An optimum binary neuron representation
NASA Technical Reports Server (NTRS)
Awwal, A. A.; Karim, M. A.; Liu, H. K.
1989-01-01
Convergence mechanism of vectors in the Hopfield's neural network is studied in terms of both weights (i.e., inner products) and Hamming distance. It is shown that Hamming distance should not always be used in determining the convergence of vectors. Instead, weights (which in turn depend on the neuron representation) are found to play a more dominant role in the convergence mechanism. Consequently, a new binary neuron representation for associative memory is proposed. With the new neuron representation, the associative memory responds unambiguously to the partial input in retrieving the stored information.
Shared versus distributed memory multiprocessors
NASA Technical Reports Server (NTRS)
Jordan, Harry F.
1991-01-01
The question of whether multiprocessors should have shared or distributed memory has attracted a great deal of attention. Some researchers argue strongly for building distributed memory machines, while others argue just as strongly for programming shared memory multiprocessors. A great deal of research is underway on both types of parallel systems. Special emphasis is placed on systems with a very large number of processors for computation intensive tasks and considers research and implementation trends. It appears that the two types of systems will likely converge to a common form for large scale multiprocessors.
ERIC Educational Resources Information Center
Vergauwe, Evie; Barrouillet, Pierre; Camos, Valerie
2009-01-01
Examinations of interference between visual and spatial materials in working memory have suggested domain- and process-based fractionations of visuo-spatial working memory. The present study examined the role of central time-based resource sharing in visuo-spatial working memory and assessed its role in obtained interference patterns. Visual and…
Holographic implementation of a binary associative memory for improved recognition
NASA Astrophysics Data System (ADS)
Bandyopadhyay, Somnath; Ghosh, Ajay; Datta, Asit K.
1998-03-01
Neural network associate memory has found wide application sin pattern recognition techniques. We propose an associative memory model for binary character recognition. The interconnection strengths of the memory are binary valued. The concept of sparse coding is sued to enhance the storage efficiency of the model. The question of imposed preconditioning of pattern vectors, which is inherent in a sparsely coded conventional memory, is eliminated by using a multistep correlation technique an the ability of correct association is enhanced in a real-time application. A potential optoelectronic implementation of the proposed associative memory is also described. The learning and recall is possible by using digital optical matrix-vector multiplication, where full use of parallelism and connectivity of optics is made. A hologram is used in the experiment as a longer memory (LTM) for storing all input information. The short-term memory or the interconnection weight matrix required during the recall process is configured by retrieving the necessary information from the holographic LTM.
Agulleiro, Jose-Ignacio; Fernandez, Jose-Jesus
2015-01-01
Cache blocking is a technique widely used in scientific computing to minimize the exchange of information with main memory by reusing the data kept in cache memory. In tomographic reconstruction on standard computers using vector instructions, cache blocking turns out to be central to optimize performance. To this end, sinograms of the tilt-series and slices of the volumes to be reconstructed have to be divided into small blocks that fit into the different levels of cache memory. The code is then reorganized so as to operate with a block as much as possible before proceeding with another one. This data article is related to the research article titled Tomo3D 2.0 – Exploitation of Advanced Vector eXtensions (AVX) for 3D reconstruction (Agulleiro and Fernandez, 2015) [1]. Here we present data of a thorough study of the performance of tomographic reconstruction by varying cache block sizes, which allows derivation of expressions for their automatic quasi-optimal tuning. PMID:26217710
Agulleiro, Jose-Ignacio; Fernandez, Jose-Jesus
2015-06-01
Cache blocking is a technique widely used in scientific computing to minimize the exchange of information with main memory by reusing the data kept in cache memory. In tomographic reconstruction on standard computers using vector instructions, cache blocking turns out to be central to optimize performance. To this end, sinograms of the tilt-series and slices of the volumes to be reconstructed have to be divided into small blocks that fit into the different levels of cache memory. The code is then reorganized so as to operate with a block as much as possible before proceeding with another one. This data article is related to the research article titled Tomo3D 2.0 - Exploitation of Advanced Vector eXtensions (AVX) for 3D reconstruction (Agulleiro and Fernandez, 2015) [1]. Here we present data of a thorough study of the performance of tomographic reconstruction by varying cache block sizes, which allows derivation of expressions for their automatic quasi-optimal tuning.
NASA Technical Reports Server (NTRS)
Harper, Richard E.; Butler, Bryan P.
1990-01-01
The Draper fault-tolerant processor with fault-tolerant shared memory (FTP/FTSM), which is designed to allow application tasks to continue execution during the memory alignment process, is described. Processor performance is not affected by memory alignment. In addition, the FTP/FTSM incorporates a hardware scrubber device to perform the memory alignment quickly during unused memory access cycles. The FTP/FTSM architecture is described, followed by an estimate of the time required for channel reintegration.
Wang, Qi
2006-01-01
The relations of maternal reminiscing style and child self-concept to children's shared and independent autobiographical memories were examined in a sample of 189 three-year-olds and their mothers from Chinese families in China, first-generation Chinese immigrant families in the United States, and European American families. Mothers shared memories with their children and completed questionnaires; children recounted autobiographical events and described themselves with a researcher. Independent of culture, gender, child age, and language skills, maternal elaborations and evaluations were associated with children's shared memory reports, and maternal evaluations and child agentic self-focus were associated with children's independent memory reports. Maternal style and child self-concept further mediated cultural influences on children's memory. The findings provide insight into the social-cultural construction of autobiographical memory.
A Neurocomputational Model of Goal-Directed Navigation in Insect-Inspired Artificial Agents
Goldschmidt, Dennis; Manoonpong, Poramate; Dasgupta, Sakyasingha
2017-01-01
Despite their small size, insect brains are able to produce robust and efficient navigation in complex environments. Specifically in social insects, such as ants and bees, these navigational capabilities are guided by orientation directing vectors generated by a process called path integration. During this process, they integrate compass and odometric cues to estimate their current location as a vector, called the home vector for guiding them back home on a straight path. They further acquire and retrieve path integration-based vector memories globally to the nest or based on visual landmarks. Although existing computational models reproduced similar behaviors, a neurocomputational model of vector navigation including the acquisition of vector representations has not been described before. Here we present a model of neural mechanisms in a modular closed-loop control—enabling vector navigation in artificial agents. The model consists of a path integration mechanism, reward-modulated global learning, random search, and action selection. The path integration mechanism integrates compass and odometric cues to compute a vectorial representation of the agent's current location as neural activity patterns in circular arrays. A reward-modulated learning rule enables the acquisition of vector memories by associating the local food reward with the path integration state. A motor output is computed based on the combination of vector memories and random exploration. In simulation, we show that the neural mechanisms enable robust homing and localization, even in the presence of external sensory noise. The proposed learning rules lead to goal-directed navigation and route formation performed under realistic conditions. Consequently, we provide a novel approach for vector learning and navigation in a simulated, situated agent linking behavioral observations to their possible underlying neural substrates. PMID:28446872
A Neurocomputational Model of Goal-Directed Navigation in Insect-Inspired Artificial Agents.
Goldschmidt, Dennis; Manoonpong, Poramate; Dasgupta, Sakyasingha
2017-01-01
Despite their small size, insect brains are able to produce robust and efficient navigation in complex environments. Specifically in social insects, such as ants and bees, these navigational capabilities are guided by orientation directing vectors generated by a process called path integration. During this process, they integrate compass and odometric cues to estimate their current location as a vector, called the home vector for guiding them back home on a straight path. They further acquire and retrieve path integration-based vector memories globally to the nest or based on visual landmarks. Although existing computational models reproduced similar behaviors, a neurocomputational model of vector navigation including the acquisition of vector representations has not been described before. Here we present a model of neural mechanisms in a modular closed-loop control-enabling vector navigation in artificial agents. The model consists of a path integration mechanism, reward-modulated global learning, random search, and action selection. The path integration mechanism integrates compass and odometric cues to compute a vectorial representation of the agent's current location as neural activity patterns in circular arrays. A reward-modulated learning rule enables the acquisition of vector memories by associating the local food reward with the path integration state. A motor output is computed based on the combination of vector memories and random exploration. In simulation, we show that the neural mechanisms enable robust homing and localization, even in the presence of external sensory noise. The proposed learning rules lead to goal-directed navigation and route formation performed under realistic conditions. Consequently, we provide a novel approach for vector learning and navigation in a simulated, situated agent linking behavioral observations to their possible underlying neural substrates.
Exploiting Efficient Transpacking for One-Sided Communication and MPI-IO
NASA Astrophysics Data System (ADS)
Mir, Faisal Ghias; Träff, Jesper Larsson
Based on a construction of socalled input-output datatypes that define a mapping between non-consecutive input and output buffers, we outline an efficient method for copying of structured data. We term this operation transpacking, and show how transpacking can be applied for the MPI implementation of one-sided communication and MPI-IO. For one-sided communication via shared-memory, we demonstrate the expected performance improvements by up to a factor of two. For individual MPI-IO, the time to read or write from file dominates the overall time, but even here efficient transpacking can in some scenarios reduce file I/O time considerably. The reported results have been achieved on a single NEC SX-8 vector node.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gebis, Joseph; Oliker, Leonid; Shalf, John
The disparity between microprocessor clock frequencies and memory latency is a primary reason why many demanding applications run well below peak achievable performance. Software controlled scratchpad memories, such as the Cell local store, attempt to ameliorate this discrepancy by enabling precise control over memory movement; however, scratchpad technology confronts the programmer and compiler with an unfamiliar and difficult programming model. In this work, we present the Virtual Vector Architecture (ViVA), which combines the memory semantics of vector computers with a software-controlled scratchpad memory in order to provide a more effective and practical approach to latency hiding. ViVA requires minimal changesmore » to the core design and could thus be easily integrated with conventional processor cores. To validate our approach, we implemented ViVA on the Mambo cycle-accurate full system simulator, which was carefully calibrated to match the performance on our underlying PowerPC Apple G5 architecture. Results show that ViVA is able to deliver significant performance benefits over scalar techniques for a variety of memory access patterns as well as two important memory-bound compact kernels, corner turn and sparse matrix-vector multiplication -- achieving 2x-13x improvement compared the scalar version. Overall, our preliminary ViVA exploration points to a promising approach for improving application performance on leading microprocessors with minimal design and complexity costs, in a power efficient manner.« less
Supporting shared data structures on distributed memory architectures
NASA Technical Reports Server (NTRS)
Koelbel, Charles; Mehrotra, Piyush; Vanrosendale, John
1990-01-01
Programming nonshared memory systems is more difficult than programming shared memory systems, since there is no support for shared data structures. Current programming languages for distributed memory architectures force the user to decompose all data structures into separate pieces, with each piece owned by one of the processors in the machine, and with all communication explicitly specified by low-level message-passing primitives. A new programming environment is presented for distributed memory architectures, providing a global name space and allowing direct access to remote parts of data values. The analysis and program transformations required to implement this environment are described, and the efficiency of the resulting code on the NCUBE/7 and IPSC/2 hypercubes are described.
Performance Evaluation of Remote Memory Access (RMA) Programming on Shared Memory Parallel Computers
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Jost, Gabriele; Biegel, Bryan A. (Technical Monitor)
2002-01-01
The purpose of this study is to evaluate the feasibility of remote memory access (RMA) programming on shared memory parallel computers. We discuss different RMA based implementations of selected CFD application benchmark kernels and compare them to corresponding message passing based codes. For the message-passing implementation we use MPI point-to-point and global communication routines. For the RMA based approach we consider two different libraries supporting this programming model. One is a shared memory parallelization library (SMPlib) developed at NASA Ames, the other is the MPI-2 extensions to the MPI Standard. We give timing comparisons for the different implementation strategies and discuss the performance.
SAHAYOG: A Testbed for Load Sharing under Failure,
1987-07-01
messages, shared memory and semaphores . To communicate using messages, processes create message queues using system-provided prim- itives. The message...The size of the memory that is to be shared is decided by the process when it makes a request for memory allocation. The semaphore option of IPC can be...used to prevent two or more concurrent processes from executing their critical sections at the same time. Semaphores must be used when the processes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Buntinas, D.; Mercier, G.; Gropp, W.
2007-09-01
This paper presents the implementation of MPICH2 over the Nemesis communication subsystem and the evaluation of its shared-memory performance. We describe design issues as well as some of the optimization techniques we employed. We conducted a performance evaluation over shared memory using microbenchmarks. The evaluation shows that MPICH2 Nemesis has very low communication overhead, making it suitable for smaller-grained applications.
Comparison of two paradigms for distributed shared memory
DOE Office of Scientific and Technical Information (OSTI.GOV)
Levelt, W.G.; Kaashoek, M.F.; Bal, H.E.
1990-08-01
The paper compares two paradigms for Distributed Shared Memory on loosely coupled computing systems: the shared data-object model as used in Orca, a programming language specially designed for loosely coupled computing systems and the Shared Virtual Memory model. For both paradigms the authors have implemented two systems, one using only point-to-point messages, the other using broadcasting as well. They briefly describe these two paradigms and their implementations. Then they compare their performance on four applications: the traveling salesman problem, alpha-beta search, matrix multiplication and the all pairs shortest paths problem. The measurements show that both paradigms can be used efficientlymore » for programming large-grain parallel applications. Significant speedups were obtained on all applications. The unstructured Shared Virtual Memory paradigm achieves the best absolute performance, although this is largely due to the preliminary nature of the Orca compiler used. The structured shared data-object model achieves the highest speedups and is much easier to program and to debug.« less
Working memory resources are shared across sensory modalities.
Salmela, V R; Moisala, M; Alho, K
2014-10-01
A common assumption in the working memory literature is that the visual and auditory modalities have separate and independent memory stores. Recent evidence on visual working memory has suggested that resources are shared between representations, and that the precision of representations sets the limit for memory performance. We tested whether memory resources are also shared across sensory modalities. Memory precision for two visual (spatial frequency and orientation) and two auditory (pitch and tone duration) features was measured separately for each feature and for all possible feature combinations. Thus, only the memory load was varied, from one to four features, while keeping the stimuli similar. In Experiment 1, two gratings and two tones-both containing two varying features-were presented simultaneously. In Experiment 2, two gratings and two tones-each containing only one varying feature-were presented sequentially. The memory precision (delayed discrimination threshold) for a single feature was close to the perceptual threshold. However, as the number of features to be remembered was increased, the discrimination thresholds increased more than twofold. Importantly, the decrease in memory precision did not depend on the modality of the other feature(s), or on whether the features were in the same or in separate objects. Hence, simultaneously storing one visual and one auditory feature had an effect on memory precision equal to those of simultaneously storing two visual or two auditory features. The results show that working memory is limited by the precision of the stored representations, and that working memory can be described as a resource pool that is shared across modalities.
Dos Santos, Alex Santana; Valle, Marcos Eduardo
2018-04-01
Autoassociative morphological memories (AMMs) are robust and computationally efficient memory models with unlimited storage capacity. In this paper, we present the max-plus and min-plus projection autoassociative morphological memories (PAMMs) as well as their compositions. Briefly, the max-plus PAMM yields the largest max-plus combination of the stored vectors which is less than or equal to the input. Dually, the vector recalled by the min-plus PAMM corresponds to the smallest min-plus combination which is larger than or equal to the input. Apart from unlimited absolute storage capacity and one step retrieval, PAMMs and their compositions exhibit an excellent noise tolerance. Furthermore, the new memories yielded quite promising results in classification problems with a large number of features and classes. Copyright © 2018 Elsevier Ltd. All rights reserved.
NASA Technical Reports Server (NTRS)
Waheed, Abdul; Yan, Jerry
1998-01-01
This paper presents a model to evaluate the performance and overhead of parallelizing sequential code using compiler directives for multiprocessing on distributed shared memory (DSM) systems. With increasing popularity of shared address space architectures, it is essential to understand their performance impact on programs that benefit from shared memory multiprocessing. We present a simple model to characterize the performance of programs that are parallelized using compiler directives for shared memory multiprocessing. We parallelized the sequential implementation of NAS benchmarks using native Fortran77 compiler directives for an Origin2000, which is a DSM system based on a cache-coherent Non Uniform Memory Access (ccNUMA) architecture. We report measurement based performance of these parallelized benchmarks from four perspectives: efficacy of parallelization process; scalability; parallelization overhead; and comparison with hand-parallelized and -optimized version of the same benchmarks. Our results indicate that sequential programs can conveniently be parallelized for DSM systems using compiler directives but realizing performance gains as predicted by the performance model depends primarily on minimizing architecture-specific data locality overhead.
Distributed simulation using a real-time shared memory network
NASA Technical Reports Server (NTRS)
Simon, Donald L.; Mattern, Duane L.; Wong, Edmond; Musgrave, Jeffrey L.
1993-01-01
The Advanced Control Technology Branch of the NASA Lewis Research Center performs research in the area of advanced digital controls for aeronautic and space propulsion systems. This work requires the real-time implementation of both control software and complex dynamical models of the propulsion system. We are implementing these systems in a distributed, multi-vendor computer environment. Therefore, a need exists for real-time communication and synchronization between the distributed multi-vendor computers. A shared memory network is a potential solution which offers several advantages over other real-time communication approaches. A candidate shared memory network was tested for basic performance. The shared memory network was then used to implement a distributed simulation of a ramjet engine. The accuracy and execution time of the distributed simulation was measured and compared to the performance of the non-partitioned simulation. The ease of partitioning the simulation, the minimal time required to develop for communication between the processors and the resulting execution time all indicate that the shared memory network is a real-time communication technique worthy of serious consideration.
Multiprocessor shared-memory information exchange
DOE Office of Scientific and Technical Information (OSTI.GOV)
Santoline, L.L.; Bowers, M.D.; Crew, A.W.
1989-02-01
In distributed microprocessor-based instrumentation and control systems, the inter-and intra-subsystem communication requirements ultimately form the basis for the overall system architecture. This paper describes a software protocol which addresses the intra-subsystem communications problem. Specifically the protocol allows for multiple processors to exchange information via a shared-memory interface. The authors primary goal is to provide a reliable means for information to be exchanged between central application processor boards (masters) and dedicated function processor boards (slaves) in a single computer chassis. The resultant Multiprocessor Shared-Memory Information Exchange (MSMIE) protocol, a standard master-slave shared-memory interface suitable for use in nuclear safety systems, ismore » designed to pass unidirectional buffers of information between the processors while providing a minimum, deterministic cycle time for this data exchange.« less
Optical threshold secret sharing scheme based on basic vector operations and coherence superposition
NASA Astrophysics Data System (ADS)
Deng, Xiaopeng; Wen, Wei; Mi, Xianwu; Long, Xuewen
2015-04-01
We propose, to our knowledge for the first time, a simple optical algorithm for secret image sharing with the (2,n) threshold scheme based on basic vector operations and coherence superposition. The secret image to be shared is firstly divided into n shadow images by use of basic vector operations. In the reconstruction stage, the secret image can be retrieved by recording the intensity of the coherence superposition of any two shadow images. Compared with the published encryption techniques which focus narrowly on information encryption, the proposed method can realize information encryption as well as secret sharing, which further ensures the safety and integrality of the secret information and prevents power from being kept centralized and abused. The feasibility and effectiveness of the proposed method are demonstrated by numerical results.
A new parallel-vector finite element analysis software on distributed-memory computers
NASA Technical Reports Server (NTRS)
Qin, Jiangning; Nguyen, Duc T.
1993-01-01
A new parallel-vector finite element analysis software package MPFEA (Massively Parallel-vector Finite Element Analysis) is developed for large-scale structural analysis on massively parallel computers with distributed-memory. MPFEA is designed for parallel generation and assembly of the global finite element stiffness matrices as well as parallel solution of the simultaneous linear equations, since these are often the major time-consuming parts of a finite element analysis. Block-skyline storage scheme along with vector-unrolling techniques are used to enhance the vector performance. Communications among processors are carried out concurrently with arithmetic operations to reduce the total execution time. Numerical results on the Intel iPSC/860 computers (such as the Intel Gamma with 128 processors and the Intel Touchstone Delta with 512 processors) are presented, including an aircraft structure and some very large truss structures, to demonstrate the efficiency and accuracy of MPFEA.
Memory access in shared virtual memory
DOE Office of Scientific and Technical Information (OSTI.GOV)
Berrendorf, R.
1992-01-01
Shared virtual memory (SVM) is a virtual memory layer with a single address space on top of a distributed real memory on parallel computers. We examine the behavior and performance of SVM running a parallel program with medium-grained, loop-level parallelism on top of it. A simulator for the underlying parallel architecture can be used to examine the behavior of SVM more deeply. The influence of several parameters, such as the number of processors, page size, cold or warm start, and restricted page replication, is studied.
Memory access in shared virtual memory
DOE Office of Scientific and Technical Information (OSTI.GOV)
Berrendorf, R.
1992-09-01
Shared virtual memory (SVM) is a virtual memory layer with a single address space on top of a distributed real memory on parallel computers. We examine the behavior and performance of SVM running a parallel program with medium-grained, loop-level parallelism on top of it. A simulator for the underlying parallel architecture can be used to examine the behavior of SVM more deeply. The influence of several parameters, such as the number of processors, page size, cold or warm start, and restricted page replication, is studied.
GaAs Supercomputing: Architecture, Language, And Algorithms For Image Processing
NASA Astrophysics Data System (ADS)
Johl, John T.; Baker, Nick C.
1988-10-01
The application of high-speed GaAs processors in a parallel system matches the demanding computational requirements of image processing. The architecture of the McDonnell Douglas Astronautics Company (MDAC) vector processor is described along with the algorithms and language translator. Most image and signal processing algorithms can utilize parallel processing and show a significant performance improvement over sequential versions. The parallelization performed by this system is within each vector instruction. Since each vector has many elements, each requiring some computation, useful concurrent arithmetic operations can easily be performed. Balancing the memory bandwidth with the computation rate of the processors is an important design consideration for high efficiency and utilization. The architecture features a bus-based execution unit consisting of four to eight 32-bit GaAs RISC microprocessors running at a 200 MHz clock rate for a peak performance of 1.6 BOPS. The execution unit is connected to a vector memory with three buses capable of transferring two input words and one output word every 10 nsec. The address generators inside the vector memory perform different vector addressing modes and feed the data to the execution unit. The functions discussed in this paper include basic MATRIX OPERATIONS, 2-D SPATIAL CONVOLUTION, HISTOGRAM, and FFT. For each of these algorithms, assembly language programs were run on a behavioral model of the system to obtain performance figures.
Gordon, Claire Louse; Lee, Lian Ni; Swadling, Leo; Hutchings, Claire; Zinser, Madeleine; Highton, Andrew John; Capone, Stefania; Folgori, Antonella; Barnes, Eleanor; Klenerman, Paul
2018-04-17
The induction and maintenance of T cell memory is critical to the success of vaccines. A recently described subset of memory CD8 + T cells defined by intermediate expression of the chemokine receptor CX3CR1 was shown to have self-renewal, proliferative, and tissue-surveillance properties relevant to vaccine-induced memory. We tracked these cells when memory is sustained at high levels: memory inflation induced by cytomegalovirus (CMV) and adenovirus-vectored vaccines. In mice, both CMV and vaccine-induced inflationary T cells showed sustained high levels of CX3R1 int cells exhibiting an effector-memory phenotype, characteristic of inflationary pools, in early memory. In humans, CX3CR1 int CD8 + T cells were strongly induced following adenovirus-vectored vaccination for hepatitis C virus (HCV) (ChAd3-NSmut) and during natural CMV infection and were associated with a memory phenotype similar to that in mice. These data indicate that CX3CR1 int cells form an important component of the memory pool in response to persistent viruses and vaccines in both mice and humans. Copyright © 2018 The Author(s). Published by Elsevier Inc. All rights reserved.
Working Memory Span Development: A Time-Based Resource-Sharing Model Account
ERIC Educational Resources Information Center
Barrouillet, Pierre; Gavens, Nathalie; Vergauwe, Evie; Gaillard, Vinciane; Camos, Valerie
2009-01-01
The time-based resource-sharing model (P. Barrouillet, S. Bernardin, & V. Camos, 2004) assumes that during complex working memory span tasks, attention is frequently and surreptitiously switched from processing to reactivate decaying memory traces before their complete loss. Three experiments involving children from 5 to 14 years of age…
Direct access inter-process shared memory
Brightwell, Ronald B; Pedretti, Kevin; Hudson, Trammell B
2013-10-22
A technique for directly sharing physical memory between processes executing on processor cores is described. The technique includes loading a plurality of processes into the physical memory for execution on a corresponding plurality of processor cores sharing the physical memory. An address space is mapped to each of the processes by populating a first entry in a top level virtual address table for each of the processes. The address space of each of the processes is cross-mapped into each of the processes by populating one or more subsequent entries of the top level virtual address table with the first entry in the top level virtual address table from other processes.
Memory Network For Distributed Data Processors
NASA Technical Reports Server (NTRS)
Bolen, David; Jensen, Dean; Millard, ED; Robinson, Dave; Scanlon, George
1992-01-01
Universal Memory Network (UMN) is modular, digital data-communication system enabling computers with differing bus architectures to share 32-bit-wide data between locations up to 3 km apart with less than one millisecond of latency. Makes it possible to design sophisticated real-time and near-real-time data-processing systems without data-transfer "bottlenecks". This enterprise network permits transmission of volume of data equivalent to an encyclopedia each second. Facilities benefiting from Universal Memory Network include telemetry stations, simulation facilities, power-plants, and large laboratories or any facility sharing very large volumes of data. Main hub of UMN is reflection center including smaller hubs called Shared Memory Interfaces.
Grouping and binding in visual short-term memory.
Quinlan, Philip T; Cohen, Dale J
2012-09-01
Findings of 2 experiments are reported that challenge the current understanding of visual short-term memory (VSTM). In both experiments, a single study display, containing 6 colored shapes, was presented briefly and then probed with a single colored shape. At stake is how VSTM retains a record of different objects that share common features: In the 1st experiment, 2 study items sometimes shared a common feature (either a shape or a color). The data revealed a color sharing effect, in which memory was much better for items that shared a common color than for items that did not. The 2nd experiment showed that the size of the color sharing effect depended on whether a single pair of items shared a common color or whether 2 pairs of items were so defined-memory for all items improved when 2 color groups were presented. In explaining performance, an account is advanced in which items compete for a fixed number of slots, but then memory recall for any given stored item is prone to error. A critical assumption is that items that share a common color are stored together in a slot as a chunk. The evidence provides further support for the idea that principles of perceptual organization may determine the manner in which items are stored in VSTM. PsycINFO Database Record (c) 2012 APA, all rights reserved.
Protein sequence comparison based on K-string dictionary.
Yu, Chenglong; He, Rong L; Yau, Stephen S-T
2013-10-25
The current K-string-based protein sequence comparisons require large amounts of computer memory because the dimension of the protein vector representation grows exponentially with K. In this paper, we propose a novel concept, the "K-string dictionary", to solve this high-dimensional problem. It allows us to use a much lower dimensional K-string-based frequency or probability vector to represent a protein, and thus significantly reduce the computer memory requirements for their implementation. Furthermore, based on this new concept, we use Singular Value Decomposition to analyze real protein datasets, and the improved protein vector representation allows us to obtain accurate gene trees. © 2013.
Skin vaccination with live virus vectored microneedle arrays induce long lived CD8(+) T cell memory.
Becker, Pablo D; Hervouet, Catherine; Mason, Gavin M; Kwon, Sung-Yun; Klavinskis, Linda S
2015-09-08
A simple dissolvable microneedle array (MA) platform has emerged as a promising technology for vaccine delivery, due to needle-free injection with a formulation that preserves the immunogenicity of live viral vectored vaccines dried in the MA matrix. While recent studies have focused largely on design parameters optimized to induce primary CD8(+) T cell responses, the hallmark of a vaccine is synonymous with engendering long-lasting memory. Here, we address the capacity of dried MA vaccination to programme phenotypic markers indicative of effector/memory CD8(+) T cell subsets and also responsiveness to recall antigen benchmarked against conventional intradermal (ID) injection. We show that despite a slightly lower frequency of dividing T cell receptor transgenic CD8(+) T cells in secondary lymphoid tissue at an early time point, the absolute number of CD8(+) T cells expressing an effector memory (CD62L(-)CD127(+)) and central memory (CD62L(+)CD127(+)) phenotype during peak expansion were comparable after MA and ID vaccination with a recombinant human adenovirus type 5 vector (AdHu5) encoding HIV-1 gag. Similarly, both vaccination routes generated CD8(+) memory T cell subsets detected in draining LNs for at least two years post-vaccination capable of responding to secondary antigen. These data suggest that CD8(+) T cell effector/memory generation and long-term memory is largely unaffected by physical differences in vaccine delivery to the skin via dried MA or ID suspension. Copyright © 2015 Elsevier Ltd. All rights reserved.
Acquisition and expression of memories of distance and direction in navigating wood ants.
Fernandes, A Sofia D; Philippides, Andrew; Collett, Tom S; Niven, Jeremy E
2015-11-01
Wood ants, like other central place foragers, rely on route memories to guide them to and from a reliable food source. They use visual memories of the surrounding scene and probably compass information to control their direction. Do they also remember the length of their route and do they link memories of direction and distance? To answer these questions, we trained wood ant (Formica rufa) foragers in a channel to perform either a single short foraging route or two foraging routes in opposite directions. By shifting the starting position of the route within the channel, but keeping the direction and distance fixed, we tried to ensure that the ants would rely upon vector memories rather than visual memories to decide when to stop. The homeward memories that the ants formed were revealed by placing fed or unfed ants directly into a channel and assessing the direction and distance that they walked without prior performance of the food-ward leg of the journey. This procedure prevented the distance and direction walked being affected by a home vector derived from path integration. Ants that were unfed walked in the feeder direction. Fed ants walked in the opposite direction for a distance related to the separation between start and feeder. Vector memories of a return route can thus be primed by the ants' feeding state and expressed even when the ants have not performed the food-ward route. Tests on ants that have acquired two routes indicate that memories of the direction and distance of the return routes are linked, suggesting that they may be encoded by a common neural population within the ant brain. © 2015. Published by The Company of Biologists Ltd.
Efficient checkpointing schemes for depletion perturbation solutions on memory-limited architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stripling, H. F.; Adams, M. L.; Hawkins, W. D.
2013-07-01
We describe a methodology for decreasing the memory footprint and machine I/O load associated with the need to access a forward solution during an adjoint solve. Specifically, we are interested in the depletion perturbation equations, where terms in the adjoint Bateman and transport equations depend on the forward flux solution. Checkpointing is the procedure of storing snapshots of the forward solution to disk and using these snapshots to recompute the parts of the forward solution that are necessary for the adjoint solve. For large problems, however, the storage cost of just a few copies of an angular flux vector canmore » exceed the available RAM on the host machine. We propose a methodology that does not checkpoint the angular flux vector; instead, we write and store converged source moments, which are typically of a much lower dimension than the angular flux solution. This reduces the memory footprint and I/O load of the problem, but requires that we perform single sweeps to reconstruct flux vectors on demand. We argue that this trade-off is exactly the kind of algorithm that will scale on advanced, memory-limited architectures. We analyze the cost, in terms of FLOPS and memory footprint, of five checkpointing schemes. We also provide computational results that support the analysis and show that the memory-for-work trade off does improve time to solution. (authors)« less
Shared memories reveal shared structure in neural activity across individuals
Chen, J.; Leong, Y.C.; Honey, C.J.; Yong, C.H.; Norman, K.A.; Hasson, U.
2016-01-01
Our lives revolve around sharing experiences and memories with others. When different people recount the same events, how similar are their underlying neural representations? Participants viewed a fifty-minute movie, then verbally described the events during functional MRI, producing unguided detailed descriptions lasting up to forty minutes. As each person spoke, event-specific spatial patterns were reinstated in default-network, medial-temporal, and high-level visual areas. Individual event patterns were both highly discriminable from one another and similar between people, suggesting consistent spatial organization. In many high-order areas, patterns were more similar between people recalling the same event than between recall and perception, indicating systematic reshaping of percept into memory. These results reveal the existence of a common spatial organization for memories in high-level cortical areas, where encoded information is largely abstracted beyond sensory constraints; and that neural patterns during perception are altered systematically across people into shared memory representations for real-life events. PMID:27918531
Combining Distributed and Shared Memory Models: Approach and Evolution of the Global Arrays Toolkit
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nieplocha, Jarek; Harrison, Robert J.; Kumar, Mukul
2002-07-29
Both shared memory and distributed memory models have advantages and shortcomings. Shared memory model is much easier to use but it ignores data locality/placement. Given the hierarchical nature of the memory subsystems in the modern computers this characteristic might have a negative impact on performance and scalability. Various techniques, such as code restructuring to increase data reuse and introducing blocking in data accesses, can address the problem and yield performance competitive with message passing[Singh], however at the cost of compromising the ease of use feature. Distributed memory models such as message passing or one-sided communication offer performance and scalability butmore » they compromise the ease-of-use. In this context, the message-passing model is sometimes referred to as?assembly programming for the scientific computing?. The Global Arrays toolkit[GA1, GA2] attempts to offer the best features of both models. It implements a shared-memory programming model in which data locality is managed explicitly by the programmer. This management is achieved by explicit calls to functions that transfer data between a global address space (a distributed array) and local storage. In this respect, the GA model has similarities to the distributed shared-memory models that provide an explicit acquire/release protocol. However, the GA model acknowledges that remote data is slower to access than local data and allows data locality to be explicitly specified and hence managed. The GA model exposes to the programmer the hierarchical memory of modern high-performance computer systems, and by recognizing the communication overhead for remote data transfer, it promotes data reuse and locality of reference. This paper describes the characteristics of the Global Arrays programming model, capabilities of the toolkit, and discusses its evolution.« less
Parallel computing for probabilistic fatigue analysis
NASA Technical Reports Server (NTRS)
Sues, Robert H.; Lua, Yuan J.; Smith, Mark D.
1993-01-01
This paper presents the results of Phase I research to investigate the most effective parallel processing software strategies and hardware configurations for probabilistic structural analysis. We investigate the efficiency of both shared and distributed-memory architectures via a probabilistic fatigue life analysis problem. We also present a parallel programming approach, the virtual shared-memory paradigm, that is applicable across both types of hardware. Using this approach, problems can be solved on a variety of parallel configurations, including networks of single or multiprocessor workstations. We conclude that it is possible to effectively parallelize probabilistic fatigue analysis codes; however, special strategies will be needed to achieve large-scale parallelism to keep large number of processors busy and to treat problems with the large memory requirements encountered in practice. We also conclude that distributed-memory architecture is preferable to shared-memory for achieving large scale parallelism; however, in the future, the currently emerging hybrid-memory architectures will likely be optimal.
Machine parts recognition using a trinary associative memory
NASA Technical Reports Server (NTRS)
Awwal, Abdul Ahad S.; Karim, Mohammad A.; Liu, Hua-Kuang
1989-01-01
The convergence mechanism of vectors in Hopfield's neural network in relation to recognition of partially known patterns is studied in terms of both inner products and Hamming distance. It has been shown that Hamming distance should not always be used in determining the convergence of vectors. Instead, inner product weighting coefficients play a more dominant role in certain data representations for determining the convergence mechanism. A trinary neuron representation for associative memory is found to be more effective for associative recall. Applications of the trinary associative memory to reconstruct machine part images that are partially missing are demonstrated by means of computer simulation as examples of the usefulness of this approach.
Machine Parts Recognition Using A Trinary Associative Memory
NASA Astrophysics Data System (ADS)
Awwal, Abdul A. S.; Karim, Mohammad A.; Liu, Hua-Kuang
1989-05-01
The convergence mechanism of vectors in Hopfield's neural network in relation to recognition of partially known patterns is studied in terms of both inner products and Hamming distance. It has been shown that Hamming distance should not always be used in determining the convergence of vectors. Instead, inner product weighting coefficients play a more dominant role in certain data representations for determining the convergence mechanism. A trinary neuron representation for associative memory is found to be more effective for associative recall. Applications of the trinary associative memory to reconstruct machine part images that are partially missing are demonstrated by means of computer simulation as examples of the usefulness of this approach.
NASA Technical Reports Server (NTRS)
Tuccillo, J. J.
1984-01-01
Numerical Weather Prediction (NWP), for both operational and research purposes, requires only fast computational speed but also large memory. A technique for solving the Primitive Equations for atmospheric motion on the CYBER 205, as implemented in the Mesoscale Atmospheric Simulation System, which is fully vectorized and requires substantially less memory than other techniques such as the Leapfrog or Adams-Bashforth Schemes is discussed. The technique presented uses the Euler-Backard time marching scheme. Also discussed are several techniques for reducing computational time of the model by replacing slow intrinsic routines by faster algorithms which use only hardware vector instructions.
Choi, Hae-Yoon; Kensinger, Elizabeth A; Rajaram, Suparna
2017-09-01
Social transmission of memory and its consequence on collective memory have generated enduring interdisciplinary interest because of their widespread significance in interpersonal, sociocultural, and political arenas. We tested the influence of 3 key factors-emotional salience of information, group structure, and information distribution-on mnemonic transmission, social contagion, and collective memory. Participants individually studied emotionally salient (negative or positive) and nonemotional (neutral) picture-word pairs that were completely shared, partially shared, or unshared within participant triads, and then completed 3 consecutive recalls in 1 of 3 conditions: individual-individual-individual (control), collaborative-collaborative (identical group; insular structure)-individual, and collaborative-collaborative (reconfigured group; diverse structure)-individual. Collaboration enhanced negative memories especially in insular group structure and especially for shared information, and promoted collective forgetting of positive memories. Diverse group structure reduced this negativity effect. Unequally distributed information led to social contagion that creates false memories; diverse structure propagated a greater variety of false memories whereas insular structure promoted confidence in false recognition and false collective memory. A simultaneous assessment of network structure, information distribution, and emotional valence breaks new ground to specify how network structure shapes the spread of negative memories and false memories, and the emergence of collective memory. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Effects of cacheing on multitasking efficiency and programming strategy on an ELXSI 6400
DOE Office of Scientific and Technical Information (OSTI.GOV)
Montry, G.R.; Benner, R.E.
1985-12-01
The impact of a cache/shared memory architecture, and, in particular, the cache coherency problem, upon concurrent algorithm and program development is discussed. In this context, a simple set of programming strategies are proposed which streamline code development and improve code performance when multitasking in a cache/shared memory or distributed memory environment.
NASA Technical Reports Server (NTRS)
Mavriplis, D. J.; Das, Raja; Saltz, Joel; Vermeland, R. E.
1992-01-01
An efficient three dimensional unstructured Euler solver is parallelized on a Cray Y-MP C90 shared memory computer and on an Intel Touchstone Delta distributed memory computer. This paper relates the experiences gained and describes the software tools and hardware used in this study. Performance comparisons between two differing architectures are made.
Implementation of a partitioned algorithm for simulation of large CSI problems
NASA Technical Reports Server (NTRS)
Alvin, Kenneth F.; Park, K. C.
1991-01-01
The implementation of a partitioned numerical algorithm for determining the dynamic response of coupled structure/controller/estimator finite-dimensional systems is reviewed. The partitioned approach leads to a set of coupled first and second-order linear differential equations which are numerically integrated with extrapolation and implicit step methods. The present software implementation, ACSIS, utilizes parallel processing techniques at various levels to optimize performance on a shared-memory concurrent/vector processing system. A general procedure for the design of controller and filter gains is also implemented, which utilizes the vibration characteristics of the structure to be solved. Also presented are: example problems; a user's guide to the software; the procedures and algorithm scripts; a stability analysis for the algorithm; and the source code for the parallel implementation.
Dorsal Hippocampal CREB Is Both Necessary and Sufficient for Spatial Memory
ERIC Educational Resources Information Center
Sekeres, Melanie J.; Neve, Rachael L.; Frankland, Paul W.; Josselyn, Sheena A.
2010-01-01
Although the transcription factor CREB has been widely implicated in memory, whether it is sufficient to produce spatial memory under conditions that do not normally support memory formation in mammals is unknown. We found that locally and acutely increasing CREB levels in the dorsal hippocampus using viral vectors is sufficient to induce robust…
High-performance ultra-low power VLSI analog processor for data compression
NASA Technical Reports Server (NTRS)
Tawel, Raoul (Inventor)
1996-01-01
An apparatus for data compression employing a parallel analog processor. The apparatus includes an array of processor cells with N columns and M rows wherein the processor cells have an input device, memory device, and processor device. The input device is used for inputting a series of input vectors. Each input vector is simultaneously input into each column of the array of processor cells in a pre-determined sequential order. An input vector is made up of M components, ones of which are input into ones of M processor cells making up a column of the array. The memory device is used for providing ones of M components of a codebook vector to ones of the processor cells making up a column of the array. A different codebook vector is provided to each of the N columns of the array. The processor device is used for simultaneously comparing the components of each input vector to corresponding components of each codebook vector, and for outputting a signal representative of the closeness between the compared vector components. A combination device is used to combine the signal output from each processor cell in each column of the array and to output a combined signal. A closeness determination device is then used for determining which codebook vector is closest to an input vector from the combined signals, and for outputting a codebook vector index indicating which of the N codebook vectors was the closest to each input vector input into the array.
Schapiro, Anna C; McDevitt, Elizabeth A; Chen, Lang; Norman, Kenneth A; Mednick, Sara C; Rogers, Timothy T
2017-11-01
Semantic memory encompasses knowledge about both the properties that typify concepts (e.g. robins, like all birds, have wings) as well as the properties that individuate conceptually related items (e.g. robins, in particular, have red breasts). We investigate the impact of sleep on new semantic learning using a property inference task in which both kinds of information are initially acquired equally well. Participants learned about three categories of novel objects possessing some properties that were shared among category exemplars and others that were unique to an exemplar, with exposure frequency varying across categories. In Experiment 1, memory for shared properties improved and memory for unique properties was preserved across a night of sleep, while memory for both feature types declined over a day awake. In Experiment 2, memory for shared properties improved across a nap, but only for the lower-frequency category, suggesting a prioritization of weakly learned information early in a sleep period. The increase was significantly correlated with amount of REM, but was also observed in participants who did not enter REM, suggesting involvement of both REM and NREM sleep. The results provide the first evidence that sleep improves memory for the shared structure of object categories, while simultaneously preserving object-unique information.
Automatic Generation of Directive-Based Parallel Programs for Shared Memory Parallel Systems
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Yan, Jerry; Frumkin, Michael
2000-01-01
The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. Due to its ease of programming and its good performance, the technique has become very popular. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate directive-based, OpenMP, parallel programs. We outline techniques used in the implementation of the tool and present test results on the NAS parallel benchmarks and ARC3D, a CFD application. This work demonstrates the great potential of using computer-aided tools to quickly port parallel programs and also achieve good performance.
A sparse matrix algorithm on the Boolean vector machine
NASA Technical Reports Server (NTRS)
Wagner, Robert A.; Patrick, Merrell L.
1988-01-01
VLSI technology is being used to implement a prototype Boolean Vector Machine (BVM), which is a large network of very small processors with equally small memories that operate in SIMD mode; these use bit-serial arithmetic, and communicate via cube-connected cycles network. The BVM's bit-serial arithmetic and the small memories of individual processors are noted to compromise the system's effectiveness in large numerical problem applications. Attention is presently given to the implementation of a basic matrix-vector iteration algorithm for space matrices of the BVM, in order to generate over 1 billion useful floating-point operations/sec for this iteration algorithm. The algorithm is expressed in a novel language designated 'BVM'.
Lee, Lian N; Bolinger, Beatrice; Banki, Zoltan; de Lara, Catherine; Highton, Andrew J; Colston, Julia M; Hutchings, Claire; Klenerman, Paul
2017-12-01
The efficacies of many new T cell vaccines rely on generating large populations of long-lived pathogen-specific effector memory CD8 T cells. However, it is now increasingly recognized that prior infection history impacts on the host immune response. Additionally, the order in which these infections are acquired could have a major effect. Exploiting the ability to generate large sustained effector memory (i.e. inflationary) T cell populations from murine cytomegalovirus (MCMV) and human Adenovirus-subtype (AdHu5) 5-beta-galactosidase (Ad-lacZ) vector, the impact of new infections on pre-existing memory and the capacity of the host's memory compartment to accommodate multiple inflationary populations from unrelated pathogens was investigated in a murine model. Simultaneous and sequential infections, first with MCMV followed by Ad-lacZ, generated inflationary populations towards both viruses with similar kinetics and magnitude to mono-infected groups. However, in Ad-lacZ immune mice, subsequent acute MCMV infection led to a rapid decline of the pre-existing Ad-LacZ-specific inflating population, associated with bystander activation of Fas-dependent apoptotic pathways. However, responses were maintained long-term and boosting with Ad-lacZ led to rapid re-expansion of the inflating population. These data indicate firstly that multiple specificities of inflating memory cells can be acquired at different times and stably co-exist. Some acute infections may also deplete pre-existing memory populations, thus revealing the importance of the order of infection acquisition. Importantly, immunization with an AdHu5 vector did not alter the size of the pre-existing memory. These phenomena are relevant to the development of adenoviral vectors as novel vaccination strategies for diverse infections and cancers. (241 words).
Bolinger, Beatrice; de Lara, Catherine; Hutchings, Claire
2017-01-01
The efficacies of many new T cell vaccines rely on generating large populations of long-lived pathogen-specific effector memory CD8 T cells. However, it is now increasingly recognized that prior infection history impacts on the host immune response. Additionally, the order in which these infections are acquired could have a major effect. Exploiting the ability to generate large sustained effector memory (i.e. inflationary) T cell populations from murine cytomegalovirus (MCMV) and human Adenovirus-subtype (AdHu5) 5-beta-galactosidase (Ad-lacZ) vector, the impact of new infections on pre-existing memory and the capacity of the host’s memory compartment to accommodate multiple inflationary populations from unrelated pathogens was investigated in a murine model. Simultaneous and sequential infections, first with MCMV followed by Ad-lacZ, generated inflationary populations towards both viruses with similar kinetics and magnitude to mono-infected groups. However, in Ad-lacZ immune mice, subsequent acute MCMV infection led to a rapid decline of the pre-existing Ad-LacZ-specific inflating population, associated with bystander activation of Fas-dependent apoptotic pathways. However, responses were maintained long-term and boosting with Ad-lacZ led to rapid re-expansion of the inflating population. These data indicate firstly that multiple specificities of inflating memory cells can be acquired at different times and stably co-exist. Some acute infections may also deplete pre-existing memory populations, thus revealing the importance of the order of infection acquisition. Importantly, immunization with an AdHu5 vector did not alter the size of the pre-existing memory. These phenomena are relevant to the development of adenoviral vectors as novel vaccination strategies for diverse infections and cancers. (241 words) PMID:29281733
ERIC Educational Resources Information Center
Hayes-Roth, Barbara
Two kinds of memory organization are distinguished: segregrated versus integrated. In segregated memory organizations, related learned propositions have separate memory representations. In integrated memory organizations, memory representations of related propositions share common subrepresentations. Segregated memory organizations facilitate…
Wiese, Holger; Schweinberger, Stefan R
2015-01-01
The present study examined whether semantic memory for newly learned people is structured by visual co-occurrence, shared semantics, or both. Participants were trained with pairs of simultaneously presented (i.e., co-occurring) preexperimentally unfamiliar faces, which either did or did not share additionally provided semantic information (occupation, place of living, etc.). Semantic information could also be shared between faces that did not co-occur. A subsequent priming experiment revealed faster responses for both co-occurrence/no shared semantics and no co-occurrence/shared semantics conditions, than for an unrelated condition. Strikingly, priming was strongest in the co-occurrence/shared semantics condition, suggesting additive effects of these factors. Additional analysis of event-related brain potentials yielded priming in the N400 component only for combined effects of visual co-occurrence and shared semantics, with more positive amplitudes in this than in the unrelated condition. Overall, these findings suggest that both semantic relatedness and visual co-occurrence are important when novel information is integrated into person-related semantic memory.
Luckey, Chance John; Bhattacharya, Deepta; Goldrath, Ananda W.; Weissman, Irving L.; Benoist, Christophe; Mathis, Diane
2006-01-01
The only cells of the hematopoietic system that undergo self-renewal for the lifetime of the organism are long-term hematopoietic stem cells and memory T and B cells. To determine whether there is a shared transcriptional program among these self-renewing populations, we first compared the gene-expression profiles of naïve, effector and memory CD8+ T cells with those of long-term hematopoietic stem cells, short-term hematopoietic stem cells, and lineage-committed progenitors. Transcripts augmented in memory CD8+ T cells relative to naïve and effector T cells were selectively enriched in long-term hematopoietic stem cells and were progressively lost in their short-term and lineage-committed counterparts. Furthermore, transcripts selectively decreased in memory CD8+ T cells were selectively down-regulated in long-term hematopoietic stem cells and progressively increased with differentiation. To confirm that this pattern was a general property of immunologic memory, we turned to independently generated gene expression profiles of memory, naïve, germinal center, and plasma B cells. Once again, memory-enriched and -depleted transcripts were also appropriately augmented and diminished in long-term hematopoietic stem cells, and their expression correlated with progressive loss of self-renewal function. Thus, there appears to be a common signature of both up- and down-regulated transcripts shared between memory T cells, memory B cells, and long-term hematopoietic stem cells. This signature was not consistently enriched in neural or embryonic stem cell populations and, therefore, appears to be restricted to the hematopoeitic system. These observations provide evidence that the shared phenotype of self-renewal in the hematopoietic system is linked at the molecular level. PMID:16492737
Coane, Jennifer H; McBride, Dawn M; Termonen, Miia-Liisa; Cutting, J Cooper
2016-01-01
The goal of the present study was to examine the contributions of associative strength and similarity in terms of shared features to the production of false memories in the Deese/Roediger-McDermott list-learning paradigm. Whereas the activation/monitoring account suggests that false memories are driven by automatic associative activation from list items to nonpresented lures, combined with errors in source monitoring, other accounts (e.g., fuzzy trace theory, global-matching models) emphasize the importance of semantic-level similarity, and thus predict that shared features between list and lure items will increase false memory. Participants studied lists of nine items related to a nonpresented lure. Half of the lists consisted of items that were associated but did not share features with the lure, and the other half included items that were equally associated but also shared features with the lure (in many cases, these were taxonomically related items). The two types of lists were carefully matched in terms of a variety of lexical and semantic factors, and the same lures were used across list types. In two experiments, false recognition of the critical lures was greater following the study of lists that shared features with the critical lure, suggesting that similarity at a categorical or taxonomic level contributes to false memory above and beyond associative strength. We refer to this phenomenon as a "feature boost" that reflects additive effects of shared meaning and association strength and is generally consistent with accounts of false memory that have emphasized thematic or feature-level similarity among studied and nonstudied representations.
System and method for programmable bank selection for banked memory subsystems
Blumrich, Matthias A.; Chen, Dong; Gara, Alan G.; Giampapa, Mark E.; Hoenicke, Dirk; Ohmacht, Martin; Salapura, Valentina; Sugavanam, Krishnan
2010-09-07
A programmable memory system and method for enabling one or more processor devices access to shared memory in a computing environment, the shared memory including one or more memory storage structures having addressable locations for storing data. The system comprises: one or more first logic devices associated with a respective one or more processor devices, each first logic device for receiving physical memory address signals and programmable for generating a respective memory storage structure select signal upon receipt of pre-determined address bit values at selected physical memory address bit locations; and, a second logic device responsive to each of the respective select signal for generating an address signal used for selecting a memory storage structure for processor access. The system thus enables each processor device of a computing environment memory storage access distributed across the one or more memory storage structures.
Dardalhon, V; Jaleco, S; Kinet, S; Herpers, B; Steinberg, M; Ferrand, C; Froger, D; Leveau, C; Tiberghien, P; Charneau, P; Noraz, N; Taylor, N
2001-07-31
Differences in the immunological reactivity of umbilical cord (UC) and adult peripheral blood (APB) T cells are poorly understood. Here, we show that IL-7, a cytokine involved in lymphoid homeostasis, has distinct regulatory effects on APB and UC lymphocytes. Neither naive nor memory APB CD4(+) cells proliferated in response to IL-7, whereas naive UC CD4(+) lymphocytes underwent multiple divisions. Nevertheless, both naive and memory IL-7-treated APB T cells progressed into the G(1b) phase of the cell cycle, albeit at higher levels in the latter subset. The IL-7-treated memory CD4(+) lymphocyte population was significantly more susceptible to infection with an HIV-1-derived vector than dividing CD4(+) UC lymphocytes. However, activation through the T cell receptor rendered UC lymphocytes fully susceptible to HIV-1-based vector infection. These data unveil differences between UC and APB CD4(+) T cells with regard to IL-7-mediated cell cycle progression and HIV-1-based vector infectivity. This evidence indicates that IL-7 differentially regulates lymphoid homeostasis in adults and neonates.
Dardalhon, Valérie; Jaleco, Sara; Kinet, Sandrina; Herpers, Bjorn; Steinberg, Marcos; Ferrand, Christophe; Froger, Delphine; Leveau, Christelle; Tiberghien, Pierre; Charneau, Pierre; Noraz, Nelly; Taylor, Naomi
2001-01-01
Differences in the immunological reactivity of umbilical cord (UC) and adult peripheral blood (APB) T cells are poorly understood. Here, we show that IL-7, a cytokine involved in lymphoid homeostasis, has distinct regulatory effects on APB and UC lymphocytes. Neither naive nor memory APB CD4+ cells proliferated in response to IL-7, whereas naive UC CD4+ lymphocytes underwent multiple divisions. Nevertheless, both naive and memory IL-7-treated APB T cells progressed into the G1b phase of the cell cycle, albeit at higher levels in the latter subset. The IL-7-treated memory CD4+ lymphocyte population was significantly more susceptible to infection with an HIV-1-derived vector than dividing CD4+ UC lymphocytes. However, activation through the T cell receptor rendered UC lymphocytes fully susceptible to HIV-1-based vector infection. These data unveil differences between UC and APB CD4+ T cells with regard to IL-7-mediated cell cycle progression and HIV-1-based vector infectivity. This evidence indicates that IL-7 differentially regulates lymphoid homeostasis in adults and neonates. PMID:11470908
Shared Memory Parallelization of an Implicit ADI-type CFD Code
NASA Technical Reports Server (NTRS)
Hauser, Th.; Huang, P. G.
1999-01-01
A parallelization study designed for ADI-type algorithms is presented using the OpenMP specification for shared-memory multiprocessor programming. Details of optimizations specifically addressed to cache-based computer architectures are described and performance measurements for the single and multiprocessor implementation are summarized. The paper demonstrates that optimization of memory access on a cache-based computer architecture controls the performance of the computational algorithm. A hybrid MPI/OpenMP approach is proposed for clusters of shared memory machines to further enhance the parallel performance. The method is applied to develop a new LES/DNS code, named LESTool. A preliminary DNS calculation of a fully developed channel flow at a Reynolds number of 180, Re(sub tau) = 180, has shown good agreement with existing data.
GPU-accelerated adjoint algorithmic differentiation
NASA Astrophysics Data System (ADS)
Gremse, Felix; Höfter, Andreas; Razik, Lukas; Kiessling, Fabian; Naumann, Uwe
2016-03-01
Many scientific problems such as classifier training or medical image reconstruction can be expressed as minimization of differentiable real-valued cost functions and solved with iterative gradient-based methods. Adjoint algorithmic differentiation (AAD) enables automated computation of gradients of such cost functions implemented as computer programs. To backpropagate adjoint derivatives, excessive memory is potentially required to store the intermediate partial derivatives on a dedicated data structure, referred to as the ;tape;. Parallelization is difficult because threads need to synchronize their accesses during taping and backpropagation. This situation is aggravated for many-core architectures, such as Graphics Processing Units (GPUs), because of the large number of light-weight threads and the limited memory size in general as well as per thread. We show how these limitations can be mediated if the cost function is expressed using GPU-accelerated vector and matrix operations which are recognized as intrinsic functions by our AAD software. We compare this approach with naive and vectorized implementations for CPUs. We use four increasingly complex cost functions to evaluate the performance with respect to memory consumption and gradient computation times. Using vectorization, CPU and GPU memory consumption could be substantially reduced compared to the naive reference implementation, in some cases even by an order of complexity. The vectorization allowed usage of optimized parallel libraries during forward and reverse passes which resulted in high speedups for the vectorized CPU version compared to the naive reference implementation. The GPU version achieved an additional speedup of 7.5 ± 4.4, showing that the processing power of GPUs can be utilized for AAD using this concept. Furthermore, we show how this software can be systematically extended for more complex problems such as nonlinear absorption reconstruction for fluorescence-mediated tomography.
GPU-Accelerated Adjoint Algorithmic Differentiation.
Gremse, Felix; Höfter, Andreas; Razik, Lukas; Kiessling, Fabian; Naumann, Uwe
2016-03-01
Many scientific problems such as classifier training or medical image reconstruction can be expressed as minimization of differentiable real-valued cost functions and solved with iterative gradient-based methods. Adjoint algorithmic differentiation (AAD) enables automated computation of gradients of such cost functions implemented as computer programs. To backpropagate adjoint derivatives, excessive memory is potentially required to store the intermediate partial derivatives on a dedicated data structure, referred to as the "tape". Parallelization is difficult because threads need to synchronize their accesses during taping and backpropagation. This situation is aggravated for many-core architectures, such as Graphics Processing Units (GPUs), because of the large number of light-weight threads and the limited memory size in general as well as per thread. We show how these limitations can be mediated if the cost function is expressed using GPU-accelerated vector and matrix operations which are recognized as intrinsic functions by our AAD software. We compare this approach with naive and vectorized implementations for CPUs. We use four increasingly complex cost functions to evaluate the performance with respect to memory consumption and gradient computation times. Using vectorization, CPU and GPU memory consumption could be substantially reduced compared to the naive reference implementation, in some cases even by an order of complexity. The vectorization allowed usage of optimized parallel libraries during forward and reverse passes which resulted in high speedups for the vectorized CPU version compared to the naive reference implementation. The GPU version achieved an additional speedup of 7.5 ± 4.4, showing that the processing power of GPUs can be utilized for AAD using this concept. Furthermore, we show how this software can be systematically extended for more complex problems such as nonlinear absorption reconstruction for fluorescence-mediated tomography.
GPU-Accelerated Adjoint Algorithmic Differentiation
Gremse, Felix; Höfter, Andreas; Razik, Lukas; Kiessling, Fabian; Naumann, Uwe
2015-01-01
Many scientific problems such as classifier training or medical image reconstruction can be expressed as minimization of differentiable real-valued cost functions and solved with iterative gradient-based methods. Adjoint algorithmic differentiation (AAD) enables automated computation of gradients of such cost functions implemented as computer programs. To backpropagate adjoint derivatives, excessive memory is potentially required to store the intermediate partial derivatives on a dedicated data structure, referred to as the “tape”. Parallelization is difficult because threads need to synchronize their accesses during taping and backpropagation. This situation is aggravated for many-core architectures, such as Graphics Processing Units (GPUs), because of the large number of light-weight threads and the limited memory size in general as well as per thread. We show how these limitations can be mediated if the cost function is expressed using GPU-accelerated vector and matrix operations which are recognized as intrinsic functions by our AAD software. We compare this approach with naive and vectorized implementations for CPUs. We use four increasingly complex cost functions to evaluate the performance with respect to memory consumption and gradient computation times. Using vectorization, CPU and GPU memory consumption could be substantially reduced compared to the naive reference implementation, in some cases even by an order of complexity. The vectorization allowed usage of optimized parallel libraries during forward and reverse passes which resulted in high speedups for the vectorized CPU version compared to the naive reference implementation. The GPU version achieved an additional speedup of 7.5 ± 4.4, showing that the processing power of GPUs can be utilized for AAD using this concept. Furthermore, we show how this software can be systematically extended for more complex problems such as nonlinear absorption reconstruction for fluorescence-mediated tomography. PMID:26941443
Address tracing for parallel machines
NASA Technical Reports Server (NTRS)
Stunkel, Craig B.; Janssens, Bob; Fuchs, W. Kent
1991-01-01
Recently implemented parallel system address-tracing methods based on several metrics are surveyed. The issues specific to collection of traces for both shared and distributed memory parallel computers are highlighted. Five general categories of address-trace collection methods are examined: hardware-captured, interrupt-based, simulation-based, altered microcode-based, and instrumented program-based traces. The problems unique to shared memory and distributed memory multiprocessors are examined separately.
Howe, Piers D. L.
2017-01-01
To understand how the visual system represents multiple moving objects and how those representations contribute to tracking, it is essential that we understand how the processes of attention and working memory interact. In the work described here we present an investigation of that interaction via a series of tracking and working memory dual-task experiments. Previously, it has been argued that tracking is resistant to disruption by a concurrent working memory task and that any apparent disruption is in fact due to observers making a response to the working memory task, rather than due to competition for shared resources. Contrary to this, in our experiments we find that when task order and response order confounds are avoided, all participants show a similar decrease in both tracking and working memory performance. However, if task and response order confounds are not adequately controlled for we find substantial individual differences, which could explain the previous conflicting reports on this topic. Our results provide clear evidence that tracking and working memory tasks share processing resources. PMID:28410383
Lapierre, Mark D; Cropper, Simon J; Howe, Piers D L
2017-01-01
To understand how the visual system represents multiple moving objects and how those representations contribute to tracking, it is essential that we understand how the processes of attention and working memory interact. In the work described here we present an investigation of that interaction via a series of tracking and working memory dual-task experiments. Previously, it has been argued that tracking is resistant to disruption by a concurrent working memory task and that any apparent disruption is in fact due to observers making a response to the working memory task, rather than due to competition for shared resources. Contrary to this, in our experiments we find that when task order and response order confounds are avoided, all participants show a similar decrease in both tracking and working memory performance. However, if task and response order confounds are not adequately controlled for we find substantial individual differences, which could explain the previous conflicting reports on this topic. Our results provide clear evidence that tracking and working memory tasks share processing resources.
Vergauwe, Evie; Barrouillet, Pierre; Camos, Valérie
2009-07-01
Examinations of interference between visual and spatial materials in working memory have suggested domain- and process-based fractionations of visuo-spatial working memory. The present study examined the role of central time-based resource sharing in visuo-spatial working memory and assessed its role in obtained interference patterns. Visual and spatial storage were combined with both visual and spatial on-line processing components in computer-paced working memory span tasks (Experiment 1) and in a selective interference paradigm (Experiment 2). The cognitive load of the processing components was manipulated to investigate its impact on concurrent maintenance for both within-domain and between-domain combinations of processing and storage components. In contrast to both domain- and process-based fractionations of visuo-spatial working memory, the results revealed that recall performance was determined by the cognitive load induced by the processing of items, rather than by the domain to which those items pertained. These findings are interpreted as evidence for a time-based resource-sharing mechanism in visuo-spatial working memory.
A Parallel Vector Machine for the PM Programming Language
NASA Astrophysics Data System (ADS)
Bellerby, Tim
2016-04-01
PM is a new programming language which aims to make the writing of computational geoscience models on parallel hardware accessible to scientists who are not themselves expert parallel programmers. It is based around the concept of communicating operators: language constructs that enable variables local to a single invocation of a parallelised loop to be viewed as if they were arrays spanning the entire loop domain. This mechanism enables different loop invocations (which may or may not be executing on different processors) to exchange information in a manner that extends the successful Communicating Sequential Processes idiom from single messages to collective communication. Communicating operators avoid the additional synchronisation mechanisms, such as atomic variables, required when programming using the Partitioned Global Address Space (PGAS) paradigm. Using a single loop invocation as the fundamental unit of concurrency enables PM to uniformly represent different levels of parallelism from vector operations through shared memory systems to distributed grids. This paper describes an implementation of PM based on a vectorised virtual machine. On a single processor node, concurrent operations are implemented using masked vector operations. Virtual machine instructions operate on vectors of values and may be unmasked, masked using a Boolean field, or masked using an array of active vector cell locations. Conditional structures (such as if-then-else or while statement implementations) calculate and apply masks to the operations they control. A shift in mask representation from Boolean to location-list occurs when active locations become sufficiently sparse. Parallel loops unfold data structures (or vectors of data structures for nested loops) into vectors of values that may additionally be distributed over multiple computational nodes and then split into micro-threads compatible with the size of the local cache. Inter-node communication is accomplished using standard OpenMP and MPI. Performance analyses of the PM vector machine, demonstrating its scaling properties with respect to domain size and the number of processor nodes will be presented for a range of hardware configurations. The PM software and language definition are being made available under unrestrictive MIT and Creative Commons Attribution licenses respectively: www.pm-lang.org.
Why are you telling me that? A conceptual model of the social function of autobiographical memory.
Alea, Nicole; Bluck, Susan
2003-03-01
In an effort to stimulate and guide empirical work within a functional framework, this paper provides a conceptual model of the social functions of autobiographical memory (AM) across the lifespan. The model delineates the processes and variables involved when AMs are shared to serve social functions. Components of the model include: lifespan contextual influences, the qualitative characteristics of memory (emotionality and level of detail recalled), the speaker's characteristics (age, gender, and personality), the familiarity and similarity of the listener to the speaker, the level of responsiveness during the memory-sharing process, and the nature of the social relationship in which the memory sharing occurs (valence and length of the relationship). These components are shown to influence the type of social function served and/or, the extent to which social functions are served. Directions for future empirical work to substantiate the model and hypotheses derived from the model are provided.
The boundary vector cell model of place cell firing and spatial memory
Barry, Caswell; Lever, Colin; Hayman, Robin; Hartley, Tom; Burton, Stephen; O'Keefe, John; Jeffery, Kate; Burgess, Neil
2009-01-01
We review evidence for the boundary vector cell model of the environmental determinants of the firing of hippocampal place cells. Preliminary experimental results are presented concerning the effects of addition or removal of environmental boundaries on place cell firing and evidence that boundary vector cells may exist in the subiculum. We review and update computational simulations predicting the location of human search within a virtual environment of variable geometry, assuming that boundary vector cells provide one of the input representations of location used in mammalian spatial memory. Finally, we extend the model to include experience-dependent modification of connection strengths through a BCM-like learning rule, and compare the effects to experimental data on the firing of place cells under geometrical manipulations to their environment. The relationship between neurophysiological results in rats and spatial behaviour in humans is discussed. PMID:16703944
A GaAs vector processor based on parallel RISC microprocessors
NASA Astrophysics Data System (ADS)
Misko, Tim A.; Rasset, Terry L.
A vector processor architecture based on the development of a 32-bit microprocessor using gallium arsenide (GaAs) technology has been developed. The McDonnell Douglas vector processor (MVP) will be fabricated completely from GaAs digital integrated circuits. The MVP architecture includes a vector memory of 1 megabyte, a parallel bus architecture with eight processing elements connected in parallel, and a control processor. The processing elements consist of a reduced instruction set CPU (RISC) with four floating-point coprocessor units and necessary memory interface functions. This architecture has been simulated for several benchmark programs including complex fast Fourier transform (FFT), complex inner product, trigonometric functions, and sort-merge routine. The results of this study indicate that the MVP can process a 1024-point complex FFT at a speed of 112 microsec (389 megaflops) while consuming approximately 618 W of power in a volume of approximately 0.1 ft-cubed.
Destination memory impairment in older people.
Gopie, Nigel; Craik, Fergus I M; Hasher, Lynn
2010-12-01
Older adults are assumed to have poor destination memory-knowing to whom they tell particular information-and anecdotes about them repeating stories to the same people are cited as informal evidence for this claim. Experiment 1 assessed young and older adults' destination memory by having participants tell facts (e.g., "A dime has 118 ridges around its edge") to pictures of famous people (e.g., Oprah Winfrey). Surprise recognition memory tests, which also assessed confidence, revealed that older adults, compared to young adults, were disproportionately impaired on destination memory relative to spared memory for the individual components (i.e., facts, faces) of the episode. Older adults also were more confident that they had not told a fact to a particular person when they actually had (i.e., a miss); this presumably causes them to repeat information more often than young adults. When the direction of information transfer was reversed in Experiment 2, such that the famous people shared information with the participants (i.e., a source memory experiment), age-related memory differences disappeared. In contrast to the destination memory experiment, older adults in the source memory experiment were more confident than young adults that someone had shared a fact with them when a different person actually had shared the fact (i.e., a false alarm). Overall, accuracy and confidence jointly influence age-related changes to destination memory, a fundamental component of successful communication. (c) 2010 APA, all rights reserved).
Sardar, Tridip; Rana, Sourav; Bhattacharya, Sabyasachi; Al-Khaled, Kamel; Chattopadhyay, Joydev
2015-05-01
In the present investigation, three mathematical models on a common single strain mosquito-transmitted diseases are considered. The first one is based on ordinary differential equations, and other two models are based on fractional order differential equations. The proposed models are validated using published monthly dengue incidence data from two provinces of Venezuela during the period 1999-2002. We estimate several parameters of these models like the order of the fractional derivatives (in case of two fractional order systems), the biting rate of mosquito, two probabilities of infection, mosquito recruitment and mortality rates, etc., from the data. The basic reproduction number, R0, for the ODE system is estimated using the data. For two fractional order systems, an upper bound for, R0, is derived and its value is obtained using the published data. The force of infection, and the effective reproduction number, R(t), for the three models are estimated using the data. Sensitivity analysis of the mosquito memory parameter with some important responses is worked out. We use Akaike Information Criterion (AIC) to identify the best model among the three proposed models. It is observed that the model with memory in both the host, and the vector population provides a better agreement with epidemic data. Finally, we provide a control strategy for the vector-borne disease, dengue, using the memory of the host, and the vector. Copyright © 2015 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Voznyuk, I.; Litman, A.; Tortel, H.
2015-08-01
A Quasi-Newton method for reconstructing the constitutive parameters of three-dimensional (3D) penetrable scatterers from scattered field measurements is presented. This method is adapted for handling large-scale electromagnetic problems while keeping the memory requirement and the time flexibility as low as possible. The forward scattering problem is solved by applying the finite-element tearing and interconnecting full-dual-primal (FETI-FDP2) method which shares the same spirit as the domain decomposition methods for finite element methods. The idea is to split the computational domain into smaller non-overlapping sub-domains in order to simultaneously solve local sub-problems. Various strategies are proposed in order to efficiently couple the inversion algorithm with the FETI-FDP2 method: a separation into permanent and non-permanent subdomains is performed, iterative solvers are favorized for resolving the interface problem and a marching-on-in-anything initial guess selection further accelerates the process. The computational burden is also reduced by applying the adjoint state vector methodology. Finally, the inversion algorithm is confronted to measurements extracted from the 3D Fresnel database.
NAS Applications and Advanced Algorithms
NASA Technical Reports Server (NTRS)
Bailey, David H.; Biswas, Rupak; VanDerWijngaart, Rob; Kutler, Paul (Technical Monitor)
1997-01-01
This paper examines the applications most commonly run on the supercomputers at the Numerical Aerospace Simulation (NAS) facility. It analyzes the extent to which such applications are fundamentally oriented to vector computers, and whether or not they can be efficiently implemented on hierarchical memory machines, such as systems with cache memories and highly parallel, distributed memory systems.
Job Management Requirements for NAS Parallel Systems and Clusters
NASA Technical Reports Server (NTRS)
Saphir, William; Tanner, Leigh Ann; Traversat, Bernard
1995-01-01
A job management system is a critical component of a production supercomputing environment, permitting oversubscribed resources to be shared fairly and efficiently. Job management systems that were originally designed for traditional vector supercomputers are not appropriate for the distributed-memory parallel supercomputers that are becoming increasingly important in the high performance computing industry. Newer job management systems offer new functionality but do not solve fundamental problems. We address some of the main issues in resource allocation and job scheduling we have encountered on two parallel computers - a 160-node IBM SP2 and a cluster of 20 high performance workstations located at the Numerical Aerodynamic Simulation facility. We describe the requirements for resource allocation and job management that are necessary to provide a production supercomputing environment on these machines, prioritizing according to difficulty and importance, and advocating a return to fundamental issues.
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Heber, Gerd; Biswas, Rupak
2000-01-01
The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. A sparse matrix-vector multiply (SPMV) usually accounts for most of the floating-point operations within a CG iteration. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and SPMV using different programming paradigms and architectures. Results show that for this class of applications, ordering significantly improves overall performance, that cache reuse may be more important than reducing communication, and that it is possible to achieve message passing performance using shared memory constructs through careful data ordering and distribution. However, a multi-threaded implementation of CG on the Tera MTA does not require special ordering or partitioning to obtain high efficiency and scalability.
Interference due to shared features between action plans is influenced by working memory span.
Fournier, Lisa R; Behmer, Lawrence P; Stubblefield, Alexandra M
2014-12-01
In this study, we examined the interactions between the action plans that we hold in memory and the actions that we carry out, asking whether the interference due to shared features between action plans is due to selection demands imposed on working memory. Individuals with low and high working memory spans learned arbitrary motor actions in response to two different visual events (A and B), presented in a serial order. They planned a response to the first event (A) and while maintaining this action plan in memory they then executed a speeded response to the second event (B). Afterward, they executed the action plan for the first event (A) maintained in memory. Speeded responses to the second event (B) were delayed when it shared an action feature (feature overlap) with the first event (A), relative to when it did not (no feature overlap). The size of the feature-overlap delay was greater for low-span than for high-span participants. This indicates that interference due to overlapping action plans is greater when fewer working memory resources are available, suggesting that this interference is due to selection demands imposed on working memory. Thus, working memory plays an important role in managing current and upcoming action plans, at least for newly learned tasks. Also, managing multiple action plans is compromised in individuals who have low versus high working memory spans.
Destination Memory Impairment in Older People
Gopie, Nigel; Craik, Fergus I. M.; Hasher, Lynn
2012-01-01
Older adults are assumed to have poor destination memory— knowing to whom they tell particular information—and anecdotes about them repeating stories to the same people are cited as informal evidence for this claim. Experiment 1 assessed young and older adults’ destination memory by having participants tell facts (e.g., “A dime has 118 ridges around its edge”) to pictures of famous people (e.g., Oprah Winfrey). Surprise recognition memory tests, which also assessed confidence, revealed that older adults, compared to young adults, were disproportionately impaired on destination memory relative to spared memory for the individual components (i.e., facts, faces) of the episode. Older adults also were more confident that they had not told a fact to a particular person when they actually had (i.e., a miss); this presumably causes them to repeat information more often than young adults. When the direction of information transfer was reversed in Experiment 2, such that the famous people shared information with the participants (i.e., a source memory experiment), age-related memory differences disappeared. In contrast to the destination memory experiment, older adults in the source memory experiment were more confident than young adults that someone had shared a fact with them when a different person actually had shared the fact (i.e., a false alarm). Overall, accuracy and confidence jointly influence age-related changes to destination memory, a fundamental component of successful communication. PMID:20718537
Efficacy of Code Optimization on Cache-based Processors
NASA Technical Reports Server (NTRS)
VanderWijngaart, Rob F.; Chancellor, Marisa K. (Technical Monitor)
1997-01-01
The current common wisdom in the U.S. is that the powerful, cost-effective supercomputers of tomorrow will be based on commodity (RISC) micro-processors with cache memories. Already, most distributed systems in the world use such hardware as building blocks. This shift away from vector supercomputers and towards cache-based systems has brought about a change in programming paradigm, even when ignoring issues of parallelism. Vector machines require inner-loop independence and regular, non-pathological memory strides (usually this means: non-power-of-two strides) to allow efficient vectorization of array operations. Cache-based systems require spatial and temporal locality of data, so that data once read from main memory and stored in high-speed cache memory is used optimally before being written back to main memory. This means that the most cache-friendly array operations are those that feature zero or unit stride, so that each unit of data read from main memory (a cache line) contains information for the next iteration in the loop. Moreover, loops ought to be 'fat', meaning that as many operations as possible are performed on cache data-provided instruction caches do not overflow and enough registers are available. If unit stride is not possible, for example because of some data dependency, then care must be taken to avoid pathological strides, just ads on vector computers. For cache-based systems the issues are more complex, due to the effects of associativity and of non-unit block (cache line) size. But there is more to the story. Most modern micro-processors are superscalar, which means that they can issue several (arithmetic) instructions per clock cycle, provided that there are enough independent instructions in the loop body. This is another argument for providing fat loop bodies. With these restrictions, it appears fairly straightforward to produce code that will run efficiently on any cache-based system. It can be argued that although some of the important computational algorithms employed at NASA Ames require different programming styles on vector machines and cache-based machines, respectively, neither architecture class appeared to be favored by particular algorithms in principle. Practice tells us that the situation is more complicated. This report presents observations and some analysis of performance tuning for cache-based systems. We point out several counterintuitive results that serve as a cautionary reminder that memory accesses are not the only factors that determine performance, and that within the class of cache-based systems, significant differences exist.
NASA Technical Reports Server (NTRS)
Kanerva, P.
1986-01-01
To determine the relation of the sparse, distributed memory to other architectures, a broad review of the literature was made. The memory is called a pattern memory because they work with large patterns of features (high-dimensional vectors). A pattern is stored in a pattern memory by distributing it over a large number of storage elements and by superimposing it over other stored patterns. A pattern is retrieved by mathematical or statistical reconstruction from the distributed elements. Three pattern memories are discussed.
An adaptive vector quantization scheme
NASA Technical Reports Server (NTRS)
Cheung, K.-M.
1990-01-01
Vector quantization is known to be an effective compression scheme to achieve a low bit rate so as to minimize communication channel bandwidth and also to reduce digital memory storage while maintaining the necessary fidelity of the data. However, the large number of computations required in vector quantizers has been a handicap in using vector quantization for low-rate source coding. An adaptive vector quantization algorithm is introduced that is inherently suitable for simple hardware implementation because it has a simple architecture. It allows fast encoding and decoding because it requires only addition and subtraction operations.
Low latency memory access and synchronization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Blumrich, Matthias A.; Chen, Dong; Coteus, Paul W.
A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Each processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processormore » only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple prefetching for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefetch rather than some other predictive algorithm. This enables hardware to effectively prefetch memory access patterns that are non-contiguous, but repetitive.« less
Low latency memory access and synchronization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Blumrich, Matthias A.; Chen, Dong; Coteus, Paul W.
A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Bach processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processormore » only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple prefetching for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefetch rather than some other predictive algorithm. This enables hardware to effectively prefetch memory access patterns that are non-contiguous, but repetitive.« less
Location-Unbound Color-Shape Binding Representations in Visual Working Memory.
Saiki, Jun
2016-02-01
The mechanism by which nonspatial features, such as color and shape, are bound in visual working memory, and the role of those features' location in their binding, remains unknown. In the current study, I modified a redundancy-gain paradigm to investigate these issues. A set of features was presented in a two-object memory display, followed by a single object probe. Participants judged whether the probe contained any features of the memory display, regardless of its location. Response time distributions revealed feature coactivation only when both features of a single object in the memory display appeared together in the probe, regardless of the response time benefit from the probe and memory objects sharing the same location. This finding suggests that a shared location is necessary in the formation of bound representations but unnecessary in their maintenance. Electroencephalography data showed that amplitude modulations reflecting location-unbound feature coactivation were different from those reflecting the location-sharing benefit, consistent with the behavioral finding that feature-location binding is unnecessary in the maintenance of color-shape binding. © The Author(s) 2015.
Echterhoff, Gerald; Kopietz, René; Higgins, E Tory
2017-06-01
Communicators typically tune messages to their audience's attitude. Such audience tuning biases communicators' memory for the topic toward the audience's attitude to the extent that they create a shared reality with the audience. To investigate shared reality in intergroup communication, we first established that a reduced memory bias after tuning messages to an out-group (vs. in-group) audience is a subtle index of communicators' denial of shared reality to that out-group audience (Experiments 1a and 1b). We then examined whether the audience-tuning memory bias might emerge when the out-group audience's epistemic authority is enhanced, either by increasing epistemic expertise concerning the communication topic or by creating epistemic consensus among members of a multiperson out-group audience. In Experiment 2, when Germans communicated to a Turkish audience with an attitude about a Turkish (vs. German) target, the audience-tuning memory bias appeared. In Experiment 3, when the audience of German communicators consisted of 3 Turks who all held the same attitude toward the target, the memory bias again appeared. The association between message valence and memory valence was consistently higher when the audience's epistemic authority was high (vs. low). An integrative analysis across all studies also suggested that the memory bias increases with increasing strength of epistemic inputs (epistemic expertise, epistemic consensus, and audience-tuned message production). The findings suggest novel ways of overcoming intergroup biases in intergroup relations. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
The Developmental Influence of Primary Memory Capacity on Working Memory and Academic Achievement
2015-01-01
In this study, we investigate the development of primary memory capacity among children. Children between the ages of 5 and 8 completed 3 novel tasks (split span, interleaved lists, and a modified free-recall task) that measured primary memory by estimating the number of items in the focus of attention that could be spontaneously recalled in serial order. These tasks were calibrated against traditional measures of simple and complex span. Clear age-related changes in these primary memory estimates were observed. There were marked individual differences in primary memory capacity, but each novel measure was predictive of simple span performance. Among older children, each measure shared variance with reading and mathematics performance, whereas for younger children, the interleaved lists task was the strongest single predictor of academic ability. We argue that these novel tasks have considerable potential for the measurement of primary memory capacity and provide new, complementary ways of measuring the transient memory processes that predict academic performance. The interleaved lists task also shared features with interference control tasks, and our findings suggest that young children have a particular difficulty in resisting distraction and that variance in the ability to resist distraction is also shared with measures of educational attainment. PMID:26075630
The developmental influence of primary memory capacity on working memory and academic achievement.
Hall, Debbora; Jarrold, Christopher; Towse, John N; Zarandi, Amy L
2015-08-01
In this study, we investigate the development of primary memory capacity among children. Children between the ages of 5 and 8 completed 3 novel tasks (split span, interleaved lists, and a modified free-recall task) that measured primary memory by estimating the number of items in the focus of attention that could be spontaneously recalled in serial order. These tasks were calibrated against traditional measures of simple and complex span. Clear age-related changes in these primary memory estimates were observed. There were marked individual differences in primary memory capacity, but each novel measure was predictive of simple span performance. Among older children, each measure shared variance with reading and mathematics performance, whereas for younger children, the interleaved lists task was the strongest single predictor of academic ability. We argue that these novel tasks have considerable potential for the measurement of primary memory capacity and provide new, complementary ways of measuring the transient memory processes that predict academic performance. The interleaved lists task also shared features with interference control tasks, and our findings suggest that young children have a particular difficulty in resisting distraction and that variance in the ability to resist distraction is also shared with measures of educational attainment. (c) 2015 APA, all rights reserved).
Conditional load and store in a shared memory
Blumrich, Matthias A; Ohmacht, Martin
2015-02-03
A method, system and computer program product for implementing load-reserve and store-conditional instructions in a multi-processor computing system. The computing system includes a multitude of processor units and a shared memory cache, and each of the processor units has access to the memory cache. In one embodiment, the method comprises providing the memory cache with a series of reservation registers, and storing in these registers addresses reserved in the memory cache for the processor units as a result of issuing load-reserve requests. In this embodiment, when one of the processor units makes a request to store data in the memory cache using a store-conditional request, the reservation registers are checked to determine if an address in the memory cache is reserved for that processor unit. If an address in the memory cache is reserved for that processor, the data are stored at this address.
Performance Analysis of Multilevel Parallel Applications on Shared Memory Architectures
NASA Technical Reports Server (NTRS)
Biegel, Bryan A. (Technical Monitor); Jost, G.; Jin, H.; Labarta J.; Gimenez, J.; Caubet, J.
2003-01-01
Parallel programming paradigms include process level parallelism, thread level parallelization, and multilevel parallelism. This viewgraph presentation describes a detailed performance analysis of these paradigms for Shared Memory Architecture (SMA). This analysis uses the Paraver Performance Analysis System. The presentation includes diagrams of a flow of useful computations.
Measuring Transactiving Memory Systems Using Network Analysis
ERIC Educational Resources Information Center
King, Kylie Goodell
2017-01-01
Transactive memory systems (TMSs) describe the structures and processes that teams use to share information, work together, and accomplish shared goals. First introduced over three decades ago, TMSs have been measured in a variety of ways. This dissertation proposes the use of network analysis in measuring TMS. This is accomplished by describing…
Operator Influence of Unexploded Ordnance Sensor Technologies
2007-03-01
chart display ActiveX control Mscomct2.dll – date/time display ActiveX control Pnpscr.dll – Systran SCRAMNet replicated shared memory device...response value database rgm_p2.dll – Phase 2 shared memory API and implementation Commercial components StripM.ocx – strip chart display ActiveX
Runtime support for parallelizing data mining algorithms
NASA Astrophysics Data System (ADS)
Jin, Ruoming; Agrawal, Gagan
2002-03-01
With recent technological advances, shared memory parallel machines have become more scalable, and offer large main memories and high bus bandwidths. They are emerging as good platforms for data warehousing and data mining. In this paper, we focus on shared memory parallelization of data mining algorithms. We have developed a series of techniques for parallelization of data mining algorithms, including full replication, full locking, fixed locking, optimized full locking, and cache-sensitive locking. Unlike previous work on shared memory parallelization of specific data mining algorithms, all of our techniques apply to a large number of common data mining algorithms. In addition, we propose a reduction-object based interface for specifying a data mining algorithm. We show how our runtime system can apply any of the technique we have developed starting from a common specification of the algorithm.
Concurrent working memory load can facilitate selective attention: evidence for specialized load.
Park, Soojin; Kim, Min-Shik; Chun, Marvin M
2007-10-01
Load theory predicts that concurrent working memory load impairs selective attention and increases distractor interference (N. Lavie, A. Hirst, J. W. de Fockert, & E. Viding). Here, the authors present new evidence that the type of concurrent working memory load determines whether load impairs selective attention or not. Working memory load was paired with a same/different matching task that required focusing on targets while ignoring distractors. When working memory items shared the same limited-capacity processing mechanisms with targets in the matching task, distractor interference increased. However, when working memory items shared processing with distractors in the matching task, distractor interference decreased, facilitating target selection. A specialized load account is proposed to describe the dissociable effects of working memory load on selective processing depending on whether the load overlaps with targets or with distractors. (c) 2007 APA
Parallelization of Lower-Upper Symmetric Gauss-Seidel Method for Chemically Reacting Flow
NASA Technical Reports Server (NTRS)
Yoon, Seokkwan; Jost, Gabriele; Chang, Sherry
2005-01-01
Development of technologies for exploration of the solar system has revived an interest in computational simulation of chemically reacting flows since planetary probe vehicles exhibit non-equilibrium phenomena during the atmospheric entry of a planet or a moon as well as the reentry to the Earth. Stability in combustion is essential for new propulsion systems. Numerical solution of real-gas flows often increases computational work by an order-of-magnitude compared to perfect gas flow partly because of the increased complexity of equations to solve. Recently, as part of Project Columbia, NASA has integrated a cluster of interconnected SGI Altix systems to provide a ten-fold increase in current supercomputing capacity that includes an SGI Origin system. Both the new and existing machines are based on cache coherent non-uniform memory access architecture. Lower-Upper Symmetric Gauss-Seidel (LU-SGS) relaxation method has been implemented into both perfect and real gas flow codes including Real-Gas Aerodynamic Simulator (RGAS). However, the vectorized RGAS code runs inefficiently on cache-based shared-memory machines such as SGI system. Parallelization of a Gauss-Seidel method is nontrivial due to its sequential nature. The LU-SGS method has been vectorized on an oblique plane in INS3D-LU code that has been one of the base codes for NAS Parallel benchmarks. The oblique plane has been called a hyperplane by computer scientists. It is straightforward to parallelize a Gauss-Seidel method by partitioning the hyperplanes once they are formed. Another way of parallelization is to schedule processors like a pipeline using software. Both hyperplane and pipeline methods have been implemented using openMP directives. The present paper reports the performance of the parallelized RGAS code on SGI Origin and Altix systems.
Transactive memory systems scale for couples: development and validation
Hewitt, Lauren Y.; Roberts, Lynne D.
2015-01-01
People in romantic relationships can develop shared memory systems by pooling their cognitive resources, allowing each person access to more information but with less cognitive effort. Research examining such memory systems in romantic couples largely focuses on remembering word lists or performing lab-based tasks, but these types of activities do not capture the processes underlying couples’ transactive memory systems, and may not be representative of the ways in which romantic couples use their shared memory systems in everyday life. We adapted an existing measure of transactive memory systems for use with romantic couples (TMSS-C), and conducted an initial validation study. In total, 397 participants who each identified as being a member of a romantic relationship of at least 3 months duration completed the study. The data provided a good fit to the anticipated three-factor structure of the components of couples’ transactive memory systems (specialization, credibility and coordination), and there was reasonable evidence of both convergent and divergent validity, as well as strong evidence of test–retest reliability across a 2-week period. The TMSS-C provides a valuable tool that can quickly and easily capture the underlying components of romantic couples’ transactive memory systems. It has potential to help us better understand this intriguing feature of romantic relationships, and how shared memory systems might be associated with other important features of romantic relationships. PMID:25999873
DMA shared byte counters in a parallel computer
Chen, Dong; Gara, Alan G.; Heidelberger, Philip; Vranas, Pavlos
2010-04-06
A parallel computer system is constructed as a network of interconnected compute nodes. Each of the compute nodes includes at least one processor, a memory and a DMA engine. The DMA engine includes a processor interface for interfacing with the at least one processor, DMA logic, a memory interface for interfacing with the memory, a DMA network interface for interfacing with the network, injection and reception byte counters, injection and reception FIFO metadata, and status registers and control registers. The injection FIFOs maintain memory locations of the injection FIFO metadata memory locations including its current head and tail, and the reception FIFOs maintain the reception FIFO metadata memory locations including its current head and tail. The injection byte counters and reception byte counters may be shared between messages.
Division of attention as a function of the number of steps, visual shifts, and memory load
NASA Technical Reports Server (NTRS)
Chechile, R. A.; Butler, K.; Gutowski, W.; Palmer, E. A.
1986-01-01
The effects on divided attention of visual shifts and long-term memory retrieval during a monitoring task are considered. A concurrent vigilance task was standardized under all experimental conditions. The results show that subjects can perform nearly perfectly on all of the time-shared tasks if long-term memory retrieval is not required for monitoring. With the requirement of memory retrieval, however, there was a large decrease in accuracy for all of the time-shared activities. It was concluded that the attentional demand of longterm memory retrieval is appreciable (even for a well-learned motor sequence), and thus memory retrieval results in a sizable reduction in the capability of subjects to divide their attention. A selected bibliography on the divided attention literature is provided.
Optimal cue integration in ants.
Wystrach, Antoine; Mangan, Michael; Webb, Barbara
2015-10-07
In situations with redundant or competing sensory information, humans have been shown to perform cue integration, weighting different cues according to their certainty in a quantifiably optimal manner. Ants have been shown to merge the directional information available from their path integration (PI) and visual memory, but as yet it is not clear that they do so in a way that reflects the relative certainty of the cues. In this study, we manipulate the variance of the PI home vector by allowing ants (Cataglyphis velox) to run different distances and testing their directional choice when the PI vector direction is put in competition with visual memory. Ants show progressively stronger weighting of their PI direction as PI length increases. The weighting is quantitatively predicted by modelling the expected directional variance of home vectors of different lengths and assuming optimal cue integration. However, a subsequent experiment suggests ants may not actually compute an internal estimate of the PI certainty, but are using the PI home vector length as a proxy. © 2015 The Author(s).
NASA Astrophysics Data System (ADS)
Kepner, J. V.; Janka, R. S.; Lebak, J.; Richards, M. A.
1999-12-01
The Vector/Signal/Image Processing Library (VSIPL) is a DARPA initiated effort made up of industry, government and academic representatives who have defined an industry standard API for vector, signal, and image processing primitives for real-time signal processing on high performance systems. VSIPL supports a wide range of data types (int, float, complex, ...) and layouts (vectors, matrices and tensors) and is ideal for astronomical data processing. The VSIPL API is intended to serve as an open, vendor-neutral, industry standard interface. The object-based VSIPL API abstracts the memory architecture of the underlying machine by using the concept of memory blocks and views. Early experiments with VSIPL code conversions have been carried out by the High Performance Computing Program team at the UCSD. Commercially, several major vendors of signal processors are actively developing implementations. VSIPL has also been explicitly required as part of a recent Rome Labs teraflop procurement. This poster presents the VSIPL API, its functionality and the status of various implementations.
Welcoming nora: a family event.
Walsh, Allison J; Walsh, Paul R; Walsh, Jane M; Walsh, Gavin T
2011-01-01
In this column, Allison and Paul Walsh share the story of the birth of Nora, their third baby and their second child to be born at home. Allison and Paul share their individual memories of labor and birth. But their story is only part of the story of Nora's birth. Nora's birth was a family event, with Allison and Paul's other children very much part of the experience. Jane and Gavin share their own memories of their baby sister's birth.
Video data compression using artificial neural network differential vector quantization
NASA Technical Reports Server (NTRS)
Krishnamurthy, Ashok K.; Bibyk, Steven B.; Ahalt, Stanley C.
1991-01-01
An artificial neural network vector quantizer is developed for use in data compression applications such as Digital Video. Differential Vector Quantization is used to preserve edge features, and a new adaptive algorithm, known as Frequency-Sensitive Competitive Learning, is used to develop the vector quantizer codebook. To develop real time performance, a custom Very Large Scale Integration Application Specific Integrated Circuit (VLSI ASIC) is being developed to realize the associative memory functions needed in the vector quantization algorithm. By using vector quantization, the need for Huffman coding can be eliminated, resulting in superior performance against channel bit errors than methods that use variable length codes.
Colouring in the Blanks: Memory Drawings of the 1990 Kuwait Invasion
ERIC Educational Resources Information Center
Pepin-Wakefield, Yvonne
2009-01-01
This study used drawing tasks to examine the similarities and differences between females and males who shared a collective traumatic event in early childhood. Could these childhood memories be recorded, measured, and compared for gender differences in drawings by young adults who had shared a similar experience as children? Exploration of this…
ERIC Educational Resources Information Center
Kulkofsky, Sarah; Wang, Qi; Koh, Jessie Bee Kim
2009-01-01
This study examined maternal beliefs about the functions of memory sharing and the relations between these beliefs and mother-child reminiscing behaviors in a cross-cultural context. Sixty-three European American and 47 Chinese mothers completed an open-ended questionnaire concerning their beliefs about the functions of parent-child memory…
Stillbirth and stigma: the spoiling and repair of multiple social identities.
Brierley-Jones, Lyn; Crawley, Rosalind; Lomax, Samantha; Ayers, Susan
This study investigated mothers' experiences surrounding stillbirth in the United Kingdom, their memory making and sharing opportunities, and the effect these opportunities had on them. Qualitative data were generated from free text responses to open-ended questions. Thematic content analysis revealed that "stigma" was experienced by most women and Goffman's (1963) work on stigma was subsequently used as an analytical framework. Results suggest that stillbirth can spoil the identities of "patient," "mother," and "full citizen." Stigma was reported as arising from interactions with professionals, family, friends, work colleagues, and even casual acquaintances. Stillbirth produces common learning experiences often requiring "identity work" (Murphy, 2012). Memory making and sharing may be important in this work and further research is needed. Stigma can reduce the memory sharing opportunities for women after stillbirth and this may explain some of the differential mental health effects of memory making after stillbirth that is documented in the literature.
Parallelization of KENO-Va Monte Carlo code
NASA Astrophysics Data System (ADS)
Ramón, Javier; Peña, Jorge
1995-07-01
KENO-Va is a code integrated within the SCALE system developed by Oak Ridge that solves the transport equation through the Monte Carlo Method. It is being used at the Consejo de Seguridad Nuclear (CSN) to perform criticality calculations for fuel storage pools and shipping casks. Two parallel versions of the code: one for shared memory machines and other for distributed memory systems using the message-passing interface PVM have been generated. In both versions the neutrons of each generation are tracked in parallel. In order to preserve the reproducibility of the results in both versions, advanced seeds for random numbers were used. The CONVEX C3440 with four processors and shared memory at CSN was used to implement the shared memory version. A FDDI network of 6 HP9000/735 was employed to implement the message-passing version using proprietary PVM. The speedup obtained was 3.6 in both cases.
A simple modern correctness condition for a space-based high-performance multiprocessor
NASA Technical Reports Server (NTRS)
Probst, David K.; Li, Hon F.
1992-01-01
A number of U.S. national programs, including space-based detection of ballistic missile launches, envisage putting significant computing power into space. Given sufficient progress in low-power VLSI, multichip-module packaging and liquid-cooling technologies, we will see design of high-performance multiprocessors for individual satellites. In very high speed implementations, performance depends critically on tolerating large latencies in interprocessor communication; without latency tolerance, performance is limited by the vastly differing time scales in processor and data-memory modules, including interconnect times. The modern approach to tolerating remote-communication cost in scalable, shared-memory multiprocessors is to use a multithreaded architecture, and alter the semantics of shared memory slightly, at the price of forcing the programmer either to reason about program correctness in a relaxed consistency model or to agree to program in a constrained style. The literature on multiprocessor correctness conditions has become increasingly complex, and sometimes confusing, which may hinder its practical application. We propose a simple modern correctness condition for a high-performance, shared-memory multiprocessor; the correctness condition is based on a simple interface between the multiprocessor architecture and a high-performance, shared-memory multiprocessor; the correctness condition is based on a simple interface between the multiprocessor architecture and the parallel programming system.
Parra, Mario A; Mikulan, Ezequiel; Trujillo, Natalia; Sala, Sergio Della; Lopera, Francisco; Manes, Facundo; Starr, John; Ibanez, Agustin
2017-01-01
Alzheimer's disease (AD) as a disconnection syndrome which disrupts both brain information sharing and memory binding functions. The extent to which these two phenotypic expressions share pathophysiological mechanisms remains unknown. To unveil the electrophysiological correlates of integrative memory impairments in AD towards new memory biomarkers for its prodromal stages. Patients with 100% risk of familial AD (FAD) and healthy controls underwent assessment with the Visual Short-Term Memory binding test (VSTMBT) while we recorded their EEG. We applied a novel brain connectivity method (Weighted Symbolic Mutual Information) to EEG data. Patients showed significant deficits during the VSTMBT. A reduction of brain connectivity was observed during resting as well as during correct VSTM binding, particularly over frontal and posterior regions. An increase of connectivity was found during VSTM binding performance over central regions. While decreased connectivity was found in cases in more advanced stages of FAD, increased brain connectivity appeared in cases in earlier stages. Such altered patterns of task-related connectivity were found in 89% of the assessed patients. VSTM binding in the prodromal stages of FAD are associated to altered patterns of brain connectivity thus confirming the link between integrative memory deficits and impaired brain information sharing in prodromal FAD. While significant loss of brain connectivity seems to be a feature of the advanced stages of FAD increased brain connectivity characterizes its earlier stages. These findings are discussed in the light of recent proposals about the earliest pathophysiological mechanisms of AD and their clinical expression. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Wilkes, Daniel R; Duncan, Alec J
2015-04-01
This paper presents a numerical model for the acoustic coupled fluid-structure interaction (FSI) of a submerged finite elastic body using the fast multipole boundary element method (FMBEM). The Helmholtz and elastodynamic boundary integral equations (BIEs) are, respectively, employed to model the exterior fluid and interior solid domains, and the pressure and displacement unknowns are coupled between conforming meshes at the shared boundary interface to achieve the acoustic FSI. The low frequency FMBEM is applied to both BIEs to reduce the algorithmic complexity of the iterative solution from O(N(2)) to O(N(1.5)) operations per matrix-vector product for N boundary unknowns. Numerical examples are presented to demonstrate the algorithmic and memory complexity of the method, which are shown to be in good agreement with the theoretical estimates, while the solution accuracy is comparable to that achieved by a conventional finite element-boundary element FSI model.
Barrier-breaking performance for industrial problems on the CRAY C916
DOE Office of Scientific and Technical Information (OSTI.GOV)
Graffunder, S.K.
1993-12-31
Nine applications, including third-party codes, were submitted to the Gordon Bell Prize committee showing the CRAY C916 supercomputer providing record-breaking time to solution for industrial problems in several disciplines. Performance was obtained by balancing raw hardware speed; effective use of large, real, shared memory; compiler vectorization and autotasking; hand optimization; asynchronous I/O techniques; and new algorithms. The highest GFLOPS performance for the submissions was 11.1 GFLOPS out of a peak advertised performance of 16 GFLOPS for the CRAY C916 system. One program achieved a 15.45 speedup from the compiler with just two hand-inserted directives to scope variables properly for themore » mathematical library. New I/O techniques hide tens of gigabytes of I/O behind parallel computations. Finally, new iterative solver algorithms have demonstrated times to solution on 1 CPU as high as 70 times faster than the best direct solvers.« less
NASA Astrophysics Data System (ADS)
Burban, Igor; Galinat, Lennart; Stolin, Alexander
2017-11-01
In this paper we study the combinatorics of quasi-trigonometric solutions of the classical Yang-Baxter equation, arising from simple vector bundles on a nodal Weierstraß cubic. Dedicated to the memory of Petr Petrovich Kulish.
Payne, Brennan R.; Gross, Alden L.; Hill, Patrick L.; Parisi, Jeanine M.; Rebok, George W.; Stine-Morrow, Elizabeth A. L.
2018-01-01
With advancing age, episodic memory performance shows marked declines along with concurrent reports of lower subjective memory beliefs. Given that normative age-related declines in episodic memory co-occur with declines in other cognitive domains, we examined the relationship between memory beliefs and multiple domains of cognitive functioning. Confirmatory bi-factor structural equation models were used to parse the shared and independent variance among factors representing episodic memory, psychomotor speed, and executive reasoning in one large cohort study (Senior Odyssey, N = 462), and replicated using another large cohort of healthy older adults (ACTIVE, N = 2,802). Accounting for a general fluid cognitive functioning factor (comprised of the shared variance among measures of episodic memory, speed, and reasoning) attenuated the relationship between objective memory performance and subjective memory beliefs in both samples. Moreover, the general cognitive functioning factor was the strongest predictor of memory beliefs in both samples. These findings are consistent with the notion that dispositional memory beliefs may reflect perceptions of cognition more broadly. This may be one reason why memory beliefs have broad predictive validity for interventions that target fluid cognitive ability. PMID:27685541
Payne, Brennan R; Gross, Alden L; Hill, Patrick L; Parisi, Jeanine M; Rebok, George W; Stine-Morrow, Elizabeth A L
2017-07-01
With advancing age, episodic memory performance shows marked declines along with concurrent reports of lower subjective memory beliefs. Given that normative age-related declines in episodic memory co-occur with declines in other cognitive domains, we examined the relationship between memory beliefs and multiple domains of cognitive functioning. Confirmatory bi-factor structural equation models were used to parse the shared and independent variance among factors representing episodic memory, psychomotor speed, and executive reasoning in one large cohort study (Senior Odyssey, N = 462), and replicated using another large cohort of healthy older adults (ACTIVE, N = 2802). Accounting for a general fluid cognitive functioning factor (comprised of the shared variance among measures of episodic memory, speed, and reasoning) attenuated the relationship between objective memory performance and subjective memory beliefs in both samples. Moreover, the general cognitive functioning factor was the strongest predictor of memory beliefs in both samples. These findings are consistent with the notion that dispositional memory beliefs may reflect perceptions of cognition more broadly. This may be one reason why memory beliefs have broad predictive validity for interventions that target fluid cognitive ability.
Verifiable Secret Redistribution for Threshold Sharing Schemes
2002-02-01
complete verification in our protocol, old shareholders broadcast a commitment to the secret to the new shareholders. We prove that the new...of an m − 1 degree polynomial from m of n points yields a constant term in 1 the polynomial that corresponds to the secret . In Blakley’s scheme [Bla79...the intersection of m of n vector spaces yields a one-dimensional vector that corresponds to the secret . Desmedt surveys other sharing schemes
Rotman Lens Sidewall Design and Optimization with Hybrid Hardware/Software Based Programming
2015-01-09
conventional MoM and stored in memory. The components of Zfar are computed as needed through a fast matrix vector multiplication ( MVM ), which...V vector. Iterative methods, e.g. BiCGSTAB, are employed for solving the linear equation. The matrix-vector multiplications ( MVMs ), which dominate...most of the computation in the solving phase, consists of calculating near and far MVMs . The far MVM comprises aggregation, translation, and
DOE Office of Scientific and Technical Information (OSTI.GOV)
Koskela, Tuomas S.; Lobet, Mathieu; Deslippe, Jack
In this session we show, in two case studies, how the roofline feature of Intel Advisor has been utilized to optimize the performance of kernels of the XGC1 and PICSAR codes in preparation for Intel Knights Landing architecture. The impact of the implemented optimizations and the benefits of using the automatic roofline feature of Intel Advisor to study performance of large applications will be presented. This demonstrates an effective optimization strategy that has enabled these science applications to achieve up to 4.6 times speed-up and prepare for future exascale architectures. # Goal/Relevance of Session The roofline model [1,2] is amore » powerful tool for analyzing the performance of applications with respect to the theoretical peak achievable on a given computer architecture. It allows one to graphically represent the performance of an application in terms of operational intensity, i.e. the ratio of flops performed and bytes moved from memory in order to guide optimization efforts. Given the scale and complexity of modern science applications, it can often be a tedious task for the user to perform the analysis on the level of functions or loops to identify where performance gains can be made. With new Intel tools, it is now possible to automate this task, as well as base the estimates of peak performance on measurements rather than vendor specifications. The goal of this session is to demonstrate how the roofline feature of Intel Advisor can be used to balance memory vs. computation related optimization efforts and effectively identify performance bottlenecks. A series of typical optimization techniques: cache blocking, structure refactoring, data alignment, and vectorization illustrated by the kernel cases will be addressed. # Description of the codes ## XGC1 The XGC1 code [3] is a magnetic fusion Particle-In-Cell code that uses an unstructured mesh for its Poisson solver that allows it to accurately resolve the edge plasma of a magnetic fusion device. After recent optimizations to its collision kernel [4], most of the computing time is spent in the electron push (pushe) kernel, where these optimization efforts have been focused. The kernel code scaled well with MPI+OpenMP but had almost no automatic compiler vectorization, in part due to indirect memory addresses and in part due to low trip counts of low-level loops that would be candidates for vectorization. Particle blocking and sorting have been implemented to increase trip counts of low-level loops and improve memory locality, and OpenMP directives have been added to vectorize compute-intensive loops that were identified by Advisor. The optimizations have improved the performance of the pushe kernel 2x on Haswell processors and 1.7x on KNL. The KNL node-for-node performance has been brought to within 30% of a NERSC Cori phase I Haswell node and we expect to bridge this gap by reducing the memory footprint of compute intensive routines to improve cache reuse. ## PICSAR is a Fortran/Python high-performance Particle-In-Cell library targeting at MIC architectures first designed to be coupled with the PIC code WARP for the simulation of laser-matter interaction and particle accelerators. PICSAR also contains a FORTRAN stand-alone kernel for performance studies and benchmarks. A MPI domain decomposition is used between NUMA domains and a tile decomposition (cache-blocking) handled by OpenMP has been added for shared-memory parallelism and better cache management. The so-called current deposition and field gathering steps that compose the PIC time loop constitute major hotspots that have been rewritten to enable more efficient vectorization. Particle communications between tiles and MPI domain has been merged and parallelized. All considered, these improvements provide speedups of 3.1 for order 1 and 4.6 for order 3 interpolation shape factors on KNL configured in SNC4 quadrant flat mode. Performance is similar between a node of cori phase 1 and KNL at order 1 and better on KNL by a factor 1.6 at order 3 with the considered test case (homogeneous thermal plasma).« less
Method for prefetching non-contiguous data structures
Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton On Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Hoenicke, Dirk [Ossining, NY; Ohmacht, Martin [Brewster, NY; Steinmacher-Burow, Burkhard D [Mount Kisco, NY; Takken, Todd E [Mount Kisco, NY; Vranas, Pavlos M [Bedford Hills, NY
2009-05-05
A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Each processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple perfecting for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefect rather than some other predictive algorithm. This enables hardware to effectively prefect memory access patterns that are non-contiguous, but repetitive.
Cooperative Data Sharing: Simple Support for Clusters of SMP Nodes
NASA Technical Reports Server (NTRS)
DiNucci, David C.; Balley, David H. (Technical Monitor)
1997-01-01
Libraries like PVM and MPI send typed messages to allow for heterogeneous cluster computing. Lower-level libraries, such as GAM, provide more efficient access to communication by removing the need to copy messages between the interface and user space in some cases. still lower-level interfaces, such as UNET, get right down to the hardware level to provide maximum performance. However, these are all still interfaces for passing messages from one process to another, and have limited utility in a shared-memory environment, due primarily to the fact that message passing is just another term for copying. This drawback is made more pertinent by today's hybrid architectures (e.g. clusters of SMPs), where it is difficult to know beforehand whether two communicating processes will share memory. As a result, even portable language tools (like HPF compilers) must either map all interprocess communication, into message passing with the accompanying performance degradation in shared memory environments, or they must check each communication at run-time and implement the shared-memory case separately for efficiency. Cooperative Data Sharing (CDS) is a single user-level API which abstracts all communication between processes into the sharing and access coordination of memory regions, in a model which might be described as "distributed shared messages" or "large-grain distributed shared memory". As a result, the user programs to a simple latency-tolerant abstract communication specification which can be mapped efficiently to either a shared-memory or message-passing based run-time system, depending upon the available architecture. Unlike some distributed shared memory interfaces, the user still has complete control over the assignment of data to processors, the forwarding of data to its next likely destination, and the queuing of data until it is needed, so even the relatively high latency present in clusters can be accomodated. CDS does not require special use of an MMU, which can add overhead to some DSM systems, and does not require an SPMD programming model. unlike some message-passing interfaces, CDS allows the user to implement efficient demand-driven applications where processes must "fight" over data, and does not perform copying if processes share memory and do not attempt concurrent writes. CDS also supports heterogeneous computing, dynamic process creation, handlers, and a very simple thread-arbitration mechanism. Additional support for array subsections is currently being considered. The CDS1 API, which forms the kernel of CDS, is built primarily upon only 2 communication primitives, one process initiation primitive, and some data translation (and marshalling) routines, memory allocation routines, and priority control routines. The entire current collection of 28 routines provides enough functionality to implement most (or all) of MPI 1 and 2, which has a much larger interface consisting of hundreds of routines. still, the API is small enough to consider integrating into standard os interfaces for handling inter-process communication in a network-independent way. This approach would also help to solve many of the problems plaguing other higher-level standards such as MPI and PVM which must, in some cases, "play OS" to adequately address progress and process control issues. The CDS2 API, a higher level of interface roughly equivalent in functionality to MPI and to be built entirely upon CDS1, is still being designed. It is intended to add support for the equivalent of communicators, reduction and other collective operations, process topologies, additional support for process creation, and some automatic memory management. CDS2 will not exactly match MPI, because the copy-free semantics of communication from CDS1 will be supported. CDS2 application programs will be free to carefully also use CDS1. CDS1 has been implemented on networks of workstations running unmodified Unix-based operating systems, using UDP/IP and vendor-supplied high- performance locks. Although its inter-node performance is currently unimpressive due to rudimentary implementation technique, it even now outperforms highly-optimized MPI implementation on intra-node communication due to its support for non-copy communication. The similarity of the CDS1 architecture to that of other projects such as UNET and TRAP suggests that the inter-node performance can be increased significantly to surpass MPI or PVM, and it may be possible to migrate some of its functionality to communication controllers.
Multiprocessing MCNP on an IBN RS/6000 cluster
DOE Office of Scientific and Technical Information (OSTI.GOV)
McKinney, G.W.; West, J.T.
1993-01-01
The advent of high-performance computer systems has brought to maturity programming concepts like vectorization, multiprocessing, and multitasking. While there are many schools of thought as to the most significant factor in obtaining order-of-magnitude increases in performance, such speedup can only be achieved by integrating the computer system and application code. Vectorization leads to faster manipulation of arrays by overlapping instruction CPU cycles. Discrete ordinates codes, which require the solving of large matrices, have proved to be major benefactors of vectorization. Monte Carlo transport, on the other hand, typically contains numerous logic statements and requires extensive redevelopment to benefit from vectorization.more » Multiprocessing and multitasking provide additional CPU cycles via multiple processors. Such systems are generally designed with either common memory access (multitasking) or distributed memory access. In both cases, theoretical speedup, as a function of the number of processors P and the fraction f of task time that multiprocesses, can be formulated using Amdahl's law: S(f, P) =1/(1-f+f/P). However, for most applications, this theoretical limit cannot be achieved because of additional terms (e.g., multitasking overhead, memory overlap, etc.) that are not included in Amdahl's law. Monte Carlo transport is a natural candidate for multiprocessing because the particle tracks are generally independent, and the precision of the result increases as the square Foot of the number of particles tracked.« less
A shared resource between declarative memory and motor memory.
Keisler, Aysha; Shadmehr, Reza
2010-11-03
The neural systems that support motor adaptation in humans are thought to be distinct from those that support the declarative system. Yet, during motor adaptation changes in motor commands are supported by a fast adaptive process that has important properties (rapid learning, fast decay) that are usually associated with the declarative system. The fast process can be contrasted to a slow adaptive process that also supports motor memory, but learns gradually and shows resistance to forgetting. Here we show that after people stop performing a motor task, the fast motor memory can be disrupted by a task that engages declarative memory, but the slow motor memory is immune from this interference. Furthermore, we find that the fast/declarative component plays a major role in the consolidation of the slow motor memory. Because of the competitive nature of declarative and nondeclarative memory during consolidation, impairment of the fast/declarative component leads to improvements in the slow/nondeclarative component. Therefore, the fast process that supports formation of motor memory is not only neurally distinct from the slow process, but it shares critical resources with the declarative memory system.
A shared resource between declarative memory and motor memory
Keisler, Aysha; Shadmehr, Reza
2010-01-01
The neural systems that support motor adaptation in humans are thought to be distinct from those that support the declarative system. Yet, during motor adaptation changes in motor commands are supported by a fast adaptive process that has important properties (rapid learning, fast decay) that are usually associated with the declarative system. The fast process can be contrasted to a slow adaptive process that also supports motor memory, but learns gradually and shows resistance to forgetting. Here we show that after people stop performing a motor task, the fast motor memory can be disrupted by a task that engages declarative memory, but the slow motor memory is immune from this interference. Furthermore, we find that the fast/declarative component plays a major role in the consolidation of the slow motor memory. Because of the competitive nature of declarative and non-declarative memory during consolidation, impairment of the fast/declarative component leads to improvements in the slow/non-declarative component. Therefore, the fast process that supports formation of motor memory is not only neurally distinct from the slow process, but it shares critical resources with the declarative memory system. PMID:21048140
Discrete-Slots Models of Visual Working-Memory Response Times
Donkin, Christopher; Nosofsky, Robert M.; Gold, Jason M.; Shiffrin, Richard M.
2014-01-01
Much recent research has aimed to establish whether visual working memory (WM) is better characterized by a limited number of discrete all-or-none slots or by a continuous sharing of memory resources. To date, however, researchers have not considered the response-time (RT) predictions of discrete-slots versus shared-resources models. To complement the past research in this field, we formalize a family of mixed-state, discrete-slots models for explaining choice and RTs in tasks of visual WM change detection. In the tasks under investigation, a small set of visual items is presented, followed by a test item in 1 of the studied positions for which a change judgment must be made. According to the models, if the studied item in that position is retained in 1 of the discrete slots, then a memory-based evidence-accumulation process determines the choice and the RT; if the studied item in that position is missing, then a guessing-based accumulation process operates. Observed RT distributions are therefore theorized to arise as probabilistic mixtures of the memory-based and guessing distributions. We formalize an analogous set of continuous shared-resources models. The model classes are tested on individual subjects with both qualitative contrasts and quantitative fits to RT-distribution data. The discrete-slots models provide much better qualitative and quantitative accounts of the RT and choice data than do the shared-resources models, although there is some evidence for “slots plus resources” when memory set size is very small. PMID:24015956
Evaluating the promise of recombinant transmissible vaccines
Basinski, Andrew J.; Varrelman, Tanner J.; Smithson, Mark W.; May, Ryan H.; Remien, Christopher H.; Nuismer, Scott L.
2018-01-01
Transmissible vaccines have the potential to revolutionize infectious disease control by reducing the vaccination effort required to protect a population against a disease. Recent efforts to develop transmissible vaccines focus on recombinant transmissible vaccine designs (RTVs) because they pose reduced risk if intra-host evolution causes the vaccine to revert to its vector form. However, the shared antigenicity of the vaccine and vector may confer vaccine-immunity to hosts infected with the vector, thwarting the ability of the vaccine to spread through the population. We build a mathematical model to test whether a RTV can facilitate disease management in instances where reversion is likely to introduce the vector into the population or when the vector organism is already established in the host population, and the vector and vaccine share perfect cross-immunity. Our results show that a RTV can autonomously eradicate a pathogen, or protect a population from pathogen invasion, when cross-immunity between vaccine and vector is absent. If cross-immunity between vaccine and vector exists, however, our results show that a RTV can substantially reduce the vaccination effort necessary to control or eradicate a pathogen only when continuously augmented with direct manual vaccination. These results demonstrate that estimating the extent of cross-immunity between vector and vaccine is a critical step in RTV design, and that herpesvirus vectors showing facile reinfection and weak cross-immunity are promising. PMID:29279283
ERIC Educational Resources Information Center
Schweppe, Judith; Rummer, Ralf
2007-01-01
The general idea of language-based accounts of short-term memory is that retention of linguistic materials is based on representations within the language processing system. In the present sentence recall study, we address the question whether the assumption of shared representations holds for morphosyntactic information (here: grammatical gender…
The Precategorical Nature of Visual Short-Term Memory
ERIC Educational Resources Information Center
Quinlan, Philip T.; Cohen, Dale J.
2016-01-01
We conducted a series of recognition experiments that assessed whether visual short-term memory (VSTM) is sensitive to shared category membership of to-be-remembered (tbr) images of common objects. In Experiment 1 some of the tbr items shared the same basic level category (e.g., hand axe): Such items were no better retained than others. In the…
NASA Technical Reports Server (NTRS)
Shalkhauser, Mary JO; Quintana, Jorge A.; Soni, Nitin J.
1994-01-01
The NASA Lewis Research Center is developing a multichannel communication signal processing satellite (MCSPS) system which will provide low data rate, direct to user, commercial communications services. The focus of current space segment developments is a flexible, high-throughput, fault tolerant onboard information switching processor. This information switching processor (ISP) is a destination-directed packet switch which performs both space and time switching to route user information among numerous user ground terminals. Through both industry study contracts and in-house investigations, several packet switching architectures were examined. A contention-free approach, the shared memory per beam architecture, was selected for implementation. The shared memory per beam architecture, fault tolerance insertion, implementation, and demonstration plans are described.
The performance of disk arrays in shared-memory database machines
NASA Technical Reports Server (NTRS)
Katz, Randy H.; Hong, Wei
1993-01-01
In this paper, we examine how disk arrays and shared memory multiprocessors lead to an effective method for constructing database machines for general-purpose complex query processing. We show that disk arrays can lead to cost-effective storage systems if they are configured from suitably small formfactor disk drives. We introduce the storage system metric data temperature as a way to evaluate how well a disk configuration can sustain its workload, and we show that disk arrays can sustain the same data temperature as a more expensive mirrored-disk configuration. We use the metric to evaluate the performance of disk arrays in XPRS, an operational shared-memory multiprocessor database system being developed at the University of California, Berkeley.
2013-01-01
Background Live attenuated viruses are among our most potent and effective vaccines. For human immunodeficiency virus, however, a live attenuated strain could present substantial safety concerns. We have used the live attenuated rubella vaccine strain RA27/3 as a vector to express SIV and HIV vaccine antigens because its safety and immunogenicity have been demonstrated in millions of children. One dose protects for life against rubella infection. In previous studies, rubella vectors replicated to high titers in cell culture while stably expressing SIV and HIV antigens. Their viability in vivo, however, as well as immunogenicity and antibody persistence, were unknown. Results This paper reports the first successful trial of rubella vectors in rhesus macaques, in combination with DNA vaccines in a prime and boost strategy. The vectors grew robustly in vivo, and the protein inserts were highly immunogenic. Antibody titers elicited by the SIV Gag vector were greater than or equal to those elicited by natural SIV infection. The antibodies were long lasting, and they were boosted by a second dose of replication-competent rubella vectors given six months later, indicating the induction of memory B cells. Conclusions Rubella vectors can serve as a vaccine platform for safe delivery and expression of SIV and HIV antigens. By presenting these antigens in the context of an acute infection, at a high level and for a prolonged duration, these vectors can stimulate a strong and persistent immune response, including maturation of memory B cells. Rhesus macaques will provide an ideal animal model for demonstrating immunogenicity of novel vectors and protection against SIV or SHIV challenge. PMID:24041113
Optical memories in digital computing
NASA Technical Reports Server (NTRS)
Alford, C. O.; Gaylord, T. K.
1979-01-01
High capacity optical memories with relatively-high data-transfer rate and multiport simultaneous access capability may serve as basis for new computer architectures. Several computer structures that might profitably use memories are: a) simultaneous record-access system, b) simultaneously-shared memory computer system, and c) parallel digital processing structure.
Modern gyrokinetic particle-in-cell simulation of fusion plasmas on top supercomputers
Wang, Bei; Ethier, Stephane; Tang, William; ...
2017-06-29
The Gyrokinetic Toroidal Code at Princeton (GTC-P) is a highly scalable and portable particle-in-cell (PIC) code. It solves the 5D Vlasov-Poisson equation featuring efficient utilization of modern parallel computer architectures at the petascale and beyond. Motivated by the goal of developing a modern code capable of dealing with the physics challenge of increasing problem size with sufficient resolution, new thread-level optimizations have been introduced as well as a key additional domain decomposition. GTC-P's multiple levels of parallelism, including inter-node 2D domain decomposition and particle decomposition, as well as intra-node shared memory partition and vectorization have enabled pushing the scalability ofmore » the PIC method to extreme computational scales. In this paper, we describe the methods developed to build a highly parallelized PIC code across a broad range of supercomputer designs. This particularly includes implementations on heterogeneous systems using NVIDIA GPU accelerators and Intel Xeon Phi (MIC) co-processors and performance comparisons with state-of-the-art homogeneous HPC systems such as Blue Gene/Q. New discovery science capabilities in the magnetic fusion energy application domain are enabled, including investigations of Ion-Temperature-Gradient (ITG) driven turbulence simulations with unprecedented spatial resolution and long temporal duration. Performance studies with realistic fusion experimental parameters are carried out on multiple supercomputing systems spanning a wide range of cache capacities, cache-sharing configurations, memory bandwidth, interconnects and network topologies. These performance comparisons using a realistic discovery-science-capable domain application code provide valuable insights on optimization techniques across one of the broadest sets of current high-end computing platforms worldwide.« less
Modern gyrokinetic particle-in-cell simulation of fusion plasmas on top supercomputers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Bei; Ethier, Stephane; Tang, William
The Gyrokinetic Toroidal Code at Princeton (GTC-P) is a highly scalable and portable particle-in-cell (PIC) code. It solves the 5D Vlasov-Poisson equation featuring efficient utilization of modern parallel computer architectures at the petascale and beyond. Motivated by the goal of developing a modern code capable of dealing with the physics challenge of increasing problem size with sufficient resolution, new thread-level optimizations have been introduced as well as a key additional domain decomposition. GTC-P's multiple levels of parallelism, including inter-node 2D domain decomposition and particle decomposition, as well as intra-node shared memory partition and vectorization have enabled pushing the scalability ofmore » the PIC method to extreme computational scales. In this paper, we describe the methods developed to build a highly parallelized PIC code across a broad range of supercomputer designs. This particularly includes implementations on heterogeneous systems using NVIDIA GPU accelerators and Intel Xeon Phi (MIC) co-processors and performance comparisons with state-of-the-art homogeneous HPC systems such as Blue Gene/Q. New discovery science capabilities in the magnetic fusion energy application domain are enabled, including investigations of Ion-Temperature-Gradient (ITG) driven turbulence simulations with unprecedented spatial resolution and long temporal duration. Performance studies with realistic fusion experimental parameters are carried out on multiple supercomputing systems spanning a wide range of cache capacities, cache-sharing configurations, memory bandwidth, interconnects and network topologies. These performance comparisons using a realistic discovery-science-capable domain application code provide valuable insights on optimization techniques across one of the broadest sets of current high-end computing platforms worldwide.« less
Reader set encoding for directory of shared cache memory in multiprocessor system
Ahn, Dnaiel; Ceze, Luis H.; Gara, Alan; Ohmacht, Martin; Xiaotong, Zhuang
2014-06-10
In a parallel processing system with speculative execution, conflict checking occurs in a directory lookup of a cache memory that is shared by all processors. In each case, the same physical memory address will map to the same set of that cache, no matter which processor originated that access. The directory includes a dynamic reader set encoding, indicating what speculative threads have read a particular line. This reader set encoding is used in conflict checking. A bitset encoding is used to specify particular threads that have read the line.
Insights on consciousness from taste memory research.
Gallo, Milagros
2016-01-01
Taste research in rodents supports the relevance of memory in order to determine the content of consciousness by modifying both taste perception and later action. Associated with this issue is the fact that taste and visual modalities share anatomical circuits traditionally related to conscious memory. This challenges the view of taste memory as a type of non-declarative unconscious memory.
Economical Implementation of a Filter Engine in an FPGA
NASA Technical Reports Server (NTRS)
Kowalski, James E.
2009-01-01
A logic design has been conceived for a field-programmable gate array (FPGA) that would implement a complex system of multiple digital state-space filters. The main innovative aspect of this design lies in providing for reuse of parts of the FPGA hardware to perform different parts of the filter computations at different times, in such a manner as to enable the timely performance of all required computations in the face of limitations on available FPGA hardware resources. The implementation of the digital state-space filter involves matrix vector multiplications, which, in the absence of the present innovation, would ordinarily necessitate some multiplexing of vector elements and/or routing of data flows along multiple paths. The design concept calls for implementing vector registers as shift registers to simplify operand access to multipliers and accumulators, obviating both multiplexing and routing of data along multiple paths. Each vector register would be reused for different parts of a calculation. Outputs would always be drawn from the same register, and inputs would always be loaded into the same register. A simple state machine would control each filter. The output of a given filter would be passed to the next filter, accompanied by a "valid" signal, which would start the state machine of the next filter. Multiple filter modules would share a multiplication/accumulation arithmetic unit. The filter computations would be timed by use of a clock having a frequency high enough, relative to the input and output data rate, to provide enough cycles for matrix and vector arithmetic operations. This design concept could prove beneficial in numerous applications in which digital filters are used and/or vectors are multiplied by coefficient matrices. Examples of such applications include general signal processing, filtering of signals in control systems, processing of geophysical measurements, and medical imaging. For these and other applications, it could be advantageous to combine compact FPGA digital filter implementations with other application-specific logic implementations on single integrated-circuit chips. An FPGA could readily be tailored to implement a variety of filters because the filter coefficients would be loaded into memory at startup.
Investigating Ground Swarm Robotics Using Agent Based Simulation
2006-12-01
Incorporation of virtual pheromones as a shared memory map is modeled as an additional capability that is found to enhance the robustness and reliability of the...virtual pheromones as a shared memory map is modeled as an additional capability that is found to enhance the robustness and reliability of the swarm... PHEROMONES .......................................... 42 1. Repel Friends under Inorganic SA.................................................. 45 2. Max
Centrally managed unified shared virtual address space
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wilkes, John
Systems, apparatuses, and methods for managing a unified shared virtual address space. A host may execute system software and manage a plurality of nodes coupled to the host. The host may send work tasks to the nodes, and for each node, the host may externally manage the node's view of the system's virtual address space. Each node may have a central processing unit (CPU) style memory management unit (MMU) with an internal translation lookaside buffer (TLB). In one embodiment, the host may be coupled to a given node via an input/output memory management unit (IOMMU) interface, where the IOMMU frontendmore » interface shares the TLB with the given node's MMU. In another embodiment, the host may control the given node's view of virtual address space via memory-mapped control registers.« less
Attention and Visuospatial Working Memory Share the Same Processing Resources
Feng, Jing; Pratt, Jay; Spence, Ian
2012-01-01
Attention and visuospatial working memory (VWM) share very similar characteristics; both have the same upper bound of about four items in capacity and they recruit overlapping brain regions. We examined whether both attention and VWM share the same processing resources using a novel dual-task costs approach based on a load-varying dual-task technique. With sufficiently large loads on attention and VWM, considerable interference between the two processes was observed. A further load increase on either process produced reciprocal increases in interference on both processes, indicating that attention and VWM share common resources. More critically, comparison among four experiments on the reciprocal interference effects, as measured by the dual-task costs, demonstrates no significant contribution from additional processing other than the shared processes. These results support the notion that attention and VWM share the same processing resources. PMID:22529826
NASA Astrophysics Data System (ADS)
Pavlichin, Dmitri S.; Mabuchi, Hideo
2014-06-01
Nanoscale integrated photonic devices and circuits offer a path to ultra-low power computation at the few-photon level. Here we propose an optical circuit that performs a ubiquitous operation: the controlled, random-access readout of a collection of stored memory phases or, equivalently, the computation of the inner product of a vector of phases with a binary selector" vector, where the arithmetic is done modulo 2pi and the result is encoded in the phase of a coherent field. This circuit, a collection of cascaded interferometers driven by a coherent input field, demonstrates the use of coherence as a computational resource, and of the use of recently-developed mathematical tools for modeling optical circuits with many coupled parts. The construction extends in a straightforward way to the computation of matrix-vector and matrix-matrix products, and, with the inclusion of an optical feedback loop, to the computation of a weighted" readout of stored memory phases. We note some applications of these circuits for error correction and for computing tasks requiring fast vector inner products, e.g. statistical classification and some machine learning algorithms.
Contention Modeling for Multithreaded Distributed Shared Memory Machines: The Cray XMT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Secchi, Simone; Tumeo, Antonino; Villa, Oreste
Distributed Shared Memory (DSM) machines are a wide class of multi-processor computing systems where a large virtually-shared address space is mapped on a network of physically distributed memories. High memory latency and network contention are two of the main factors that limit performance scaling of such architectures. Modern high-performance computing DSM systems have evolved toward exploitation of massive hardware multi-threading and fine-grained memory hashing to tolerate irregular latencies, avoid network hot-spots and enable high scaling. In order to model the performance of such large-scale machines, parallel simulation has been proved to be a promising approach to achieve good accuracy inmore » reasonable times. One of the most critical factors in solving the simulation speed-accuracy trade-off is network modeling. The Cray XMT is a massively multi-threaded supercomputing architecture that belongs to the DSM class, since it implements a globally-shared address space abstraction on top of a physically distributed memory substrate. In this paper, we discuss the development of a contention-aware network model intended to be integrated in a full-system XMT simulator. We start by measuring the effects of network contention in a 128-processor XMT machine and then investigate the trade-off that exists between simulation accuracy and speed, by comparing three network models which operate at different levels of accuracy. The comparison and model validation is performed by executing a string-matching algorithm on the full-system simulator and on the XMT, using three datasets that generate noticeably different contention patterns.« less
Reducing Memory Cost of Exact Diagonalization using Singular Value Decomposition
NASA Astrophysics Data System (ADS)
Weinstein, Marvin; Chandra, Ravi; Auerbach, Assa
2012-02-01
We present a modified Lanczos algorithm to diagonalize lattice Hamiltonians with dramatically reduced memory requirements. In contrast to variational approaches and most implementations of DMRG, Lanczos rotations towards the ground state do not involve incremental minimizations, (e.g. sweeping procedures) which may get stuck in false local minima. The lattice of size N is partitioned into two subclusters. At each iteration the rotating Lanczos vector is compressed into two sets of nsvd small subcluster vectors using singular value decomposition. For low entanglement entropy See, (satisfied by short range Hamiltonians), the truncation error is bounded by (-nsvd^1/See). Convergence is tested for the Heisenberg model on Kagom'e clusters of 24, 30 and 36 sites, with no lattice symmetries exploited, using less than 15GB of dynamical memory. Generalization of the Lanczos-SVD algorithm to multiple partitioning is discussed, and comparisons to other techniques are given. Reference: arXiv:1105.0007
Fusion PIC code performance analysis on the Cori KNL system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Koskela, Tuomas S.; Deslippe, Jack; Friesen, Brian
We study the attainable performance of Particle-In-Cell codes on the Cori KNL system by analyzing a miniature particle push application based on the fusion PIC code XGC1. We start from the most basic building blocks of a PIC code and build up the complexity to identify the kernels that cost the most in performance and focus optimization efforts there. Particle push kernels operate at high AI and are not likely to be memory bandwidth or even cache bandwidth bound on KNL. Therefore, we see only minor benefits from the high bandwidth memory available on KNL, and achieving good vectorization ismore » shown to be the most beneficial optimization path with theoretical yield of up to 8x speedup on KNL. In practice we are able to obtain up to a 4x gain from vectorization due to limitations set by the data layout and memory latency.« less
System and method for memory allocation in a multiclass memory system
Loh, Gabriel; Meswani, Mitesh; Ignatowski, Michael; Nutter, Mark
2016-06-28
A system for memory allocation in a multiclass memory system includes a processor coupleable to a plurality of memories sharing a unified memory address space, and a library store to store a library of software functions. The processor identifies a type of a data structure in response to a memory allocation function call to the library for allocating memory to the data structure. Using the library, the processor allocates portions of the data structure among multiple memories of the multiclass memory system based on the type of the data structure.
NASA Technical Reports Server (NTRS)
Jaeckel, Louis A.
1988-01-01
In Kanerva's Sparse Distributed Memory, writing to and reading from the memory are done in relation to spheres in an n-dimensional binary vector space. Thus it is important to know how many points are in the intersection of two spheres in this space. Two proofs are given of Wang's formula for spheres of unequal radii, and an integral approximation for the intersection in this case.
Focal expression of mutated tau in entorhinal cortex neurons of rats impairs spatial working memory.
Ramirez, Julio J; Poulton, Winona E; Knelson, Erik; Barton, Cole; King, Michael A; Klein, Ronald L
2011-01-01
Entorhinal cortex neuropathology begins very early in Alzheimer's disease (AD), a disorder characterized by severe memory disruption. Indeed, loss of entorhinal volume is predictive of AD and two of the hallmark neuroanatomical markers of AD, amyloid plaques and neurofibrillary tangles (NFTs), are particularly prevalent in the entorhinal area of AD-afflicted brains. Gene transfer techniques were used to create a model neurofibrillary tauopathy by injecting a recombinant adeno-associated viral vector with a mutated human tau gene (P301L) into the entorhinal cortex of adult rats. The objective of the present investigation was to determine whether adult onset, spatially restricted tauopathy could be sufficient to reproduce progressive deficits in mnemonic function. Spatial memory on a Y-maze was tested for approximately 3 months post-surgery. Upon completion of behavioral testing the brains were assessed for expression of human tau and evidence of tauopathy. Rats injected with the tau vector became persistently impaired on the task after about 6 weeks of postoperative testing, whereas the control rats injected with a green fluorescent protein vector performed at criterion levels during that period. Histological analysis confirmed the presence of hyperphosphorylated tau and NFTs in the entorhinal cortex and neighboring retrohippocampal areas as well as limited synaptic degeneration of the perforant path. Thus, highly restricted vector-induced tauopathy in retrohippocampal areas is sufficient for producing progressive impairment in mnemonic ability in rats, successfully mimicking a key aspect of tauopathies such as AD. Copyright © 2010 Elsevier B.V. All rights reserved.
A Formal Model of Capacity Limits in Working Memory
ERIC Educational Resources Information Center
Oberauer, Klaus; Kliegl, Reinhold
2006-01-01
A mathematical model of working-memory capacity limits is proposed on the key assumption of mutual interference between items in working memory. Interference is assumed to arise from overwriting of features shared by these items. The model was fit to time-accuracy data of memory-updating tasks from four experiments using nonlinear mixed effect…
Ordering of guarded and unguarded stores for no-sync I/O
Gara, Alan; Ohmacht, Martin
2013-06-25
A parallel computing system processes at least one store instruction. A first processor core issues a store instruction. A first queue, associated with the first processor core, stores the store instruction. A second queue, associated with a first local cache memory device of the first processor core, stores the store instruction. The first processor core updates first data in the first local cache memory device according to the store instruction. The third queue, associated with at least one shared cache memory device, stores the store instruction. The first processor core invalidates second data, associated with the store instruction, in the at least one shared cache memory. The first processor core invalidates third data, associated with the store instruction, in other local cache memory devices of other processor cores. The first processor core flushing only the first queue.
LU Factorization with Partial Pivoting for a Multi-CPU, Multi-GPU Shared Memory System
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kurzak, Jakub; Luszczek, Pitior; Faverge, Mathieu
2012-03-01
LU factorization with partial pivoting is a canonical numerical procedure and the main component of the High Performance LINPACK benchmark. This article presents an implementation of the algorithm for a hybrid, shared memory, system with standard CPU cores and GPU accelerators. Performance in excess of one TeraFLOPS is achieved using four AMD Magny Cours CPUs and four NVIDIA Fermi GPUs.
ERIC Educational Resources Information Center
Burgess, Gregory C.; Gray, Jeremy R.; Conway, Andrew R. A.; Braver, Todd S.
2011-01-01
Fluid intelligence (gF) and working memory (WM) span predict success in demanding cognitive situations. Recent studies show that much of the variance in gF and WM span is shared, suggesting common neural mechanisms. This study provides a direct investigation of the degree to which shared variance in gF and WM span can be explained by neural…
NASA Technical Reports Server (NTRS)
Jost, Gabriele; Labarta, Jesus; Gimenez, Judit
2004-01-01
With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors, parallel programming techniques have evolved that support parallelism beyond a single level. When comparing the performance of applications based on different programming paradigms, it is important to differentiate between the influence of the programming model itself and other factors, such as implementation specific behavior of the operating system (OS) or architectural issues. Rewriting-a large scientific application in order to employ a new programming paradigms is usually a time consuming and error prone task. Before embarking on such an endeavor it is important to determine that there is really a gain that would not be possible with the current implementation. A detailed performance analysis is crucial to clarify these issues. The multilevel programming paradigms considered in this study are hybrid MPI/OpenMP, MLP, and nested OpenMP. The hybrid MPI/OpenMP approach is based on using MPI [7] for the coarse grained parallelization and OpenMP [9] for fine grained loop level parallelism. The MPI programming paradigm assumes a private address space for each process. Data is transferred by explicitly exchanging messages via calls to the MPI library. This model was originally designed for distributed memory architectures but is also suitable for shared memory systems. The second paradigm under consideration is MLP which was developed by Taft. The approach is similar to MPi/OpenMP, using a mix of coarse grain process level parallelization and loop level OpenMP parallelization. As it is the case with MPI, a private address space is assumed for each process. The MLP approach was developed for ccNUMA architectures and explicitly takes advantage of the availability of shared memory. A shared memory arena which is accessible by all processes is required. Communication is done by reading from and writing to the shared memory.
Processing-in-Memory Enabled Graphics Processors for 3D Rendering
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xie, Chenhao; Song, Shuaiwen; Wang, Jing
2017-02-06
The performance of 3D rendering of Graphics Processing Unit that convents 3D vector stream into 2D frame with 3D image effects significantly impact users’ gaming experience on modern computer systems. Due to the high texture throughput in 3D rendering, main memory bandwidth becomes a critical obstacle for improving the overall rendering performance. 3D stacked memory systems such as Hybrid Memory Cube (HMC) provide opportunities to significantly overcome the memory wall by directly connecting logic controllers to DRAM dies. Based on the observation that texel fetches significantly impact off-chip memory traffic, we propose two architectural designs to enable Processing-In-Memory based GPUmore » for efficient 3D rendering.« less
Audience-tuning effects on memory: the role of shared reality.
Echterhoff, Gerald; Higgins, E Tory; Groll, Stephan
2005-09-01
After tuning to an audience, communicators' own memories for the topic often reflect the biased view expressed in their messages. Three studies examined explanations for this bias. Memories for a target person were biased when feedback signaled the audience's successful identification of the target but not after failed identification (Experiment 1). Whereas communicators tuning to an in-group audience exhibited the bias, communicators tuning to an out-group audience did not (Experiment 2). These differences did not depend on communicators' mood but were mediated by communicators' trust in their audience's judgment about other people (Experiments 2 and 3). Message and memory were more closely associated for high than for low trusters. Apparently, audience-tuning effects depend on the communicators' experience of a shared reality.
Olderbak, Sally; Hildebrandt, Andrea; Wilhelm, Oliver
2015-01-01
The shared decline in cognitive abilities, sensory functions (e.g., vision and hearing), and physical health with increasing age is well documented with some research attributing this shared age-related decline to a single common cause (e.g., aging brain). We evaluate the extent to which the common cause hypothesis predicts associations between vision and physical health with social cognition abilities specifically face perception and face memory. Based on a sample of 443 adults (17–88 years old), we test a series of structural equation models, including Multiple Indicator Multiple Cause (MIMIC) models, and estimate the extent to which vision and self-reported physical health are related to face perception and face memory through a common factor, before and after controlling for their fluid cognitive component and the linear effects of age. Results suggest significant shared variance amongst these constructs, with a common factor explaining some, but not all, of the shared age-related variance. Also, we found that the relations of face perception, but not face memory, with vision and physical health could be completely explained by fluid cognition. Overall, results suggest that a single common cause explains most, but not all age-related shared variance with domain specific aging mechanisms evident. PMID:26321998
Solutions and debugging for data consistency in multiprocessors with noncoherent caches
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bernstein, D.; Mendelson, B.; Breternitz, M. Jr.
1995-02-01
We analyze two important problems that arise in shared-memory multiprocessor systems. The stale data problem involves ensuring that data items in local memory of individual processors are current, independent of writes done by other processors. False sharing occurs when two processors have copies of the same shared data block but update different portions of the block. The false sharing problem involves guaranteeing that subsequent writes are properly combined. In modern architectures these problems are usually solved in hardware, by exploiting mechanisms for hardware controlled cache consistency. This leads to more expensive and nonscalable designs. Therefore, we are concentrating on softwaremore » methods for ensuring cache consistency that would allow for affordable and scalable multiprocessing systems. Unfortunately, providing software control is nontrivial, both for the compiler writer and for the application programmer. For this reason we are developing a debugging environment that will facilitate the development of compiler-based techniques and will help the programmer to tune his or her application using explicit cache management mechanisms. We extend the notion of a race condition for IBM Shared Memory System POWER/4, taking into consideration its noncoherent caches, and propose techniques for detection of false sharing problems. Identification of the stale data problem is discussed as well, and solutions are suggested.« less
Automatic Generation of OpenMP Directives and Its Application to Computational Fluid Dynamics Codes
NASA Technical Reports Server (NTRS)
Yan, Jerry; Jin, Haoqiang; Frumkin, Michael; Yan, Jerry (Technical Monitor)
2000-01-01
The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate OpenMP-based parallel programs with nominal user assistance. We outline techniques used in the implementation of the tool and discuss the application of this tool on the NAS Parallel Benchmarks and several computational fluid dynamics codes. This work demonstrates the great potential of using the tool to quickly port parallel programs and also achieve good performance that exceeds some of the commercial tools.
Static Memory Deduplication for Performance Optimization in Cloud Computing.
Jia, Gangyong; Han, Guangjie; Wang, Hao; Yang, Xuan
2017-04-27
In a cloud computing environment, the number of virtual machines (VMs) on a single physical server and the number of applications running on each VM are continuously growing. This has led to an enormous increase in the demand of memory capacity and subsequent increase in the energy consumption in the cloud. Lack of enough memory has become a major bottleneck for scalability and performance of virtualization interfaces in cloud computing. To address this problem, memory deduplication techniques which reduce memory demand through page sharing are being adopted. However, such techniques suffer from overheads in terms of number of online comparisons required for the memory deduplication. In this paper, we propose a static memory deduplication (SMD) technique which can reduce memory capacity requirement and provide performance optimization in cloud computing. The main innovation of SMD is that the process of page detection is performed offline, thus potentially reducing the performance cost, especially in terms of response time. In SMD, page comparisons are restricted to the code segment, which has the highest shared content. Our experimental results show that SMD efficiently reduces memory capacity requirement and improves performance. We demonstrate that, compared to other approaches, the cost in terms of the response time is negligible.
Static Memory Deduplication for Performance Optimization in Cloud Computing
Jia, Gangyong; Han, Guangjie; Wang, Hao; Yang, Xuan
2017-01-01
In a cloud computing environment, the number of virtual machines (VMs) on a single physical server and the number of applications running on each VM are continuously growing. This has led to an enormous increase in the demand of memory capacity and subsequent increase in the energy consumption in the cloud. Lack of enough memory has become a major bottleneck for scalability and performance of virtualization interfaces in cloud computing. To address this problem, memory deduplication techniques which reduce memory demand through page sharing are being adopted. However, such techniques suffer from overheads in terms of number of online comparisons required for the memory deduplication. In this paper, we propose a static memory deduplication (SMD) technique which can reduce memory capacity requirement and provide performance optimization in cloud computing. The main innovation of SMD is that the process of page detection is performed offline, thus potentially reducing the performance cost, especially in terms of response time. In SMD, page comparisons are restricted to the code segment, which has the highest shared content. Our experimental results show that SMD efficiently reduces memory capacity requirement and improves performance. We demonstrate that, compared to other approaches, the cost in terms of the response time is negligible. PMID:28448434
Scheduling for Locality in Shared-Memory Multiprocessors
1993-05-01
Submitted in Partial Fulfillment of the Requirements for the Degree ’)iIC Q(JALfryT INSPECTED 5 DOCTOR OF PHILOSOPHY I Accesion For Supervised by NTIS CRAM... architecture on parallel program performance, explain the implications of this trend on popular parallel programming models, and propose system software to 0...decomoosition and scheduling algorithms. I. SUIUECT TERMS IS. NUMBER OF PAGES shared-memory multiprocessors; architecture trends; loop 110 scheduling
Advanced Development of Certified OS Kernels
2015-06-01
It provides an infrastructure to map a physical page into multiple processes’ page maps in different address spaces. Their ownership mechanism ensures...of their shared memory infrastructure . Trap module The trap module specifies the behaviors of exception handlers and mCertiKOS system calls. In...layers), 1 pm for the shared memory infrastructure (3 layers), 3.5 pm for the thread management (10 layers), 1 pm for the process management (4 layers
6 DOF Nonlinear AUV Simulation Toolbox
1997-01-01
is to supply a flexible 3D -simulation platform for motion visualization, in-lab debugging and testing of mission-specific strategies as well as those...Explorer are modular designed [Smith] in order to cut time and cost for vehicle recontlguration. A flexible 3D -simulation platform is desired to... 3D models. Current implemented modules include a nonlinear dynamic model for the OEX, shared memory and semaphore manager tools, shared memory monitor
A cache-aided multiprocessor rollback recovery scheme
NASA Technical Reports Server (NTRS)
Wu, Kun-Lung; Fuchs, W. Kent
1989-01-01
This paper demonstrates how previous uniprocessor cache-aided recovery schemes can be applied to multiprocessor architectures, for recovering from transient processor failures, utilizing private caches and a global shared memory. As with cache-aided uniprocessor recovery, the multiprocessor cache-aided recovery scheme of this paper can be easily integrated into standard bus-based snoopy cache coherence protocols. A consistent shared memory state is maintained without the necessity of global check-pointing.
Oyarzún, Javiera P; Morís, Joaquín; Luque, David; de Diego-Balaguer, Ruth; Fuentemilla, Lluís
2017-08-09
System memory consolidation is conceptualized as an active process whereby newly encoded memory representations are strengthened through selective memory reactivation during sleep. However, our learning experience is highly overlapping in content (i.e., shares common elements), and memories of these events are organized in an intricate network of overlapping associated events. It remains to be explored whether and how selective memory reactivation during sleep has an impact on these overlapping memories acquired during awake time. Here, we test in a group of adult women and men the prediction that selective memory reactivation during sleep entails the reactivation of associated events and that this may lead the brain to adaptively regulate whether these associated memories are strengthened or pruned from memory networks on the basis of their relative associative strength with the shared element. Our findings demonstrate the existence of efficient regulatory neural mechanisms governing how complex memory networks are shaped during sleep as a function of their associative memory strength. SIGNIFICANCE STATEMENT Numerous studies have demonstrated that system memory consolidation is an active, selective, and sleep-dependent process in which only subsets of new memories become stabilized through their reactivation. However, the learning experience is highly overlapping in content and thus events are encoded in an intricate network of related memories. It remains to be explored whether and how memory reactivation has an impact on overlapping memories acquired during awake time. Here, we show that sleep memory reactivation promotes strengthening and weakening of overlapping memories based on their associative memory strength. These results suggest the existence of an efficient regulatory neural mechanism that avoids the formation of cluttered memory representation of multiple events and promotes stabilization of complex memory networks. Copyright © 2017 the authors 0270-6474/17/377748-11$15.00/0.
Cheung, Wing-Yee; Wildschut, Tim; Sedikides, Constantine
2018-02-01
We compared and contrasted nostalgia with rumination and counterfactual thinking in terms of their autobiographical memory functions. Specifically, we assessed individual differences in nostalgia, rumination, and counterfactual thinking, which we then linked to self-reported functions or uses of autobiographical memory (Self-Regard, Boredom Reduction, Death Preparation, Intimacy Maintenance, Conversation, Teach/Inform, and Bitterness Revival). We tested which memory functions are shared and which are uniquely linked to nostalgia. The commonality among nostalgia, rumination, and counterfactual thinking resides in their shared positive associations with all memory functions: individuals who evinced a stronger propensity towards past-oriented thought (as manifested in nostalgia, rumination, and counterfactual thinking) reported greater overall recruitment of memories in the service of present functioning. The uniqueness of nostalgia resides in its comparatively strong positive associations with Intimacy Maintenance, Teach/Inform, and Self-Regard and weak association with Bitterness Revival. In all, nostalgia possesses a more positive functional signature than do rumination and counterfactual thinking.
Mnemonic convergence in social networks: The emergent properties of cognition at a collective level.
Coman, Alin; Momennejad, Ida; Drach, Rae D; Geana, Andra
2016-07-19
The development of shared memories, beliefs, and norms is a fundamental characteristic of human communities. These emergent outcomes are thought to occur owing to a dynamic system of information sharing and memory updating, which fundamentally depends on communication. Here we report results on the formation of collective memories in laboratory-created communities. We manipulated conversational network structure in a series of real-time, computer-mediated interactions in fourteen 10-member communities. The results show that mnemonic convergence, measured as the degree of overlap among community members' memories, is influenced by both individual-level information-processing phenomena and by the conversational social network structure created during conversational recall. By studying laboratory-created social networks, we show how large-scale social phenomena (i.e., collective memory) can emerge out of microlevel local dynamics (i.e., mnemonic reinforcement and suppression effects). The social-interactionist approach proposed herein points to optimal strategies for spreading information in social networks and provides a framework for measuring and forging collective memories in communities of individuals.
Moreau, Noémie; Viallet, François; Champagne-Lavau, Maud
2013-09-01
Theory of mind (TOM) refers to the ability to infer one's own and other's mental states. Growing evidence highlighted the presence of impairment on the most complex TOM tasks in Alzheimer disease (AD). However, how TOM deficit is related to other cognitive dysfunctions and more specifically to episodic memory impairment - the prominent feature of this disease - is still under debate. Recent neuroanatomical findings have shown that remembering past events and inferring others' states of mind share the same cerebral network suggesting the two abilities share a common process .This paper proposes to review emergent evidence of TOM impairment in AD patients and to discuss the evidence of a relationship between TOM and episodic memory. We will discuss about AD patients' deficit in TOM being possibly related to their difficulties in recollecting memories of past social interactions. Copyright © 2013 Elsevier B.V. All rights reserved.
Mental time travel and the shaping of the human mind
Suddendorf, Thomas; Addis, Donna Rose; Corballis, Michael C.
2009-01-01
Episodic memory, enabling conscious recollection of past episodes, can be distinguished from semantic memory, which stores enduring facts about the world. Episodic memory shares a core neural network with the simulation of future episodes, enabling mental time travel into both the past and the future. The notion that there might be something distinctly human about mental time travel has provoked ingenious attempts to demonstrate episodic memory or future simulation in non-human animals, but we argue that they have not yet established a capacity comparable to the human faculty. The evolution of the capacity to simulate possible future events, based on episodic memory, enhanced fitness by enabling action in preparation of different possible scenarios that increased present or future survival and reproduction chances. Human language may have evolved in the first instance for the sharing of past and planned future events, and, indeed, fictional ones, further enhancing fitness in social settings. PMID:19528013
Bhatti, A Aziz
2009-12-01
This study proposes an efficient and improved model of a direct storage bidirectional memory, improved bidirectional associative memory (IBAM), and emphasises the use of nanotechnology for efficient implementation of such large-scale neural network structures at a considerable lower cost reduced complexity, and less area required for implementation. This memory model directly stores the X and Y associated sets of M bipolar binary vectors in the form of (MxN(x)) and (MxN(y)) memory matrices, requires O(N) or about 30% of interconnections with weight strength ranging between +/-1, and is computationally very efficient as compared to sequential, intraconnected and other bidirectional associative memory (BAM) models of outer-product type that require O(N(2)) complex interconnections with weight strength ranging between +/-M. It is shown that it is functionally equivalent to and possesses all attributes of a BAM of outer-product type, and yet it is simple and robust in structure, very large scale integration (VLSI), optical and nanotechnology realisable, modular and expandable neural network bidirectional associative memory model in which the addition or deletion of a pair of vectors does not require changes in the strength of interconnections of the entire memory matrix. The analysis of retrieval process, signal-to-noise ratio, storage capacity and stability of the proposed model as well as of the traditional BAM has been carried out. Constraints on and characteristics of unipolar and bipolar binaries for improved storage and retrieval are discussed. The simulation results show that it has log(e) N times higher storage capacity, superior performance, faster convergence and retrieval time, when compared to traditional sequential and intraconnected bidirectional memories.
Shared prefetching to reduce execution skew in multi-threaded systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Eichenberger, Alexandre E; Gunnels, John A
Mechanisms are provided for optimizing code to perform prefetching of data into a shared memory of a computing device that is shared by a plurality of threads that execute on the computing device. A memory stream of a portion of code that is shared by the plurality of threads is identified. A set of prefetch instructions is distributed across the plurality of threads. Prefetch instructions are inserted into the instruction sequences of the plurality of threads such that each instruction sequence has a separate sub-portion of the set of prefetch instructions, thereby generating optimized code. Executable code is generated basedmore » on the optimized code and stored in a storage device. The executable code, when executed, performs the prefetches associated with the distributed set of prefetch instructions in a shared manner across the plurality of threads.« less
NASA Technical Reports Server (NTRS)
Kosko, Bart
1991-01-01
Mappings between fuzzy cubes are discussed. This level of abstraction provides a surprising and fruitful alternative to the propositional and predicate-calculas reasoning techniques used in expert systems. It allows one to reason with sets instead of propositions. Discussed here are fuzzy and neural function estimators, neural vs. fuzzy representation of structured knowledge, fuzzy vector-matrix multiplication, and fuzzy associative memory (FAM) system architecture.
A shared neural ensemble links distinct contextual memories encoded close in time
NASA Astrophysics Data System (ADS)
Cai, Denise J.; Aharoni, Daniel; Shuman, Tristan; Shobe, Justin; Biane, Jeremy; Song, Weilin; Wei, Brandon; Veshkini, Michael; La-Vu, Mimi; Lou, Jerry; Flores, Sergio E.; Kim, Isaac; Sano, Yoshitake; Zhou, Miou; Baumgaertel, Karsten; Lavi, Ayal; Kamata, Masakazu; Tuszynski, Mark; Mayford, Mark; Golshani, Peyman; Silva, Alcino J.
2016-06-01
Recent studies suggest that a shared neural ensemble may link distinct memories encoded close in time. According to the memory allocation hypothesis, learning triggers a temporary increase in neuronal excitability that biases the representation of a subsequent memory to the neuronal ensemble encoding the first memory, such that recall of one memory increases the likelihood of recalling the other memory. Here we show in mice that the overlap between the hippocampal CA1 ensembles activated by two distinct contexts acquired within a day is higher than when they are separated by a week. Several findings indicate that this overlap of neuronal ensembles links two contextual memories. First, fear paired with one context is transferred to a neutral context when the two contexts are acquired within a day but not across a week. Second, the first memory strengthens the second memory within a day but not across a week. Older mice, known to have lower CA1 excitability, do not show the overlap between ensembles, the transfer of fear between contexts, or the strengthening of the second memory. Finally, in aged mice, increasing cellular excitability and activating a common ensemble of CA1 neurons during two distinct context exposures rescued the deficit in linking memories. Taken together, these findings demonstrate that contextual memories encoded close in time are linked by directing storage into overlapping ensembles. Alteration of these processes by ageing could affect the temporal structure of memories, thus impairing efficient recall of related information.
Rasmussen, Anne S; Habermas, Tilmann
2011-08-01
According to theory, autobiographical memory serves three broad functions of overall usage: directive, self, and social. However, there is evidence to suggest that the tripartite model may be better conceptualised in terms of a four-factor model with two social functions. In the present study we examined the two models in Danish and German samples, using the Thinking About Life Experiences Questionnaire (TALE; Bluck, Alea, Habermas, & Rubin, 2005), which measures the overall usage of the three functions generalised across concrete memories. Confirmatory factor analysis supported the four-factor model and rejected the theoretical three-factor model in both samples. The results are discussed in relation to cultural differences in overall autobiographical memory usage as well as sharing versus non-sharing aspects of social remembering.
Automated quantitative muscle biopsy analysis system
NASA Technical Reports Server (NTRS)
Castleman, Kenneth R. (Inventor)
1980-01-01
An automated system to aid the diagnosis of neuromuscular diseases by producing fiber size histograms utilizing histochemically stained muscle biopsy tissue. Televised images of the microscopic fibers are processed electronically by a multi-microprocessor computer, which isolates, measures, and classifies the fibers and displays the fiber size distribution. The architecture of the multi-microprocessor computer, which is iterated to any required degree of complexity, features a series of individual microprocessors P.sub.n each receiving data from a shared memory M.sub.n-1 and outputing processed data to a separate shared memory M.sub.n+1 under control of a program stored in dedicated memory M.sub.n.
Parallel performance investigations of an unstructured mesh Navier-Stokes solver
NASA Technical Reports Server (NTRS)
Mavriplis, Dimitri J.
2000-01-01
A Reynolds-averaged Navier-Stokes solver based on unstructured mesh techniques for analysis of high-lift configurations is described. The method makes use of an agglomeration multigrid solver for convergence acceleration. Implicit line-smoothing is employed to relieve the stiffness associated with highly stretched meshes. A GMRES technique is also implemented to speed convergence at the expense of additional memory usage. The solver is cache efficient and fully vectorizable, and is parallelized using a two-level hybrid MPI-OpenMP implementation suitable for shared and/or distributed memory architectures, as well as clusters of shared memory machines. Convergence and scalability results are illustrated for various high-lift cases.
Experimental evaluation of multiprocessor cache-based error recovery
NASA Technical Reports Server (NTRS)
Janssens, Bob; Fuchs, W. K.
1991-01-01
Several variations of cache-based checkpointing for rollback error recovery in shared-memory multiprocessors have been recently developed. By modifying the cache replacement policy, these techniques use the inherent redundancy in the memory hierarchy to periodically checkpoint the computation state. Three schemes, different in the manner in which they avoid rollback propagation, are evaluated. By simulation with address traces from parallel applications running on an Encore Multimax shared-memory multiprocessor, the performance effect of integrating the recovery schemes in the cache coherence protocol are evaluated. The results indicate that the cache-based schemes can provide checkpointing capability with low performance overhead but uncontrollable high variability in the checkpoint interval.
The FORCE - A highly portable parallel programming language
NASA Technical Reports Server (NTRS)
Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger
1989-01-01
This paper explains why the FORCE parallel programming language is easily portable among six different shared-memory multiprocessors, and how a two-level macro preprocessor makes it possible to hide low-level machine dependencies and to build machine-independent high-level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared-memory multiprocessor executing them.
The FORCE: A highly portable parallel programming language
NASA Technical Reports Server (NTRS)
Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger
1989-01-01
Here, it is explained why the FORCE parallel programming language is easily portable among six different shared-memory microprocessors, and how a two-level macro preprocessor makes it possible to hide low level machine dependencies and to build machine-independent high level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared memory multiprocessor executing them.
Hybrid MPI+OpenMP Programming of an Overset CFD Solver and Performance Investigations
NASA Technical Reports Server (NTRS)
Djomehri, M. Jahed; Jin, Haoqiang H.; Biegel, Bryan (Technical Monitor)
2002-01-01
This report describes a two level parallelization of a Computational Fluid Dynamic (CFD) solver with multi-zone overset structured grids. The approach is based on a hybrid MPI+OpenMP programming model suitable for shared memory and clusters of shared memory machines. The performance investigations of the hybrid application on an SGI Origin2000 (O2K) machine is reported using medium and large scale test problems.
Performing a local reduction operation on a parallel computer
Blocksome, Michael A; Faraj, Daniel A
2013-06-04
A parallel computer including compute nodes, each including two reduction processing cores, a network write processing core, and a network read processing core, each processing core assigned an input buffer. Copying, in interleaved chunks by the reduction processing cores, contents of the reduction processing cores' input buffers to an interleaved buffer in shared memory; copying, by one of the reduction processing cores, contents of the network write processing core's input buffer to shared memory; copying, by another of the reduction processing cores, contents of the network read processing core's input buffer to shared memory; and locally reducing in parallel by the reduction processing cores: the contents of the reduction processing core's input buffer; every other interleaved chunk of the interleaved buffer; the copied contents of the network write processing core's input buffer; and the copied contents of the network read processing core's input buffer.
Performing a local reduction operation on a parallel computer
Blocksome, Michael A.; Faraj, Daniel A.
2012-12-11
A parallel computer including compute nodes, each including two reduction processing cores, a network write processing core, and a network read processing core, each processing core assigned an input buffer. Copying, in interleaved chunks by the reduction processing cores, contents of the reduction processing cores' input buffers to an interleaved buffer in shared memory; copying, by one of the reduction processing cores, contents of the network write processing core's input buffer to shared memory; copying, by another of the reduction processing cores, contents of the network read processing core's input buffer to shared memory; and locally reducing in parallel by the reduction processing cores: the contents of the reduction processing core's input buffer; every other interleaved chunk of the interleaved buffer; the copied contents of the network write processing core's input buffer; and the copied contents of the network read processing core's input buffer.
Some Problems and Solutions in Transferring Ecosystem Simulation Codes to Supercomputers
NASA Technical Reports Server (NTRS)
Skiles, J. W.; Schulbach, C. H.
1994-01-01
Many computer codes for the simulation of ecological systems have been developed in the last twenty-five years. This development took place initially on main-frame computers, then mini-computers, and more recently, on micro-computers and workstations. Recent recognition of ecosystem science as a High Performance Computing and Communications Program Grand Challenge area emphasizes supercomputers (both parallel and distributed systems) as the next set of tools for ecological simulation. Transferring ecosystem simulation codes to such systems is not a matter of simply compiling and executing existing code on the supercomputer since there are significant differences in the system architectures of sequential, scalar computers and parallel and/or vector supercomputers. To more appropriately match the application to the architecture (necessary to achieve reasonable performance), the parallelism (if it exists) of the original application must be exploited. We discuss our work in transferring a general grassland simulation model (developed on a VAX in the FORTRAN computer programming language) to a Cray Y-MP. We show the Cray shared-memory vector-architecture, and discuss our rationale for selecting the Cray. We describe porting the model to the Cray and executing and verifying a baseline version, and we discuss the changes we made to exploit the parallelism in the application and to improve code execution. As a result, the Cray executed the model 30 times faster than the VAX 11/785 and 10 times faster than a Sun 4 workstation. We achieved an additional speed-up of approximately 30 percent over the original Cray run by using the compiler's vectorizing capabilities and the machine's ability to put subroutines and functions "in-line" in the code. With the modifications, the code still runs at only about 5% of the Cray's peak speed because it makes ineffective use of the vector processing capabilities of the Cray. We conclude with a discussion and future plans.
Influence of local objects on hippocampal representations: landmark vectors and memory
Deshmukh, Sachin S.; Knierim, James J.
2013-01-01
The hippocampus is thought to represent nonspatial information in the context of spatial information. An animal can derive both spatial information as well as nonspatial information from the objects (landmarks) it encounters as it moves around in an environment. Here, we demonstrate correlates of both object-derived spatial as well as nonspatial information in the hippocampus of rats foraging in the presence of objects. We describe a new form of CA1 place cells, called landmark-vector cells, that encode spatial locations as a vector relationship to local landmarks. Such landmark vector relationships can be dynamically encoded. Of the 26 CA1 neurons that developed new fields in the course of a day’s recording sessions, in 8 cases the new fields were located at a similar distance and direction from a landmark as the initial field was located relative to a different landmark. We also demonstrate object-location memory in the hippocampus. When objects were removed from an environment or moved to new locations, a small number of neurons in CA1 and CA3 increased firing at the locations where the objects used to be. In some neurons, this increase occurred only in one location, indicating object +place conjunctive memory; in other neurons the increase in firing was seen at multiple locations where an object used to be. Taken together, these results demonstrate that the spatially restricted firing of hippocampal neurons encode multiple types of information regarding the relationship between an animal’s location and the location of objects in its environment. PMID:23447419
NASA Astrophysics Data System (ADS)
Liu, Tianyu; Du, Xining; Ji, Wei; Xu, X. George; Brown, Forrest B.
2014-06-01
For nuclear reactor analysis such as the neutron eigenvalue calculations, the time consuming Monte Carlo (MC) simulations can be accelerated by using graphics processing units (GPUs). However, traditional MC methods are often history-based, and their performance on GPUs is affected significantly by the thread divergence problem. In this paper we describe the development of a newly designed event-based vectorized MC algorithm for solving the neutron eigenvalue problem. The code was implemented using NVIDIA's Compute Unified Device Architecture (CUDA), and tested on a NVIDIA Tesla M2090 GPU card. We found that although the vectorized MC algorithm greatly reduces the occurrence of thread divergence thus enhancing the warp execution efficiency, the overall simulation speed is roughly ten times slower than the history-based MC code on GPUs. Profiling results suggest that the slow speed is probably due to the memory access latency caused by the large amount of global memory transactions. Possible solutions to improve the code efficiency are discussed.
... this page: //medlineplus.gov/ency/article/003257.htm Memory loss To use the sharing features on this ... Bethesda, MD 20894 U.S. Department of Health and Human Services National Institutes of Health Page last updated: ...
NASA Astrophysics Data System (ADS)
Hutsalyuk, A.; Liashyk, A.; Pakuliak, S. Z.; Ragoucy, E.; Slavnov, N. A.
2016-11-01
We study the scalar products of Bethe vectors in integrable models solvable by the nested algebraic Bethe ansatz and possessing {gl}(2| 1) symmetry. Using explicit formulas of the monodromy matrix entries’ multiple actions onto Bethe vectors we obtain a representation for the scalar product in the most general case. This explicit representation appears to be a sum over partitions of the Bethe parameters. It can be used for the analysis of scalar products involving on-shell Bethe vectors. As a by-product, we obtain a determinant representation for the scalar products of generic Bethe vectors in integrable models with {gl}(1| 1) symmetry. Dedicated to the memory of Petr Petrovich Kulish.
First experience of vectorizing electromagnetic physics models for detector simulation
NASA Astrophysics Data System (ADS)
Amadio, G.; Apostolakis, J.; Bandieramonte, M.; Bianchini, C.; Bitzes, G.; Brun, R.; Canal, P.; Carminati, F.; de Fine Licht, J.; Duhem, L.; Elvira, D.; Gheata, A.; Jun, S. Y.; Lima, G.; Novak, M.; Presbyterian, M.; Shadura, O.; Seghal, R.; Wenzel, S.
2015-12-01
The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. The GeantV vector prototype for detector simulations has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth, parallelization needed to achieve optimal performance or memory access latency and speed. An additional challenge is to avoid the code duplication often inherent to supporting heterogeneous platforms. In this paper we present the first experience of vectorizing electromagnetic physics models developed for the GeantV project.
Graystock, Peter; Goulson, Dave; Hughes, William O H
2015-08-22
The dispersal of parasites is critical for epidemiology, and the interspecific vectoring of parasites when species share resources may play an underappreciated role in parasite dispersal. One of the best examples of such a situation is the shared use of flowers by pollinators, but the importance of flowers and interspecific vectoring in the dispersal of pollinator parasites is poorly understood and frequently overlooked. Here, we use an experimental approach to show that during even short foraging periods of 3 h, three bumblebee parasites and two honeybee parasites were dispersed effectively onto flowers by their hosts, and then vectored readily between flowers by non-host pollinator species. The results suggest that flowers are likely to be hotspots for the transmission of pollinator parasites and that considering potential vector, as well as host, species will be of general importance for understanding the distribution and transmission of parasites in the environment and between pollinators. © 2015 The Author(s).
We Remember, We Forget: Collaborative Remembering in Older Couples
ERIC Educational Resources Information Center
Harris, Celia B.; Keil, Paul G.; Sutton, John; Barnier, Amanda J.; McIlwain, Doris J. F.
2011-01-01
Transactive memory theory describes the processes by which benefits for memory can occur when remembering is shared in dyads or groups. In contrast, cognitive psychology experiments demonstrate that social influences on memory disrupt and inhibit individual recall. However, most research in cognitive psychology has focused on groups of strangers…
76 FR 12821 - 150th Anniversary of the Inauguration of Abraham Lincoln
Federal Register 2010, 2011, 2012, 2013, 2014
2011-03-09
... together by shared memories and common hopes. As we observe the 150th anniversary of his Inauguration, we... his memory enabled America to move beyond a young collection of States to become a free and unified... memory and uphold the principles he so nobly advanced. [[Page 12822
Expert Systems on Multiprocessor Architectures. Volume 2. Technical Reports
1991-06-01
Report RC 12936 (#58037). IBM T. J. Wartson Reiearch Center. July 1987. Alan Jay Smith. Cache memories. Coniputing Sitrry., 1.1(3): I.3-5:30...basic-shared is an instrument for ashared memory design. The components panels are processor- qload-scrolling-bar-panel, memory-qload-scrolling-bar-panel
Blanket Gate Would Address Blocks Of Memory
NASA Technical Reports Server (NTRS)
Lambe, John; Moopenn, Alexander; Thakoor, Anilkumar P.
1988-01-01
Circuit-chip area used more efficiently. Proposed gate structure selectively allows and restricts access to blocks of memory in electronic neural-type network. By breaking memory into independent blocks, gate greatly simplifies problem of reading from and writing to memory. Since blocks not used simultaneously, share operational amplifiers that prompt and read information stored in memory cells. Fewer operational amplifiers needed, and chip area occupied reduced correspondingly. Cost per bit drops as result.
The potential of multi-port optical memories in digital computing
NASA Technical Reports Server (NTRS)
Alford, C. O.; Gaylord, T. K.
1975-01-01
A high-capacity memory with a relatively high data transfer rate and multi-port simultaneous access capability may serve as the basis for new computer architectures. The implementation of a multi-port optical memory is discussed. Several computer structures are presented that might profitably use such a memory. These structures include (1) a simultaneous record access system, (2) a simultaneously shared memory computer system, and (3) a parallel digital processing structure.
NASA Technical Reports Server (NTRS)
OKeefe, Matthew (Editor); Kerr, Christopher L. (Editor)
1998-01-01
This report contains the abstracts and technical papers from the Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications, held June 15-18, 1998, in Scottsdale, Arizona. The purpose of the workshop is to bring together software developers in meteorology and oceanography to discuss software engineering and code design issues for parallel architectures, including Massively Parallel Processors (MPP's), Parallel Vector Processors (PVP's), Symmetric Multi-Processors (SMP's), Distributed Shared Memory (DSM) multi-processors, and clusters. Issues to be discussed include: (1) code architectures for current parallel models, including basic data structures, storage allocation, variable naming conventions, coding rules and styles, i/o and pre/post-processing of data; (2) designing modular code; (3) load balancing and domain decomposition; (4) techniques that exploit parallelism efficiently yet hide the machine-related details from the programmer; (5) tools for making the programmer more productive; and (6) the proliferation of programming models (F--, OpenMP, MPI, and HPF).
Gravitational wave memory in an expanding universe
NASA Astrophysics Data System (ADS)
Tolish, Alexander; Wald, Robert
2016-03-01
We investigate the gravitational wave memory effect in an expanding FLRW spacetime. We find that if the gravitational field is decomposed into gauge-invariant scalar, vector, and tensor modes after the fashion of Bardeen, only the tensor mode gives rise to memory, and this memory can be calculated using the retarded Green's function associated with the tensor wave equation. If locally similar radiation source events occur on flat and FLRW backgrounds, we find that the resulting memories will differ only by a redshift factor, and we explore whether or not this factor depends on the expansion history of the FLRW universe. We compare our results to related work by Bieri, Garfinkle, and Yau.
Methodology for fast detection of false sharing in threaded scientific codes
Chung, I-Hsin; Cong, Guojing; Murata, Hiroki; Negishi, Yasushi; Wen, Hui-Fang
2014-11-25
A profiling tool identifies a code region with a false sharing potential. A static analysis tool classifies variables and arrays in the identified code region. A mapping detection library correlates memory access instructions in the identified code region with variables and arrays in the identified code region while a processor is running the identified code region. The mapping detection library identifies one or more instructions at risk, in the identified code region, which are subject to an analysis by a false sharing detection library. A false sharing detection library performs a run-time analysis of the one or more instructions at risk while the processor is re-running the identified code region. The false sharing detection library determines, based on the performed run-time analysis, whether two different portions of the cache memory line are accessed by the generated binary code.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chin, George; Marquez, Andres; Choudhury, Sutanay
2012-09-01
Triadic analysis encompasses a useful set of graph mining methods that is centered on the concept of a triad, which is a subgraph of three nodes and the configuration of directed edges across the nodes. Such methods are often applied in the social sciences as well as many other diverse fields. Triadic methods commonly operate on a triad census that counts the number of triads of every possible edge configuration in a graph. Like other graph algorithms, triadic census algorithms do not scale well when graphs reach tens of millions to billions of nodes. To enable the triadic analysis ofmore » large-scale graphs, we developed and optimized a triad census algorithm to efficiently execute on shared memory architectures. We will retrace the development and evolution of a parallel triad census algorithm. Over the course of several versions, we continually adapted the code’s data structures and program logic to expose more opportunities to exploit parallelism on shared memory that would translate into improved computational performance. We will recall the critical steps and modifications that occurred during code development and optimization. Furthermore, we will compare the performances of triad census algorithm versions on three specific systems: Cray XMT, HP Superdome, and AMD multi-core NUMA machine. These three systems have shared memory architectures but with markedly different hardware capabilities to manage parallelism.« less
An Adaptive Insertion and Promotion Policy for Partitioned Shared Caches
NASA Astrophysics Data System (ADS)
Mahrom, Norfadila; Liebelt, Michael; Raof, Rafikha Aliana A.; Daud, Shuhaizar; Hafizah Ghazali, Nur
2018-03-01
Cache replacement policies in chip multiprocessors (CMP) have been investigated extensively and proven able to enhance shared cache management. However, competition among multiple processors executing different threads that require simultaneous access to a shared memory may cause cache contention and memory coherence problems on the chip. These issues also exist due to some drawbacks of the commonly used Least Recently Used (LRU) policy employed in multiprocessor systems, which are because of the cache lines residing in the cache longer than required. In image processing analysis of for example extra pulmonary tuberculosis (TB), an accurate diagnosis for tissue specimen is required. Therefore, a fast and reliable shared memory management system to execute algorithms for processing vast amount of specimen image is needed. In this paper, the effects of the cache replacement policy in a partitioned shared cache are investigated. The goal is to quantify whether better performance can be achieved by using less complex replacement strategies. This paper proposes a Middle Insertion 2 Positions Promotion (MI2PP) policy to eliminate cache misses that could adversely affect the access patterns and the throughput of the processors in the system. The policy employs a static predefined insertion point, near distance promotion, and the concept of ownership in the eviction policy to effectively improve cache thrashing and to avoid resource stealing among the processors.
ERIC Educational Resources Information Center
Kosaki, Yutaka; Poulter, Steven L.; Austen, Joe M.; McGregor, Anthony
2015-01-01
In three experiments, the nature of the interaction between multiple memory systems in rats solving a variation of a spatial task in the water maze was investigated. Throughout training rats were able to find a submerged platform at a fixed distance and direction from an intramaze landmark by learning a landmark-goal vector. Extramaze cues were…
Nowicki, Dimitri; Siegelmann, Hava
2010-01-01
This paper introduces a new model of associative memory, capable of both binary and continuous-valued inputs. Based on kernel theory, the memory model is on one hand a generalization of Radial Basis Function networks and, on the other, is in feature space, analogous to a Hopfield network. Attractors can be added, deleted, and updated on-line simply, without harming existing memories, and the number of attractors is independent of input dimension. Input vectors do not have to adhere to a fixed or bounded dimensionality; they can increase and decrease it without relearning previous memories. A memory consolidation process enables the network to generalize concepts and form clusters of input data, which outperforms many unsupervised clustering techniques; this process is demonstrated on handwritten digits from MNIST. Another process, reminiscent of memory reconsolidation is introduced, in which existing memories are refreshed and tuned with new inputs; this process is demonstrated on series of morphed faces. PMID:20552013
Hierarchical Traces for Reduced NSM Memory Requirements
NASA Astrophysics Data System (ADS)
Dahl, Torbjørn S.
This paper presents work on using hierarchical long term memory to reduce the memory requirements of nearest sequence memory (NSM) learning, a previously published, instance-based reinforcement learning algorithm. A hierarchical memory representation reduces the memory requirements by allowing traces to share common sub-sequences. We present moderated mechanisms for estimating discounted future rewards and for dealing with hidden state using hierarchical memory. We also present an experimental analysis of how the sub-sequence length affects the memory compression achieved and show that the reduced memory requirements do not effect the speed of learning. Finally, we analyse and discuss the persistence of the sub-sequences independent of specific trace instances.
Memory in aged mice is rescued by enhanced expression of the GluN2B subunit of the NMDA receptor
Brim, B. L.; Haskell, R.; Awedikian, R.; Ellinwood, N.M.; Jin, L.; Kumar, A.; Foster, T.C.; Magnusson, K.
2012-01-01
The GluN2B subunit of the N-methyl-D-aspartate (NMDA) receptor shows age-related declines in expression across the frontal cortex and hippocampus. This decline is strongly correlated to age-related memory declines. This study was designed to determine if increasing GluN2B subunit expression in the frontal lobe or hippocampus would improve memory in aged mice. Mice were injected bilaterally with either the GluN2B vector, containing cDNA specific for the GluN2B subunit and enhanced Green Fluorescent Protein (eGFP); a control vector or vehicle. Spatial memory, cognitive flexibility, and associative memory were assessed using the Morris water maze. Aged mice, with increased GluN2B subunit expression, exhibited improved long-term spatial memory, comparable to young mice. However, memory was rescued on different days in the Morris water maze; early for hippocampal GluN2B subunit enrichment and later for the frontal lobe. A higher concentration of the GluN2B antagonist, Ro 25-6981, was required to impair long-term spatial memory in aged mice with enhanced GluN2B expression, as compared to aged controls, suggesting there was an increase in the number of GluN2B-containing NMDA receptors. In addition, hippocampal slices from aged mice with increased GluN2B subunit expression exhibited enhanced NMDA receptor-mediated excitatory post-synaptic potentials (EPSP). Treatment with Ro 25-6981 showed that a greater proportion of the NMDA receptor-mediated EPSP was due to the GluN2B subunit in these animals, as compared to aged controls. These results suggest that increasing the production of the GluN2B subunit in aged animals enhances memory and synaptic transmission. Therapies that enhance GluN2B subunit expression within the aged brain may be useful for ameliorating age-related memory declines. PMID:23103326
The Contribution of Working Memory to Fluid Reasoning: Capacity, Control, or Both?
ERIC Educational Resources Information Center
Chuderski, Adam; Necka, Edward
2012-01-01
Fluid reasoning shares a large part of its variance with working memory capacity (WMC). The literature on working memory (WM) suggests that the capacity of the focus of attention responsible for simultaneous maintenance and integration of information within WM, as well as the effectiveness of executive control exerted over WM, determines…
ERIC Educational Resources Information Center
Olivers, Christian N. L.; Meijer, Frank; Theeuwes, Jan
2006-01-01
In 7 experiments, the authors explored whether visual attention (the ability to select relevant visual information) and visual working memory (the ability to retain relevant visual information) share the same content representations. The presence of singleton distractors interfered more strongly with a visual search task when it was accompanied by…
Time-Related Decay or Interference-Based Forgetting in Working Memory?
ERIC Educational Resources Information Center
Portrat, Sophie; Barrouillet, Pierre; Camos, Valerie
2008-01-01
The time-based resource-sharing model of working memory assumes that memory traces suffer from a time-related decay when attention is occupied by concurrent activities. Using complex continuous span tasks in which temporal parameters are carefully controlled, P. Barrouillet, S. Bernardin, S. Portrat, E. Vergauwe, & V. Camos (2007) recently…
Developmental Change in Working Memory Strategies: From Passive Maintenance to Active Refreshing
ERIC Educational Resources Information Center
Camos, Valerie; Barrouillet, Pierre
2011-01-01
Change in strategies is often mentioned as a source of memory development. However, though performance in working memory tasks steadily improves during childhood, theories differ in linking this development to strategy changes. Whereas some theories, such as the time-based resource-sharing model, invoke the age-related increase in use and…
Cache write generate for parallel image processing on shared memory architectures.
Wittenbrink, C M; Somani, A K; Chen, C H
1996-01-01
We investigate cache write generate, our cache mode invention. We demonstrate that for parallel image processing applications, the new mode improves main memory bandwidth, CPU efficiency, cache hits, and cache latency. We use register level simulations validated by the UW-Proteus system. Many memory, cache, and processor configurations are evaluated.
Mnemonic convergence in social networks: The emergent properties of cognition at a collective level
Coman, Alin; Momennejad, Ida; Drach, Rae D.; Geana, Andra
2016-01-01
The development of shared memories, beliefs, and norms is a fundamental characteristic of human communities. These emergent outcomes are thought to occur owing to a dynamic system of information sharing and memory updating, which fundamentally depends on communication. Here we report results on the formation of collective memories in laboratory-created communities. We manipulated conversational network structure in a series of real-time, computer-mediated interactions in fourteen 10-member communities. The results show that mnemonic convergence, measured as the degree of overlap among community members’ memories, is influenced by both individual-level information-processing phenomena and by the conversational social network structure created during conversational recall. By studying laboratory-created social networks, we show how large-scale social phenomena (i.e., collective memory) can emerge out of microlevel local dynamics (i.e., mnemonic reinforcement and suppression effects). The social-interactionist approach proposed herein points to optimal strategies for spreading information in social networks and provides a framework for measuring and forging collective memories in communities of individuals. PMID:27357678
Vascular system modeling in parallel environment - distributed and shared memory approaches
Jurczuk, Krzysztof; Kretowski, Marek; Bezy-Wendling, Johanne
2011-01-01
The paper presents two approaches in parallel modeling of vascular system development in internal organs. In the first approach, new parts of tissue are distributed among processors and each processor is responsible for perfusing its assigned parts of tissue to all vascular trees. Communication between processors is accomplished by passing messages and therefore this algorithm is perfectly suited for distributed memory architectures. The second approach is designed for shared memory machines. It parallelizes the perfusion process during which individual processing units perform calculations concerning different vascular trees. The experimental results, performed on a computing cluster and multi-core machines, show that both algorithms provide a significant speedup. PMID:21550891
Importance of balanced architectures in the design of high-performance imaging systems
NASA Astrophysics Data System (ADS)
Sgro, Joseph A.; Stanton, Paul C.
1999-03-01
Imaging systems employed in demanding military and industrial applications, such as automatic target recognition and computer vision, typically require real-time high-performance computing resources. While high- performances computing systems have traditionally relied on proprietary architectures and custom components, recent advances in high performance general-purpose microprocessor technology have produced an abundance of low cost components suitable for use in high-performance computing systems. A common pitfall in the design of high performance imaging system, particularly systems employing scalable multiprocessor architectures, is the failure to balance computational and memory bandwidth. The performance of standard cluster designs, for example, in which several processors share a common memory bus, is typically constrained by memory bandwidth. The symptom characteristic of this problem is failure to the performance of the system to scale as more processors are added. The problem becomes exacerbated if I/O and memory functions share the same bus. The recent introduction of microprocessors with large internal caches and high performance external memory interfaces makes it practical to design high performance imaging system with balanced computational and memory bandwidth. Real word examples of such designs will be presented, along with a discussion of adapting algorithm design to best utilize available memory bandwidth.
NASA Astrophysics Data System (ADS)
Li, W.; Shao, H.
2017-12-01
For geospatial cyberinfrastructure enabled web services, the ability of rapidly transmitting and sharing spatial data over the Internet plays a critical role to meet the demands of real-time change detection, response and decision-making. Especially for the vector datasets which serve as irreplaceable and concrete material in data-driven geospatial applications, their rich geometry and property information facilitates the development of interactive, efficient and intelligent data analysis and visualization applications. However, the big-data issues of vector datasets have hindered their wide adoption in web services. In this research, we propose a comprehensive optimization strategy to enhance the performance of vector data transmitting and processing. This strategy combines: 1) pre- and on-the-fly generalization, which automatically determines proper simplification level through the introduction of appropriate distance tolerance (ADT) to meet various visualization requirements, and at the same time speed up simplification efficiency; 2) a progressive attribute transmission method to reduce data size and therefore the service response time; 3) compressed data transmission and dynamic adoption of a compression method to maximize the service efficiency under different computing and network environments. A cyberinfrastructure web portal was developed for implementing the proposed technologies. After applying our optimization strategies, substantial performance enhancement is achieved. We expect this work to widen the use of web service providing vector data to support real-time spatial feature sharing, visual analytics and decision-making.
Nakahara, Kiyoshi; Adachi, Ken; Kawasaki, Keisuke; Matsuo, Takeshi; Sawahata, Hirohito; Majima, Kei; Takeda, Masaki; Sugiyama, Sayaka; Nakata, Ryota; Iijima, Atsuhiko; Tanigawa, Hisashi; Suzuki, Takafumi; Kamitani, Yukiyasu; Hasegawa, Isao
2016-01-01
Highly localized neuronal spikes in primate temporal cortex can encode associative memory; however, whether memory formation involves area-wide reorganization of ensemble activity, which often accompanies rhythmicity, or just local microcircuit-level plasticity, remains elusive. Using high-density electrocorticography, we capture local-field potentials spanning the monkey temporal lobes, and show that the visual pair-association (PA) memory is encoded in spatial patterns of theta activity in areas TE, 36, and, partially, in the parahippocampal cortex, but not in the entorhinal cortex. The theta patterns elicited by learned paired associates are distinct between pairs, but similar within pairs. This pattern similarity, emerging through novel PA learning, allows a machine-learning decoder trained on theta patterns elicited by a particular visual item to correctly predict the identity of those elicited by its paired associate. Our results suggest that the formation and sharing of widespread cortical theta patterns via learning-induced reorganization are involved in the mechanisms of associative memory representation. PMID:27282247
Spiegel, M A; Koester, D; Weigelt, M; Schack, T
2012-02-16
How much cognitive effort does it take to change a movement plan? In previous studies, it has been shown that humans plan and represent actions in advance, but it remains unclear whether or not action planning and verbal working memory share cognitive resources. Using a novel experimental paradigm, we combined in two experiments a grasp-to-place task with a verbal working memory task. Participants planned a placing movement toward one of two target positions and subsequently encoded and maintained visually presented letters. Both experiments revealed that re-planning the intended action reduced letter recall performance; execution time, however, was not influenced by action modifications. The results of Experiment 2 suggest that the action's interference with verbal working memory arose during the planning rather than the execution phase of the movement. Together, our results strongly suggest that movement planning and verbal working memory share common cognitive resources. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
International solar polar mission: The vector helium magnetometer
NASA Technical Reports Server (NTRS)
1982-01-01
The functional requirements for the vector helium magnetometer (VHM) on the Solar Polar spacecraft are presented. The VHM is one of the two magnetometers on board that will measure the vector magnetic field along the Earth to Jupiter transfer trajectory, as well as in the vicinity of Jupiter and along the solar polar orbit following the Jupiter encounter. The interconnection between these two magnetometers and their shared data processing unit is illustrated.
Demonstrating the Direction of Angular Velocity in Circular Motion
NASA Astrophysics Data System (ADS)
Demircioglu, Salih; Yurumezoglu, Kemal; Isik, Hakan
2015-09-01
Rotational motion is ubiquitous in nature, from astronomical systems to household devices in everyday life to elementary models of atoms. Unlike the tangential velocity vector that represents the instantaneous linear velocity (magnitude and direction), an angular velocity vector is conceptually more challenging for students to grasp. In physics classrooms, the direction of an angular velocity vector is taught by the right-hand rule, a mnemonic tool intended to aid memory. A setup constructed for instructional purposes may provide students with a more easily understood and concrete method to observe the direction of the angular velocity. This article attempts to demonstrate the angular velocity vector using the observable motion of a screw mounted to a remotely operated toy car.
Vectorization and parallelization of the finite strip method for dynamic Mindlin plate problems
NASA Technical Reports Server (NTRS)
Chen, Hsin-Chu; He, Ai-Fang
1993-01-01
The finite strip method is a semi-analytical finite element process which allows for a discrete analysis of certain types of physical problems by discretizing the domain of the problem into finite strips. This method decomposes a single large problem into m smaller independent subproblems when m harmonic functions are employed, thus yielding natural parallelism at a very high level. In this paper we address vectorization and parallelization strategies for the dynamic analysis of simply-supported Mindlin plate bending problems and show how to prevent potential conflicts in memory access during the assemblage process. The vector and parallel implementations of this method and the performance results of a test problem under scalar, vector, and vector-concurrent execution modes on the Alliant FX/80 are also presented.
Iterative free-energy optimization for recurrent neural networks (INFERNO).
Pitti, Alexandre; Gaussier, Philippe; Quoy, Mathias
2017-01-01
The intra-parietal lobe coupled with the Basal Ganglia forms a working memory that demonstrates strong planning capabilities for generating robust yet flexible neuronal sequences. Neurocomputational models however, often fails to control long range neural synchrony in recurrent spiking networks due to spontaneous activity. As a novel framework based on the free-energy principle, we propose to see the problem of spikes' synchrony as an optimization problem of the neurons sub-threshold activity for the generation of long neuronal chains. Using a stochastic gradient descent, a reinforcement signal (presumably dopaminergic) evaluates the quality of one input vector to move the recurrent neural network to a desired activity; depending on the error made, this input vector is strengthened to hill-climb the gradient or elicited to search for another solution. This vector can be learned then by one associative memory as a model of the basal-ganglia to control the recurrent neural network. Experiments on habit learning and on sequence retrieving demonstrate the capabilities of the dual system to generate very long and precise spatio-temporal sequences, above two hundred iterations. Its features are applied then to the sequential planning of arm movements. In line with neurobiological theories, we discuss its relevance for modeling the cortico-basal working memory to initiate flexible goal-directed neuronal chains of causation and its relation to novel architectures such as Deep Networks, Neural Turing Machines and the Free-Energy Principle.
Synapsin Determines Memory Strength after Punishment- and Relief-Learning
Niewalda, Thomas; Michels, Birgit; Jungnickel, Roswitha; Diegelmann, Sören; Kleber, Jörg; Kähne, Thilo
2015-01-01
Adverse life events can induce two kinds of memory with opposite valence, dependent on timing: “negative” memories for stimuli preceding them and “positive” memories for stimuli experienced at the moment of “relief.” Such punishment memory and relief memory are found in insects, rats, and man. For example, fruit flies (Drosophila melanogaster) avoid an odor after odor-shock training (“forward conditioning” of the odor), whereas after shock-odor training (“backward conditioning” of the odor) they approach it. Do these timing-dependent associative processes share molecular determinants? We focus on the role of Synapsin, a conserved presynaptic phosphoprotein regulating the balance between the reserve pool and the readily releasable pool of synaptic vesicles. We find that a lack of Synapsin leaves task-relevant sensory and motor faculties unaffected. In contrast, both punishment memory and relief memory scores are reduced. These defects reflect a true lessening of associative memory strength, as distortions in nonassociative processing (e.g., susceptibility to handling, adaptation, habituation, sensitization), discrimination ability, and changes in the time course of coincidence detection can be ruled out as alternative explanations. Reductions in punishment- and relief-memory strength are also observed upon an RNAi-mediated knock-down of Synapsin, and are rescued both by acutely restoring Synapsin and by locally restoring it in the mushroom bodies of mutant flies. Thus, both punishment memory and relief memory require the Synapsin protein and in this sense share genetic and molecular determinants. We note that corresponding molecular commonalities between punishment memory and relief memory in humans would constrain pharmacological attempts to selectively interfere with excessive associative punishment memories, e.g., after traumatic experiences. PMID:25972175
Synapsin determines memory strength after punishment- and relief-learning.
Niewalda, Thomas; Michels, Birgit; Jungnickel, Roswitha; Diegelmann, Sören; Kleber, Jörg; Kähne, Thilo; Gerber, Bertram
2015-05-13
Adverse life events can induce two kinds of memory with opposite valence, dependent on timing: "negative" memories for stimuli preceding them and "positive" memories for stimuli experienced at the moment of "relief." Such punishment memory and relief memory are found in insects, rats, and man. For example, fruit flies (Drosophila melanogaster) avoid an odor after odor-shock training ("forward conditioning" of the odor), whereas after shock-odor training ("backward conditioning" of the odor) they approach it. Do these timing-dependent associative processes share molecular determinants? We focus on the role of Synapsin, a conserved presynaptic phosphoprotein regulating the balance between the reserve pool and the readily releasable pool of synaptic vesicles. We find that a lack of Synapsin leaves task-relevant sensory and motor faculties unaffected. In contrast, both punishment memory and relief memory scores are reduced. These defects reflect a true lessening of associative memory strength, as distortions in nonassociative processing (e.g., susceptibility to handling, adaptation, habituation, sensitization), discrimination ability, and changes in the time course of coincidence detection can be ruled out as alternative explanations. Reductions in punishment- and relief-memory strength are also observed upon an RNAi-mediated knock-down of Synapsin, and are rescued both by acutely restoring Synapsin and by locally restoring it in the mushroom bodies of mutant flies. Thus, both punishment memory and relief memory require the Synapsin protein and in this sense share genetic and molecular determinants. We note that corresponding molecular commonalities between punishment memory and relief memory in humans would constrain pharmacological attempts to selectively interfere with excessive associative punishment memories, e.g., after traumatic experiences. Copyright © 2015 Niewalda et al.
Wu, Y; Ling, F; Hou, J; Guo, S; Wang, J; Gong, Z
2016-07-01
Vector-borne diseases are one of the world's major public health threats and annually responsible for 30-50% of deaths reported to the national notifiable disease system in China. To control vector-borne diseases, a unified, effective and economic surveillance system is urgently needed; all of the current surveillance systems in China waste resources and/or information. Here, we review some current surveillance systems and present a concept for an integrated surveillance system combining existing vector and vector-borne disease monitoring systems. The integrated surveillance system has been tested in pilot programmes in China and led to a 21·6% cost saving in rodent-borne disease surveillance. We share some experiences gained from these programmes.
Execute-Only Attacks against Execute-Only Defenses
2015-11-13
attacks that have been widely used to bypass randomization-based memory corruption defenses. A recent technique, Readactor, provides one of the... corruption defenses with various impacts. We analyze the prevalence of opportunities for such attacks in popular code bases and build two proof-of-concept...our countermeasures introduce only a modest additional overhead. I. INTRODUCTION Memory corruption has been a primary vector of attacks against
Audience tuning effects in the context of situated and embodied processes.
Semin, Gün R
2018-03-05
This review provides an overview of the research on communication and the 'Saying is Believing' paradigm in the context of different perspectives on communication. The process of 'audience tuning' is shaped by a variety of situated factors in contexts that affect the communicators' confidence in their message. The overwhelming common denominator is that the combination of features that create ambiguity yields the optimal condition for the formation of shared realities. I conclude with an argument that the implied invariance of memory processes in shared reality work needs to be more attentive to the regulatory function of memories driving the expression of shared realities. Copyright © 2018 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Kajiyama, Shinya; Fujito, Masamichi; Kasai, Hideo; Mizuno, Makoto; Yamaguchi, Takanori; Shinagawa, Yutaka
A novel 300MHz embedded flash memory for dual-core microcontrollers with a shared ROM architecture is proposed. One of its features is a three-stage pipeline read operation, which enables reduced access pitch and therefore reduces performance penalty due to conflict of shared ROM accesses. Another feature is a highly sensitive sense amplifier that achieves efficient pipeline operation with two-cycle latency one-cycle pitch as a result of a shortened sense time of 0.63ns. The combination of the pipeline architecture and proposed sense amplifiers significantly reduces access-conflict penalties with shared ROM and enhances performance of 32-bit RISC dual-core microcontrollers by 30%.
A general model for memory interference in a multiprocessor system with memory hierarchy
NASA Technical Reports Server (NTRS)
Taha, Badie A.; Standley, Hilda M.
1989-01-01
The problem of memory interference in a multiprocessor system with a hierarchy of shared buses and memories is addressed. The behavior of the processors is represented by a sequence of memory requests with each followed by a determined amount of processing time. A statistical queuing network model for determining the extent of memory interference in multiprocessor systems with clusters of memory hierarchies is presented. The performance of the system is measured by the expected number of busy memory clusters. The results of the analytic model are compared with simulation results, and the correlation between them is found to be very high.
Vergauwe, Evie; Hartstra, Egbert; Barrouillet, Pierre; Brass, Marcel
2015-07-15
Working memory is often defined in cognitive psychology as a system devoted to the simultaneous processing and maintenance of information. In line with the time-based resource-sharing model of working memory (TBRS; Barrouillet and Camos, 2015; Barrouillet et al., 2004), there is accumulating evidence that, when memory items have to be maintained while performing a concurrent activity, memory performance depends on the cognitive load of this activity, independently of the domain involved. The present study used fMRI to identify regions in the brain that are sensitive to variations in cognitive load in a domain-general way. More precisely, we aimed at identifying brain areas that activate during maintenance of memory items as a direct function of the cognitive load induced by both verbal and spatial concurrent tasks. Results show that the right IFJ and bilateral SPL/IPS are the only areas showing an increased involvement as cognitive load increases and do so in a domain general manner. When correlating the fMRI signal with the approximated cognitive load as defined by the TBRS model, it was shown that the main focus of the cognitive load-related activation is located in the right IFJ. The present findings indicate that the IFJ makes domain-general contributions to time-based resource-sharing in working memory and allowed us to generate the novel hypothesis by which the IFJ might be the neural basis for the process of rapid switching. We argue that the IFJ might be a crucial part of a central attentional bottleneck in the brain because of its inability to upload more than one task rule at once. Copyright © 2015 Elsevier Inc. All rights reserved.
Cox, Gregory E; Hemmer, Pernille; Aue, William R; Criss, Amy H
2018-04-01
The development of memory theory has been constrained by a focus on isolated tasks rather than the processes and information that are common to situations in which memory is engaged. We present results from a study in which 453 participants took part in five different memory tasks: single-item recognition, associative recognition, cued recall, free recall, and lexical decision. Using hierarchical Bayesian techniques, we jointly analyzed the correlations between tasks within individuals-reflecting the degree to which tasks rely on shared cognitive processes-and within items-reflecting the degree to which tasks rely on the same information conveyed by the item. Among other things, we find that (a) the processes involved in lexical access and episodic memory are largely separate and rely on different kinds of information, (b) access to lexical memory is driven primarily by perceptual aspects of a word, (c) all episodic memory tasks rely to an extent on a set of shared processes which make use of semantic features to encode both single words and associations between words, and (d) recall involves additional processes likely related to contextual cuing and response production. These results provide a large-scale picture of memory across different tasks which can serve to drive the development of comprehensive theories of memory. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
NASA Technical Reports Server (NTRS)
Habiby, Sarry F.; Collins, Stuart A., Jr.
1987-01-01
The design and implementation of a digital (numerical) optical matrix-vector multiplier are presented. A Hughes liquid crystal light valve, the residue arithmetic representation, and a holographic optical memory are used to construct position coded optical look-up tables. All operations are performed in effectively one light valve response time with a potential for a high information density.
Habiby, S F; Collins, S A
1987-11-01
The design and implementation of a digital (numerical) optical matrix-vector multiplier are presented. A Hughes liquid crystal light valve, the residue arithmetic representation, and a holographic optical memory are used to construct position coded optical look-up tables. All operations are performed in effectively one light valve response time with a potential for a high information density.
Computational mechanics analysis tools for parallel-vector supercomputers
NASA Technical Reports Server (NTRS)
Storaasli, Olaf O.; Nguyen, Duc T.; Baddourah, Majdi; Qin, Jiangning
1993-01-01
Computational algorithms for structural analysis on parallel-vector supercomputers are reviewed. These parallel algorithms, developed by the authors, are for the assembly of structural equations, 'out-of-core' strategies for linear equation solution, massively distributed-memory equation solution, unsymmetric equation solution, general eigensolution, geometrically nonlinear finite element analysis, design sensitivity analysis for structural dynamics, optimization search analysis and domain decomposition. The source code for many of these algorithms is available.
Multiprocessing MCNP on an IBM RS/6000 cluster
DOE Office of Scientific and Technical Information (OSTI.GOV)
McKinney, G.W.; West, J.T.
1993-01-01
The advent of high-performance computer systems has brought to maturity programming concepts like vectorization, multiprocessing, and multitasking. While there are many schools of thought as to the most significant factor in obtaining order-of-magnitude increases in performance, such speedup can only be achieved by integrating the computer system and application code. Vectorization leads to faster manipulation of arrays by overlapping instruction CPU cycles. Discrete ordinates codes, which require the solving of large matrices, have proved to be major benefactors of vectorization. Monte Carlo transport, on the other hand, typically contains numerous logic statements and requires extensive redevelopment to benefit from vectorization.more » Multiprocessing and multitasking provide additional CPU cycles via multiple processors. Such systems are generally designed with either common memory access (multitasking) or distributed memory access. In both cases, theoretical speedup, as a function of the number of processors (P) and the fraction of task time that multiprocesses (f), can be formulated using Amdahl's Law S ((f,P) = 1 f + f/P). However, for most applications this theoretical limit cannot be achieved, due to additional terms not included in Amdahl's Law. Monte Carlo transport is a natural candidate for multiprocessing, since the particle tracks are generally independent and the precision of the result increases as the square root of the number of particles tracked.« less
Poulter, Steven L.; Austen, Joe M.
2015-01-01
In three experiments, the nature of the interaction between multiple memory systems in rats solving a variation of a spatial task in the water maze was investigated. Throughout training rats were able to find a submerged platform at a fixed distance and direction from an intramaze landmark by learning a landmark-goal vector. Extramaze cues were also available for standard place learning, or “cognitive mapping,” but these cues were valid only within each session, as the position of the platform moved around the pool between sessions together with the intramaze landmark. Animals could therefore learn the position of the platform by taking the consistent vector from the landmark across sessions or by rapidly encoding the new platform position on each session with reference to the extramaze cues. Excitotoxic lesions of the dorsolateral striatum impaired vector-based learning but facilitated cognitive map-based rapid place learning when the extramaze cues were relatively poor (Experiment 1) but not when they were more salient (Experiments 2 and 3). The way the lesion effects interacted with cue availability is consistent with the idea that the memory systems involved in the current navigation task are functionally cooperative yet associatively competitive in nature. PMID:25691518
ERIC Educational Resources Information Center
Olivers, Christian N. L.
2009-01-01
An important question is whether visual attention (the ability to select relevant visual information) and visual working memory (the ability to retain relevant visual information) share the same content representations. Some past research has indicated that they do: Singleton distractors interfered more strongly with a visual search task when they…
Discrete Resource Allocation in Visual Working Memory
ERIC Educational Resources Information Center
Barton, Brian; Ester, Edward F.; Awh, Edward
2009-01-01
Are resources in visual working memory allocated in a continuous or a discrete fashion? On one hand, flexible resource models suggest that capacity is determined by a central resource pool that can be flexibly divided such that items of greater complexity receive a larger share of resources. On the other hand, if capacity in working memory is…
Principe, Gabrielle F.; Schindewolf, Erica
2012-01-01
Research on factors that can affect the accuracy of children’s autobiographical remembering has important implications for understanding the abilities of young witnesses to provide legal testimony. In this article, we review our own recent research on one factor that has much potential to induce errors in children’s event recall, namely natural memory sharing conversations with peers and parents. Our studies provide compelling evidence that not only can the content of conversations about the past intrude into later memory but that such exchanges can prompt the generation of entirely false narratives that are more detailed than true accounts of experienced events. Further, our work show that deeper and more creative participation in memory sharing dialogues can boost the damaging effects of conversationally conveyed misinformation. Implications of this collection of findings for children’s testimony are discussed. PMID:23129880
Blocksome, Michael A.; Mamidala, Amith R.
2013-09-03
Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to segments of shared random access memory through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and a segment of shared memory; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.
Blocksome, Michael A; Mamidala, Amith R
2014-02-11
Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to segments of shared random access memory through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and a segment of shared memory; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.
Implementation of Parallel Dynamic Simulation on Shared-Memory vs. Distributed-Memory Environments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jin, Shuangshuang; Chen, Yousu; Wu, Di
2015-12-09
Power system dynamic simulation computes the system response to a sequence of large disturbance, such as sudden changes in generation or load, or a network short circuit followed by protective branch switching operation. It consists of a large set of differential and algebraic equations, which is computational intensive and challenging to solve using single-processor based dynamic simulation solution. High-performance computing (HPC) based parallel computing is a very promising technology to speed up the computation and facilitate the simulation process. This paper presents two different parallel implementations of power grid dynamic simulation using Open Multi-processing (OpenMP) on shared-memory platform, and Messagemore » Passing Interface (MPI) on distributed-memory clusters, respectively. The difference of the parallel simulation algorithms and architectures of the two HPC technologies are illustrated, and their performances for running parallel dynamic simulation are compared and demonstrated.« less
Parallelization of NAS Benchmarks for Shared Memory Multiprocessors
NASA Technical Reports Server (NTRS)
Waheed, Abdul; Yan, Jerry C.; Saini, Subhash (Technical Monitor)
1998-01-01
This paper presents our experiences of parallelizing the sequential implementation of NAS benchmarks using compiler directives on SGI Origin2000 distributed shared memory (DSM) system. Porting existing applications to new high performance parallel and distributed computing platforms is a challenging task. Ideally, a user develops a sequential version of the application, leaving the task of porting to new generations of high performance computing systems to parallelization tools and compilers. Due to the simplicity of programming shared-memory multiprocessors, compiler developers have provided various facilities to allow the users to exploit parallelism. Native compilers on SGI Origin2000 support multiprocessing directives to allow users to exploit loop-level parallelism in their programs. Additionally, supporting tools can accomplish this process automatically and present the results of parallelization to the users. We experimented with these compiler directives and supporting tools by parallelizing sequential implementation of NAS benchmarks. Results reported in this paper indicate that with minimal effort, the performance gain is comparable with the hand-parallelized, carefully optimized, message-passing implementations of the same benchmarks.
A Neural Network Architecture For Rapid Model Indexing In Computer Vision Systems
NASA Astrophysics Data System (ADS)
Pawlicki, Ted
1988-03-01
Models of objects stored in memory have been shown to be useful for guiding the processing of computer vision systems. A major consideration in such systems, however, is how stored models are initially accessed and indexed by the system. As the number of stored models increases, the time required to search memory for the correct model becomes high. Parallel distributed, connectionist, neural networks' have been shown to have appealing content addressable memory properties. This paper discusses an architecture for efficient storage and reference of model memories stored as stable patterns of activity in a parallel, distributed, connectionist, neural network. The emergent properties of content addressability and resistance to noise are exploited to perform indexing of the appropriate object centered model from image centered primitives. The system consists of three network modules each of which represent information relative to a different frame of reference. The model memory network is a large state space vector where fields in the vector correspond to ordered component objects and relative, object based spatial relationships between the component objects. The component assertion network represents evidence about the existence of object primitives in the input image. It establishes local frames of reference for object primitives relative to the image based frame of reference. The spatial relationship constraint network is an intermediate representation which enables the association between the object based and the image based frames of reference. This intermediate level represents information about possible object orderings and establishes relative spatial relationships from the image based information in the component assertion network below. It is also constrained by the lawful object orderings in the model memory network above. The system design is consistent with current psychological theories of recognition by component. It also seems to support Marr's notions of hierarchical indexing. (i.e. the specificity, adjunct, and parent indices) It supports the notion that multiple canonical views of an object may have to be stored in memory to enable its efficient identification. The use of variable fields in the state space vectors appears to keep the number of required nodes in the network down to a tractable number while imposing a semantic value on different areas of the state space. This semantic imposition supports an interface between the analogical aspects of neural networks and the propositional paradigms of symbolic processing.
Wide-Range Motion Estimation Architecture with Dual Search Windows for High Resolution Video Coding
NASA Astrophysics Data System (ADS)
Dung, Lan-Rong; Lin, Meng-Chun
This paper presents a memory-efficient motion estimation (ME) technique for high-resolution video compression. The main objective is to reduce the external memory access, especially for limited local memory resource. The reduction of memory access can successfully save the notorious power consumption. The key to reduce the memory accesses is based on center-biased algorithm in that the center-biased algorithm performs the motion vector (MV) searching with the minimum search data. While considering the data reusability, the proposed dual-search-windowing (DSW) approaches use the secondary windowing as an option per searching necessity. By doing so, the loading of search windows can be alleviated and hence reduce the required external memory bandwidth. The proposed techniques can save up to 81% of external memory bandwidth and require only 135 MBytes/sec, while the quality degradation is less than 0.2dB for 720p HDTV clips coded at 8Mbits/sec.
2015-09-28
the performance of log-and- replay can degrade significantly for VMs configured with multiple virtual CPUs, since the shared memory communication...whether based on checkpoint replication or log-and- replay , existing HA ap- proaches use in- memory backups. The backup VM sits in the memory of a...efficiently. 15. SUBJECT TERMS High-availability virtual machines, live migration, memory and traffic overheads, application suspension, Java
Parallel processing for scientific computations
NASA Technical Reports Server (NTRS)
Alkhatib, Hasan S.
1995-01-01
The scope of this project dealt with the investigation of the requirements to support distributed computing of scientific computations over a cluster of cooperative workstations. Various experiments on computations for the solution of simultaneous linear equations were performed in the early phase of the project to gain experience in the general nature and requirements of scientific applications. A specification of a distributed integrated computing environment, DICE, based on a distributed shared memory communication paradigm has been developed and evaluated. The distributed shared memory model facilitates porting existing parallel algorithms that have been designed for shared memory multiprocessor systems to the new environment. The potential of this new environment is to provide supercomputing capability through the utilization of the aggregate power of workstations cooperating in a cluster interconnected via a local area network. Workstations, generally, do not have the computing power to tackle complex scientific applications, making them primarily useful for visualization, data reduction, and filtering as far as complex scientific applications are concerned. There is a tremendous amount of computing power that is left unused in a network of workstations. Very often a workstation is simply sitting idle on a desk. A set of tools can be developed to take advantage of this potential computing power to create a platform suitable for large scientific computations. The integration of several workstations into a logical cluster of distributed, cooperative, computing stations presents an alternative to shared memory multiprocessor systems. In this project we designed and evaluated such a system.
Agerskov, Claus
2016-04-01
A neural network model is presented of novelty detection in the CA1 subdomain of the hippocampal formation from the perspective of information flow. This computational model is restricted on several levels by both anatomical information about hippocampal circuitry and behavioral data from studies done in rats. Several studies report that the CA1 area broadcasts a generalized novelty signal in response to changes in the environment. Using the neural engineering framework developed by Eliasmith et al., a spiking neural network architecture is created that is able to compare high-dimensional vectors, symbolizing semantic information, according to the semantic pointer hypothesis. This model then computes the similarity between the vectors, as both direct inputs and a recalled memory from a long-term memory network by performing the dot-product operation in a novelty neural network architecture. The developed CA1 model agrees with available neuroanatomical data, as well as the presented behavioral data, and so it is a biologically realistic model of novelty detection in the hippocampus, which can provide a feasible explanation for experimentally observed dynamics.
Implementing Shared Memory Parallelism in MCBEND
NASA Astrophysics Data System (ADS)
Bird, Adam; Long, David; Dobson, Geoff
2017-09-01
MCBEND is a general purpose radiation transport Monte Carlo code from AMEC Foster Wheelers's ANSWERS® Software Service. MCBEND is well established in the UK shielding community for radiation shielding and dosimetry assessments. The existing MCBEND parallel capability effectively involves running the same calculation on many processors. This works very well except when the memory requirements of a model restrict the number of instances of a calculation that will fit on a machine. To more effectively utilise parallel hardware OpenMP has been used to implement shared memory parallelism in MCBEND. This paper describes the reasoning behind the choice of OpenMP, notes some of the challenges of multi-threading an established code such as MCBEND and assesses the performance of the parallel method implemented in MCBEND.
Diffusion theory of decision making in continuous report.
Smith, Philip L
2016-07-01
I present a diffusion model for decision making in continuous report tasks, in which a continuous, circularly distributed, stimulus attribute in working memory is matched to a representation of the attribute in the stimulus display. Memory retrieval is modeled as a 2-dimensional diffusion process with vector-valued drift on a disk, whose bounding circle represents the decision criterion. The direction and magnitude of the drift vector describe the identity of the stimulus and the quality of its representation in memory, respectively. The point at which the diffusion exits the disk determines the reported value of the attribute and the time to exit the disk determines the decision time. Expressions for the joint distribution of decision times and report outcomes are obtained by means of the Girsanov change-of-measure theorem, which allows the properties of the nonzero-drift diffusion process to be characterized as a function of a Euclidian-distance Bessel process. Predicted report precision is equal to the product of the decision criterion and the drift magnitude and follows a von Mises distribution, in agreement with the treatment of precision in the working memory literature. Trial-to-trial variability in criterion and drift rate leads, respectively, to direct and inverse relationships between report accuracy and decision times, in agreement with, and generalizing, the standard diffusion model of 2-choice decisions. The 2-dimensional model provides a process account of working memory precision and its relationship with the diffusion model, and a new way to investigate the properties of working memory, via the distributions of decision times. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wu, Fangzhen; Wang, Huanhuan; Raghothamachar, Balaji
A new method has been developed to determine the fault vectors associated with stacking faults in 4H-SiC from their stacking sequences observed on high resolution TEM images. This method, analogous to the Burgers circuit technique for determination of dislocation Burgers vector, involves determination of the vectors required in the projection of the perfect lattice to correct the deviated path constructed in the faulted material. Results for several different stacking faults were compared with fault vectors determined from X-ray topographic contrast analysis and were found to be consistent. This technique is expected to applicable to all structures comprising corner shared tetrahedra.
Image coding using entropy-constrained residual vector quantization
NASA Technical Reports Server (NTRS)
Kossentini, Faouzi; Smith, Mark J. T.; Barnes, Christopher F.
1993-01-01
The residual vector quantization (RVQ) structure is exploited to produce a variable length codeword RVQ. Necessary conditions for the optimality of this RVQ are presented, and a new entropy-constrained RVQ (ECRVQ) design algorithm is shown to be very effective in designing RVQ codebooks over a wide range of bit rates and vector sizes. The new EC-RVQ has several important advantages. It can outperform entropy-constrained VQ (ECVQ) in terms of peak signal-to-noise ratio (PSNR), memory, and computation requirements. It can also be used to design high rate codebooks and codebooks with relatively large vector sizes. Experimental results indicate that when the new EC-RVQ is applied to image coding, very high quality is achieved at relatively low bit rates.
Effects of Aging on True and False Memory Formation: An fMRI Study
ERIC Educational Resources Information Center
Dennis, Nancy A.; Kim, Hongkeun; Cabeza, Roberto
2007-01-01
Compared to young, older adults are more likely to forget events that occurred in the past as well as remember events that never happened. Previous studies examining false memories and aging have shown that these memories are more likely to occur when new items share perceptual or semantic similarities with those presented during encoding. It is…
Ad Hoc Categories and False Memories: Memory Illusions for Categories Created On-The-Spot
ERIC Educational Resources Information Center
Soro, Jerônimo C.; Ferreira, Mário B.; Semin, Gün R.; Mata, André; Carneiro, Paula
2017-01-01
Three experiments were designed to test whether experimentally created ad hoc associative networks evoke false memories. We used the DRM (Deese, Roediger, McDermott) paradigm with lists of ad hoc categories composed of exemplars aggregated toward specific goals (e.g., going for a picnic) that do not share any consistent set of features. Experiment…
Austin, John R
2003-10-01
Previous research on transactive memory has found a positive relationship between transactive memory system development and group performance in single project laboratory and ad hoc groups. Closely related research on shared mental models and expertise recognition supports these findings. In this study, the author examined the relationship between transactive memory systems and performance in mature, continuing groups. A group's transactive memory system, measured as a combination of knowledge stock, knowledge specialization, transactive memory consensus, and transactive memory accuracy, is positively related to group goal performance, external group evaluations, and internal group evaluations. The positive relationship with group performance was found to hold for both task and external relationship transactive memory systems.
Virtual memory support for distributed computing environments using a shared data object model
NASA Astrophysics Data System (ADS)
Huang, F.; Bacon, J.; Mapp, G.
1995-12-01
Conventional storage management systems provide one interface for accessing memory segments and another for accessing secondary storage objects. This hinders application programming and affects overall system performance due to mandatory data copying and user/kernel boundary crossings, which in the microkernel case may involve context switches. Memory-mapping techniques may be used to provide programmers with a unified view of the storage system. This paper extends such techniques to support a shared data object model for distributed computing environments in which good support for coherence and synchronization is essential. The approach is based on a microkernel, typed memory objects, and integrated coherence control. A microkernel architecture is used to support multiple coherence protocols and the addition of new protocols. Memory objects are typed and applications can choose the most suitable protocols for different types of object to avoid protocol mismatch. Low-level coherence control is integrated with high-level concurrency control so that the number of messages required to maintain memory coherence is reduced and system-wide synchronization is realized without severely impacting the system performance. These features together contribute a novel approach to the support for flexible coherence under application control.
Social Transmission of False Memory in Small Groups and Large Networks.
Maswood, Raeya; Rajaram, Suparna
2018-05-21
Sharing information and memories is a key feature of social interactions, making social contexts important for developing and transmitting accurate memories and also false memories. False memory transmission can have wide-ranging effects, including shaping personal memories of individuals as well as collective memories of a network of people. This paper reviews a collection of key findings and explanations in cognitive research on the transmission of false memories in small groups. It also reviews the emerging experimental work on larger networks and collective false memories. Given the reconstructive nature of memory, the abundance of misinformation in everyday life, and the variety of social structures in which people interact, an understanding of transmission of false memories has both scientific and societal implications. © 2018 Cognitive Science Society, Inc.
Computational mechanics analysis tools for parallel-vector supercomputers
NASA Technical Reports Server (NTRS)
Storaasli, O. O.; Nguyen, D. T.; Baddourah, M. A.; Qin, J.
1993-01-01
Computational algorithms for structural analysis on parallel-vector supercomputers are reviewed. These parallel algorithms, developed by the authors, are for the assembly of structural equations, 'out-of-core' strategies for linear equation solution, massively distributed-memory equation solution, unsymmetric equation solution, general eigen-solution, geometrically nonlinear finite element analysis, design sensitivity analysis for structural dynamics, optimization algorithm and domain decomposition. The source code for many of these algorithms is available from NASA Langley.
MULTI: a shared memory approach to cooperative molecular modeling.
Darden, T; Johnson, P; Smith, H
1991-03-01
A general purpose molecular modeling system, MULTI, based on the UNIX shared memory and semaphore facilities for interprocess communication is described. In addition to the normal querying or monitoring of geometric data, MULTI also provides processes for manipulating conformations, and for displaying peptide or nucleic acid ribbons, Connolly surfaces, close nonbonded contacts, crystal-symmetry related images, least-squares superpositions, and so forth. This paper outlines the basic techniques used in MULTI to ensure cooperation among these specialized processes, and then describes how they can work together to provide a flexible modeling environment.
SMT-Aware Instantaneous Footprint Optimization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Roy, Probir; Liu, Xu; Song, Shuaiwen
Modern architectures employ simultaneous multithreading (SMT) to increase thread-level parallelism. SMT threads share many functional units and the whole memory hierarchy of a physical core. Without a careful code design, SMT threads can easily contend with each other for these shared resources, causing severe performance degradation. Minimizing SMT thread contention for HPC applications running on dedicated platforms is very challenging, because they usually spawn threads within Single Program Multiple Data (SPMD) models. To address this important issue, we introduce a simple scheme for SMT-aware code optimization, which aims to reduce the memory contention across SMT threads.
A Massively Parallel Code for Polarization Calculations
NASA Astrophysics Data System (ADS)
Akiyama, Shizuka; Höflich, Peter
2001-03-01
We present an implementation of our Monte-Carlo radiation transport method for rapidly expanding, NLTE atmospheres for massively parallel computers which utilizes both the distributed and shared memory models. This allows us to take full advantage of the fast communication and low latency inherent to nodes with multiple CPUs, and to stretch the limits of scalability with the number of nodes compared to a version which is based on the shared memory model. Test calculations on a local 20-node Beowulf cluster with dual CPUs showed an improved scalability by about 40%.
Performance Analysis of Multilevel Parallel Applications on Shared Memory Architectures
NASA Technical Reports Server (NTRS)
Jost, Gabriele; Jin, Haoqiang; Labarta, Jesus; Gimenez, Judit; Caubet, Jordi; Biegel, Bryan A. (Technical Monitor)
2002-01-01
In this paper we describe how to apply powerful performance analysis techniques to understand the behavior of multilevel parallel applications. We use the Paraver/OMPItrace performance analysis system for our study. This system consists of two major components: The OMPItrace dynamic instrumentation mechanism, which allows the tracing of processes and threads and the Paraver graphical user interface for inspection and analyses of the generated traces. We describe how to use the system to conduct a detailed comparative study of a benchmark code implemented in five different programming paradigms applicable for shared memory
Cache-based error recovery for shared memory multiprocessor systems
NASA Technical Reports Server (NTRS)
Wu, Kun-Lung; Fuchs, W. Kent; Patel, Janak H.
1989-01-01
A multiprocessor cache-based checkpointing and recovery scheme for of recovering from transient processor errors in a shared-memory multiprocessor with private caches is presented. New implementation techniques that use checkpoint identifiers and recovery stacks to reduce performance degradation in processor utilization during normal execution are examined. This cache-based checkpointing technique prevents rollback propagation, provides for rapid recovery, and can be integrated into standard cache coherence protocols. An analytical model is used to estimate the relative performance of the scheme during normal execution. Extensions that take error latency into account are presented.
Power/Performance Trade-offs of Small Batched LU Based Solvers on GPUs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Villa, Oreste; Fatica, Massimiliano; Gawande, Nitin A.
In this paper we propose and analyze a set of batched linear solvers for small matrices on Graphic Processing Units (GPUs), evaluating the various alternatives depending on the size of the systems to solve. We discuss three different solutions that operate with different level of parallelization and GPU features. The first, exploiting the CUBLAS library, manages matrices of size up to 32x32 and employs Warp level (one matrix, one Warp) parallelism and shared memory. The second works at Thread-block level parallelism (one matrix, one Thread-block), still exploiting shared memory but managing matrices up to 76x76. The third is Thread levelmore » parallel (one matrix, one thread) and can reach sizes up to 128x128, but it does not exploit shared memory and only relies on the high memory bandwidth of the GPU. The first and second solution only support partial pivoting, the third one easily supports partial and full pivoting, making it attractive to problems that require greater numerical stability. We analyze the trade-offs in terms of performance and power consumption as function of the size of the linear systems that are simultaneously solved. We execute the three implementations on a Tesla M2090 (Fermi) and on a Tesla K20 (Kepler).« less
A neuropsychological comparison of obsessive-compulsive disorder and trichotillomania.
Chamberlain, Samuel R; Fineberg, Naomi A; Blackwell, Andrew D; Clark, Luke; Robbins, Trevor W; Sahakian, Barbara J
2007-03-02
Obsessive-compulsive disorder (OCD) and trichotillomania (compulsive hair-pulling) share overlapping co-morbidity, familial transmission, and phenomenology. However, the extent to which these disorders share a common cognitive phenotype has yet to be elucidated using patients without confounding co-morbidities. To compare neurocognitive functioning in co-morbidity-free patients with OCD and trichotillomania, focusing on domains of learning and memory, executive function, affective processing, reflection-impulsivity and decision-making. Twenty patients with OCD, 20 patients with trichotillomania, and 20 matched controls undertook neuropsychological assessment after meeting stringent inclusion criteria. Groups were matched for age, education, verbal IQ, and gender. The OCD and trichotillomania groups were impaired on spatial working memory. Only OCD patients showed additional impairments on executive planning and visual pattern recognition memory, and missed more responses to sad target words than other groups on an affective go/no-go task. Furthermore, OCD patients failed to modulate their behaviour between conditions on the reflection-impulsivity test, suggestive of cognitive inflexibility. Both clinical groups showed intact decision-making and probabilistic reversal learning. OCD and trichotillomania shared overlapping spatial working memory problems, but neuropsychological dysfunction in OCD spanned additional domains that were intact in trichotillomania. Findings are discussed in relation to likely fronto-striatal neural substrates and future research directions.
Comparison between sparsely distributed memory and Hopfield-type neural network models
NASA Technical Reports Server (NTRS)
Keeler, James D.
1986-01-01
The Sparsely Distributed Memory (SDM) model (Kanerva, 1984) is compared to Hopfield-type neural-network models. A mathematical framework for comparing the two is developed, and the capacity of each model is investigated. The capacity of the SDM can be increased independently of the dimension of the stored vectors, whereas the Hopfield capacity is limited to a fraction of this dimension. However, the total number of stored bits per matrix element is the same in the two models, as well as for extended models with higher order interactions. The models are also compared in their ability to store sequences of patterns. The SDM is extended to include time delays so that contextual information can be used to cover sequences. Finally, it is shown how a generalization of the SDM allows storage of correlated input pattern vectors.
A Theoretical Understanding of Circular Polarization Memory in Random Media
NASA Astrophysics Data System (ADS)
Dark, Julia
Radiative transport theory describes the propagation of light in random media that absorb, scatter, and emit radiation. To describe the propagation of light, the full polarization state is quantified using the Stokes parameters. For the sake of mathematical convenience, the polarization state of light is often neglected leading to the scalar radiative transport equation for the intensity only. For scalar transport theory, there is a well-established body of literature on numerical and analytic approximations to the radiative transport equation. We extend the scalar theory to the vector radiative transport equation (vRTE). In particular, we are interested in the theoretical basis for a phenomena called circular polarization memory. Circular polarization memory is the physical phenomena whereby circular polarization retains its ellipticity and handedness when propagating in random media. This is in contrast to the propagation of linear polarization in random media, which depolarizes at a faster rate, and specular reflection of circular polarization, whereby the circular polarization handedness flips. We investigate two limits that are of known interest in the phenomena of circular polarization memory. The first limit we investigate is that of forward-peaked scattering, i.e. the limit where most scattering events occur in the forward or near-forward directions. The second limit we consider is that of strong scattering and weak absorption. In the forward-peaked scattering limit we approximate the vRTE by a system of partial differential equations motivated by the scalar Fokker-Planck approximation. We call the leading order approximation the vector Fokker-Planck approximation. The vector Fokker Planck approximation predicts that strongly forward-peaked media exhibit circular polarization memory where the strength of the effect can be calculated from the expansion of the scattering matrix in special functions. In addition, we find in this limit that total intensity, linear polarization, and circular polarization decouple. From this result we conclude, that in the Fokker-Planck limit the scalar approximation is an appropriate leading order approximation. In the strong scattering and weak absorbing limit the vector radiative transport equation can be analyzed using boundary layer theory. In this case, the problem of light scattering in an optically thick medium is reduced to a 1D vRTE near the boundary and a 3D diffusion equation in the interior. We develop and implement a numerical solver for the boundary layer problem by using a discrete ordinate solver in the boundary layer and a spectral method to solve the diffusion approximation in the interior. We implement the method in Fortran 95 with external dependencies on BLAS, LAPACK, and FFTW. By analyzing the spectrum of the discretized vRTE in the boundary layer, we are able to predict the presence of circular polarization memory in a given medium.
Control of thumb force using surface functional electrical stimulation and muscle load sharing
2013-01-01
Background Stroke survivors often have difficulties in manipulating objects with their affected hand. Thumb control plays an important role in object manipulation. Surface functional electrical stimulation (FES) can assist movement. We aim to control the 2D thumb force by predicting the sum of individual muscle forces, described by a sigmoidal muscle recruitment curve and a single force direction. Methods Five able bodied subjects and five stroke subjects were strapped in a custom built setup. The forces perpendicular to the thumb in response to FES applied to three thumb muscles were measured. We evaluated the feasibility of using recruitment curve based force vector maps in predicting output forces. In addition, we developed a closed loop force controller. Load sharing between the three muscles was used to solve the redundancy problem having three actuators to control forces in two dimensions. The thumb force was controlled towards target forces of 0.5 N and 1.0 N in multiple directions within the individual’s thumb work space. Hereby, the possibilities to use these force vector maps and the load sharing approach in feed forward and feedback force control were explored. Results The force vector prediction of the obtained model had small RMS errors with respect to the actual measured force vectors (0.22±0.17 N for the healthy subjects; 0.17±0.13 N for the stroke subjects). The stroke subjects showed a limited work range due to limited force production of the individual muscles. Performance of feed forward control without feedback, was better in healthy subjects than in stroke subjects. However, when feedback control was added performances were similar between the two groups. Feedback force control lead, especially for the stroke subjects, to a reduction in stationary errors, which improved performance. Conclusions Thumb muscle responses to FES can be described by a single force direction and a sigmoidal recruitment curve. Force in desired direction can be generated through load sharing among redundant muscles. The force vector maps are subject specific and also suitable in feedforward and feedback control taking the individual’s available workspace into account. With feedback, more accurate control of muscle force can be achieved. PMID:24103414
Human Episodic Memory Retrieval Is Accompanied by a Neural Contiguity Effect.
Folkerts, Sarah; Rutishauser, Ueli; Howard, Marc W
2018-04-25
Cognitive psychologists have long hypothesized that experiences are encoded in a temporal context that changes gradually over time. When an episodic memory is retrieved, the state of context is recovered-a jump back in time. We recorded from single units in the medial temporal lobe of epilepsy patients performing an item recognition task. The population vector changed gradually over minutes during presentation of the list. When a probe from the list was remembered with high confidence, the population vector reinstated the temporal context of the original presentation of that probe during study, a neural contiguity effect that provides a possible mechanism for behavioral contiguity effects. This pattern was only observed for well remembered probes; old probes that were not well remembered showed an anti-contiguity effect. These results constitute the first direct evidence that recovery of an episodic memory in humans is associated with retrieval of a gradually changing state of temporal context, a neural "jump back in time" that parallels the act of remembering. SIGNIFICANCE STATEMENT Episodic memory is the ability to relive a specific experience from one's life. For decades, researchers have hypothesized that, unlike other forms of memory that can be described as simple associations between stimuli, episodic memory depends on the recovery of a neural representation of spatiotemporal context. During study of a sequence of stimuli, the brain state of epilepsy patients changed slowly over at least a minute. When the participant remembered a particular event from the list, this gradually changing state was recovered. This provides direct confirmation of the prediction from computational models of episodic memory. The resolution of this point means that the study of episodic memory can focus on the mechanisms by which this representation of spatiotemporal context is maintained and sometimes recovered. Copyright © 2018 the authors 0270-6474/18/384200-12$15.00/0.
Ho, ThienLuan; Oh, Seung-Rohk
2017-01-01
Approximate string matching with k-differences has a number of practical applications, ranging from pattern recognition to computational biology. This paper proposes an efficient memory-access algorithm for parallel approximate string matching with k-differences on Graphics Processing Units (GPUs). In the proposed algorithm, all threads in the same GPUs warp share data using warp-shuffle operation instead of accessing the shared memory. Moreover, we implement the proposed algorithm by exploiting the memory structure of GPUs to optimize its performance. Experiment results for real DNA packages revealed that the performance of the proposed algorithm and its implementation archived up to 122.64 and 1.53 times compared to that of sequential algorithm on CPU and previous parallel approximate string matching algorithm on GPUs, respectively. PMID:29016700
Buhusi, Catalin V; Meck, Warren H
2009-07-12
Individuals time as if using a stopwatch that can be stopped or reset on command. Here, we review behavioural and neurobiological data supporting the time-sharing hypothesis that perceived time depends on the attentional and memory resources allocated to the timing process. Neuroimaging studies in humans suggest that timekeeping tasks engage brain circuits typically involved in attention and working memory. Behavioural, pharmacological, lesion and electrophysiological studies in lower animals support this time-sharing hypothesis. When subjects attend to a second task, or when intruder events are presented, estimated durations are shorter, presumably due to resources being taken away from timing. Here, we extend the time-sharing hypothesis by proposing that resource reallocation is proportional to the perceived contrast, both in temporal and non-temporal features, between intruders and the timed events. New findings support this extension by showing that the effect of an intruder event is dependent on the relative duration of the intruder to the intertrial interval. The conclusion is that the brain circuits engaged by timekeeping comprise not only those primarily involved in time accumulation, but also those involved in the maintenance of attentional and memory resources for timing, and in the monitoring and reallocation of those resources among tasks.
A vectorization of the Hess McDonnell Douglas potential flow program NUED for the STAR-100 computer
NASA Technical Reports Server (NTRS)
Boney, L. R.; Smith, R. E., Jr.
1979-01-01
The computer program NUED for analyzing potential flow about arbitrary three dimensional lifting bodies using the panel method was modified to use vector operations and run on the STAR-100 computer. A high speed of computation and ability to approximate the body surface with a large number of panels are characteristics of NUEDV. The new program shows that vector operations can be readily implemented in programs of this type to increase the computational speed on the STAR-100 computer. The virtual memory architecture of the STAR-100 facilitates the use of large numbers of panels to approximate the body surface.
1991-01-01
visual and three-layer connectionist network, in that the input layer of memory processing is serial, and is likely to represent each module is... Selective attention gates visual University Press. processing in the extrastnate cortex. Science, 229:782-784. Treasman, A.M. (1985). Preartentive...AD-A242 225 A CONNECTIONIST SIMULATION OF ATTENTION AND VECTOR COMPARISON: THE NEED FOR SERIAL PROCESSING IN PARALLEL HARDWARE Technical Report AlP
Multiprocessing MCNP on an IBM RS/6000 cluster
DOE Office of Scientific and Technical Information (OSTI.GOV)
McKinney, G.W.; West, J.T.
1993-03-01
The advent of high-performance computer systems has brought to maturity programming concepts like vectorization, multiprocessing, and multitasking. While there are many schools of thought as to the most significant factor in obtaining order-of-magnitude increases in performance, such speedup can only be achieved by integrating the computer system and application code. Vectorization leads to faster manipulation of arrays by overlapping instruction CPU cycles. Discrete ordinates codes, which require the solving of large matrices, have proved to be major benefactors of vectorization. Monte Carlo transport, on the other hand, typically contains numerous logic statements and requires extensive redevelopment to benefit from vectorization.more » Multiprocessing and multitasking provide additional CPU cycles via multiple processors. Such systems are generally designed with either common memory access (multitasking) or distributed memory access. In both cases, theoretical speedup, as a function of the number of processors (P) and the fraction of task time that multiprocesses (f), can be formulated using Amdahl`s Law S ((f,P) = 1 f + f/P). However, for most applications this theoretical limit cannot be achieved, due to additional terms not included in Amdahl`s Law. Monte Carlo transport is a natural candidate for multiprocessing, since the particle tracks are generally independent and the precision of the result increases as the square root of the number of particles tracked.« less
Extending the length and time scales of Gram-Schmidt Lyapunov vector computations
NASA Astrophysics Data System (ADS)
Costa, Anthony B.; Green, Jason R.
2013-08-01
Lyapunov vectors have found growing interest recently due to their ability to characterize systems out of thermodynamic equilibrium. The computation of orthogonal Gram-Schmidt vectors requires multiplication and QR decomposition of large matrices, which grow as N2 (with the particle count). This expense has limited such calculations to relatively small systems and short time scales. Here, we detail two implementations of an algorithm for computing Gram-Schmidt vectors. The first is a distributed-memory message-passing method using Scalapack. The second uses the newly-released MAGMA library for GPUs. We compare the performance of both codes for Lennard-Jones fluids from N=100 to 1300 between Intel Nahalem/Infiniband DDR and NVIDIA C2050 architectures. To our best knowledge, these are the largest systems for which the Gram-Schmidt Lyapunov vectors have been computed, and the first time their calculation has been GPU-accelerated. We conclude that Lyapunov vector calculations can be significantly extended in length and time by leveraging the power of GPU-accelerated linear algebra.
NASA Technical Reports Server (NTRS)
Mejzak, R. S.
1980-01-01
The distributed processing concept is defined in terms of control primitives, variables, and structures and their use in performing a decomposed discrete Fourier transform (DET) application function. The design assumes interprocessor communications to be anonymous. In this scheme, all processors can access an entire common database by employing control primitives. Access to selected areas within the common database is random, enforced by a hardware lock, and determined by task and subtask pointers. This enables the number of processors to be varied in the configuration without any modifications to the control structure. Decompositional elements of the DFT application function in terms of tasks and subtasks are also described. The experimental hardware configuration consists of IMSAI 8080 chassis which are independent, 8 bit microcomputer units. These chassis are linked together to form a multiple processing system by means of a shared memory facility. This facility consists of hardware which provides a bus structure to enable up to six microcomputers to be interconnected. It provides polling and arbitration logic so that only one processor has access to shared memory at any one time.
NASA Astrophysics Data System (ADS)
Calafiura, Paolo; Leggett, Charles; Seuster, Rolf; Tsulaia, Vakhtang; Van Gemmeren, Peter
2015-12-01
AthenaMP is a multi-process version of the ATLAS reconstruction, simulation and data analysis framework Athena. By leveraging Linux fork and copy-on-write mechanisms, it allows for sharing of memory pages between event processors running on the same compute node with little to no change in the application code. Originally targeted to optimize the memory footprint of reconstruction jobs, AthenaMP has demonstrated that it can reduce the memory usage of certain configurations of ATLAS production jobs by a factor of 2. AthenaMP has also evolved to become the parallel event-processing core of the recently developed ATLAS infrastructure for fine-grained event processing (Event Service) which allows the running of AthenaMP inside massively parallel distributed applications on hundreds of compute nodes simultaneously. We present the architecture of AthenaMP, various strategies implemented by AthenaMP for scheduling workload to worker processes (for example: Shared Event Queue and Shared Distributor of Event Tokens) and the usage of AthenaMP in the diversity of ATLAS event processing workloads on various computing resources: Grid, opportunistic resources and HPC.
Continuing the search for the engram: examining the mechanism of fear memories.
Josselyn, Sheena A
2010-07-01
The goal of my research is to gain insight using rodent models into the fundamental molecular, cellular and systems that make up the base of memory formation. My work focuses on fear memories. Aberrant fear and/or anxiety may be at the heart of many psychiatric disorders. In this article, I review the results of my research group; these results show that particular neurons in the lateral amygdala, a brain region important for fear, are specifically involved in particular fear memories. We started by showing that the transcription factor CREB (cAMP/Ca(2+) response element binding protein) plays a key role in the formation of fear memories. Next, we used viral vectors to overexpress CREB in a subset of lateral amygdala neurons. This not only facilitated fear memory formation but also "drove" the memory into the neurons with relatively increased CREB function. Finally, we showed that selective ablation of the neurons overexpressing CREB in the lateral amygdala selectively erased the fear memory. These findings are the first to show disruption of a specific memory by disrupting select neurons within a distributed network.
NASA Technical Reports Server (NTRS)
Muellerschoen, R. J.
1988-01-01
A unified method to permute vector-stored upper-triangular diagonal factorized covariance (UD) and vector stored upper-triangular square-root information filter (SRIF) arrays is presented. The method involves cyclical permutation of the rows and columns of the arrays and retriangularization with appropriate square-root-free fast Givens rotations or elementary slow Givens reflections. A minimal amount of computation is performed and only one scratch vector of size N is required, where N is the column dimension of the arrays. To make the method efficient for large SRIF arrays on a virtual memory machine, three additional scratch vectors each of size N are used to avoid expensive paging faults. The method discussed is compared with the methods and routines of Bierman's Estimation Subroutine Library (ESL).
Coman, Alin; Berry, Jessica N
2015-12-01
When speakers selectively retrieve previously learned information, listeners often concurrently, and covertly, retrieve their memories of that information. This concurrent retrieval typically enhances memory for mentioned information (the rehearsal effect) and impairs memory for unmentioned but related information (socially shared retrieval-induced forgetting, SSRIF), relative to memory for unmentioned and unrelated information. Building on research showing that anxiety leads to increased attention to threat-relevant information, we explored whether concurrent retrieval is facilitated in high-anxiety real-world contexts. Participants first learned category-exemplar facts about meningococcal disease. Following a manipulation of perceived risk of infection (low vs. high risk), they listened to a mock radio show in which some of the facts were selectively practiced. Final recall tests showed that the rehearsal effect was equivalent between the two risk conditions, but SSRIF was significantly larger in the high-risk than in the low-risk condition. Thus, the tendency to exaggerate consequences of news events was found to have deleterious consequences. © The Author(s) 2015.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Blocksome, Michael A.; Mamidala, Amith R.
2013-09-03
Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to segments of shared random access memory through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and a segmentmore » of shared memory; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.« less
The Work's Not Over- Roll Up Your Sleeves and Make a Difference!
NASA Astrophysics Data System (ADS)
Sarquis, Mickey
1997-01-01
As my 17-year tenure as the first editor of the Secondary School Chemistry Section draws to a close, John Moore has invited me to share some reflections on my experiences. It's hard for me to believe that this many years have passed; in some ways, it seems like only yesterday that I took on this position. Looking back over my term as Section editor recalls wonderful memories, but it also stimulates me to seek out and take on new challenges as I move into a new phase of involvement in chemical education. In response to John's kind invitation, I'd like to share some of these memories and ideas with you who share my vision of quality chemical education, particularly at the secondary level.
NASA Astrophysics Data System (ADS)
Fonseca, R. A.; Vieira, J.; Fiuza, F.; Davidson, A.; Tsung, F. S.; Mori, W. B.; Silva, L. O.
2013-12-01
A new generation of laser wakefield accelerators (LWFA), supported by the extreme accelerating fields generated in the interaction of PW-Class lasers and underdense targets, promises the production of high quality electron beams in short distances for multiple applications. Achieving this goal will rely heavily on numerical modelling to further understand the underlying physics and identify optimal regimes, but large scale modelling of these scenarios is computationally heavy and requires the efficient use of state-of-the-art petascale supercomputing systems. We discuss the main difficulties involved in running these simulations and the new developments implemented in the OSIRIS framework to address these issues, ranging from multi-dimensional dynamic load balancing and hybrid distributed/shared memory parallelism to the vectorization of the PIC algorithm. We present the results of the OASCR Joule Metric program on the issue of large scale modelling of LWFA, demonstrating speedups of over 1 order of magnitude on the same hardware. Finally, scalability to over ˜106 cores and sustained performance over ˜2 P Flops is demonstrated, opening the way for large scale modelling of LWFA scenarios.
New computing systems and their impact on structural analysis and design
NASA Technical Reports Server (NTRS)
Noor, Ahmed K.
1989-01-01
A review is given of the recent advances in computer technology that are likely to impact structural analysis and design. The computational needs for future structures technology are described. The characteristics of new and projected computing systems are summarized. Advances in programming environments, numerical algorithms, and computational strategies for new computing systems are reviewed, and a novel partitioning strategy is outlined for maximizing the degree of parallelism. The strategy is designed for computers with a shared memory and a small number of powerful processors (or a small number of clusters of medium-range processors). It is based on approximating the response of the structure by a combination of symmetric and antisymmetric response vectors, each obtained using a fraction of the degrees of freedom of the original finite element model. The strategy was implemented on the CRAY X-MP/4 and the Alliant FX/8 computers. For nonlinear dynamic problems on the CRAY X-MP with four CPUs, it resulted in an order of magnitude reduction in total analysis time, compared with the direct analysis on a single-CPU CRAY X-MP machine.
ERIC Educational Resources Information Center
Galilee-Belfer, Mika
2012-01-01
Though many programs for undecided students focus on the "developing purpose" vector, the author argues that putting purpose before competency is putting the cart before the horse. In this article, she shares practical strategies she has used to help her students at the University of Arizona reach competence in understanding the academic world.…
Arousal-biased competition in perception and memory
Mather, Mara; Sutherland, Matthew R.
2010-01-01
Our everyday surroundings besiege us with information. The battle is for a share of our limited attention and memory, with the brain selecting the winners and discarding the losers. Previous research shows that both bottom-up and top-down factors bias competition in favor of high priority stimuli. We propose that arousal during an event increases this bias both in perception and in long-term memory of the event. Arousal-biased competition theory provides specific predictions about when arousal will enhance and when it will impair memory for events, accounting for some puzzling contradictions in the emotional memory literature. PMID:21660127
Adaptive track scheduling to optimize concurrency and vectorization in GeantV
Apostolakis, J.; Bandieramonte, M.; Bitzes, G.; ...
2015-05-22
The GeantV project is focused on the R&D of new particle transport techniques to maximize parallelism on multiple levels, profiting from the use of both SIMD instructions and co-processors for the CPU-intensive calculations specific to this type of applications. In our approach, vectors of tracks belonging to multiple events and matching different locality criteria must be gathered and dispatched to algorithms having vector signatures. While the transport propagates tracks and changes their individual states, data locality becomes harder to maintain. The scheduling policy has to be changed to maintain efficient vectors while keeping an optimal level of concurrency. The modelmore » has complex dynamics requiring tuning the thresholds to switch between the normal regime and special modes, i.e. prioritizing events to allow flushing memory, adding new events in the transport pipeline to boost locality, dynamically adjusting the particle vector size or switching between vector to single track mode when vectorization causes only overhead. Lastly, this work requires a comprehensive study for optimizing these parameters to make the behaviour of the scheduler self-adapting, presenting here its initial results.« less
Barsegyan, Areg; Mackenzie, Scott M.; Kurose, Brian D.; McGaugh, James L.; Roozendaal, Benno
2010-01-01
It is well established that acute administration of adrenocortical hormones enhances the consolidation of memories of emotional experiences and, concurrently, impairs working memory. These different glucocorticoid effects on these two memory functions have generally been considered to be independently regulated processes. Here we report that a glucocorticoid receptor agonist administered into the medial prefrontal cortex (mPFC) of male Sprague-Dawley rats both enhances memory consolidation and impairs working memory. Both memory effects are mediated by activation of a membrane-bound steroid receptor and depend on noradrenergic activity within the mPFC to increase levels of cAMP-dependent protein kinase. These findings provide direct evidence that glucocorticoid effects on both memory consolidation and working memory share a common neural influence within the mPFC. PMID:20810923
Benefits and Costs of Context Reinstatement in Episodic Memory: An ERP Study.
Bramão, Inês; Johansson, Mikael
2017-01-01
This study investigated context-dependent episodic memory retrieval. An influential idea in the memory literature is that performance benefits when the retrieval context overlaps with the original encoding context. However, such memory facilitation may not be driven by the encoding-retrieval overlap per se but by the presence of diagnostic features in the reinstated context that discriminate the target episode from competing episodes. To test this prediction, the encoding-retrieval overlap and the diagnostic value of the context were manipulated in a novel associative recognition memory task. Participants were asked to memorize word pairs presented together with diagnostic (unique) and nondiagnostic (shared) background scenes. At test, participants recognized the word pairs in the presence and absence of the previously encoded contexts. Behavioral data show facilitated memory performance in the presence of the original context but, importantly, only when the context was diagnostic of the target episode. The electrophysiological data reveal an early anterior ERP encoding-retrieval overlap effect that tracks the cost associated with having nondiagnostic contexts present at retrieval, that is, shared by multiple previous episodes, and a later posterior encoding-retrieval overlap effect that reflects facilitated access to the target episode during retrieval in diagnostic contexts. Taken together, our results underscore the importance of the diagnostic value of the context and suggest that context-dependent episodic memory effects are multiple determined.
Brown, Thackery I.; Stern, Chantal E.
2014-01-01
Many life experiences share information with other memories. In order to make decisions based on overlapping memories, we need to distinguish between experiences to determine the appropriate behavior for the current situation. Previous work suggests that the medial temporal lobe (MTL) and medial caudate interact to support the retrieval of overlapping navigational memories in different contexts. The present study used functional magnetic resonance imaging (fMRI) in humans to test the prediction that the MTL and medial caudate play complementary roles in learning novel mazes that cross paths with, and must be distinguished from, previously learned routes. During fMRI scanning, participants navigated virtual routes that were well learned from prior training while also learning new mazes. Critically, some routes learned during scanning shared hallways with those learned during pre-scan training. Overlap between mazes required participants to use contextual cues to select between alternative behaviors. Results demonstrated parahippocampal cortex activity specific for novel spatial cues that distinguish between overlapping routes. The hippocampus and medial caudate were active for learning overlapping spatial memories, and increased their activity for previously learned routes when they became context dependent. Our findings provide novel evidence that the MTL and medial caudate play complementary roles in the learning, updating, and execution of context-dependent navigational behaviors. PMID:23448868
Symbiosis of executive and selective attention in working memory
Vandierendonck, André
2014-01-01
The notion of working memory (WM) was introduced to account for the usage of short-term memory resources by other cognitive tasks such as reasoning, mental arithmetic, language comprehension, and many others. This collaboration between memory and other cognitive tasks can only be achieved by a dedicated WM system that controls task coordination. To that end, WM models include executive control. Nevertheless, other attention control systems may be involved in coordination of memory and cognitive tasks calling on memory resources. The present paper briefly reviews the evidence concerning the role of selective attention in WM activities. A model is proposed in which selective attention control is directly linked to the executive control part of the WM system. The model assumes that apart from storage of declarative information, the system also includes an executive WM module that represents the current task set. Control processes are automatically triggered when particular conditions in these modules are met. As each task set represents the parameter settings and the actions needed to achieve the task goal, it will depend on the specific settings and actions whether selective attention control will have to be shared among the active tasks. Only when such sharing is required, task performance will be affected by the capacity limits of the control system involved. PMID:25152723
Symbiosis of executive and selective attention in working memory.
Vandierendonck, André
2014-01-01
The notion of working memory (WM) was introduced to account for the usage of short-term memory resources by other cognitive tasks such as reasoning, mental arithmetic, language comprehension, and many others. This collaboration between memory and other cognitive tasks can only be achieved by a dedicated WM system that controls task coordination. To that end, WM models include executive control. Nevertheless, other attention control systems may be involved in coordination of memory and cognitive tasks calling on memory resources. The present paper briefly reviews the evidence concerning the role of selective attention in WM activities. A model is proposed in which selective attention control is directly linked to the executive control part of the WM system. The model assumes that apart from storage of declarative information, the system also includes an executive WM module that represents the current task set. Control processes are automatically triggered when particular conditions in these modules are met. As each task set represents the parameter settings and the actions needed to achieve the task goal, it will depend on the specific settings and actions whether selective attention control will have to be shared among the active tasks. Only when such sharing is required, task performance will be affected by the capacity limits of the control system involved.
A collaborative framework for Distributed Privacy-Preserving Support Vector Machine learning.
Que, Jialan; Jiang, Xiaoqian; Ohno-Machado, Lucila
2012-01-01
A Support Vector Machine (SVM) is a popular tool for decision support. The traditional way to build an SVM model is to estimate parameters based on a centralized repository of data. However, in the field of biomedicine, patient data are sometimes stored in local repositories or institutions where they were collected, and may not be easily shared due to privacy concerns. This creates a substantial barrier for researchers to effectively learn from the distributed data using machine learning tools like SVMs. To overcome this difficulty and promote efficient information exchange without sharing sensitive raw data, we developed a Distributed Privacy Preserving Support Vector Machine (DPP-SVM). The DPP-SVM enables privacy-preserving collaborative learning, in which a trusted server integrates "privacy-insensitive" intermediary results. The globally learned model is guaranteed to be exactly the same as learned from combined data. We also provide a free web-service (http://privacy.ucsd.edu:8080/ppsvm/) for multiple participants to collaborate and complete the SVM-learning task in an efficient and privacy-preserving manner.
Dekhtiarenko, Iryna; Ratts, Robert B; Blatnik, Renata; Lee, Lian N; Fischer, Sonja; Borkner, Lisa; Oduro, Jennifer D; Marandu, Thomas F; Hoppe, Stephanie; Ruzsics, Zsolt; Sonnemann, Julia K; Mansouri, Mandana; Meyer, Christine; Lemmermann, Niels A W; Holtappels, Rafaela; Arens, Ramon; Klenerman, Paul; Früh, Klaus; Reddehase, Matthias J; Riemer, Angelika B; Cicin-Sain, Luka
2016-12-01
Cytomegalovirus (CMV) elicits long-term T-cell immunity of unparalleled strength, which has allowed the development of highly protective CMV-based vaccine vectors. Counterintuitively, experimental vaccines encoding a single MHC-I restricted epitope offered better immune protection than those expressing entire proteins, including the same epitope. To clarify this conundrum, we generated recombinant murine CMVs (MCMVs) encoding well-characterized MHC-I epitopes at different positions within viral genes and observed strong immune responses and protection against viruses and tumor growth when the epitopes were expressed at the protein C-terminus. We used the M45-encoded conventional epitope HGIRNASFI to dissect this phenomenon at the molecular level. A recombinant MCMV expressing HGIRNASFI on the C-terminus of M45, in contrast to wild-type MCMV, enabled peptide processing by the constitutive proteasome, direct antigen presentation, and an inflation of antigen-specific effector memory cells. Consequently, our results indicate that constitutive proteasome processing of antigenic epitopes in latently infected cells is required for robust inflationary responses. This insight allows utilizing the epitope positioning in the design of CMV-based vectors as a novel strategy for enhancing their efficacy.
Blatnik, Renata; Lee, Lian N.; Fischer, Sonja; Borkner, Lisa; Oduro, Jennifer D.; Marandu, Thomas F.; Hoppe, Stephanie; Ruzsics, Zsolt; Sonnemann, Julia K.; Meyer, Christine; Holtappels, Rafaela; Arens, Ramon; Früh, Klaus; Reddehase, Matthias J.; Riemer, Angelika B.; Cicin-Sain, Luka
2016-01-01
Cytomegalovirus (CMV) elicits long-term T-cell immunity of unparalleled strength, which has allowed the development of highly protective CMV-based vaccine vectors. Counterintuitively, experimental vaccines encoding a single MHC-I restricted epitope offered better immune protection than those expressing entire proteins, including the same epitope. To clarify this conundrum, we generated recombinant murine CMVs (MCMVs) encoding well-characterized MHC-I epitopes at different positions within viral genes and observed strong immune responses and protection against viruses and tumor growth when the epitopes were expressed at the protein C-terminus. We used the M45-encoded conventional epitope HGIRNASFI to dissect this phenomenon at the molecular level. A recombinant MCMV expressing HGIRNASFI on the C-terminus of M45, in contrast to wild-type MCMV, enabled peptide processing by the constitutive proteasome, direct antigen presentation, and an inflation of antigen-specific effector memory cells. Consequently, our results indicate that constitutive proteasome processing of antigenic epitopes in latently infected cells is required for robust inflationary responses. This insight allows utilizing the epitope positioning in the design of CMV-based vectors as a novel strategy for enhancing their efficacy. PMID:27977791
Alea, Nicole
2010-02-01
Two separate studies examined the prevalence and quality of silent (infrequently recalled), socially silent (i.e., recalled but not shared), and disclosed autobiographical memories. In Study 1 young and older men and women remembered positive events. Positive memories were more likely to be disclosed than to be kept socially silent or completely silent. However, socially silent and disclosed memories did not differ in memory quality: the memories were equally vivid, significant, and emotional. Silent memories were less qualitatively rich. This pattern of results was generally replicated in Study 2 with a lifespan sample for both positive and negative memories, and with additional qualitative variables. The exception was that negative memories were kept silent more often. Age differences were minimal. Women disclosed their autobiographical memories more, but men told a greater variety of people. Results are discussed in terms of the functions that memory telling and silences might serve for individuals.
Accelerated Modeling and New Ferroelectric Materials for Naval SONAR
2004-06-01
AN other platforms was achieved. As expected, proper into BZ leads to a development of small polarization, vectorization and optimal memory usage were...polarization is due to a combination code was fully vectorized , a speed-up of 9.2 times over of large Ag off-centering and small displacements by the Pentium 4... Xeon and 6.6 times over the SGI 03K was other cations. The large Ag displacements are due to a achieved. We are currently using the X1 in production
An Update on Canine Adenovirus Type 2 and Its Vectors
Bru, Thierry; Salinas, Sara; Kremer, Eric J.
2010-01-01
Adenovirus vectors have significant potential for long- or short-term gene transfer. Preclinical and clinical studies using human derived adenoviruses (HAd) have demonstrated the feasibility of flexible hybrid vector designs, robust expression and induction of protective immunity. However, clinical use of HAd vectors can, under some conditions, be limited by pre-existing vector immunity. Pre-existing humoral and cellular anti-capsid immunity limits the efficacy and duration of transgene expression and is poorly circumvented by injections of larger doses and immuno-suppressing drugs. This review updates canine adenovirus serotype 2 (CAV-2, also known as CAdV-2) biology and gives an overview of the generation of early region 1 (E1)-deleted to helper-dependent (HD) CAV-2 vectors. We also summarize the essential characteristics concerning their interaction with the anti-HAd memory immune responses in humans, the preferential transduction of neurons, and its high level of retrograde axonal transport in the central and peripheral nervous system. CAV-2 vectors are particularly interesting tools to study the pathophysiology and potential treatment of neurodegenerative diseases, as anti-tumoral and anti-viral vaccines, tracer of synaptic junctions, oncolytic virus and as a platform to generate chimeric vectors. PMID:21994722
Personal semantics: at the crossroads of semantic and episodic memory.
Renoult, Louis; Davidson, Patrick S R; Palombo, Daniela J; Moscovitch, Morris; Levine, Brian
2012-11-01
Declarative memory is usually described as consisting of two systems: semantic and episodic memory. Between these two poles, however, may lie a third entity: personal semantics (PS). PS concerns knowledge of one's past. Although typically assumed to be an aspect of semantic memory, it is essentially absent from existing models of knowledge. Furthermore, like episodic memory (EM), PS is idiosyncratically personal (i.e., not culturally-shared). We show that, depending on how it is operationalized, the neural correlates of PS can look more similar to semantic memory, more similar to EM, or dissimilar to both. We consider three different perspectives to better integrate PS into existing models of declarative memory and suggest experimental strategies for disentangling PS from semantic and episodic memory. Copyright © 2012 Elsevier Ltd. All rights reserved.
Computerized scoring algorithms for the Autobiographical Memory Test.
Takano, Keisuke; Gutenbrunner, Charlotte; Martens, Kris; Salmon, Karen; Raes, Filip
2018-02-01
Reduced specificity of autobiographical memories is a hallmark of depressive cognition. Autobiographical memory (AM) specificity is typically measured by the Autobiographical Memory Test (AMT), in which respondents are asked to describe personal memories in response to emotional cue words. Due to this free descriptive responding format, the AMT relies on experts' hand scoring for subsequent statistical analyses. This manual coding potentially impedes research activities in big data analytics such as large epidemiological studies. Here, we propose computerized algorithms to automatically score AM specificity for the Dutch (adult participants) and English (youth participants) versions of the AMT by using natural language processing and machine learning techniques. The algorithms showed reliable performances in discriminating specific and nonspecific (e.g., overgeneralized) autobiographical memories in independent testing data sets (area under the receiver operating characteristic curve > .90). Furthermore, outcome values of the algorithms (i.e., decision values of support vector machines) showed a gradient across similar (e.g., specific and extended memories) and different (e.g., specific memory and semantic associates) categories of AMT responses, suggesting that, for both adults and youth, the algorithms well capture the extent to which a memory has features of specific memories. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Etiological Distinction of Working Memory Components in Relation to Mathematics
Lukowski, Sarah L.; Soden, Brooke; Hart, Sara A.; Thompson, Lee A.; Kovas, Yulia; Petrill, Stephen A.
2014-01-01
Working memory has been consistently associated with mathematics achievement, although the etiology of these relations remains poorly understood. The present study examined the genetic and environmental underpinnings of math story problem solving, timed calculation, and untimed calculation alongside working memory components in 12-year-old monozygotic (n = 105) and same-sex dizygotic (n = 143) twin pairs. Results indicated significant phenotypic correlation between each working memory component and all mathematics outcomes (r = 0.18 – 0.33). Additive genetic influences shared between the visuo-spatial sketchpad and mathematics achievement was significant, accounting for roughly 89% of the observed correlation. In addition, genetic covariance was found between the phonological loop and math story problem solving. In contrast, despite there being a significant observed relationship between phonological loop and timed and untimed calculation, there was no significant genetic or environmental covariance between the phonological loop and timed or untimed calculation skills. Further analyses indicated that genetic overlap between the visuo-spatial sketchpad and math story problem solving and math fluency was distinct from general genetic factors, whereas g, phonological loop, and mathematics shared generalist genes. Thus, although each working memory component was related to mathematics, the etiology of their relationships may be distinct. PMID:25477699
Characteristics of 5M modulated martensite in Ni-Mn-Ga magnetic shape memory alloys
NASA Astrophysics Data System (ADS)
Ćakır, A.; Acet, M.; Righi, L.; Albertini, F.; Farle, M.
2015-09-01
The applicability of the magnetic shape memory effect in Ni-Mn-based martensitic Heusler alloys is closely related to the nature of the crystallographically modulated martensite phase in these materials. We study the properties of modulated phases as a function of temperature and composition in three magnetic shape memory alloys Ni49.8Mn25.0Ga25.2, Ni49.8Mn27.1Ga23.1 and Ni49.5Mn28.6Ga21.9. The effect of substituting Ga for Mn leads to an anisotropic expansion of the lattice, where the b-parameter of the 5M modulated structure increases and the a and c-parameters decrease with increasing Ga concentration. The modulation vector is found to be both temperature and composition dependent. The size of the modulation vector corresponds to an incommensurate structure for Ni49.8Mn25.0Ga25.2 at all temperatures. For the other samples the modulation is incommensurate at low temperatures but reaches a commensurate value of q ≈ 0.400 close to room temperature. The results show that commensurateness of the 5M modulated structure is a special case of incommensurate 5M at a particular temperature.
Test Sequence Priming in Recognition Memory
ERIC Educational Resources Information Center
Johns, Elizabeth E.; Mewhort, D. J. K.
2009-01-01
The authors examined priming within the test sequence in 3 recognition memory experiments. A probe primed its successor whenever both probes shared a feature with the same studied item ("interjacent priming"), indicating that the study item like the probe is central to the decision. Interjacent priming occurred even when the 2 probes did…
Two Maintenance Mechanisms of Verbal Information in Working Memory
ERIC Educational Resources Information Center
Camos, V.; Lagner, P.; Barrouillet, P.
2009-01-01
The present study evaluated the interplay between two mechanisms of maintenance of verbal information in working memory, namely articulatory rehearsal as described in Baddeley's model, and attentional refreshing as postulated in Barrouillet and Camos's Time-Based Resource-Sharing (TBRS) model. In four experiments using complex span paradigm, we…
Down Memory Lane: Recollections of Lamaze International's First 50 Years
Zwelling, Elaine
2010-01-01
The 42-year involvement of one member of Lamaze International is chronicled through a decade-by-decade review of personal memories. The history of Lamaze International is shared through the recollections of her roles as a childbirth educator, faculty member, and member of the board of directors. PMID:21629385
Paul Ricoeur, Memory, and the Historical Gaze: Implications for Education Histories
ERIC Educational Resources Information Center
Colby, Sherri Rae
2012-01-01
In this article, the author shares the potential applications of Paul Ricoeur's philosophies of history, memory, and narrative to the interpretation of educational histories, and those histories' life spans: moving cyclically from early conception, to evidentiary construction, to published dissemination; and ultimately to death or immortality. Her…
1988-02-29
by memory copyin g will degrade system performance on shared-memory multiprocessors. Virtual memor y (VM) remapping, as opposed to memory copying...Bershad, G.D. Giuseppe Facchetti, Kevin Fall, G . Scott Graham, Ellen Nelson , P. Venkat Rangan, Bruno Sartirana, Shin-Yuan Tzou, Raj Vaswani, and Robert...Remote Execution in NEST", IEEE Trans. on Software Eng. 13, 8 (August 1987), 905-912. 3. G . T. Almes, A. P. Black, E. Lazowska and J. Noe, "The Eden
Machine Learning Feature Selection for Tuning Memory Page Swapping
2013-09-01
environments we set up. 13 Figure 4.1 Updated Feature Vector List. Features we added to the kernel are anno - tated with “(MLVM...Feb. 1966. [2] P. J . Denning, “The working set model for program behavior,” Communications of the ACM, vol. 11, no. 5, pp. 323–333, May 1968. [3] L. A...8] R. W. Cart and J . L. Hennessy, “WSClock — A simple and effective algorithm for virtual memory management,” M.S. thesis, Dept. Computer Science
Bartonellae are Prevalent and Diverse in Costa Rican Bats and Bat Flies.
Judson, S D; Frank, H K; Hadly, E A
2015-12-01
Species in the bacterial genus, Bartonella, can cause disease in both humans and animals. Previous reports of Bartonella in bats and ectoparasitic bat flies suggest that bats could serve as mammalian hosts and bat flies as arthropod vectors. We compared the prevalence and genetic similarity of bartonellae in individual Costa Rican bats and their bat flies using molecular and sequencing methods targeting the citrate synthase gene (gltA). Bartonellae were more prevalent in bat flies than in bats, and genetic variants were sometimes, but not always, shared between bats and their bat flies. The detected bartonellae genetic variants were diverse, and some were similar to species known to cause disease in humans and other mammals. The high prevalence and sharing of bartonellae in bat flies and bats support a role for bat flies as a potential vector for Bartonella, while the genetic diversity and similarity to known species suggest that bartonellae could spill over into humans and animals sharing the landscape. © 2015 Blackwell Verlag GmbH.
NASA Astrophysics Data System (ADS)
Shi, X.; Utada, H.; Jiaying, W.
2009-12-01
The vector finite-element method combined with divergence corrections based on the magnetic field H, referred to as VFEH++ method, is developed to simulate the magnetotelluric (MT) responses of 3-D conductivity models. The advantages of the new VFEH++ method are the use of edge-elements to eliminate the vector parasites and the divergence corrections to explicitly guarantee the divergence-free conditions in the whole modeling domain. 3-D MT topographic responses are modeling using the new VFEH++ method, and are compared with those calculated by other numerical methods. The results show that MT responses can be modeled highly accurate using the VFEH+ +method. The VFEH++ algorithm is also employed for the 3-D MT data inversion incorporating topography. The 3-D MT inverse problem is formulated as a minimization problem of the regularized misfit function. In order to avoid the huge memory requirement and very long time for computing the Jacobian sensitivity matrix for Gauss-Newton method, we employ the conjugate gradient (CG) approach to solve the inversion equation. In each iteration of CG algorithm, the cost computation is the product of the Jacobian sensitivity matrix with a model vector x or its transpose with a data vector y, which can be transformed into two pseudo-forwarding modeling. This avoids the full explicitly Jacobian matrix calculation and storage which leads to considerable savings in the memory required by the inversion program in PC computer. The performance of CG algorithm will be illustrated by several typical 3-D models with horizontal earth surface and topographic surfaces. The results show that the VFEH++ and CG algorithms can be effectively employed to 3-D MT field data inversion.
Multiclass Reduced-Set Support Vector Machines
NASA Technical Reports Server (NTRS)
Tang, Benyang; Mazzoni, Dominic
2006-01-01
There are well-established methods for reducing the number of support vectors in a trained binary support vector machine, often with minimal impact on accuracy. We show how reduced-set methods can be applied to multiclass SVMs made up of several binary SVMs, with significantly better results than reducing each binary SVM independently. Our approach is based on Burges' approach that constructs each reduced-set vector as the pre-image of a vector in kernel space, but we extend this by recomputing the SVM weights and bias optimally using the original SVM objective function. This leads to greater accuracy for a binary reduced-set SVM, and also allows vectors to be 'shared' between multiple binary SVMs for greater multiclass accuracy with fewer reduced-set vectors. We also propose computing pre-images using differential evolution, which we have found to be more robust than gradient descent alone. We show experimental results on a variety of problems and find that this new approach is consistently better than previous multiclass reduced-set methods, sometimes with a dramatic difference.
Obstacle-avoiding navigation system
Borenstein, Johann; Koren, Yoram; Levine, Simon P.
1991-01-01
A system for guiding an autonomous or semi-autonomous vehicle through a field of operation having obstacles thereon to be avoided employs a memory for containing data which defines an array of grid cells which correspond to respective subfields in the field of operation of the vehicle. Each grid cell in the memory contains a value which is indicative of the likelihood, or probability, that an obstacle is present in the respectively associated subfield. The values in the grid cells are incremented individually in response to each scan of the subfields, and precomputation and use of a look-up table avoids complex trigonometric functions. A further array of grid cells is fixed with respect to the vehicle form a conceptual active window which overlies the incremented grid cells. Thus, when the cells in the active window overly grid cell having values which are indicative of the presence of obstacles, the value therein is used as a multiplier of the precomputed vectorial values. The resulting plurality of vectorial values are summed vectorially in one embodiment of the invention to produce a virtual composite repulsive vector which is then summed vectorially with a target-directed vector for producing a resultant vector for guiding the vehicle. In an alternative embodiment, a plurality of vectors surrounding the vehicle are computed, each having a value corresponding to obstacle density. In such an embodiment, target location information is used to select between alternative directions of travel having low associated obstacle densities.
Initial Performance Results on IBM POWER6
NASA Technical Reports Server (NTRS)
Saini, Subbash; Talcott, Dale; Jespersen, Dennis; Djomehri, Jahed; Jin, Haoqiang; Mehrotra, Piysuh
2008-01-01
The POWER5+ processor has a faster memory bus than that of the previous generation POWER5 processor (533 MHz vs. 400 MHz), but the measured per-core memory bandwidth of the latter is better than that of the former (5.7 GB/s vs. 4.3 GB/s). The reason for this is that in the POWER5+, the two cores on the chip share the L2 cache, L3 cache and memory bus. The memory controller is also on the chip and is shared by the two cores. This serializes the path to memory. For consistently good performance on a wide range of applications, the performance of the processor, the memory subsystem, and the interconnects (both latency and bandwidth) should be balanced. Recognizing this, IBM has designed the Power6 processor so as to avoid the bottlenecks due to the L2 cache, memory controller and buffer chips of the POWER5+. Unlike the POWER5+, each core in the POWER6 has its own L2 cache (4 MB - double that of the Power5+), memory controller and buffer chips. Each core in the POWER6 runs at 4.7 GHz instead of 1.9 GHz in POWER5+. In this paper, we evaluate the performance of a dual-core Power6 based IBM p6-570 system, and we compare its performance with that of a dual-core Power5+ based IBM p575+ system. In this evaluation, we have used the High- Performance Computing Challenge (HPCC) benchmarks, NAS Parallel Benchmarks (NPB), and four real-world applications--three from computational fluid dynamics and one from climate modeling.
NASA Technical Reports Server (NTRS)
Lawson, Gary; Poteat, Michael; Sosonkina, Masha; Baurle, Robert; Hammond, Dana
2016-01-01
In this work, several mini-apps have been created to enhance a real-world application performance, namely the VULCAN code for complex flow analysis developed at the NASA Langley Research Center. These mini-apps explore hybrid parallel programming paradigms with Message Passing Interface (MPI) for distributed memory access and either Shared MPI (SMPI) or OpenMP for shared memory accesses. Performance testing shows that MPI+SMPI yields the best execution performance, while requiring the largest number of code changes. A maximum speedup of 23X was measured for MPI+SMPI, but only 10X was measured for MPI+OpenMP.
Implementation and performance of parallel Prolog interpreter
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wei, S.; Kale, L.V.; Balkrishna, R.
1988-01-01
In this paper, the authors discuss the implementation of a parallel Prolog interpreter on different parallel machines. The implementation is based on the REDUCE--OR process model which exploits both AND and OR parallelism in logic programs. It is machine independent as it runs on top of the chare-kernel--a machine-independent parallel programming system. The authors also give the performance of the interpreter running a diverse set of benchmark pargrams on parallel machines including shared memory systems: an Alliant FX/8, Sequent and a MultiMax, and a non-shared memory systems: Intel iPSC/32 hypercube, in addition to its performance on a multiprocessor simulation system.
Parallel discrete event simulation using shared memory
NASA Technical Reports Server (NTRS)
Reed, Daniel A.; Malony, Allen D.; Mccredie, Bradley D.
1988-01-01
With traditional event-list techniques, evaluating a detailed discrete-event simulation-model can often require hours or even days of computation time. By eliminating the event list and maintaining only sufficient synchronization to ensure causality, parallel simulation can potentially provide speedups that are linear in the numbers of processors. A set of shared-memory experiments, using the Chandy-Misra distributed-simulation algorithm, to simulate networks of queues is presented. Parameters of the study include queueing network topology and routing probabilities, number of processors, and assignment of network nodes to processors. These experiments show that Chandy-Misra distributed simulation is a questionable alternative to sequential-simulation of most queueing network models.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jones, J.P.; Bangs, A.L.; Butler, P.L.
Hetero Helix is a programming environment which simulates shared memory on a heterogeneous network of distributed-memory computers. The machines in the network may vary with respect to their native operating systems and internal representation of numbers. Hetero Helix presents a simple programming model to developers, and also considers the needs of designers, system integrators, and maintainers. The key software technology underlying Hetero Helix is the use of a compiler'' which analyzes the data structures in shared memory and automatically generates code which translates data representations from the format native to each machine into a common format, and vice versa. Themore » design of Hetero Helix was motivated in particular by the requirements of robotics applications. Hetero Helix has been used successfully in an integration effort involving 27 CPUs in a heterogeneous network and a body of software totaling roughly 100,00 lines of code. 25 refs., 6 figs.« less
Extending the length and time scales of Gram–Schmidt Lyapunov vector computations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Costa, Anthony B., E-mail: acosta@northwestern.edu; Green, Jason R., E-mail: jason.green@umb.edu; Department of Chemistry, University of Massachusetts Boston, Boston, MA 02125
Lyapunov vectors have found growing interest recently due to their ability to characterize systems out of thermodynamic equilibrium. The computation of orthogonal Gram–Schmidt vectors requires multiplication and QR decomposition of large matrices, which grow as N{sup 2} (with the particle count). This expense has limited such calculations to relatively small systems and short time scales. Here, we detail two implementations of an algorithm for computing Gram–Schmidt vectors. The first is a distributed-memory message-passing method using Scalapack. The second uses the newly-released MAGMA library for GPUs. We compare the performance of both codes for Lennard–Jones fluids from N=100 to 1300 betweenmore » Intel Nahalem/Infiniband DDR and NVIDIA C2050 architectures. To our best knowledge, these are the largest systems for which the Gram–Schmidt Lyapunov vectors have been computed, and the first time their calculation has been GPU-accelerated. We conclude that Lyapunov vector calculations can be significantly extended in length and time by leveraging the power of GPU-accelerated linear algebra.« less
Olivers, Christian N L; Meijer, Frank; Theeuwes, Jan
2006-10-01
In 7 experiments, the authors explored whether visual attention (the ability to select relevant visual information) and visual working memory (the ability to retain relevant visual information) share the same content representations. The presence of singleton distractors interfered more strongly with a visual search task when it was accompanied by an additional memory task. Singleton distractors interfered even more when they were identical or related to the object held in memory, but only when it was difficult to verbalize the memory content. Furthermore, this content-specific interaction occurred for features that were relevant to the memory task but not for irrelevant features of the same object or for once-remembered objects that could be forgotten. Finally, memory-related distractors attracted more eye movements but did not result in longer fixations. The results demonstrate memory-driven attentional capture on the basis of content-specific representations. Copyright 2006 APA.
Singer, Jefferson A; Blagov, Pavel; Berry, Meredith; Oost, Kathryn M
2013-12-01
An integrative model of narrative identity builds on a dual memory system that draws on episodic memory and a long-term self to generate autobiographical memories. Autobiographical memories related to critical goals in a lifetime period lead to life-story memories, which in turn become self-defining memories when linked to an individual's enduring concerns. Self-defining memories that share repetitive emotion-outcome sequences yield narrative scripts, abstracted templates that filter cognitive-affective processing. The life story is the individual's overarching narrative that provides unity and purpose over the life course. Healthy narrative identity combines memory specificity with adaptive meaning-making to achieve insight and well-being, as demonstrated through a literature review of personality and clinical research, as well as new findings from our own research program. A clinical case study drawing on this narrative identity model is also presented with implications for treatment and research. © 2012 Wiley Periodicals, Inc.
Working Memory in Children: A Time-Constrained Functioning Similar to Adults
ERIC Educational Resources Information Center
Portrat, Sophie; Camos, Valerie; Barrouillet, Pierre
2009-01-01
Within the time-based resource-sharing (TBRS) model, we tested a new conception of the relationships between processing and storage in which the core mechanisms of working memory (WM) are time constrained. However, our previous studies were restricted to adults. The current study aimed at demonstrating that these mechanisms are present and…
Close Associations and Memory in Brainwriting Groups
ERIC Educational Resources Information Center
Coskun, Hamit
2011-01-01
The present experiment examined whether or not the type of associations (close (e.g. apple-pear) and distant (e.g. apple-fish) word associations) and memory instruction (paying attention to the ideas of others) had effects on the idea generation performances in the brainwriting paradigm in which all participants shared their ideas by using paper…
NASA Technical Reports Server (NTRS)
Byrne, F.
1981-01-01
Time-shared interface speeds data processing in distributed computer network. Two-level high-speed scanning approach routes information to buffer, portion of which is reserved for series of "first-in, first-out" memory stacks. Buffer address structure and memory are protected from noise or failed components by error correcting code. System is applicable to any computer or processing language.
Accumulating Evidence about What Prospective Memory Costs Actually Reveal
ERIC Educational Resources Information Center
Strickland, Luke; Heathcote, Andrew; Remington, Roger W.; Loft, Shayne
2017-01-01
Event-based prospective memory (PM) tasks require participants to substitute an atypical PM response for an ongoing task response when presented with PM targets. Responses to ongoing tasks are often slower with the addition of PM demands ("PM costs"). Prominent PM theories attribute costs to capacity-sharing between the ongoing and PM…
How communication goals determine when audience tuning biases memory.
Echterhoff, Gerald; Higgins, E Tory; Kopietz, René; Groll, Stephan
2008-02-01
After tuning their message to suit their audience's attitude, communicators' own memories for the original information (e.g., a target person's behaviors) often reflect the biased view expressed in their message--producing an audience-congruent memory bias. Exploring the motivational circumstances of message production, the authors investigated whether this bias depends on the goals driving audience tuning. In 4 experiments, the memory bias was found to a greater extent when audience tuning served the creation of a shared reality than when it served alternative, nonshared reality goals (being polite toward a stigmatized-group audience; obtaining incentives; being entertaining; complying with a blatant demand). In addition, the authors found that these effects were mediated by the epistemic trust in the audience-congruent view but not by the rehearsal or accurate retrieval of the original input information, the ability to discriminate between the original and the message information, or a contrast away from extremely tuned messages. The central role of epistemic trust, a measure of the communicators' experience of shared reality, was supported in meta-analyses across the experiments. PsycINFO Database Record (c) 2008 APA, all rights reserved.
Viruses vector control proposal: genus Aedes emphasis.
Reis, Nelson Nogueira; Silva, Alcino Lázaro da; Reis, Elma Pereira Guedes; Silva, Flávia Chaves E; Reis, Igor Guedes Nogueira
The dengue fever is a major public health problem in the world. In Brazil, in 2015, there were 1,534,932 cases, being 20,320 cases of severe form, and 811 deaths related to this disease. The distribution of Aedes aegypti, the vector, is extensive. Recently, Zika and Chikungunya viruses had arisen, sharing the same vector as dengue and became a huge public health issue. Without specific treatment, it is urgently required as an effective vector control. This article is focused on reviewing vector control strategies, their effectiveness, viability and economical impact. Among all, the Sterile Insect Technique is highlighted as the best option to be adopted in Brazil, once it is largely effectively used in the USA and Mexico for plagues related to agribusiness. Copyright © 2017 Sociedade Brasileira de Infectologia. Published by Elsevier Editora Ltda. All rights reserved.
Gillard, Geoffrey O.; Bivas-Benita, Maytal; Hovav, Avi-Hai; Grandpre, Lauren E.; Panas, Michael W.; Seaman, Michael S.; Haynes, Barton F.; Letvin, Norman L.
2011-01-01
While immunological memory has long been considered the province of T- and B- lymphocytes, it has recently been reported that innate cell populations are capable of mediating memory responses. We now show that an innate memory immune response is generated in mice following infection with vaccinia virus, a poxvirus for which no cognate germline-encoded receptor has been identified. This immune response results in viral clearance in the absence of classical adaptive T and B lymphocyte populations, and is mediated by a Thy1+ subset of natural killer (NK) cells. We demonstrate that immune protection against infection from a lethal dose of virus can be adoptively transferred with memory hepatic Thy1+ NK cells that were primed with live virus. Our results also indicate that, like classical immunological memory, stronger innate memory responses form in response to priming with live virus than a highly attenuated vector. These results demonstrate that a defined innate memory cell population alone can provide host protection against a lethal systemic infection through viral clearance. PMID:21829360
Shared filtering processes link attentional and visual short-term memory capacity limits.
Bettencourt, Katherine C; Michalka, Samantha W; Somers, David C
2011-09-30
Both visual attention and visual short-term memory (VSTM) have been shown to have capacity limits of 4 ± 1 objects, driving the hypothesis that they share a visual processing buffer. However, these capacity limitations also show strong individual differences, making the degree to which these capacities are related unclear. Moreover, other research has suggested a distinction between attention and VSTM buffers. To explore the degree to which capacity limitations reflect the use of a shared visual processing buffer, we compared individual subject's capacities on attentional and VSTM tasks completed in the same testing session. We used a multiple object tracking (MOT) and a VSTM change detection task, with varying levels of distractors, to measure capacity. Significant correlations in capacity were not observed between the MOT and VSTM tasks when distractor filtering demands differed between the tasks. Instead, significant correlations were seen when the tasks shared spatial filtering demands. Moreover, these filtering demands impacted capacity similarly in both attention and VSTM tasks. These observations fail to support the view that visual attention and VSTM capacity limits result from a shared buffer but instead highlight the role of the resource demands of underlying processes in limiting capacity.
An efficient and portable SIMD algorithm for charge/current deposition in Particle-In-Cell codes
Vincenti, H.; Lobet, M.; Lehe, R.; ...
2016-09-19
In current computer architectures, data movement (from die to network) is by far the most energy consuming part of an algorithm (≈20pJ/word on-die to ≈10,000 pJ/word on the network). To increase memory locality at the hardware level and reduce energy consumption related to data movement, future exascale computers tend to use many-core processors on each compute nodes that will have a reduced clock speed to allow for efficient cooling. To compensate for frequency decrease, machine vendors are making use of long SIMD instruction registers that are able to process multiple data with one arithmetic operator in one clock cycle. SIMD registermore » length is expected to double every four years. As a consequence, Particle-In-Cell (PIC) codes will have to achieve good vectorization to fully take advantage of these upcoming architectures. In this paper, we present a new algorithm that allows for efficient and portable SIMD vectorization of current/charge deposition routines that are, along with the field gathering routines, among the most time consuming parts of the PIC algorithm. Our new algorithm uses a particular data structure that takes into account memory alignment constraints and avoids gather/scat;ter instructions that can significantly affect vectorization performances on current CPUs. The new algorithm was successfully implemented in the 3D skeleton PIC code PICSAR and tested on Haswell Xeon processors (AVX2-256 bits wide data registers). Results show a factor of ×2 to ×2.5 speed-up in double precision for particle shape factor of orders 1–3. The new algorithm can be applied as is on future KNL (Knights Landing) architectures that will include AVX-512 instruction sets with 512 bits register lengths (8 doubles/16 singles). Program summary Program Title: vec_deposition Program Files doi:http://dx.doi.org/10.17632/nh77fv9k8c.1 Licensing provisions: BSD 3-Clause Programming language: Fortran 90 External routines/libraries: OpenMP > 4.0 Nature of problem: Exascale architectures will have many-core processors per node with long vector data registers capable of performing one single instruction on multiple data during one clock cycle. Data register lengths are expected to double every four years and this pushes for new portable solutions for efficiently vectorizing Particle-In-Cell codes on these future many-core architectures. One of the main hotspot routines of the PIC algorithm is the current/charge deposition for which there is no efficient and portable vector algorithm. Solution method: Here we provide an efficient and portable vector algorithm of current/charge deposition routines that uses a new data structure, which significantly reduces gather/scatter operations. Vectorization is controlled using OpenMP 4.0 compiler directives for vectorization which ensures portability across different architectures. Restrictions: Here we do not provide the full PIC algorithm with an executable but only vector routines for current/charge deposition. These scalar/vector routines can be used as library routines in your 3D Particle-In-Cell code. However, to get the best performances out of vector routines you have to satisfy the two following requirements: (1) Your code should implement particle tiling (as explained in the manuscript) to allow for maximized cache reuse and reduce memory accesses that can hinder vector performances. The routines can be used directly on each particle tile. (2) You should compile your code with a Fortran 90 compiler (e.g Intel, gnu or cray) and provide proper alignment flags and compiler alignment directives (more details in README file).« less
An efficient and portable SIMD algorithm for charge/current deposition in Particle-In-Cell codes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vincenti, H.; Lobet, M.; Lehe, R.
In current computer architectures, data movement (from die to network) is by far the most energy consuming part of an algorithm (≈20pJ/word on-die to ≈10,000 pJ/word on the network). To increase memory locality at the hardware level and reduce energy consumption related to data movement, future exascale computers tend to use many-core processors on each compute nodes that will have a reduced clock speed to allow for efficient cooling. To compensate for frequency decrease, machine vendors are making use of long SIMD instruction registers that are able to process multiple data with one arithmetic operator in one clock cycle. SIMD registermore » length is expected to double every four years. As a consequence, Particle-In-Cell (PIC) codes will have to achieve good vectorization to fully take advantage of these upcoming architectures. In this paper, we present a new algorithm that allows for efficient and portable SIMD vectorization of current/charge deposition routines that are, along with the field gathering routines, among the most time consuming parts of the PIC algorithm. Our new algorithm uses a particular data structure that takes into account memory alignment constraints and avoids gather/scat;ter instructions that can significantly affect vectorization performances on current CPUs. The new algorithm was successfully implemented in the 3D skeleton PIC code PICSAR and tested on Haswell Xeon processors (AVX2-256 bits wide data registers). Results show a factor of ×2 to ×2.5 speed-up in double precision for particle shape factor of orders 1–3. The new algorithm can be applied as is on future KNL (Knights Landing) architectures that will include AVX-512 instruction sets with 512 bits register lengths (8 doubles/16 singles). Program summary Program Title: vec_deposition Program Files doi:http://dx.doi.org/10.17632/nh77fv9k8c.1 Licensing provisions: BSD 3-Clause Programming language: Fortran 90 External routines/libraries: OpenMP > 4.0 Nature of problem: Exascale architectures will have many-core processors per node with long vector data registers capable of performing one single instruction on multiple data during one clock cycle. Data register lengths are expected to double every four years and this pushes for new portable solutions for efficiently vectorizing Particle-In-Cell codes on these future many-core architectures. One of the main hotspot routines of the PIC algorithm is the current/charge deposition for which there is no efficient and portable vector algorithm. Solution method: Here we provide an efficient and portable vector algorithm of current/charge deposition routines that uses a new data structure, which significantly reduces gather/scatter operations. Vectorization is controlled using OpenMP 4.0 compiler directives for vectorization which ensures portability across different architectures. Restrictions: Here we do not provide the full PIC algorithm with an executable but only vector routines for current/charge deposition. These scalar/vector routines can be used as library routines in your 3D Particle-In-Cell code. However, to get the best performances out of vector routines you have to satisfy the two following requirements: (1) Your code should implement particle tiling (as explained in the manuscript) to allow for maximized cache reuse and reduce memory accesses that can hinder vector performances. The routines can be used directly on each particle tile. (2) You should compile your code with a Fortran 90 compiler (e.g Intel, gnu or cray) and provide proper alignment flags and compiler alignment directives (more details in README file).« less
A class Hierarchical, object-oriented approach to virtual memory management
NASA Technical Reports Server (NTRS)
Russo, Vincent F.; Campbell, Roy H.; Johnston, Gary M.
1989-01-01
The Choices family of operating systems exploits class hierarchies and object-oriented programming to facilitate the construction of customized operating systems for shared memory and networked multiprocessors. The software is being used in the Tapestry laboratory to study the performance of algorithms, mechanisms, and policies for parallel systems. Described here are the architectural design and class hierarchy of the Choices virtual memory management system. The software and hardware mechanisms and policies of a virtual memory system implement a memory hierarchy that exploits the trade-off between response times and storage capacities. In Choices, the notion of a memory hierarchy is captured by abstract classes. Concrete subclasses of those abstractions implement a virtual address space, segmentation, paging, physical memory management, secondary storage, and remote (that is, networked) storage. Captured in the notion of a memory hierarchy are classes that represent memory objects. These classes provide a storage mechanism that contains encapsulated data and have methods to read or write the memory object. Each of these classes provides specializations to represent the memory hierarchy.
Finkel, Deborah; Pedersen, Nancy L
2014-01-01
Intraindividual variability (IIV) in reaction time has been related to cognitive decline, but questions remain about the nature of this relationship. Mean and range in movement and decision time for simple reaction time were available from 241 individuals aged 51-86 years at the fifth testing wave of the Swedish Adoption/Twin Study of Aging. Cognitive performance on four factors was also available: verbal, spatial, memory, and speed. Analyses indicated that range in reaction time could be used as an indicator of IIV. Heritability estimates were 35% for mean reaction and 20% for range in reaction. Multivariate analysis indicated that the genetic variance on the memory, speed, and spatial factors is shared with genetic variance for mean or range in reaction time. IIV shares significant genetic variance with fluid ability in late adulthood, over and above and genetic variance shared with mean reaction time.
A Study of Shared-Memory Mutual Exclusion Protocols Using CADP
NASA Astrophysics Data System (ADS)
Mateescu, Radu; Serwe, Wendelin
Mutual exclusion protocols are an essential building block of concurrent systems: indeed, such a protocol is required whenever a shared resource has to be protected against concurrent non-atomic accesses. Hence, many variants of mutual exclusion protocols exist in the shared-memory setting, such as Peterson's or Dekker's well-known protocols. Although the functional correctness of these protocols has been studied extensively, relatively little attention has been paid to their non-functional aspects, such as their performance in the long run. In this paper, we report on experiments with the performance evaluation of mutual exclusion protocols using Interactive Markov Chains. Steady-state analysis provides an additional criterion for comparing protocols, which complements the verification of their functional properties. We also carefully re-examined the functional properties, whose accurate formulation as temporal logic formulas in the action-based setting turns out to be quite involved.
Handling debugger breakpoints in a shared instruction system
Gooding, Thomas Michael; Shok, Richard Michael
2014-01-21
A debugger debugs processes that execute shared instructions so that a breakpoint set for one process will not cause a breakpoint to occur in the other processes. A breakpoint is set by recording the original instruction at the desired location and writing a trap instruction to the shared instructions at that location. When a process encounters the breakpoint, the process passes control to the debugger for breakpoint processing if the breakpoint was set at that location for that process. If the trap was not set at that location for that process, the cacheline containing the trap is copied to a small scratchpad memory, and the virtual memory mappings are changed to translate the virtual address of the cacheline to the scratchpad. The original instruction is then written to replace the trap instruction in the scratchpad, so that process can execute the instructions in the scatchpad thereby avoiding the trap instruction.
An Army dentist in the combat zone during WWII.
Orden, C Q
2001-11-01
It is 60 years since the bombing of Pearl Harbor and the outbreak of World War II for the United States. Some of the men and women who served in the armed forces at the time are willing to share some of their reminiscences with those of us who could not serve for one reason or another or who may not even have been born at the time. One of the dentists who is willing to share some of his memories is Dr. Charles Q. Orden of New York. Unless these people share their memories much history will be lost forever in the next years, and we will all be poorer for it. We sincerely thank Dr. Orden for his offer of information and for allowing us to reproduce Fig. 1 in which he is seen as the dentist using a field dental chair, a foot-powered drill, and with a black dental corpsman as his assistant.
Reder, Lynne M.; Park, Heekyeong; Kieffaber, Paul D.
2009-01-01
There is a popular hypothesis that performance on implicit and explicit memory tasks reflects 2 distinct memory systems. Explicit memory is said to store those experiences that can be consciously recollected, and implicit memory is said to store experiences and affect subsequent behavior but to be unavailable to conscious awareness. Although this division based on awareness is a useful taxonomy for memory tasks, the authors review the evidence that the unconscious character of implicit memory does not necessitate that it be treated as a separate system of human memory. They also argue that some implicit and explicit memory tasks share the same memory representations and that the important distinction is whether the task (implicit or explicit) requires the formation of a new association. The authors review and critique dissociations from the behavioral, amnesia, and neuroimaging literatures that have been advanced in support of separate explicit and implicit memory systems by highlighting contradictory evidence and by illustrating how the data can be accounted for using a simple computational memory model that assumes the same memory representation for those disparate tasks. PMID:19210052
Liakhovetskiĭ, V A; Bobrova, E V; Skopin, G N
2012-01-01
Transposition errors during the reproduction of a hand movement sequence make it possible to receive important information on the internal representation of this sequence in the motor working memory. Analysis of such errors showed that learning to reproduce sequences of the left-hand movements improves the system of positional coding (coding ofpositions), while learning of the right-hand movements improves the system of vector coding (coding of movements). Learning of the right-hand movements after the left-hand performance involved the system of positional coding "imposed" by the left hand. Learning of the left-hand movements after the right-hand performance activated the system of vector coding. Transposition errors during learning to reproduce movement sequences can be explained by neural network using either vector coding or both vector and positional coding.
Conditional Entropy-Constrained Residual VQ with Application to Image Coding
NASA Technical Reports Server (NTRS)
Kossentini, Faouzi; Chung, Wilson C.; Smith, Mark J. T.
1996-01-01
This paper introduces an extension of entropy-constrained residual vector quantization (VQ) where intervector dependencies are exploited. The method, which we call conditional entropy-constrained residual VQ, employs a high-order entropy conditioning strategy that captures local information in the neighboring vectors. When applied to coding images, the proposed method is shown to achieve better rate-distortion performance than that of entropy-constrained residual vector quantization with less computational complexity and lower memory requirements. Moreover, it can be designed to support progressive transmission in a natural way. It is also shown to outperform some of the best predictive and finite-state VQ techniques reported in the literature. This is due partly to the joint optimization between the residual vector quantizer and a high-order conditional entropy coder as well as the efficiency of the multistage residual VQ structure and the dynamic nature of the prediction.
Stanley, Clayton; Byrne, Michael D
2016-12-01
The growth of social media and user-created content on online sites provides unique opportunities to study models of human declarative memory. By framing the task of choosing a hashtag for a tweet and tagging a post on Stack Overflow as a declarative memory retrieval problem, 2 cognitively plausible declarative memory models were applied to millions of posts and tweets and evaluated on how accurately they predict a user's chosen tags. An ACT-R based Bayesian model and a random permutation vector-based model were tested on the large data sets. The results show that past user behavior of tag use is a strong predictor of future behavior. Furthermore, past behavior was successfully incorporated into the random permutation model that previously used only context. Also, ACT-R's attentional weight term was linked to an entropy-weighting natural language processing method used to attenuate high-frequency words (e.g., articles and prepositions). Word order was not found to be a strong predictor of tag use, and the random permutation model performed comparably to the Bayesian model without including word order. This shows that the strength of the random permutation model is not in the ability to represent word order, but rather in the way in which context information is successfully compressed. The results of the large-scale exploration show how the architecture of the 2 memory models can be modified to significantly improve accuracy, and may suggest task-independent general modifications that can help improve model fit to human data in a much wider range of domains. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Zhou, Hong-Jing; Zeng, Chen-Ye; Yang, Ting-Ting; Long, Fang-Yi; Kuang, Xi; Du, Jun-Rong
2018-05-01
Oxidative stress caused by aging aggravates neuropathological changes and cognitive deficits. Klotho, an anti-aging protein, shows an anti-oxidative effect. The aims of the present study were to determine the potential therapeutic effect of klotho in aging-related neuropathological changes and memory impairments in senescence-accelerated mouse prone-8 (SAMP8) mice, and identify the potential mechanism of these neuroprotective effects. A lentivirus was used to deliver and sustain the expression of klotho. The lentiviral vectors were injected into the bilateral lateral ventricles of 7-month-old SAMP8 mice or age-matched SAMR1 mice. Three months later, the Y-maze alternation task and passive avoidance task were used to assess the memory deficits of the mice. In situ hybridization, immunohistochemistry, immunofluorescence, Nissl staining, quantitative real-time PCR and Western blot assays were applied in the following research. Our results showed that 3 months after injection of the lentiviral vectors encoding the full-length klotho gene, the expression of klotho in the brain was significantly increased in 10-month-old SAMP8 mice. This treatment reduced memory deficits, neuronal loss, synaptic damage and 4-HNE levels but increased mitochondrial manganese-superoxide dismutase (Mn-SOD) and catalase (CAT) expression. Moreover, the up-regulation of klotho expression decreased Akt and Forkhead box class O1 (FoxO1) phosphorylation. The present study provides a novel approach for klotho gene therapy and demonstrates that direct up-regulation of klotho in the brain might improve aging-related memory impairments and decrease oxidative stress. The underlying mechanism of this effect likely involves the inhibition of the Akt/FoxO1 pathway. Copyright © 2018 Elsevier Inc. All rights reserved.
Lattice QCD calculation using VPP500
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, Seyong; Ohta, Shigemi
1995-02-01
A new vector parallel supercomputer, Fujitsu VPP500, was installed at RIKEN earlier this year. It consists of 30 vector computers, each with 1.6 GFLOPS peak speed and 256 MB memory, connected by a crossbar switch with 400 MB/s peak data transfer rate each way between any pair of nodes. The authors developed a Fortran lattice QCD simulation code for it. It runs at about 1.1 GFLOPS sustained per node for Metropolis pure-gauge update, and about 0.8 GFLOPS sustained per node for conjugate gradient inversion of staggered fermion matrix.
The ASC Sequoia Programming Model
DOE Office of Scientific and Technical Information (OSTI.GOV)
Seager, M
2008-08-06
In the late 1980's and early 1990's, Lawrence Livermore National Laboratory was deeply engrossed in determining the next generation programming model for the Integrated Design Codes (IDC) beyond vectorization for the Cray 1s series of computers. The vector model, developed in mid 1970's first for the CDC 7600 and later extended from stack based vector operation to memory to memory operations for the Cray 1s, lasted approximately 20 years (See Slide 5). The Cray vector era was deemed an extremely long lived era as it allowed vector codes to be developed over time (the Cray 1s were faster in scalarmore » mode than the CDC 7600) with vector unit utilization increasing incrementally over time. The other attributes of the Cray vector era at LLNL were that we developed, supported and maintained the Operating System (LTSS and later NLTSS), communications protocols (LINCS), Compilers (Civic Fortran77 and Model), operating system tools (e.g., batch system, job control scripting, loaders, debuggers, editors, graphics utilities, you name it) and math and highly machine optimized libraries (e.g., SLATEC, and STACKLIB). Although LTSS was adopted by Cray for early system generations, they later developed COS and UNICOS operating systems and environment on their own. In the late 1970s and early 1980s two trends appeared that made the Cray vector programming model (described above including both the hardware and system software aspects) seem potentially dated and slated for major revision. These trends were the appearance of low cost CMOS microprocessors and their attendant, departmental and mini-computers and later workstations and personal computers. With the wide spread adoption of Unix in the early 1980s, it appeared that LLNL (and the other DOE Labs) would be left out of the mainstream of computing without a rapid transition to these 'Killer Micros' and modern OS and tools environments. The other interesting advance in the period is that systems were being developed with multiple 'cores' in them and called Symmetric Multi-Processor or Shared Memory Processor (SMP) systems. The parallel revolution had begun. The Laboratory started a small 'parallel processing project' in 1983 to study the new technology and its application to scientific computing with four people: Tim Axelrod, Pete Eltgroth, Paul Dubois and Mark Seager. Two years later, Eugene Brooks joined the team. This team focused on Unix and 'killer micro' SMPs. Indeed, Eugene Brooks was credited with coming up with the 'Killer Micro' term. After several generations of SMP platforms (e.g., Sequent Balance 8000 with 8 33MHz MC32032s, Allian FX8 with 8 MC68020 and FPGA based Vector Units and finally the BB&N Butterfly with 128 cores), it became apparent to us that the killer micro revolution would indeed take over Crays and that we definitely needed a new programming and systems model. The model developed by Mark Seager and Dale Nielsen focused on both the system aspects (Slide 3) and the code development aspects (Slide 4). Although now succinctly captured in two attached slides, at the time there was tremendous ferment in the research community as to what parallel programming model would emerge, dominate and survive. In addition, we wanted a model that would provide portability between platforms of a single generation but also longevity over multiple--and hopefully--many generations. Only after we developed the 'Livermore Model' and worked it out in considerable detail did it become obvious that what we came up with was the right approach. In a nutshell, the applications programming model of the Livermore Model posited that SMP parallelism would ultimately not scale indefinitely and one would have to bite the bullet and implement MPI parallelism within the Integrated Design Code (IDC). We also had a major emphasis on doing everything in a completely standards based, portable methodology with POSIX/Unix as the target environment. We decided against specialized libraries like STACKLIB for performance, but kept as many general purpose, portable math libraries as were needed by the codes. Third, we assumed that the SMPs in clusters would evolve in time to become more powerful, feature rich and, in particular, offer more cores. Thus, we focused on OpenMP, and POSIX PThreads for programming SMP parallelism. These code porting efforts were lead by Dale Nielsen, A-Division code group leader, and Randy Christensen, B-Division code group leader. Most of the porting effort revolved removing 'Crayisms' in the codes: artifacts of LTSS/NLTSS, Civic compiler extensions beyond Fortran77, IO libraries and dealing with new code control languages (we switched to Perl and later to Python). Adding MPI to the codes was initially problematic and error prone because the programmers used MPI directly and sprinkled the calls throughout the code.« less
Feldman, Steven; Valera-Leon, Carlos; Dechev, Damian
2016-03-01
The vector is a fundamental data structure, which provides constant-time access to a dynamically-resizable range of elements. Currently, there exist no wait-free vectors. The only non-blocking version supports only a subset of the sequential vector API and exhibits significant synchronization overhead caused by supporting opposing operations. Since many applications operate in phases of execution, wherein each phase only a subset of operations are used, this overhead is unnecessary for the majority of the application. To address the limitations of the non-blocking version, we present a new design that is wait-free, supports more of the operations provided by the sequential vector,more » and provides alternative implementations of key operations. These alternatives allow the developer to balance the performance and functionality of the vector as requirements change throughout execution. Compared to the known non-blocking version and the concurrent vector found in Intel’s TBB library, our design outperforms or provides comparable performance in the majority of tested scenarios. Over all tested scenarios, the presented design performs an average of 4.97 times more operations per second than the non-blocking vector and 1.54 more than the TBB vector. In a scenario designed to simulate the filling of a vector, performance improvement increases to 13.38 and 1.16 times. This work presents the first ABA-free non-blocking vector. Finally, unlike the other non-blocking approach, all operations are wait-free and bounds-checked and elements are stored contiguously in memory.« less
Leclerc, Arnaud; Carrington, Tucker
2014-05-07
We propose an iterative method for computing vibrational spectra that significantly reduces the memory cost of calculations. It uses a direct product primitive basis, but does not require storing vectors with as many components as there are product basis functions. Wavefunctions are represented in a basis each of whose functions is a sum of products (SOP) and the factorizable structure of the Hamiltonian is exploited. If the factors of the SOP basis functions are properly chosen, wavefunctions are linear combinations of a small number of SOP basis functions. The SOP basis functions are generated using a shifted block power method. The factors are refined with a rank reduction algorithm to cap the number of terms in a SOP basis function. The ideas are tested on a 20-D model Hamiltonian and a realistic CH3CN (12 dimensional) potential. For the 20-D problem, to use a standard direct product iterative approach one would need to store vectors with about 10(20) components and would hence require about 8 × 10(11) GB. With the approach of this paper only 1 GB of memory is necessary. Results for CH3CN agree well with those of a previous calculation on the same potential.
Gonneaud, Julie; Kalpouzos, Grégoria; Bon, Laetitia; Viader, Fausto; Eustache, Francis; Desgranges, Béatrice
2011-01-01
Prospective memory (PM) is the ability to remember to perform an action at a specific point in the future. Regarded as multidimensional, PM involves several cognitive functions that are known to be impaired in normal aging. In the present study, we set out to investigate the cognitive correlates of PM impairment in normal aging. Manipulating cognitive load, we assessed event- and time-based PM, as well as several cognitive functions, including executive functions, working memory and retrospective episodic memory, in healthy subjects covering the entire adulthood. We found that normal aging was characterized by PM decline in all conditions and that event-based PM was more sensitive to the effects of aging than time-based PM. Whatever the conditions, PM was linked to inhibition and processing speed. However, while event-based PM was mainly mediated by binding and retrospective memory processes, time-based PM was mainly related to inhibition. The only distinction between high- and low-load PM cognitive correlates lays in an additional, but marginal, correlation between updating and the high-load PM condition. The association of distinct cognitive functions, as well as shared mechanisms with event- and time-based PM confirms that each type of PM relies on a different set of processes. PMID:21678154
de la Serna, Elena; Sugranyes, Gisela; Sanchez-Gistau, Vanessa; Rodriguez-Toscano, Elisa; Baeza, Immaculada; Vila, Montserrat; Romero, Soledad; Sanchez-Gutierrez, Teresa; Penzol, Mª José; Moreno, Dolores; Castro-Fornieles, Josefina
2017-05-01
Schizophrenia (SZ) and bipolar disorder (BD) are considered neurobiological disorders which share some clinical, cognitive and neuroimaging characteristics. Studying child and adolescent offspring of patients diagnosed with bipolar disorder (BDoff) or schizophrenia (SZoff) is regarded as a reliable method for investigating early alterations and vulnerability factors for these disorders. This study compares the neuropsychological characteristics of SZoff, BDoff and a community control offspring group (CC) with the aim of examining shared and differential cognitive characteristics among groups. 41 SZoff, 90 BDoff and 107 CC were recruited. They were all assessed with a complete neuropsychological battery which included intelligence quotient, working memory (WM), processing speed, verbal memory and learning, visual memory, executive functions and sustained attention. SZoff and BDoff showed worse performance in some cognitive areas compared with CC. Some of these difficulties (visual memory) were common to both offspring groups, whereas others, such as verbal learning and WM in SZoff or PSI in BDoff, were group-specific. The cognitive difficulties in visual memory shown by both the SZoff and BDoff groups might point to a common endophenotype in the two disorders. Difficulties in other cognitive functions would be specific depending on the family diagnosis. Copyright © 2016 Elsevier B.V. All rights reserved.
Working memory costs of task switching.
Liefooghe, Baptist; Barrouillet, Pierre; Vandierendonck, André; Camos, Valérie
2008-05-01
Although many accounts of task switching emphasize the importance of working memory as a substantial source of the switch cost, there is a lack of evidence demonstrating that task switching actually places additional demands on working memory. The present study addressed this issue by implementing task switching in continuous complex span tasks with strictly controlled time parameters. A series of 4 experiments demonstrate that recall performance decreased as a function of the number of task switches and that the concurrent load of item maintenance had no influence on task switching. These results indicate that task switching induces a cost on working memory functioning. Implications for theories of task switching, working memory, and resource sharing are addressed.
Multithreaded implicitly dealiased convolutions
NASA Astrophysics Data System (ADS)
Roberts, Malcolm; Bowman, John C.
2018-03-01
Implicit dealiasing is a method for computing in-place linear convolutions via fast Fourier transforms that decouples work memory from input data. It offers easier memory management and, for long one-dimensional input sequences, greater efficiency than conventional zero-padding. Furthermore, for convolutions of multidimensional data, the segregation of data and work buffers can be exploited to reduce memory usage and execution time significantly. This is accomplished by processing and discarding data as it is generated, allowing work memory to be reused, for greater data locality and performance. A multithreaded implementation of implicit dealiasing that accepts an arbitrary number of input and output vectors and a general multiplication operator is presented, along with an improved one-dimensional Hermitian convolution that avoids the loop dependency inherent in previous work. An alternate data format that can accommodate a Nyquist mode and enhance cache efficiency is also proposed.
A Collaborative Framework for Distributed Privacy-Preserving Support Vector Machine Learning
Que, Jialan; Jiang, Xiaoqian; Ohno-Machado, Lucila
2012-01-01
A Support Vector Machine (SVM) is a popular tool for decision support. The traditional way to build an SVM model is to estimate parameters based on a centralized repository of data. However, in the field of biomedicine, patient data are sometimes stored in local repositories or institutions where they were collected, and may not be easily shared due to privacy concerns. This creates a substantial barrier for researchers to effectively learn from the distributed data using machine learning tools like SVMs. To overcome this difficulty and promote efficient information exchange without sharing sensitive raw data, we developed a Distributed Privacy Preserving Support Vector Machine (DPP-SVM). The DPP-SVM enables privacy-preserving collaborative learning, in which a trusted server integrates “privacy-insensitive” intermediary results. The globally learned model is guaranteed to be exactly the same as learned from combined data. We also provide a free web-service (http://privacy.ucsd.edu:8080/ppsvm/) for multiple participants to collaborate and complete the SVM-learning task in an efficient and privacy-preserving manner. PMID:23304414
Stark, Felicity C.; McCluskie, Michael J.; Krishnan, Lakshmi
2016-01-01
Homologous prime-boost vaccinations with live vectors typically fail to induce repeated strong CD8+ T cell responses due to the induction of anti-vector immunity, highlighting the need for alternative delivery vehicles. The unique ether lipids of archaea may be constituted into liposomes, archaeosomes, which do not induce anti-carrier responses, making them an ideal candidate for use in repeat vaccination systems. Herein, we evaluated in mice the maximum threshold of antigen-specific CD8+ T cell responses that may be induced by multiple homologous immunizations with ovalbumin (OVA) entrapped in archaeosomes derived from the ether glycerolipids of the archaeon Methanobrevibacter smithii (MS-OVA). Up to three immunizations with MS-OVA administered in optimized intervals (to allow for sufficient resting of the primed cells prior to boosting), induced a potent anti-OVA CD8+ T cell response of up to 45% of all circulating CD8+ T cells. Additional MS-OVA injections did not add any further benefit in increasing the memory of CD8+ T cell frequency. In contrast, OVA expressed by Listeria monocytogenes (LM-OVA), an intracellular bacterial vector failed to evoke a boosting effect after the second injection, resulting in significantly reduced antigen-specific CD8+ T cell frequencies. Furthermore, repeated vaccination with MS-OVA skewed the response increasingly towards an effector memory (CD62low) phenotype. Vaccinated animals were challenged with B16-OVA at late time points after vaccination (+7 months) and were afforded protection compared to control. Therefore, archaeosomes constituted a robust particulate delivery system to unravel the kinetics of CD8+ T cell response induction and memory maintenance and constitute an efficient vaccination regimen optimized for tumor protection. PMID:27869670
Consciousness of Unification: The Mind-Matter Phallacy Bites the Dust
NASA Astrophysics Data System (ADS)
Beichler, James E.
A complete theoretical model of how consciousness arises in neural nets can be developed based on a mixed quantum/classical basis. Both mind and consciousness are multi-leveled scalar and vector electromagnetic complexity patterns, respectively, which emerge within all living organisms through the process of evolution. Like life, the mind and consciousness patterns extend throughout living organisms (bodies), but the neural nets and higher level groupings that distinguish higher levels of consciousness only exist in the brain so mind and consciousness have been traditionally associated with the brain alone. A close study of neurons and neural nets in the brain shows that the microtubules within axons are classical bio-magnetic inductors that emit and absorb electromagnetic pulses from each other. These pulses establish interference patterns that influence the quantized vector potential patterns of interstitial water molecules within the neurons as well as create the coherence within neurons and neural nets that scientists normally associate with more complex memories, thought processes and streams of thought. Memory storage and recall are guided by the microtubules and the actual memory patterns are stored as magnetic vector potential complexity patterns in the points of space at the quantum level occupied by the water molecules. This model also accounts for the plasticity of the brain and implies that mind and consciousness, like life itself, are the result of evolutionary processes. However, consciousness can evolve independent of an organism's birth genetics once it has evolved by normal bottom-up genetic processes and thus force a new type of top-down evolution on living organisms and species as a whole that can be explained by expanding the laws of thermodynamics to include orderly systems.
A comparison of the Cray-2 performance before and after the installation of memory pseudo-banking
NASA Technical Reports Server (NTRS)
Schmickley, Ronald D.; Bailey, David H.
1987-01-01
A suite of 13 large Fortran benchmark codes were run on a Cray-2 configured with memory pseudo-banking circuits, and floating point operation rates were measured for each under a variety of system load configurations. These were compared with similar flop measurements taken on the same system before installation of the pseudo-banking. A useful memory access efficiency parameter was defined and calculated for both sets of performance rates, allowing a crude quantitative measure of the improvement in efficiency due to pseudo-banking. Programs were categorized as either highly scalar (S) or highly vectorized (V) and either memory-intensive or register-intensive, giving 4 categories: S-memory, S-register, V-memory, and V-register. Using flop rates as a simple quantifier of these 4 categories, a scatter plot of efficiency gain vs Mflops roughly illustrates the improvement in floating point processing speed due to pseudo-banking. On the Cray-2 system tested this improvement ranged from 1 percent for S-memory codes to about 12 percent for V-memory codes. No significant gains were made for V-register codes, which was to be expected.
Method of up-front load balancing for local memory parallel processors
NASA Technical Reports Server (NTRS)
Baffes, Paul Thomas (Inventor)
1990-01-01
In a parallel processing computer system with multiple processing units and shared memory, a method is disclosed for uniformly balancing the aggregate computational load in, and utilizing minimal memory by, a network having identical computations to be executed at each connection therein. Read-only and read-write memory are subdivided into a plurality of process sets, which function like artificial processing units. Said plurality of process sets is iteratively merged and reduced to the number of processing units without exceeding the balance load. Said merger is based upon the value of a partition threshold, which is a measure of the memory utilization. The turnaround time and memory savings of the instant method are functions of the number of processing units available and the number of partitions into which the memory is subdivided. Typical results of the preferred embodiment yielded memory savings of from sixty to seventy five percent.
Zeithamova, Dagmar; Dominick, April L; Preston, Alison R
2012-07-12
Memory enables flexible use of past experience to inform new behaviors. Although leading theories hypothesize that this fundamental flexibility results from the formation of integrated memory networks relating multiple experiences, the neural mechanisms that support memory integration are not well understood. Here, we demonstrate that retrieval-mediated learning, whereby prior event details are reinstated during encoding of related experiences, supports participants' ability to infer relationships between distinct events that share content. Furthermore, we show that activation changes in a functionally coupled hippocampal and ventral medial prefrontal cortical circuit track the formation of integrated memories and successful inferential memory performance. These findings characterize the respective roles of these regions in retrieval-mediated learning processes that support relational memory network formation and inferential memory in the human brain. More broadly, these data reveal fundamental mechanisms through which memory representations are constructed into prospectively useful formats. Copyright © 2012 Elsevier Inc. All rights reserved.
Zeithamova, Dagmar; Dominick, April L.; Preston, Alison R.
2012-01-01
SUMMARY Memory enables flexible use of past experience to inform new behaviors. Though leading theories hypothesize that this fundamental flexibility results from the formation of integrated memory networks relating multiple experiences, the neural mechanisms that support memory integration are not well understood. Here, we demonstrate that retrieval-mediated learning, whereby prior event details are reinstated during encoding of related experiences, supports participants’ ability to infer relationships between distinct events that share content. Furthermore, we show that activation changes in a functionally coupled hippocampal and ventral medial prefrontal cortical circuit track the formation of integrated memories and successful inferential memory performance. These findings characterize the respective roles of these regions in retrieval-mediated learning processes that support relational memory network formation and inferential memory in the human brain. More broadly, these data reveal fundamental mechanisms through which memory representations are constructed into prospectively useful formats. PMID:22794270
Jim Thomas: A Collection of Memories
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wong, Pak C.
Jim Thomas, a guest editor and a long-time associate editor of Information Visualization (IVS), died in Richland, WA, on August 6, 2010 due to complications from a brain tumor. His friends and colleagues from around the world have since expressed their sadness and paid tribute to a visionary scientist in multiple public forums. For those who didn't get the chance to know Jim, I share a collection of my own memories of Jim Thomas and memories from some of his colleagues.
Effects of motor congruence on visual working memory.
Quak, Michel; Pecher, Diane; Zeelenberg, Rene
2014-10-01
Grounded-cognition theories suggest that memory shares processing resources with perception and action. The motor system could be used to help memorize visual objects. In two experiments, we tested the hypothesis that people use motor affordances to maintain object representations in working memory. Participants performed a working memory task on photographs of manipulable and nonmanipulable objects. The manipulable objects were objects that required either a precision grip (i.e., small items) or a power grip (i.e., large items) to use. A concurrent motor task that could be congruent or incongruent with the manipulable objects caused no difference in working memory performance relative to nonmanipulable objects. Moreover, the precision- or power-grip motor task did not affect memory performance on small and large items differently. These findings suggest that the motor system plays no part in visual working memory.
NASA Technical Reports Server (NTRS)
Hom, K. W.
1994-01-01
The EM-ANIMATE program is a specialized visualization program that displays and animates the near-field and surface-current solutions obtained from an electromagnetics program, in particular, that from MOM3D (LAR-15074). The EM-ANIMATE program is windows based and contains a user-friendly, graphical interface for setting viewing options, case selection, file manipulation, etc. EM-ANIMATE displays the field and surface-current magnitude as smooth shaded color fields (color contours) ranging from a minimum contour value to a maximum contour value for the fields and surface currents. The program can display either the total electric field or the scattered electric field in either time-harmonic animation mode or in the root mean square (RMS) average mode. The default setting is initially set to the minimum and maximum values within the field and surface current data and can be optionally set by the user. The field and surface-current value are animated by calculating and viewing the solution at user selectable radian time increments between 0 and 2pi. The surface currents can also be displayed in either time-harmonic animation mode or in RMS average mode. In RMS mode, the color contours do not vary with time, but show the constant time averaged field and surface-current magnitude solution. The electric field and surface-current directions can be displayed as scaled vector arrows which have a length proportional to the magnitude at each field grid point or surface node point. These vector properties can be viewed separately or concurrently with the field or surface-current magnitudes. Animation speed is improved by turning off the display of the vector arrows. In RMS modes, the direction vectors are still displayed as varying with time since the time averaged direction vectors would be zero length vectors. Other surface properties can optionally be viewed. These include the surface grid, the resistance value assigned to each element of the grid, and the power dissipation of each element which has an assigned resistance value. The EM-ANIMATE program will accept up to 10 different surface current cases each consisting of up to 20,000 node points and 10,000 triangle definitions and will animate one of these cases. The capability is used to compare surface-current distribution due to various initial excitation directions or electric field orientations. The program can accept up to 50 planes of field data consisting of a grid of 100 by 100 field points. These planes of data are user selectable and can be viewed individually or concurrently. With these preset limits, the program requires 55 megabytes of core memory to run. These limits can be changed in the header files to accommodate the available core memory of an individual workstation. An estimate of memory required can be made as follows: approximate memory in bytes equals (number of nodes times number of surfaces times 14 variables times bytes per word, typically 4 bytes per floating point) plus (number of field planes times number of nodes per plane times 21 variables times bytes per word). This gives the approximate memory size required to store the field and surface-current data. The total memory size is approximately 400,000 bytes plus the data memory size. The animation calculations are performed in real time at any user set time step. For Silicon Graphics Workstations that have multiple processors, this program has been optimized to perform these calculations on multiple processors to increase animation rates. The optimized program uses the SGI PFA (Power FORTRAN Accelerator) library. On single processor machines, the parallelization directives are seen as comments to the program and will have no effect on compilation or execution. EM-ANIMATE is written in FORTRAN 77 for implementation on SGI IRIS workstations running IRIX 3.0 or later. A minimum of 55Mb of RAM is required for execution of this program; however, the code may be modified to accommodate the available memory of an individual workstation. For program execution, twenty-four bit, double-buffered color capability is suggested, but not required. Sample input and output files and a sample executable are provided on the distribution medium. Electronic documentation is provided in PostScript format and in the form of IRIX man pages. The standard distribution medium for EM-ANIMATE is a .25 inch streaming magnetic IRIX tape cartridge in UNIX tar format. EM-ANIMATE is also available as part of a bundled package, COS-10048 that includes MOM3D, an IRIS program that produces electromagnetic near field and surface current solutions. This program was developed in 1993.
ERIC Educational Resources Information Center
Lucas, Heather D.; Taylor, Jason R.; Henson, Richard N.; Paller, Ken A.
2012-01-01
The neural mechanisms that underlie familiarity memory have been extensively investigated, but a consensus understanding remains elusive. Behavioral evidence suggests that familiarity sometimes shares sources with instances of implicit memory known as priming, in that the same increases in processing fluency that give rise to priming can engender…
Forward Association, Backward Association, and the False-Memory Illusion
ERIC Educational Resources Information Center
Brainerd, C. J.; Wright, Ron
2005-01-01
In the Deese-Roediger-McDermott false-memory illusion, forward associative strength (FAS) is unrelated to the strength of the illusion; this is puzzling, because high-FAS lists ought to share more semantic features with critical unpresented words than should low-FAS lists. The authors show that this null result is probably a truncated range…
The Impact of Storage on Processing: How Is Information Maintained in Working Memory?
ERIC Educational Resources Information Center
Vergauwe, Evie; Camos, Valérie; Barrouillet, Pierre
2014-01-01
Working memory is typically defined as a system devoted to the simultaneous maintenance and processing of information. However, the interplay between these 2 functions is still a matter of debate in the literature, with views ranging from complete independence to complete dependence. The time-based resource-sharing model assumes that a central…
ERIC Educational Resources Information Center
Corbalan, Gemma; Kester, Liesbeth; van Merrienboer, Jeroen J. G.
2008-01-01
Complex skill acquisition by performing authentic learning tasks is constrained by limited working memory capacity [Baddeley, A. D. (1992). Working memory. "Science, 255", 556-559]. To prevent cognitive overload, task difficulty and support of each newly selected learning task can be adapted to the learner's competence level and perceived task…
Android Protection Mechanism: A Signed Code Security Mechanism for Smartphone Applications
2011-03-01
status registers, exceptions, endian support, unaligned access support, synchronization primitives , the Jazelle Extension, and saturated integer...supports comprehensive non-blocking shared-memory synchronization primitives that scale for multiple-processor system designs. This is an improvement... synchronization . Memory semaphores can be loaded and altered without interruption because the load and store operations are atomic. Processor
Senior Citizens' Personal Stories...Literacy through Narrative...Sharing the Richness of the Past.
ERIC Educational Resources Information Center
Lineberry, Colleen
Using simple writing strategies, senior citizens at an elder camp workshop collected memories in journals. In some cases, readings were used to trigger memories. The exercise enabled students to make connections between their own life experiences and the life experiences of others. Workshops encouraging participants to tell their own stories for…
Processes and Content of Narrative Identity Development in Adolescence: Gender and Well-Being
ERIC Educational Resources Information Center
McLean, Kate C.; Breen, Andrea V.
2009-01-01
The present study examined narrative identity in adolescence (14-18 years) in terms of narrative content and processes of identity development. Age- and gender-related differences in narrative patterns in turning point memories and gender differences in the content and functions for sharing those memories were examined, as was the relationship…
ERIC Educational Resources Information Center
Burdette, Kimberly
2007-01-01
In this article, the author recalls and shares the first half of her college journey. Her memories do not play back to her in bursts of sounds or colors; friends or lovers; feelings, touches, tastes, or ideas. They play, rather, as silent images of herself that flicker disjointedly across her mind, the lens of her memory having recorded her…
Schools of the Past: A Treasury of Photographs. Fastback 80.
ERIC Educational Resources Information Center
Davis, O. L., Jr.
The experience of schooling in America is recalled through a memory-sharing essay and an album of photographs. The intent of the article is to prompt readers to remember their personal schooling experiences and relate them to the larger framework of national memories. The essay, focusing on schools at the turn of the 20th century, discusses…
Shared Etiology of Phonological Memory and Vocabulary Deficits in School-Age Children
ERIC Educational Resources Information Center
Peterson, Robin L.; Pennington, Bruce F.; Samuelsson, Stefan; Byrne, Brian; Olson, Richard K.
2013-01-01
Purpose: The goal of this study was to investigate the etiologic basis for the association between deficits in phonological memory (PM) and vocabulary in school-age children. Method: Children with deficits in PM or vocabulary were identified within the International Longitudinal Twin Study (ILTS; Samuelsson et al., 2005). The ILTS includes 1,045…
The CA3 Network as a Memory Store for Spatial Representations
ERIC Educational Resources Information Center
Papp, Gergely; Witter, Menno P.; Treves, Alessandro
2007-01-01
Comparative neuroanatomy suggests that the CA3 region of the mammalian hippocampus is directly homologous with the medio-dorsal pallium in birds and reptiles, with which it largely shares the basic organization of primitive cortex. Autoassociative memory models, which are generically applicable to cortical networks, then help assess how well CA3…
ERIC Educational Resources Information Center
Chang, Christine
2010-01-01
In this article, the author shares her memories of Sally Smith, the founder of The Lab School of Washington, where she works as the director of the Occupational Therapy. When the author first met Smith, Smith asked her what brought her to The Lab School at that point in her career. She told Smith that her background was rather eclectic, since she…
ERIC Educational Resources Information Center
Razook, Nim
2009-01-01
The author began teaching at the University of Oklahoma in the late 1970s. In this article, the author shares two memories of those times on campus. The first was looking out his office window and seeing Iranian students marching on campus, shouting, "The Shah is a Fascist Pig." The second memory provoked this paper. It made the author…
ERIC Educational Resources Information Center
Gross, Gwen E.
2008-01-01
In this article, the author shares her experience when she was still a student until she became a superintendent. In her 17th year in the superintendency, the author finds the joys of her work all around her, grateful to be bestowed with the gift of leadership. She shares with colleagues a few especially meaningful moments from her professional…
Neonatal MRI is associated with future cognition and academic achievement in preterm children
Spencer-Smith, Megan; Thompson, Deanne K.; Doyle, Lex W.; Inder, Terrie E.; Anderson, Peter J.; Klingberg, Torkel
2015-01-01
School-age children born preterm are particularly at risk for low mathematical achievement, associated with reduced working memory and number skills. Early identification of preterm children at risk for future impairments using brain markers might assist in referral for early intervention. This study aimed to examine the use of neonatal magnetic resonance imaging measures derived from automated methods (Jacobian maps from deformation-based morphometry; fractional anisotropy maps from diffusion tensor images) to predict skills important for mathematical achievement (working memory, early mathematical skills) at 5 and 7 years in a cohort of preterm children using both univariable (general linear model) and multivariable models (support vector regression). Participants were preterm children born <30 weeks’ gestational age and healthy control children born ≥37 weeks’ gestational age at the Royal Women’s Hospital in Melbourne, Australia between July 2001 and December 2003 and recruited into a prospective longitudinal cohort study. At term-equivalent age ( ±2 weeks) 224 preterm and 46 control infants were recruited for magnetic resonance imaging. Working memory and early mathematics skills were assessed at 5 years (n = 195 preterm; n = 40 controls) and 7 years (n = 197 preterm; n = 43 controls). In the preterm group, results identified localized regions around the insula and putamen in the neonatal Jacobian map that were positively associated with early mathematics at 5 and 7 years (both P < 0.05), even after covarying for important perinatal clinical factors using general linear model but not support vector regression. The neonatal Jacobian map showed the same trend for association with working memory at 7 years (models ranging from P = 0.07 to P = 0.05). Neonatal fractional anisotropy was positively associated with working memory and early mathematics at 5 years (both P < 0.001) even after covarying for clinical factors using support vector regression but not general linear model. These significant relationships were not observed in the control group. In summary, we identified, in the preterm brain, regions around the insula and putamen using neonatal deformation-based morphometry, and brain microstructural organization using neonatal diffusion tensor imaging, associated with skills important for childhood mathematical achievement. Results contribute to the growing evidence for the clinical utility of neonatal magnetic resonance imaging for early identification of preterm infants at risk for childhood cognitive and academic impairment. PMID:26329284
Performance Analysis of the NAS Y-MP Workload
NASA Technical Reports Server (NTRS)
Bergeron, Robert J.; Kutler, Paul (Technical Monitor)
1997-01-01
This paper describes the performance characteristics of the computational workloads on the NAS Cray Y-MP machines, a Y-MP 832 and later a Y-MP 8128. Hardware measurements indicated that the Y-MP workload performance matured over time, ultimately sustaining an average throughput of 0.8 GFLOPS and a vector operation fraction of 87%. The measurements also revealed an operation rate exceeding 1 per clock period, a well-balanced architecture featuring a strong utilization of vector functional units, and an efficient memory organization. Introduction of the larger memory 8128 increased throughput by allowing a more efficient utilization of CPUs. Throughput also depended on the metering of the batch queues; low-idle Saturday workloads required a buffer of small jobs to prevent memory starvation of the CPU. UNICOS required about 7% of total CPU time to service the 832 workloads; this overhead decreased to 5% for the 8128 workloads. While most of the system time went to service I/O requests, efficient scheduling prevented excessive idle due to I/O wait. System measurements disclosed no obvious bottlenecks in the response of the machine and UNICOS to the workloads. In most cases, Cray-provided software tools were- quite sufficient for measuring the performance of both the machine and operating, system.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sitaraman, Hariswaran; Grout, Ray W
This work investigates novel algorithm designs and optimization techniques for restructuring chemistry integrators in zero and multidimensional combustion solvers, which can then be effectively used on the emerging generation of Intel's Many Integrated Core/Xeon Phi processors. These processors offer increased computing performance via large number of lightweight cores at relatively lower clock speeds compared to traditional processors (e.g. Intel Sandybridge/Ivybridge) used in current supercomputers. This style of processor can be productively used for chemistry integrators that form a costly part of computational combustion codes, in spite of their relatively lower clock speeds. Performance commensurate with traditional processors is achieved heremore » through the combination of careful memory layout, exposing multiple levels of fine grain parallelism and through extensive use of vendor supported libraries (Cilk Plus and Math Kernel Libraries). Important optimization techniques for efficient memory usage and vectorization have been identified and quantified. These optimizations resulted in a factor of ~ 3 speed-up using Intel 2013 compiler and ~ 1.5 using Intel 2017 compiler for large chemical mechanisms compared to the unoptimized version on the Intel Xeon Phi. The strategies, especially with respect to memory usage and vectorization, should also be beneficial for general purpose computational fluid dynamics codes.« less
In-situ, In-Memory Stateful Vector Logic Operations based on Voltage Controlled Magnetic Anisotropy.
Jaiswal, Akhilesh; Agrawal, Amogh; Roy, Kaushik
2018-04-10
Recently, the exponential increase in compute requirements demanded by emerging applications like artificial intelligence, Internet of things, etc. have rendered the state-of-art von-Neumann machines inefficient in terms of energy and throughput owing to the well-known von-Neumann bottleneck. A promising approach to mitigate the bottleneck is to do computations as close to the memory units as possible. One extreme possibility is to do in-situ Boolean logic computations by using stateful devices. Stateful devices are those that can act both as a compute engine and storage device, simultaneously. We propose such stateful, vector, in-memory operations using voltage controlled magnetic anisotropy (VCMA) effect in magnetic tunnel junctions (MTJ). Our proposal is based on the well known manufacturable 1-transistor - 1-MTJ bit-cell and does not require any modifications in the bit-cell circuit or the magnetic device. Instead, we leverage the very physics of the VCMA effect to enable stateful computations. Specifically, we exploit the voltage asymmetry of the VCMA effect to construct stateful IMP (implication) gate and use the precessional switching dynamics of the VCMA devices to propose a massively parallel NOT operation. Further, we show that other gates like AND, OR, NAND, NOR, NIMP (complement of implication) can be implemented using multi-cycle operations.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rouet, François-Henry; Li, Xiaoye S.; Ghysels, Pieter
In this paper, we present a distributed-memory library for computations with dense structured matrices. A matrix is considered structured if its off-diagonal blocks can be approximated by a rank-deficient matrix with low numerical rank. Here, we use Hierarchically Semi-Separable (HSS) representations. Such matrices appear in many applications, for example, finite-element methods, boundary element methods, and so on. Exploiting this structure allows for fast solution of linear systems and/or fast computation of matrix-vector products, which are the two main building blocks of matrix computations. The compression algorithm that we use, that computes the HSS form of an input dense matrix, reliesmore » on randomized sampling with a novel adaptive sampling mechanism. We discuss the parallelization of this algorithm and also present the parallelization of structured matrix-vector product, structured factorization, and solution routines. The efficiency of the approach is demonstrated on large problems from different academic and industrial applications, on up to 8,000 cores. Finally, this work is part of a more global effort, the STRUctured Matrices PACKage (STRUMPACK) software package for computations with sparse and dense structured matrices. Hence, although useful on their own right, the routines also represent a step in the direction of a distributed-memory sparse solver.« less
Rouet, François-Henry; Li, Xiaoye S.; Ghysels, Pieter; ...
2016-06-30
In this paper, we present a distributed-memory library for computations with dense structured matrices. A matrix is considered structured if its off-diagonal blocks can be approximated by a rank-deficient matrix with low numerical rank. Here, we use Hierarchically Semi-Separable (HSS) representations. Such matrices appear in many applications, for example, finite-element methods, boundary element methods, and so on. Exploiting this structure allows for fast solution of linear systems and/or fast computation of matrix-vector products, which are the two main building blocks of matrix computations. The compression algorithm that we use, that computes the HSS form of an input dense matrix, reliesmore » on randomized sampling with a novel adaptive sampling mechanism. We discuss the parallelization of this algorithm and also present the parallelization of structured matrix-vector product, structured factorization, and solution routines. The efficiency of the approach is demonstrated on large problems from different academic and industrial applications, on up to 8,000 cores. Finally, this work is part of a more global effort, the STRUctured Matrices PACKage (STRUMPACK) software package for computations with sparse and dense structured matrices. Hence, although useful on their own right, the routines also represent a step in the direction of a distributed-memory sparse solver.« less
Scaling Irregular Applications through Data Aggregation and Software Multithreading
DOE Office of Scientific and Technical Information (OSTI.GOV)
Morari, Alessandro; Tumeo, Antonino; Chavarría-Miranda, Daniel
Bioinformatics, data analytics, semantic databases, knowledge discovery are emerging high performance application areas that exploit dynamic, linked data structures such as graphs, unbalanced trees or unstructured grids. These data structures usually are very large, requiring significantly more memory than available on single shared memory systems. Additionally, these data structures are difficult to partition on distributed memory systems. They also present poor spatial and temporal locality, thus generating unpredictable memory and network accesses. The Partitioned Global Address Space (PGAS) programming model seems suitable for these applications, because it allows using a shared memory abstraction across distributed-memory clusters. However, current PGAS languagesmore » and libraries are built to target regular remote data accesses and block transfers. Furthermore, they usually rely on the Single Program Multiple Data (SPMD) parallel control model, which is not well suited to the fine grained, dynamic and unbalanced parallelism of irregular applications. In this paper we present {\\bf GMT} (Global Memory and Threading library), a custom runtime library that enables efficient execution of irregular applications on commodity clusters. GMT integrates a PGAS data substrate with simple fork/join parallelism and provides automatic load balancing on a per node basis. It implements multi-level aggregation and lightweight multithreading to maximize memory and network bandwidth with fine-grained data accesses and tolerate long data access latencies. A key innovation in the GMT runtime is its thread specialization (workers, helpers and communication threads) that realize the overall functionality. We compare our approach with other PGAS models, such as UPC running using GASNet, and hand-optimized MPI code on a set of typical large-scale irregular applications, demonstrating speedups of an order of magnitude.« less
Ageing-related stereotypes in memory: When the beliefs come true.
Bouazzaoui, Badiâa; Follenfant, Alice; Ric, François; Fay, Séverine; Croizet, Jean-Claude; Atzeni, Thierry; Taconnat, Laurence
2016-01-01
Age-related stereotype concerns culturally shared beliefs about the inevitable decline of memory with age. In this study, stereotype priming and stereotype threat manipulations were used to explore the impact of age-related stereotype on metamemory beliefs and episodic memory performance. Ninety-two older participants who reported the same perceived memory functioning were divided into two groups: a threatened group and a non-threatened group (control). First, the threatened group was primed with an ageing stereotype questionnaire. Then, both groups were administered memory complaints and memory self-efficacy questionnaires to measure metamemory beliefs. Finally, both groups were administered the Logical Memory task to measure episodic memory, for the threatened group the instructions were manipulated to enhance the stereotype threat. Results indicated that the threatened individuals reported more memory complaints and less memory efficacy, and had lower scores than the control group on the logical memory task. A multiple mediation analysis revealed that the stereotype threat effect on the episodic memory performance was mediated by both memory complaints and memory self-efficacy. This study revealed that stereotype threat impacts belief in one's own memory functioning, which in turn impairs episodic memory performance.
MPF: A portable message passing facility for shared memory multiprocessors
NASA Technical Reports Server (NTRS)
Malony, Allen D.; Reed, Daniel A.; Mcguire, Patrick J.
1987-01-01
The design, implementation, and performance evaluation of a message passing facility (MPF) for shared memory multiprocessors are presented. The MPF is based on a message passing model conceptually similar to conversations. Participants (parallel processors) can enter or leave a conversation at any time. The message passing primitives for this model are implemented as a portable library of C function calls. The MPF is currently operational on a Sequent Balance 21000, and several parallel applications were developed and tested. Several simple benchmark programs are presented to establish interprocess communication performance for common patterns of interprocess communication. Finally, performance figures are presented for two parallel applications, linear systems solution, and iterative solution of partial differential equations.
Vadnais, Sarah A; Kibby, Michelle Y; Jagger-Rickels, Audreyana C
2018-01-01
We identified statistical predictors of four processing speed (PS) components in a sample of 151 children with and without attention-deficit/hyperactivity disorder (ADHD). Performance on perceptual speed was predicted by visual attention/short-term memory, whereas incidental learning/psychomotor speed was predicted by verbal working memory. Rapid naming was predictive of each PS component assessed, and inhibition predicted all but one task, suggesting a shared need to identify/retrieve stimuli rapidly and inhibit incorrect responding across PS components. Hence, we found both shared and unique predictors of perceptual, cognitive, and output speed, suggesting more specific terminology should be used in future research on PS in ADHD.
Parallel k-means++ for Multiple Shared-Memory Architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mackey, Patrick S.; Lewis, Robert R.
2016-09-22
In recent years k-means++ has become a popular initialization technique for improved k-means clustering. To date, most of the work done to improve its performance has involved parallelizing algorithms that are only approximations of k-means++. In this paper we present a parallelization of the exact k-means++ algorithm, with a proof of its correctness. We develop implementations for three distinct shared-memory architectures: multicore CPU, high performance GPU, and the massively multithreaded Cray XMT platform. We demonstrate the scalability of the algorithm on each platform. In addition we present a visual approach for showing which platform performed k-means++ the fastest for varyingmore » data sizes.« less
A multiarchitecture parallel-processing development environment
NASA Technical Reports Server (NTRS)
Townsend, Scott; Blech, Richard; Cole, Gary
1993-01-01
A description is given of the hardware and software of a multiprocessor test bed - the second generation Hypercluster system. The Hypercluster architecture consists of a standard hypercube distributed-memory topology, with multiprocessor shared-memory nodes. By using standard, off-the-shelf hardware, the system can be upgraded to use rapidly improving computer technology. The Hypercluster's multiarchitecture nature makes it suitable for researching parallel algorithms in computational field simulation applications (e.g., computational fluid dynamics). The dedicated test-bed environment of the Hypercluster and its custom-built software allows experiments with various parallel-processing concepts such as message passing algorithms, debugging tools, and computational 'steering'. Such research would be difficult, if not impossible, to achieve on shared, commercial systems.
Shared virtual memory and generalized speedup
NASA Technical Reports Server (NTRS)
Sun, Xian-He; Zhu, Jianping
1994-01-01
Generalized speedup is defined as parallel speed over sequential speed. The generalized speedup and its relation with other existing performance metrics, such as traditional speedup, efficiency, scalability, etc., are carefully studied. In terms of the introduced asymptotic speed, it was shown that the difference between the generalized speedup and the traditional speedup lies in the definition of the efficiency of uniprocessor processing, which is a very important issue in shared virtual memory machines. A scientific application was implemented on a KSR-1 parallel computer. Experimental and theoretical results show that the generalized speedup is distinct from the traditional speedup and provides a more reasonable measurement. In the study of different speedups, various causes of superlinear speedup are also presented.
Error recovery in shared memory multiprocessors using private caches
NASA Technical Reports Server (NTRS)
Wu, Kun-Lung; Fuchs, W. Kent; Patel, Janak H.
1990-01-01
The problem of recovering from processor transient faults in shared memory multiprocesses systems is examined. A user-transparent checkpointing and recovery scheme using private caches is presented. Processes can recover from errors due to faulty processors by restarting from the checkpointed computation state. Implementation techniques using checkpoint identifiers and recovery stacks are examined as a means of reducing performance degradation in processor utilization during normal execution. This cache-based checkpointing technique prevents rollback propagation, provides rapid recovery, and can be integrated into standard cache coherence protocols. An analytical model is used to estimate the relative performance of the scheme during normal execution. Extensions to take error latency into account are presented.
Reducing Interprocessor Dependence in Recoverable Distributed Shared Memory
NASA Technical Reports Server (NTRS)
Janssens, Bob; Fuchs, W. Kent
1994-01-01
Checkpointing techniques in parallel systems use dependency tracking and/or message logging to ensure that a system rolls back to a consistent state. Traditional dependency tracking in distributed shared memory (DSM) systems is expensive because of high communication frequency. In this paper we show that, if designed correctly, a DSM system only needs to consider dependencies due to the transfer of blocks of data, resulting in reduced dependency tracking overhead and reduced potential for rollback propagation. We develop an ownership timestamp scheme to tolerate the loss of block state information and develop a passive server model of execution where interactions between processors are considered atomic. With our scheme, dependencies are significantly reduced compared to the traditional message-passing model.
Towards memory-aware services and browsing through lifelogging sensing.
Arcega, Lorena; Font, Jaime; Cetina, Carlos
2013-11-05
Every day we receive lots of information through our senses that is lost forever, because it lacked the strength or the repetition needed to generate a lasting memory. Combining the emerging Internet of Things and lifelogging sensors, we believe it is possible to build up a Digital Memory (Dig-Mem) in order to complement the fallible memory of people. This work shows how to realize the Dig-Mem in terms of interactions, affinities, activities, goals and protocols. We also complement this Dig-Mem with memory-aware services and a Dig-Mem browser. Furthermore, we propose a RFID Tag-Sharing technique to speed up the adoption of Dig-Mem. Experimentation reveals an improvement of the user understanding of Dig-Mem as time passes, compared to natural memories where the level of detail decreases over time.
NASA Astrophysics Data System (ADS)
Suharsono, Agus; Aziza, Auliya; Pramesti, Wara
2017-12-01
Capital markets can be an indicator of the development of a country's economy. The presence of capital markets also encourages investors to trade; therefore investors need information and knowledge of which shares are better. One way of making decisions for short-term investments is the need for modeling to forecast stock prices in the period to come. Issue of stock market-stock integration ASEAN is very important. The problem is that ASEAN does not have much time to implement one market in the economy, so it would be very interesting if there is evidence whether the capital market in the ASEAN region, especially the countries of Indonesia, Malaysia, Philippines, Singapore and Thailand deserve to be integrated or still segmented. Furthermore, it should also be known and proven What kind of integration is happening: what A capital market affects only the market Other capital, or a capital market only Influenced by other capital markets, or a Capital market as well as affecting as well Influenced by other capital markets in one ASEAN region. In this study, it will compare forecasting of Indonesian share price (IHSG) with neighboring countries (ASEAN) including developed and developing countries such as Malaysia (KLSE), Singapore (SGE), Thailand (SETI), Philippines (PSE) to find out which stock country the most superior and influential. These countries are the founders of ASEAN and share price index owners who have close relations with Indonesia in terms of trade, especially exports and imports. Stock price modeling in this research is using multivariate time series analysis that is VAR (Vector Autoregressive) and VECM (Vector Error Correction Modeling). VAR and VECM models not only predict more than one variable but also can see the interrelations between variables with each other. If the assumption of white noise is not met in the VAR modeling, then the cause can be assumed that there is an outlier. With this modeling will be able to know the pattern of relationship or linkage of share prices of each country in ASEAN. The best modeling comparison result of the ASEAN stock price index is VAR.
A Metric to Quantify Shared Visual Attention in Two-Person Teams
NASA Technical Reports Server (NTRS)
Gontar, Patrick; Mulligan, Jeffrey B.
2015-01-01
Introduction: Critical tasks in high-risk environments are often performed by teams, the members of which must work together efficiently. In some situations, the team members may have to work together to solve a particular problem, while in others it may be better for them to divide the work into separate tasks that can be completed in parallel. We hypothesize that these two team strategies can be differentiated on the basis of shared visual attention, measured by gaze tracking. 2) Methods: Gaze recordings were obtained for two-person flight crews flying a high-fidelity simulator (Gontar, Hoermann, 2014). Gaze was categorized with respect to 12 areas of interest (AOIs). We used these data to construct time series of 12 dimensional vectors, with each vector component representing one of the AOIs. At each time step, each vector component was set to 0, except for the one corresponding to the currently fixated AOI, which was set to 1. This time series could then be averaged in time, with the averaging window time (t) as a variable parameter. For example, when we average with a t of one minute, each vector component represents the proportion of time that the corresponding AOI was fixated within the corresponding one minute interval. We then computed the Pearson product-moment correlation coefficient between the gaze proportion vectors for each of the two crew members, at each point in time, resulting in a signal representing the time-varying correlation between gaze behaviors. We determined criteria for concluding correlated gaze behavior using two methods: first, a permutation test was applied to the subjects' data. When one crew member's gaze proportion vector is correlated with a random time sample from the other crewmember's data, a distribution of correlation values is obtained that differs markedly from the distribution obtained from temporally aligned samples. In addition to validating that the gaze tracker was functioning reasonably well, this also allows us to compute probabilities of coordinated behavior for each value of the correlation. As an alternative, we also tabulated distributions of correlation coefficients for synthetic data sets, in which the behavior was modeled as a first-order Markov process, and compared correlation distributions for identical processes with those for disparate processes, allowing us to choose criteria and estimate error rates. 3) Discussion: Our method of gaze correlation is able to measure shared visual attention, and can distinguish between activities involving different instruments. We plan to analyze whether pilots strategies of sharing visual attention can predict performance. Possible measurements of performance include expert ratings from instructors, fuel consumption, total task time, and failure rate. While developed for two-person crews, our approach can be applied to larger groups, using intra-class correlation coefficients instead of the Pearson product-moment correlation.
IoT security with one-time pad secure algorithm based on the double memory technique
NASA Astrophysics Data System (ADS)
Wiśniewski, Remigiusz; Grobelny, Michał; Grobelna, Iwona; Bazydło, Grzegorz
2017-11-01
Secure encryption of data in Internet of Things is especially important as many information is exchanged every day and the number of attack vectors on IoT elements still increases. In the paper a novel symmetric encryption method is proposed. The idea bases on the one-time pad technique. The proposed solution applies double memory concept to secure transmitted data. The presented algorithm is considered as a part of communication protocol and it has been initially validated against known security issues.
Quasi-dynamic Earthquake Cycle Simulation in a Viscoelastic Medium with Memory Variables
NASA Astrophysics Data System (ADS)
Hirahara, K.; Ohtani, M.; Shikakura, Y.
2011-12-01
Earthquake cycle simulations based on rate and state friction laws have successfully reproduced the observed complex earthquake cycles at subduction zones. Most of simulations have assumed elastic media. The lower crust and the upper mantle have, however, viscoelastic properties, which cause postseismic stress relaxation. Hence the slip evolution on the plate interfaces or the faults in long earthquake cycles is different from that in elastic media. Especially, the viscoelasticity plays an important role in the interactive occurrence of inland and great interplate earthquakes. In viscoelastic media, the stress is usually calculated by the temporal convolution of the slip response function matrix and the slip deficit rate vector, which needs the past history of slip rates at all cells. Even if properly truncating the convolution, it requires huge computations. This is why few simulation studies have considered viscoelastic media so far. In this study, we examine the method using memory variables or anelastic functions, which has been developed for the time-domain finite-difference calculation of seismic waves in a dissipative medium (e.g., Emmerich and Korn,1987; Moczo and Kristek, 2005). The procedure for stress calculation with memory variables is as follows. First, we approximate the time-domain slip response function calculated in a viscoelastic medium with a series of relaxation functions with coefficients and relaxation times derived from a generalized Maxell body model. Then we can define the time-domain material-independent memory variable or anelastic function for each relaxation mechanism. Each time-domain memory variable satisfies the first-order differential equation. As a result, we can calculate the stress simply by the product of the unrelaxed modulus and the slip deficit subtracted from the sum of memory variables without temporal convolution. With respect to computational cost, we can summarize as in the followings. Dividing the plate interface into N cells, in elastic media, the stress at all cells is calculated by the product of the slip response function matrix and the slip deficit vector. The computational cost is O(N**2). With H-matrices method, we can reduce this to O(N)-O(NlogN) (Ohtani et al. 2011). The memory size is also reduced from O(N**2) to O(N). In viscoelastic media, the product of the unrelaxed modulus matrix and the vector of the slip deficit subtracted from the sum of memory variables costs O(N) with H-matrices method, which is the same as in elastic ones. If we use m relaxation functions, m x N differential equations are additionally solved at a time. The increase in memory size is (4m+1) x N**2. For approximation of slip response function, we need to estimate coefficients and relaxation times for m relaxation functions non-linearly with constraints. Because it is difficult to execute the non-linear least square estimation with constraints, we consider only m=2 with satisfying constraints. Test calculations in a layered or 3-D heterogeneous viscoelastic structure show this gives the satisfactory approximation. As an example, we report a 2-D earthquake cycle simulation for the 2011 giant Tohoku earthquake in a layered viscoelastic medium.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Feldman, Steven; Valera-Leon, Carlos; Dechev, Damian
The vector is a fundamental data structure, which provides constant-time access to a dynamically-resizable range of elements. Currently, there exist no wait-free vectors. The only non-blocking version supports only a subset of the sequential vector API and exhibits significant synchronization overhead caused by supporting opposing operations. Since many applications operate in phases of execution, wherein each phase only a subset of operations are used, this overhead is unnecessary for the majority of the application. To address the limitations of the non-blocking version, we present a new design that is wait-free, supports more of the operations provided by the sequential vector,more » and provides alternative implementations of key operations. These alternatives allow the developer to balance the performance and functionality of the vector as requirements change throughout execution. Compared to the known non-blocking version and the concurrent vector found in Intel’s TBB library, our design outperforms or provides comparable performance in the majority of tested scenarios. Over all tested scenarios, the presented design performs an average of 4.97 times more operations per second than the non-blocking vector and 1.54 more than the TBB vector. In a scenario designed to simulate the filling of a vector, performance improvement increases to 13.38 and 1.16 times. This work presents the first ABA-free non-blocking vector. Finally, unlike the other non-blocking approach, all operations are wait-free and bounds-checked and elements are stored contiguously in memory.« less
Replay of Episodic Memories in the Rat.
Panoz-Brown, Danielle; Iyer, Vishakh; Carey, Lawrence M; Sluka, Christina M; Rajic, Gabriela; Kestenman, Jesse; Gentry, Meredith; Brotheridge, Sydney; Somekh, Isaac; Corbin, Hannah E; Tucker, Kjersten G; Almeida, Bianca; Hex, Severine B; Garcia, Krysten D; Hohmann, Andrea G; Crystal, Jonathon D
2018-05-21
Vivid episodic memories in people have been characterized as the replay of multiple unique events in sequential order [1-3]. The hippocampus plays a critical role in episodic memories in both people and rodents [2, 4-6]. Although rats remember multiple unique episodes [7, 8], it is currently unknown if animals "replay" episodic memories. Therefore, we developed an animal model of episodic memory replay. Here, we show that rats can remember a trial-unique stream of multiple episodes and the order in which these events occurred by engaging hippocampal-dependent episodic memory replay. We document that rats rely on episodic memory replay to remember the order of events rather than relying on non-episodic memories. Replay of episodic memories survives a long retention-interval challenge and interference from the memory of other events, which documents that replay is part of long-term episodic memory. The chemogenetic activating drug clozapine N-oxide (CNO), but not vehicle, reversibly impairs episodic memory replay in rats previously injected bilaterally in the hippocampus with a recombinant viral vector containing an inhibitory designer receptor exclusively activated by a designer drug (DREADD; AAV8-hSyn-hM4Di-mCherry). By contrast, two non-episodic memory assessments are unaffected by CNO, showing selectivity of this hippocampal-dependent impairment. Our approach provides an animal model of episodic memory replay, a process by which the rat searches its representations in episodic memory in sequential order to find information. Our findings using rats suggest that the ability to replay a stream of episodic memories is quite old in the evolutionary timescale. Copyright © 2018 Elsevier Ltd. All rights reserved.
Multiprocessor architecture: Synthesis and evaluation
NASA Technical Reports Server (NTRS)
Standley, Hilda M.
1990-01-01
Multiprocessor computed architecture evaluation for structural computations is the focus of the research effort described. Results obtained are expected to lead to more efficient use of existing architectures and to suggest designs for new, application specific, architectures. The brief descriptions given outline a number of related efforts directed toward this purpose. The difficulty is analyzing an existing architecture or in designing a new computer architecture lies in the fact that the performance of a particular architecture, within the context of a given application, is determined by a number of factors. These include, but are not limited to, the efficiency of the computation algorithm, the programming language and support environment, the quality of the program written in the programming language, the multiplicity of the processing elements, the characteristics of the individual processing elements, the interconnection network connecting processors and non-local memories, and the shared memory organization covering the spectrum from no shared memory (all local memory) to one global access memory. These performance determiners may be loosely classified as being software or hardware related. This distinction is not clear or even appropriate in many cases. The effect of the choice of algorithm is ignored by assuming that the algorithm is specified as given. Effort directed toward the removal of the effect of the programming language and program resulted in the design of a high-level parallel programming language. Two characteristics of the fundamental structure of the architecture (memory organization and interconnection network) are examined.
Multi-processor including data flow accelerator module
Davidson, George S.; Pierce, Paul E.
1990-01-01
An accelerator module for a data flow computer includes an intelligent memory. The module is added to a multiprocessor arrangement and uses a shared tagged memory architecture in the data flow computer. The intelligent memory module assigns locations for holding data values in correspondence with arcs leading to a node in a data dependency graph. Each primitive computation is associated with a corresponding memory cell, including a number of slots for operands needed to execute a primitive computation, a primitive identifying pointer, and linking slots for distributing the result of the cell computation to other cells requiring that result as an operand. Circuitry is provided for utilizing tag bits to determine automatically when all operands required by a processor are available and for scheduling the primitive for execution in a queue. Each memory cell of the module may be associated with any of the primitives, and the particular primitive to be executed by the processor associated with the cell is identified by providing an index, such as the cell number for the primitive, to the primitive lookup table of starting addresses. The module thus serves to perform functions previously performed by a number of sections of data flow architectures and coexists with conventional shared memory therein. A multiprocessing system including the module operates in a hybrid mode, wherein the same processing modules are used to perform some processing in a sequential mode, under immediate control of an operating system, while performing other processing in a data flow mode.
Early Experiences Writing Performance Portable OpenMP 4 Codes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Joubert, Wayne; Hernandez, Oscar R
In this paper, we evaluate the recently available directives in OpenMP 4 to parallelize a computational kernel using both the traditional shared memory approach and the newer accelerator targeting capabilities. In addition, we explore various transformations that attempt to increase application performance portability, and examine the expressiveness and performance implications of using these approaches. For example, we want to understand if the target map directives in OpenMP 4 improve data locality when mapped to a shared memory system, as opposed to the traditional first touch policy approach in traditional OpenMP. To that end, we use recent Cray and Intel compilersmore » to measure the performance variations of a simple application kernel when executed on the OLCF s Titan supercomputer with NVIDIA GPUs and the Beacon system with Intel Xeon Phi accelerators attached. To better understand these trade-offs, we compare our results from traditional OpenMP shared memory implementations to the newer accelerator programming model when it is used to target both the CPU and an attached heterogeneous device. We believe the results and lessons learned as presented in this paper will be useful to the larger user community by providing guidelines that can assist programmers in the development of performance portable code.« less
NASA Technical Reports Server (NTRS)
Ierotheou, C.; Johnson, S.; Leggett, P.; Cross, M.; Evans, E.; Jin, Hao-Qiang; Frumkin, M.; Yan, J.; Biegel, Bryan (Technical Monitor)
2001-01-01
The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. Historically, the lack of a programming standard for using directives and the rather limited performance due to scalability have affected the take-up of this programming model approach. Significant progress has been made in hardware and software technologies, as a result the performance of parallel programs with compiler directives has also made improvements. The introduction of an industrial standard for shared-memory programming with directives, OpenMP, has also addressed the issue of portability. In this study, we have extended the computer aided parallelization toolkit (developed at the University of Greenwich), to automatically generate OpenMP based parallel programs with nominal user assistance. We outline the way in which loop types are categorized and how efficient OpenMP directives can be defined and placed using the in-depth interprocedural analysis that is carried out by the toolkit. We also discuss the application of the toolkit on the NAS Parallel Benchmarks and a number of real-world application codes. This work not only demonstrates the great potential of using the toolkit to quickly parallelize serial programs but also the good performance achievable on up to 300 processors for hybrid message passing and directive-based parallelizations.
Pinaud, Silvain; Portela, Julien; Duval, David; Nowacki, Fanny C.; Olive, Marie-Aude; Allienne, Jean-François; Galinier, Richard; Dheilly, Nolwenn M.; Kieffer-Jaquinod, Sylvie; Mitta, Guillaume; Théron, André; Gourbal, Benjamin
2016-01-01
Discoveries made over the past ten years have provided evidence that invertebrate antiparasitic responses may be primed in a sustainable manner, leading to the failure of a secondary encounter with the same pathogen. This phenomenon called “immune priming” or "innate immune memory" was mainly phenomenological. The demonstration of this process remains to be obtained and the underlying mechanisms remain to be discovered and exhaustively tested with rigorous functional and molecular methods, to eliminate all alternative explanations. In order to achieve this ambitious aim, the present study focuses on the Lophotrochozoan snail, Biomphalaria glabrata, in which innate immune memory was recently reported. We provide herein the first evidence that a shift from a cellular immune response (encapsulation) to a humoral immune response (biomphalysin) occurs during the development of innate memory. The molecular characterisation of this process in Biomphalaria/Schistosoma system was undertaken to reconcile mechanisms with phenomena, opening the way to a better comprehension of innate immune memory in invertebrates. This prompted us to revisit the artificial dichotomy between innate and memory immunity in invertebrate systems. PMID:26735307
NASA Astrophysics Data System (ADS)
Akimov, D. A.; Fedotov, Andrei B.; Koroteev, Nikolai I.; Magnitskii, S. A.; Naumov, A. N.; Sidorov-Biryukov, Dmitri A.; Sokoluk, N. T.; Zheltikov, Alexei M.
1998-04-01
The possibilities of optimizing data writing and reading in devices of 3D optical memory using photochromic materials are discussed. We quantitatively analyze linear and nonlinear optical properties of induline spiropyran molecules, which allows us to estimate the efficiency of using such materials for implementing 3D optical-memory devices. It is demonstrated that, with an appropriate choice of polarization vectors of laser beams, one can considerably improve the efficiency of two-photon writing in photochromic materials. The problem of reading the data stored in a photochromic material is analyzed. The possibilities of data reading methods with the use of fluorescence and four-photon techniques are compared.
Programming distributed memory architectures using Kali
NASA Technical Reports Server (NTRS)
Mehrotra, Piyush; Vanrosendale, John
1990-01-01
Programming nonshared memory systems is more difficult than programming shared memory systems, in part because of the relatively low level of current programming environments for such machines. A new programming environment is presented, Kali, which provides a global name space and allows direct access to remote data values. In order to retain efficiency, Kali provides a system on annotations, allowing the user to control those aspects of the program critical to performance, such as data distribution and load balancing. The primitives and constructs provided by the language is described, and some of the issues raised in translating a Kali program for execution on distributed memory systems are also discussed.
Association of KIBRA and memory.
Bates, Timothy C; Price, Jackie F; Harris, Sarah E; Marioni, Riccardo E; Fowkes, F Gerry R; Stewart, Marlene C; Murray, Gordon D; Whalley, Lawrence J; Starr, John M; Deary, Ian J
2009-07-24
We report on the association of KIBRA with memory in two samples of older individuals assessed on either memory for semantically unrelated word stimuli (Rey Auditory Verbal Learning Test, n=2091), or a measure of semantically related material (the WAIS Logical Memory Test of prose-passage recall, n=542). SNP rs17070145 was associated with delayed recall of semantically unrelated items, but not with immediate recall for these stimuli, nor with either immediate or delayed recall for semantically related material. The pattern of results suggests a role for the T-->C substitution in intron 9 of KIBRA in a component of episodic memory involved in long-term storage but independent of processes shared with immediate recall such as rehearsal involved in acquisition and rehearsal or processes.
We Have Met Our Past and Our Future: Thanks for the Walk down Memory Lane
ERIC Educational Resources Information Center
Wiseman, Robert C.
2006-01-01
In this article, the author takes the readers for a walk down memory lane on the use of teaching aids. He shares his experience of the good old days of Audio Visual--opaque projector, motion pictures/films, recorders, and overhead projector. Computers have arrived, and now people can make graphics, pictures, motion pictures, and many different…
ERIC Educational Resources Information Center
Rummel, Jan; Smeekens, Bridget A.; Kane, Michael J.
2017-01-01
Prospective memory (PM) is the cognitive ability to remember to fulfill intended action plans at the appropriate future moment. Current theories assume that PM fulfillment draws on attentional processes. Accordingly, pending PM intentions interfere with other ongoing tasks to the extent to which both tasks rely on the same processes. How do people…
ERIC Educational Resources Information Center
Wiediger, Matthew D.; Fournier, Lisa R.
2008-01-01
Withholding an action plan in memory for later execution can delay execution of another action, if the actions share a similar (compatible) action feature (i.e., response hand). This phenomenon, termed compatibility interference (CI), was found for identity-based actions that do not require visual guidance. The authors examined whether CI can…
ERIC Educational Resources Information Center
Wang, Qi
2006-01-01
The relations of maternal reminiscing style and child self-concept to children's shared and independent autobiographical memories were examined in a sample of 189 three-year-olds and their mothers from Chinese families in China, first-generation Chinese immigrant families in the United States, and European American families. Mothers shared…
Wang, Manjie; Saudino, Kimberly J
2013-12-01
This is the first study to explore genetic and environmental contributions to individual differences in emotion regulation in toddlers, and the first to examine the genetic and environmental etiology underlying the association between emotion regulation and working memory. In a sample of 304 same-sex twin pairs (140 MZ, 164 DZ) at age 3, emotion regulation was assessed using the Behavior Rating Scale of the Bayley Scales of Infant Development (BRS; Bayley, 1993), and working memory was measured by the visually cued recall (VCR) task (Zelazo, Jacques, Burack, & Frye, 2002) and several memory tasks from the Mental Scale of the BSID. Based on model-fitting analyses, both emotion regulation and working memory were significantly influenced by genetic and nonshared environmental factors. Shared environmental effects were significant for working memory, but not for emotion regulation. Only genetic factors significantly contributed to the covariation between emotion regulation and working memory.
Wang, Manjie; Saudino, Kimberly J.
2014-01-01
This is the first study to explore genetic and environmental contributions to individual differences in emotion regulation in toddlers, and the first to examine the genetic and environmental etiology underlying the association between emotion regulation and working memory. In a sample of 304 same-sex twin pairs (140 MZ, 164 DZ) at age 3, emotion regulation was assessed using the Behavior Rating Scale of the Bayley Scales of Infant Development (BRS; Bayley, 1993), and working memory was measured by the visually cued recall (VCR) task (Zelazo et al., 2002) and several memory tasks from the Mental Scale of BSID. Based on model-fitting analyses, both emotion regulation and working memory were significantly influenced by genetic and nonshared environmental factors. Shared environmental effects were significant for working memory, but not for emotion regulation. Only genetic factors significantly contributed to the covariation between emotion regulation and working memory. PMID:24098922
Autobiographical Memory Sharing in Everyday Life: Characteristics of a Good Story
ERIC Educational Resources Information Center
Baron, Jacqueline M.; Bluck, Susan
2009-01-01
Storytelling is a ubiquitous human activity that occurs across the lifespan as part of everyday life. Studies from three disparate literatures suggest that older adults (as compared to younger adults) are (a) less likely to recall story details, (b) more likely to go off-target when sharing stories, and, in contrast, (c) more likely to receive…
ERIC Educational Resources Information Center
O'Toole, Catriona; Barnes-Holmes, Dermot
2009-01-01
The Implicit Association Test (IAT) examines the differential association of 2 target concepts with 2 attribute concepts. Responding is predicted to be faster on consistent trials, when concepts that are associated in memory share a response key, than on inconsistent trials, when less associated items share a key. In the current study,…
Shared Versus Distributed Memory Multiprocessors
1991-01-01
multiprocessors should hawe shared or dis.trimuted meieo-% ha~ trr ~ g ’’~ de~i c4~accio;, S Cm teaicners argue S trongly tor Outiding (li15 tri huted...Applications, MIT Press (1985). 161 D. Gajski et el., "Cedar," Proc. Compcon, pp. 306-309 (Spring 19S9). 171 S. Ahuja, N. Carriero and D. Gelernter, "Linda
A predictive framework for evaluating models of semantic organization in free recall
Morton, Neal W; Polyn, Sean M.
2016-01-01
Research in free recall has demonstrated that semantic associations reliably influence the organization of search through episodic memory. However, the specific structure of these associations and the mechanisms by which they influence memory search remain unclear. We introduce a likelihood-based model-comparison technique, which embeds a model of semantic structure within the context maintenance and retrieval (CMR) model of human memory search. Within this framework, model variants are evaluated in terms of their ability to predict the specific sequence in which items are recalled. We compare three models of semantic structure, latent semantic analysis (LSA), global vectors (GloVe), and word association spaces (WAS), and find that models using WAS have the greatest predictive power. Furthermore, we find evidence that semantic and temporal organization is driven by distinct item and context cues, rather than a single context cue. This finding provides important constraint for theories of memory search. PMID:28331243
A decomposition approach to the design of a multiferroic memory bit
NASA Astrophysics Data System (ADS)
Acevedo, Ruben; Liang, Cheng-Yen; Carman, Gregory P.; Sepulveda, Abdon E.
2017-06-01
The objective of this paper is to present a methodology for the design of a memory bit to minimize the energy required to write data at the bit level. By straining a ferromagnetic nickel nano-dot by means of a piezoelectric substrate, its magnetization vector rotates between two stable states defined as a 1 and 0 for digital memory. The memory bit geometry, actuation mechanism and voltage control law were used as design variables. The approach used was to decompose the overall design process into simpler sub-problems whose structure can be exploited for a more efficient solution. This method minimizes the number of fully dynamic coupled finite element analyses required to converge to a near optimal design, thus decreasing the computational time for the design process. An in-plane sample design problem is presented to illustrate the advantages and flexibility of the procedure.
Low-memory iterative density fitting.
Grajciar, Lukáš
2015-07-30
A new low-memory modification of the density fitting approximation based on a combination of a continuous fast multipole method (CFMM) and a preconditioned conjugate gradient solver is presented. Iterative conjugate gradient solver uses preconditioners formed from blocks of the Coulomb metric matrix that decrease the number of iterations needed for convergence by up to one order of magnitude. The matrix-vector products needed within the iterative algorithm are calculated using CFMM, which evaluates them with the linear scaling memory requirements only. Compared with the standard density fitting implementation, up to 15-fold reduction of the memory requirements is achieved for the most efficient preconditioner at a cost of only 25% increase in computational time. The potential of the method is demonstrated by performing density functional theory calculations for zeolite fragment with 2592 atoms and 121,248 auxiliary basis functions on a single 12-core CPU workstation. © 2015 Wiley Periodicals, Inc.
The evolution of episodic memory
Allen, Timothy A.; Fortin, Norbert J.
2013-01-01
One prominent view holds that episodic memory emerged recently in humans and lacks a “(neo)Darwinian evolution” [Tulving E (2002) Annu Rev Psychol 53:1–25]. Here, we review evidence supporting the alternative perspective that episodic memory has a long evolutionary history. We show that fundamental features of episodic memory capacity are present in mammals and birds and that the major brain regions responsible for episodic memory in humans have anatomical and functional homologs in other species. We propose that episodic memory capacity depends on a fundamental neural circuit that is similar across mammalian and avian species, suggesting that protoepisodic memory systems exist across amniotes and, possibly, all vertebrates. The implication is that episodic memory in diverse species may primarily be due to a shared underlying neural ancestry, rather than the result of evolutionary convergence. We also discuss potential advantages that episodic memory may offer, as well as species-specific divergences that have developed on top of the fundamental episodic memory architecture. We conclude by identifying possible time points for the emergence of episodic memory in evolution, to help guide further research in this area. PMID:23754432
Clément, Nathalie; Velu, Thierry; Brandenburger, Annick
2002-09-01
The production of currently available vectors derived from autonomous parvoviruses requires the expression of capsid proteins in trans, from helper sequences. Cotransfection of a helper plasmid always generates significant amounts of replication-competent virus (RCV) that can be reduced by the integration of helper sequences into a packaging cell line. Although stocks of minute virus of mice (MVM)-based vectors with no detectable RCV could be produced by transfection into packaging cells; the latter appear after one or two rounds of replication, precluding further amplification of the vector stock. Indeed, once RCVs become detectable, they are efficiently amplified and rapidly take over the culture. Theoretically RCV-free vector stocks could be produced if all homology between vector and helper DNA is eliminated, thus preventing homologous recombination. We constructed new vectors based on the structure of spontaneously occurring defective particles of MVM. Based on published observations related to the size of vectors and the sequence of the viral origin of replication, these vectors were modified by the insertion of foreign DNA sequences downstream of the transgene and by the introduction of a consensus NS-1 nick site near the origin of replication to optimize their production. In one of the vectors the inserted fragment of mouse genomic DNA had a synergistic effect with the modified origin of replication in increasing vector production.
Vector-Borne Bacterial Plant Pathogens: Interactions with Hemipteran Insects and Plants
Perilla-Henao, Laura M.; Casteel, Clare L.
2016-01-01
Hemipteran insects are devastating pests of crops due to their wide host range, rapid reproduction, and ability to transmit numerous plant-infecting pathogens as vectors. While the field of plant–virus–vector interactions has flourished in recent years, plant–bacteria–vector interactions remain poorly understood. Leafhoppers and psyllids are by far the most important vectors of bacterial pathogens, yet there are still significant gaps in our understanding of their feeding behavior, salivary secretions, and plant responses as compared to important viral vectors, such as whiteflies and aphids. Even with an incomplete understanding of plant–bacteria–vector interactions, some common themes have emerged: (1) all known vector-borne bacteria share the ability to propagate in the plant and insect host; (2) particular hemipteran families appear to be incapable of transmitting vector-borne bacteria; (3) all known vector-borne bacteria have highly reduced genomes and coding capacity, resulting in host-dependence; and (4) vector-borne bacteria encode proteins that are essential for colonization of specific hosts, though only a few types of proteins have been investigated. Here, we review the current knowledge on important vector-borne bacterial pathogens, including Xylella fastidiosa, Spiroplasma spp., Liberibacter spp., and ‘Candidatus Phytoplasma spp.’. We then highlight recent approaches used in the study of vector-borne bacteria. Finally, we discuss the application of this knowledge for control and future directions that will need to be addressed in the field of vector–plant–bacteria interactions. PMID:27555855
NASA Astrophysics Data System (ADS)
Chase, Patrick; Vondran, Gary
2011-01-01
Tetrahedral interpolation is commonly used to implement continuous color space conversions from sparse 3D and 4D lookup tables. We investigate the implementation and optimization of tetrahedral interpolation algorithms for GPUs, and compare to the best known CPU implementations as well as to a well known GPU-based trilinear implementation. We show that a 500 NVIDIA GTX-580 GPU is 3x faster than a 1000 Intel Core i7 980X CPU for 3D interpolation, and 9x faster for 4D interpolation. Performance-relevant GPU attributes are explored including thread scheduling, local memory characteristics, global memory hierarchy, and cache behaviors. We consider existing tetrahedral interpolation algorithms and tune based on the structure and branching capabilities of current GPUs. Global memory performance is improved by reordering and expanding the lookup table to ensure optimal access behaviors. Per multiprocessor local memory is exploited to implement optimally coalesced global memory accesses, and local memory addressing is optimized to minimize bank conflicts. We explore the impacts of lookup table density upon computation and memory access costs. Also presented are CPU-based 3D and 4D interpolators, using SSE vector operations that are faster than any previously published solution.
Explaining prompts children to privilege inductively rich properties.
Walker, Caren M; Lombrozo, Tania; Legare, Cristine H; Gopnik, Alison
2014-11-01
Four experiments with preschool-aged children test the hypothesis that engaging in explanation promotes inductive reasoning on the basis of shared causal properties as opposed to salient (but superficial) perceptual properties. In Experiments 1a and 1b, 3- to 5-year-old children prompted to explain during a causal learning task were more likely to override a tendency to generalize according to perceptual similarity and instead extend an internal feature to an object that shared a causal property. Experiment 2 replicated this effect of explanation in a case of label extension (i.e., categorization). Experiment 3 demonstrated that explanation improves memory for clusters of causally relevant (non-perceptual) features, but impairs memory for superficial (perceptual) features, providing evidence that effects of explanation are selective in scope and apply to memory as well as inference. In sum, our data support the proposal that engaging in explanation influences children's reasoning by privileging inductively rich, causal properties. Copyright © 2014 Elsevier B.V. All rights reserved.
Power and Performance Trade-offs for Space Time Adaptive Processing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gawande, Nitin A.; Manzano Franco, Joseph B.; Tumeo, Antonino
Computational efficiency – performance relative to power or energy – is one of the most important concerns when designing RADAR processing systems. This paper analyzes power and performance trade-offs for a typical Space Time Adaptive Processing (STAP) application. We study STAP implementations for CUDA and OpenMP on two computationally efficient architectures, Intel Haswell Core I7-4770TE and NVIDIA Kayla with a GK208 GPU. We analyze the power and performance of STAP’s computationally intensive kernels across the two hardware testbeds. We also show the impact and trade-offs of GPU optimization techniques. We show that data parallelism can be exploited for efficient implementationmore » on the Haswell CPU architecture. The GPU architecture is able to process large size data sets without increase in power requirement. The use of shared memory has a significant impact on the power requirement for the GPU. A balance between the use of shared memory and main memory access leads to an improved performance in a typical STAP application.« less
Parallel Navier-Stokes computations on shared and distributed memory architectures
NASA Technical Reports Server (NTRS)
Hayder, M. Ehtesham; Jayasimha, D. N.; Pillay, Sasi Kumar
1995-01-01
We study a high order finite difference scheme to solve the time accurate flow field of a jet using the compressible Navier-Stokes equations. As part of our ongoing efforts, we have implemented our numerical model on three parallel computing platforms to study the computational, communication, and scalability characteristics. The platforms chosen for this study are a cluster of workstations connected through fast networks (the LACE experimental testbed at NASA Lewis), a shared memory multiprocessor (the Cray YMP), and a distributed memory multiprocessor (the IBM SPI). Our focus in this study is on the LACE testbed. We present some results for the Cray YMP and the IBM SP1 mainly for comparison purposes. On the LACE testbed, we study: (1) the communication characteristics of Ethernet, FDDI, and the ALLNODE networks and (2) the overheads induced by the PVM message passing library used for parallelizing the application. We demonstrate that clustering of workstations is effective and has the potential to be computationally competitive with supercomputers at a fraction of the cost.
CREB regulates memory allocation in the insular cortex
Sano, Yoshitake; Shobe, Justin L.; Zhou, Miou; Huang, Shan; Shuman, Tristan; Cai, Denise J.; Golshani, Peyman; Kamata, Masakazu; Silva, Alcino J.
2016-01-01
Summary The molecular and cellular mechanisms of memory storage have attracted a great deal of attention. By comparison, little is known about memory allocation, the process that determines which specific neurons in a neural network will store a given memory [1, 2]. Previous studies demonstrated that memory allocation is not random in the amygdala; these studies showed that amygdala neurons with higher levels of the cAMP response element binding protein (CREB) are more likely to be recruited into encoding and storing fear memory [3–6]. To determine whether specific mechanisms also regulate memory allocation in other brain regions, and whether CREB also has a role in this process, we studied insular cortical memory representations for conditioned taste aversion (CTA). In this task, an animal learns to associate a taste (CS) with the experience of malaise (such as that induced by LiCl; US). The insular cortex is required for CTA memory formation and retrieval [7–12]. CTA learning activates a subpopulation of neurons in this structure [13–15], and the insular cortex and the basolateral amygdala (BLA) interact during CTA formation [16, 17]. Here, we used a combination of approaches, including viral vector transfections of insular cortex, arc Fluorescence In Situ Hybridization (FISH) and Designer Receptors Exclusively Activated by Designer Drugs (DREADD) system, to show that CREB levels determine which insular cortical neurons go on to encode a given conditioned taste memory. PMID:25454591
Interference from mere thinking: mental rehearsal temporarily disrupts recall of motor memory.
Yin, Cong; Wei, Kunlin
2014-08-01
Interference between successively learned tasks is widely investigated to study motor memory. However, how simultaneously learned motor memories interact with each other has been rarely studied despite its prevalence in daily life. Assuming that motor memory shares common neural mechanisms with declarative memory system, we made unintuitive predictions that mental rehearsal, as opposed to further practice, of one motor memory will temporarily impair the recall of another simultaneously learned memory. Subjects simultaneously learned two sensorimotor tasks, i.e., visuomotor rotation and gain. They retrieved one memory by either practice or mental rehearsal and then had their memory evaluated. We found that mental rehearsal, instead of execution, impaired the recall of unretrieved memory. This impairment was content-independent, i.e., retrieving either gain or rotation impaired the other memory. Hence, conscious recollection of one motor memory interferes with the recall of another memory. This is analogous to retrieval-induced forgetting in declarative memory, suggesting a common neural process across memory systems. Our findings indicate that motor imagery is sufficient to induce interference between motor memories. Mental rehearsal, currently widely regarded as beneficial for motor performance, negatively affects memory recall when it is exercised for a subset of memorized items. Copyright © 2014 the American Physiological Society.
Optimization of the Brillouin operator on the KNL architecture
NASA Astrophysics Data System (ADS)
Dürr, Stephan
2018-03-01
Experiences with optimizing the matrix-times-vector application of the Brillouin operator on the Intel KNL processor are reported. Without adjustments to the memory layout, performance figures of 360 Gflop/s in single and 270 Gflop/s in double precision are observed. This is with Nc = 3 colors, Nv = 12 right-hand-sides, Nthr = 256 threads, on lattices of size 323 × 64, using exclusively OMP pragmas. Interestingly, the same routine performs quite well on Intel Core i7 architectures, too. Some observations on the much harderWilson fermion matrix-times-vector optimization problem are added.
NASA Technical Reports Server (NTRS)
Muellerschoen, R. J.
1988-01-01
A unified method to permute vector stored Upper triangular Diagonal factorized covariance and vector stored upper triangular Square Root Information arrays is presented. The method involves cyclic permutation of the rows and columns of the arrays and retriangularization with fast (slow) Givens rotations (reflections). Minimal computation is performed, and a one dimensional scratch array is required. To make the method efficient for large arrays on a virtual memory machine, computations are arranged so as to avoid expensive paging faults. This method is potentially important for processing large volumes of radio metric data in the Deep Space Network.
NASA Technical Reports Server (NTRS)
1975-01-01
NASA structural analysis (NASTRAN) computer program is operational on three series of third generation computers. The problem and difficulties involved in adapting NASTRAN to a fourth generation computer, namely, the Control Data STAR-100, are discussed. The salient features which distinguish Control Data STAR-100 from third generation computers are hardware vector processing capability and virtual memory. A feasible method is presented for transferring NASTRAN to Control Data STAR-100 system while retaining much of the machine-independent code. Basic matrix operations are noted for optimization for vector processing.
Real-Time Symbol Extraction From Grey-Level Images
NASA Astrophysics Data System (ADS)
Massen, R.; Simnacher, M.; Rosch, J.; Herre, E.; Wuhrer, H. W.
1988-04-01
A VME-bus image pipeline processor for extracting vectorized contours from grey-level images in real-time is presented. This 3 Giga operation per second processor uses large kernel convolvers and new non-linear neighbourhood processing algorithms to compute true 1-pixel wide and noise-free contours without thresholding even from grey-level images with quite varying edge sharpness. The local edge orientation is used as an additional cue to compute a list of vectors describing the closed and open contours in real-time and to dump a CAD-like symbolic image description into a symbol memory at pixel clock rate.
Distributed-Memory Fast Maximal Independent Set
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kanewala Appuhamilage, Thejaka Amila J.; Zalewski, Marcin J.; Lumsdaine, Andrew
The Maximal Independent Set (MIS) graph problem arises in many applications such as computer vision, information theory, molecular biology, and process scheduling. The growing scale of MIS problems suggests the use of distributed-memory hardware as a cost-effective approach to providing necessary compute and memory resources. Luby proposed four randomized algorithms to solve the MIS problem. All those algorithms are designed focusing on shared-memory machines and are analyzed using the PRAM model. These algorithms do not have direct efficient distributed-memory implementations. In this paper, we extend two of Luby’s seminal MIS algorithms, “Luby(A)” and “Luby(B),” to distributed-memory execution, and we evaluatemore » their performance. We compare our results with the “Filtered MIS” implementation in the Combinatorial BLAS library for two types of synthetic graph inputs.« less
Towards Memory-Aware Services and Browsing through Lifelogging Sensing
Arcega, Lorena; Font, Jaime; Cetina, Carlos
2013-01-01
Every day we receive lots of information through our senses that is lost forever, because it lacked the strength or the repetition needed to generate a lasting memory. Combining the emerging Internet of Things and lifelogging sensors, we believe it is possible to build up a Digital Memory (Dig-Mem) in order to complement the fallible memory of people. This work shows how to realize the Dig-Mem in terms of interactions, affinities, activities, goals and protocols. We also complement this Dig-Mem with memory-aware services and a Dig-Mem browser. Furthermore, we propose a RFID Tag-Sharing technique to speed up the adoption of Dig-Mem. Experimentation reveals an improvement of the user understanding of Dig-Mem as time passes, compared to natural memories where the level of detail decreases over time. PMID:24196436
A feature selection approach towards progressive vector transmission over the Internet
NASA Astrophysics Data System (ADS)
Miao, Ru; Song, Jia; Feng, Min
2017-09-01
WebGIS has been applied for visualizing and sharing geospatial information popularly over the Internet. In order to improve the efficiency of the client applications, the web-based progressive vector transmission approach is proposed. Important features should be selected and transferred firstly, and the methods for measuring the importance of features should be further considered in the progressive transmission. However, studies on progressive transmission for large-volume vector data have mostly focused on map generalization in the field of cartography, but rarely discussed on the selection of geographic features quantitatively. This paper applies information theory for measuring the feature importance of vector maps. A measurement model for the amount of information of vector features is defined based upon the amount of information for dealing with feature selection issues. The measurement model involves geometry factor, spatial distribution factor and thematic attribute factor. Moreover, a real-time transport protocol (RTP)-based progressive transmission method is then presented to improve the transmission of vector data. To clearly demonstrate the essential methodology and key techniques, a prototype for web-based progressive vector transmission is presented, and an experiment of progressive selection and transmission for vector features is conducted. The experimental results indicate that our approach clearly improves the performance and end-user experience of delivering and manipulating large vector data over the Internet.
Vienna FORTRAN: A FORTRAN language extension for distributed memory multiprocessors
NASA Technical Reports Server (NTRS)
Chapman, Barbara; Mehrotra, Piyush; Zima, Hans
1991-01-01
Exploiting the performance potential of distributed memory machines requires a careful distribution of data across the processors. Vienna FORTRAN is a language extension of FORTRAN which provides the user with a wide range of facilities for such mapping of data structures. However, programs in Vienna FORTRAN are written using global data references. Thus, the user has the advantage of a shared memory programming paradigm while explicitly controlling the placement of data. The basic features of Vienna FORTRAN are presented along with a set of examples illustrating the use of these features.
Simulation Analysis of Data Sharing in Shared Memory Multiprocessors
1989-02-24
LIMITATION OF ABSTRACT Same as Report (SAR) 18. NUMBER OF PAGES 178 19a. NAME OF RESPONSIBLE PERSON a. REPORT unclassified b . ABSTRACT unclassified...work. Andrea Casotto (CELL), Steve McGrogan (SPICE), Srinivas Devadas (TOPOP1) and Hi-Keung Tony Ma (VERIFY) donated the parallel programs and a con...Effect of Block Size on B us Utilization 120 5-14 Ratio of Sharing Bus Cyc les to Total Bus Cycles 120 5-15 Oassification of Bus Cyc les for
Parallel implementation of an adaptive and parameter-free N-body integrator
NASA Astrophysics Data System (ADS)
Pruett, C. David; Ingham, William H.; Herman, Ralph D.
2011-05-01
Previously, Pruett et al. (2003) [3] described an N-body integrator of arbitrarily high order M with an asymptotic operation count of O(MN). The algorithm's structure lends itself readily to data parallelization, which we document and demonstrate here in the integration of point-mass systems subject to Newtonian gravitation. High order is shown to benefit parallel efficiency. The resulting N-body integrator is robust, parameter-free, highly accurate, and adaptive in both time-step and order. Moreover, it exhibits linear speedup on distributed parallel processors, provided that each processor is assigned at least a handful of bodies. Program summaryProgram title: PNB.f90 Catalogue identifier: AEIK_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEIK_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC license, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 3052 No. of bytes in distributed program, including test data, etc.: 68 600 Distribution format: tar.gz Programming language: Fortran 90 and OpenMPI Computer: All shared or distributed memory parallel processors Operating system: Unix/Linux Has the code been vectorized or parallelized?: The code has been parallelized but has not been explicitly vectorized. RAM: Dependent upon N Classification: 4.3, 4.12, 6.5 Nature of problem: High accuracy numerical evaluation of trajectories of N point masses each subject to Newtonian gravitation. Solution method: Parallel and adaptive extrapolation in time via power series of arbitrary degree. Running time: 5.1 s for the demo program supplied with the package.
Parallel discrete event simulation: A shared memory approach
NASA Technical Reports Server (NTRS)
Reed, Daniel A.; Malony, Allen D.; Mccredie, Bradley D.
1987-01-01
With traditional event list techniques, evaluating a detailed discrete event simulation model can often require hours or even days of computation time. Parallel simulation mimics the interacting servers and queues of a real system by assigning each simulated entity to a processor. By eliminating the event list and maintaining only sufficient synchronization to insure causality, parallel simulation can potentially provide speedups that are linear in the number of processors. A set of shared memory experiments is presented using the Chandy-Misra distributed simulation algorithm to simulate networks of queues. Parameters include queueing network topology and routing probabilities, number of processors, and assignment of network nodes to processors. These experiments show that Chandy-Misra distributed simulation is a questionable alternative to sequential simulation of most queueing network models.
Exploration versus exploitation in space, mind, and society
Hills, Thomas T.; Todd, Peter M.; Lazer, David; Redish, A. David; Couzin, Iain D.
2015-01-01
Search is a ubiquitous property of life. Although diverse domains have worked on search problems largely in isolation, recent trends across disciplines indicate that the formal properties of these problems share similar structures and, often, similar solutions. Moreover, internal search (e.g., memory search) shows similar characteristics to external search (e.g., spatial foraging), including shared neural mechanisms consistent with a common evolutionary origin across species. Search problems and their solutions also scale from individuals to societies, underlying and constraining problem solving, memory, information search, and scientific and cultural innovation. In summary, search represents a core feature of cognition, with a vast influence on its evolution and processes across contexts and requiring input from multiple domains to understand its implications and scope. PMID:25487706
Dynamic programming on a shared-memory multiprocessor
NASA Technical Reports Server (NTRS)
Edmonds, Phil; Chu, Eleanor; George, Alan
1993-01-01
Three new algorithms for solving dynamic programming problems on a shared-memory parallel computer are described. All three algorithms attempt to balance work load, while keeping synchronization cost low. In particular, for a multiprocessor having p processors, an analysis of the best algorithm shows that the arithmetic cost is O(n-cubed/6p) and that the synchronization cost is O(absolute value of log sub C n) if p much less than n, where C = (2p-1)/(2p + 1) and n is the size of the problem. The low synchronization cost is important for machines where synchronization is expensive. Analysis and experiments show that the best algorithm is effective in balancing the work load and producing high efficiency.
Haidar, M; Guèvremont, G; Zhang, C; Bathgate, R A D; Timofeeva, E; Smith, C M; Gundlach, A L
2017-05-01
Hippocampus is innervated by γ-aminobutyric acid (GABA) "projection" neurons of the nucleus incertus (NI), including a population expressing the neuropeptide, relaxin-3 (RLN3). In studies aimed at gaining an understanding of the role of RLN3 signaling in hippocampus via its G i/o -protein-coupled receptor, RXFP3, we examined the distribution of RLN3-immunoreactive nerve fibres and RXFP3 mRNA-positive neurons in relation to hippocampal GABA neuron populations. RLN3-positive elements were detected in close-apposition with a substantial population of somatostatin (SST)- and GABA-immunoreactive neurons, and a smaller population of parvalbumin- and calretinin-immunoreactive neurons in different hippocampal areas, consistent with the relative distribution patterns of RXFP3 mRNA and these marker transcripts. In light of the functional importance of the dentate gyrus (DG) hilus in learning and memory, and our anatomical data, we examined the possible influence of RLN3/RXFP3 signaling in this region on spatial memory. Using viral-based Cre/LoxP recombination methods and adult mice with a floxed Rxfp3 gene, we deleted Rxfp3 from DG hilar neurons and assessed spatial memory performance and affective behaviors. Following infusions of an AAV (1/2) -Cre-IRES-eGFP vector, Cre expression was observed in DG hilar neurons, including SST-positive cells, and in situ hybridization histochemistry for RXFP3 mRNA confirmed receptor depletion relative to levels in floxed-RXFP3 mice infused with an AAV (1/2) -eGFP (control) vector. RXFP3 depletion within the DG hilus impaired spatial reference memory in an appetitive T-maze task reflected by a reduced percentage of correct choices and increased time to meet criteria, relative to control. In a continuous spontaneous alternation Y-maze task, RXFP3-depleted mice made fewer alternations in the first minute, suggesting impairment of spatial working memory. However, RXFP3-depleted and control mice displayed similar locomotor activity, anxiety-like behavior in light/dark box and elevated-plus maze tests, and learning and long-term memory retention in the Morris water maze. These data indicate endogenous RLN3/RXFP3 signaling can modulate hippocampal-dependent spatial reference and working memory via effects on SST interneurons, and further our knowledge of hippocampal cognitive processing. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Flegal, Kristin E; Reuter-Lorenz, Patricia A
2014-07-01
Gist-based processing has been proposed to account for robust false memories in the converging-associates task. The deep-encoding processes known to enhance verbatim memory also strengthen gist memory and increase distortions of long-term memory (LTM). Recent research has demonstrated that compelling false memory illusions are relatively delay-invariant, also occurring under canonical short-term memory (STM) conditions. To investigate the contributions of gist to false memory at short and long delays, processing depth was manipulated as participants encoded lists of four semantically related words and were probed immediately, following a filled 3- to 4-s retention interval, or approximately 20 min later, in a surprise recognition test. In two experiments, the encoding manipulation dissociated STM and LTM on the frequency, but not the phenomenology, of false memory. Deep encoding at STM increases false recognition rates at LTM, but confidence ratings and remember/know judgments are similar across delays and do not differ as a function of processing depth. These results suggest that some shared and some unique processes underlie false memory illusions at short and long delays.
A theory of working memory without consciousness or sustained activity
Trübutschek, Darinka; Marti, Sébastien; Ojeda, Andrés; King, Jean-Rémi; Mi, Yuanyuan; Tsodyks, Misha; Dehaene, Stanislas
2017-01-01
Working memory and conscious perception are thought to share similar brain mechanisms, yet recent reports of non-conscious working memory challenge this view. Combining visual masking with magnetoencephalography, we investigate the reality of non-conscious working memory and dissect its neural mechanisms. In a spatial delayed-response task, participants reported the location of a subjectively unseen target above chance-level after several seconds. Conscious perception and conscious working memory were characterized by similar signatures: a sustained desynchronization in the alpha/beta band over frontal cortex, and a decodable representation of target location in posterior sensors. During non-conscious working memory, such activity vanished. Our findings contradict models that identify working memory with sustained neural firing, but are compatible with recent proposals of ‘activity-silent’ working memory. We present a theoretical framework and simulations showing how slowly decaying synaptic changes allow cell assemblies to go dormant during the delay, yet be retrieved above chance-level after several seconds. DOI: http://dx.doi.org/10.7554/eLife.23871.001 PMID:28718763
Visual Cortex Inspired CNN Model for Feature Construction in Text Analysis
Fu, Hongping; Niu, Zhendong; Zhang, Chunxia; Ma, Jing; Chen, Jie
2016-01-01
Recently, biologically inspired models are gradually proposed to solve the problem in text analysis. Convolutional neural networks (CNN) are hierarchical artificial neural networks, which include a various of multilayer perceptrons. According to biological research, CNN can be improved by bringing in the attention modulation and memory processing of primate visual cortex. In this paper, we employ the above properties of primate visual cortex to improve CNN and propose a biological-mechanism-driven-feature-construction based answer recommendation method (BMFC-ARM), which is used to recommend the best answer for the corresponding given questions in community question answering. BMFC-ARM is an improved CNN with four channels respectively representing questions, answers, asker information and answerer information, and mainly contains two stages: biological mechanism driven feature construction (BMFC) and answer ranking. BMFC imitates the attention modulation property by introducing the asker information and answerer information of given questions and the similarity between them, and imitates the memory processing property through bringing in the user reputation information for answerers. Then the feature vector for answer ranking is constructed by fusing the asker-answerer similarities, answerer's reputation and the corresponding vectors of question, answer, asker, and answerer. Finally, the Softmax is used at the stage of answer ranking to get best answers by the feature vector. The experimental results of answer recommendation on the Stackexchange dataset show that BMFC-ARM exhibits better performance. PMID:27471460
Visual Cortex Inspired CNN Model for Feature Construction in Text Analysis.
Fu, Hongping; Niu, Zhendong; Zhang, Chunxia; Ma, Jing; Chen, Jie
2016-01-01
Recently, biologically inspired models are gradually proposed to solve the problem in text analysis. Convolutional neural networks (CNN) are hierarchical artificial neural networks, which include a various of multilayer perceptrons. According to biological research, CNN can be improved by bringing in the attention modulation and memory processing of primate visual cortex. In this paper, we employ the above properties of primate visual cortex to improve CNN and propose a biological-mechanism-driven-feature-construction based answer recommendation method (BMFC-ARM), which is used to recommend the best answer for the corresponding given questions in community question answering. BMFC-ARM is an improved CNN with four channels respectively representing questions, answers, asker information and answerer information, and mainly contains two stages: biological mechanism driven feature construction (BMFC) and answer ranking. BMFC imitates the attention modulation property by introducing the asker information and answerer information of given questions and the similarity between them, and imitates the memory processing property through bringing in the user reputation information for answerers. Then the feature vector for answer ranking is constructed by fusing the asker-answerer similarities, answerer's reputation and the corresponding vectors of question, answer, asker, and answerer. Finally, the Softmax is used at the stage of answer ranking to get best answers by the feature vector. The experimental results of answer recommendation on the Stackexchange dataset show that BMFC-ARM exhibits better performance.
Quinn, Kylie M.; Costa, Andreia Da; Yamamoto, Ayako; Berry, Dana; Lindsay, Ross W.B.; Darrah, Patricia A.; Wang, Lingshu; Cheng, Cheng; Kong, Wing-Pui; Gall, Jason G.D.; Nicosia, Alfredo; Folgori, Antonella; Colloca, Stefano; Cortese, Riccardo; Gostick, Emma; Price, David A.; Gomez, Carmen E.; Esteban, Mariano; Wyatt, Linda S.; Moss, Bernard; Morgan, Cecilia; Roederer, Mario; Bailer, Robert T.; Nabel, Gary J.; Koup, Richard A.; Seder, Robert A.
2013-01-01
Recombinant adenoviral vectors (rAds) are the most potent recombinant vaccines for eliciting CD8+ T cell-mediated immunity in humans; however, prior exposure from natural adenoviral infection can decrease such responses. Here we show low seroreactivity in humans against simian- (sAd11, sAd16), or chimpanzee-derived (chAd3, chAd63) compared to human-derived (rAd5, rAd28, rAd35) vectors across multiple geographic regions. We then compared the magnitude, quality, phenotype and protective capacity of CD8+ T cell responses in mice vaccinated with rAds encoding SIV Gag. Using a dose range (1 × 107 to 109 PU), we defined a hierarchy among rAd vectors based on the magnitude and protective capacity of CD8+ T cell responses, from most to least as: rAd5 and chAd3, rAd28 and sAd11, chAd63, sAd16, and rAd35. Selection of rAd vector or dose could modulate the proportion and/or frequency of IFNγ+TNFα+IL-2+ and KLRG1+CD127- CD8+ T cells, but strikingly ~30–80% of memory CD8+ T cells co-expressed CD127 and KLRG1. To further optimise CD8+ T cell responses, we assessed rAds as part of prime-boost regimens. Mice primed with rAds and boosted with NYVAC generated Gag-specific responses that approached ~60% of total CD8+ T cells at peak. Alternatively, priming with DNA or rAd28 and boosting with rAd5 or chAd3 induced robust and equivalent CD8+ T cell responses compared to prime or boost alone. Collectively, these data provide the immunologic basis for using specific rAd vectors alone or as part of prime-boost regimens to induce CD8+ T cells for rapid effector function or robust long-term memory, respectively. PMID:23390298
Automated Change Detection for Synthetic Aperture Sonar
2014-01-01
channels, respectively. The canonical coordinates of x and y are defined as u = FHR−1/2xx x v = GHR−1/2yy y where F and G are the mapping matrices...containing the left and right singular vectors of the coherence matrix C, respectively. The canonical coordinate vectors u and v share the diagonal cross...feature set. The coherent change information between canonical coordinates v and u can be calculated using the residual, v −Ku, owing to the fact that
Multiple-User, Multitasking, Virtual-Memory Computer System
NASA Technical Reports Server (NTRS)
Generazio, Edward R.; Roth, Don J.; Stang, David B.
1993-01-01
Computer system designed and programmed to serve multiple users in research laboratory. Provides for computer control and monitoring of laboratory instruments, acquisition and anlaysis of data from those instruments, and interaction with users via remote terminals. System provides fast access to shared central processing units and associated large (from megabytes to gigabytes) memories. Underlying concept of system also applicable to monitoring and control of industrial processes.
A warm and friendly memorial session for Helmut Oeschler
NASA Astrophysics Data System (ADS)
Cleymans, Jean; Hippolyte, Boris; Kalweit, Alexander; Müntz, Christian; Stroth, Joachim
2018-02-01
A full session was organized in memory of Helmut Oeschler during the 2017 edition of the Strangeness in Quark Matter Conference. It was heart-warming to discuss with the audience his main achievements and share anecdotes about this exceptionally praised and appreciated colleague, who was also a great friend for many at the conference. A brief summary of the session is provided with these proceedings.
Wang, Qi; Koh, Jessie Bee Kim; Song, Qingfang; Hou, Yubo
2015-01-01
This study investigated explicit knowledge of autobiographical memory functions using a newly developed questionnaire. European and Asian American adults (N = 57) and school-aged children (N = 68) indicated their agreement with 13 statements about why people think about and share memories pertaining to four broad functions-self, social, directive and emotion regulation. Children were interviewed for personal memories concurrently with the memory function knowledge assessment and again 3 months later. It was found that adults agreed to the self, social and directive purposes of memory to a greater extent than did children, whereas European American children agreed to the emotion regulation purposes of memory to a greater extent than did European American adults. Furthermore, European American children endorsed more self and emotion regulation functions than did Asian American children, whereas Asian American adults endorsed more directive functions than did European American adults. Children's endorsement of memory functions, particularly social functions, was associated with more detailed and personally meaningful memories. These findings are informative for the understanding of developmental and cultural influences on memory function knowledge and of the relation of such knowledge to autobiographical memory development.
ERIC Educational Resources Information Center
Turner, Joy; And Others
1995-01-01
Twenty-five members of the Montessori community share their memories of Dr. Nancy McCormick Rambusch, charismatic founder of the American Montessori movement, early childhood professional, and innovative educator, who died of pancreatic cancer on October 27, 1994. Rambusch's work of 40 years now flowers as an institutionalized educational program…