shared memory machine: Topics by Science.gov

Sample records for shared memory machine

Implementation and performance of parallel Prolog interpreter

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wei, S.; Kale, L.V.; Balkrishna, R.

1988-01-01

In this paper, the authors discuss the implementation of a parallel Prolog interpreter on different parallel machines. The implementation is based on the REDUCE--OR process model which exploits both AND and OR parallelism in logic programs. It is machine independent as it runs on top of the chare-kernel--a machine-independent parallel programming system. The authors also give the performance of the interpreter running a diverse set of benchmark pargrams on parallel machines including shared memory systems: an Alliant FX/8, Sequent and a MultiMax, and a non-shared memory systems: Intel iPSC/32 hypercube, in addition to its performance on a multiprocessor simulation system.
The FORCE - A highly portable parallel programming language

NASA Technical Reports Server (NTRS)

Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

1989-01-01

This paper explains why the FORCE parallel programming language is easily portable among six different shared-memory multiprocessors, and how a two-level macro preprocessor makes it possible to hide low-level machine dependencies and to build machine-independent high-level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared-memory multiprocessor executing them.
The FORCE: A highly portable parallel programming language

NASA Technical Reports Server (NTRS)

Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

1989-01-01

Here, it is explained why the FORCE parallel programming language is easily portable among six different shared-memory microprocessors, and how a two-level macro preprocessor makes it possible to hide low level machine dependencies and to build machine-independent high level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared memory multiprocessor executing them.
Hybrid MPI+OpenMP Programming of an Overset CFD Solver and Performance Investigations

NASA Technical Reports Server (NTRS)

Djomehri, M. Jahed; Jin, Haoqiang H.; Biegel, Bryan (Technical Monitor)

2002-01-01

This report describes a two level parallelization of a Computational Fluid Dynamic (CFD) solver with multi-zone overset structured grids. The approach is based on a hybrid MPI+OpenMP programming model suitable for shared memory and clusters of shared memory machines. The performance investigations of the hybrid application on an SGI Origin2000 (O2K) machine is reported using medium and large scale test problems.
Contention Modeling for Multithreaded Distributed Shared Memory Machines: The Cray XMT

DOE Office of Scientific and Technical Information (OSTI.GOV)

Secchi, Simone; Tumeo, Antonino; Villa, Oreste

Distributed Shared Memory (DSM) machines are a wide class of multi-processor computing systems where a large virtually-shared address space is mapped on a network of physically distributed memories. High memory latency and network contention are two of the main factors that limit performance scaling of such architectures. Modern high-performance computing DSM systems have evolved toward exploitation of massive hardware multi-threading and fine-grained memory hashing to tolerate irregular latencies, avoid network hot-spots and enable high scaling. In order to model the performance of such large-scale machines, parallel simulation has been proved to be a promising approach to achieve good accuracy inmore » reasonable times. One of the most critical factors in solving the simulation speed-accuracy trade-off is network modeling. The Cray XMT is a massively multi-threaded supercomputing architecture that belongs to the DSM class, since it implements a globally-shared address space abstraction on top of a physically distributed memory substrate. In this paper, we discuss the development of a contention-aware network model intended to be integrated in a full-system XMT simulator. We start by measuring the effects of network contention in a 128-processor XMT machine and then investigate the trade-off that exists between simulation accuracy and speed, by comparing three network models which operate at different levels of accuracy. The comparison and model validation is performed by executing a string-matching algorithm on the full-system simulator and on the XMT, using three datasets that generate noticeably different contention patterns.« less
The performance of disk arrays in shared-memory database machines

NASA Technical Reports Server (NTRS)

Katz, Randy H.; Hong, Wei

1993-01-01

In this paper, we examine how disk arrays and shared memory multiprocessors lead to an effective method for constructing database machines for general-purpose complex query processing. We show that disk arrays can lead to cost-effective storage systems if they are configured from suitably small formfactor disk drives. We introduce the storage system metric data temperature as a way to evaluate how well a disk configuration can sustain its workload, and we show that disk arrays can sustain the same data temperature as a more expensive mirrored-disk configuration. We use the metric to evaluate the performance of disk arrays in XPRS, an operational shared-memory multiprocessor database system being developed at the University of California, Berkeley.
Protection of Mission-Critical Applications from Untrusted Execution Environment: Resource Efficient Replication and Migration of Virtual Machines

DTIC Science & Technology

2015-09-28

the performance of log-and- replay can degrade significantly for VMs configured with multiple virtual CPUs, since the shared memory communication...whether based on checkpoint replication or log-and- replay , existing HA ap- proaches use in- memory backups. The backup VM sits in the memory of a...efficiently. 15. SUBJECT TERMS High-availability virtual machines, live migration, memory and traffic overheads, application suspension, Java
Shared versus distributed memory multiprocessors

NASA Technical Reports Server (NTRS)

Jordan, Harry F.

1991-01-01

The question of whether multiprocessors should have shared or distributed memory has attracted a great deal of attention. Some researchers argue strongly for building distributed memory machines, while others argue just as strongly for programming shared memory multiprocessors. A great deal of research is underway on both types of parallel systems. Special emphasis is placed on systems with a very large number of processors for computation intensive tasks and considers research and implementation trends. It appears that the two types of systems will likely converge to a common form for large scale multiprocessors.
Address tracing for parallel machines

NASA Technical Reports Server (NTRS)

Stunkel, Craig B.; Janssens, Bob; Fuchs, W. Kent

1991-01-01

Recently implemented parallel system address-tracing methods based on several metrics are surveyed. The issues specific to collection of traces for both shared and distributed memory parallel computers are highlighted. Five general categories of address-trace collection methods are examined: hardware-captured, interrupt-based, simulation-based, altered microcode-based, and instrumented program-based traces. The problems unique to shared memory and distributed memory multiprocessors are examined separately.
Vascular system modeling in parallel environment - distributed and shared memory approaches

PubMed Central

Jurczuk, Krzysztof; Kretowski, Marek; Bezy-Wendling, Johanne

2011-01-01

The paper presents two approaches in parallel modeling of vascular system development in internal organs. In the first approach, new parts of tissue are distributed among processors and each processor is responsible for perfusing its assigned parts of tissue to all vascular trees. Communication between processors is accomplished by passing messages and therefore this algorithm is perfectly suited for distributed memory architectures. The second approach is designed for shared memory machines. It parallelizes the perfusion process during which individual processing units perform calculations concerning different vascular trees. The experimental results, performed on a computing cluster and multi-core machines, show that both algorithms provide a significant speedup. PMID:21550891
Supporting shared data structures on distributed memory architectures

NASA Technical Reports Server (NTRS)

Koelbel, Charles; Mehrotra, Piyush; Vanrosendale, John

1990-01-01

Programming nonshared memory systems is more difficult than programming shared memory systems, since there is no support for shared data structures. Current programming languages for distributed memory architectures force the user to decompose all data structures into separate pieces, with each piece owned by one of the processors in the machine, and with all communication explicitly specified by low-level message-passing primitives. A new programming environment is presented for distributed memory architectures, providing a global name space and allowing direct access to remote parts of data values. The analysis and program transformations required to implement this environment are described, and the efficiency of the resulting code on the NCUBE/7 and IPSC/2 hypercubes are described.
Debugging Fortran on a shared memory machine

DOE Office of Scientific and Technical Information (OSTI.GOV)

Allen, T.R.; Padua, D.A.

1987-01-01

Debugging on a parallel processor is more difficult than debugging on a serial machine because errors in a parallel program may introduce nondeterminism. The approach to parallel debugging presented here attempts to reduce the problem of debugging on a parallel machine to that of debugging on a serial machine by automatically detecting nondeterminism. 20 refs., 6 figs.
Shared Memory Parallelization of an Implicit ADI-type CFD Code

NASA Technical Reports Server (NTRS)

Hauser, Th.; Huang, P. G.

1999-01-01

A parallelization study designed for ADI-type algorithms is presented using the OpenMP specification for shared-memory multiprocessor programming. Details of optimizations specifically addressed to cache-based computer architectures are described and performance measurements for the single and multiprocessor implementation are summarized. The paper demonstrates that optimization of memory access on a cache-based computer architecture controls the performance of the computational algorithm. A hybrid MPI/OpenMP approach is proposed for clusters of shared memory machines to further enhance the parallel performance. The method is applied to develop a new LES/DNS code, named LESTool. A preliminary DNS calculation of a fully developed channel flow at a Reynolds number of 180, Re(sub tau) = 180, has shown good agreement with existing data.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Jones, J.P.; Bangs, A.L.; Butler, P.L.

Hetero Helix is a programming environment which simulates shared memory on a heterogeneous network of distributed-memory computers. The machines in the network may vary with respect to their native operating systems and internal representation of numbers. Hetero Helix presents a simple programming model to developers, and also considers the needs of designers, system integrators, and maintainers. The key software technology underlying Hetero Helix is the use of a compiler'' which analyzes the data structures in shared memory and automatically generates code which translates data representations from the format native to each machine into a common format, and vice versa. Themore » design of Hetero Helix was motivated in particular by the requirements of robotics applications. Hetero Helix has been used successfully in an integration effort involving 27 CPUs in a heterogeneous network and a body of software totaling roughly 100,00 lines of code. 25 refs., 6 figs.« less
Architectures for reasoning in parallel

NASA Technical Reports Server (NTRS)

Hall, Lawrence O.

1989-01-01

The research conducted has dealt with rule-based expert systems. The algorithms that may lead to effective parallelization of them were investigated. Both the forward and backward chained control paradigms were investigated in the course of this work. The best computer architecture for the developed and investigated algorithms has been researched. Two experimental vehicles were developed to facilitate this research. They are Backpac, a parallel backward chained rule-based reasoning system and Datapac, a parallel forward chained rule-based reasoning system. Both systems have been written in Multilisp, a version of Lisp which contains the parallel construct, future. Applying the future function to a function causes the function to become a task parallel to the spawning task. Additionally, Backpac and Datapac have been run on several disparate parallel processors. The machines are an Encore Multimax with 10 processors, the Concert Multiprocessor with 64 processors, and a 32 processor BBN GP1000. Both the Concert and the GP1000 are switch-based machines. The Multimax has all its processors hung off a common bus. All are shared memory machines, but have different schemes for sharing the memory and different locales for the shared memory. The main results of the investigations come from experiments on the 10 processor Encore and the Concert with partitions of 32 or less processors. Additionally, experiments have been run with a stripped down version of EMYCIN.
Runtime support for parallelizing data mining algorithms

NASA Astrophysics Data System (ADS)

Jin, Ruoming; Agrawal, Gagan

2002-03-01

With recent technological advances, shared memory parallel machines have become more scalable, and offer large main memories and high bus bandwidths. They are emerging as good platforms for data warehousing and data mining. In this paper, we focus on shared memory parallelization of data mining algorithms. We have developed a series of techniques for parallelization of data mining algorithms, including full replication, full locking, fixed locking, optimized full locking, and cache-sensitive locking. Unlike previous work on shared memory parallelization of specific data mining algorithms, all of our techniques apply to a large number of common data mining algorithms. In addition, we propose a reduction-object based interface for specifying a data mining algorithm. We show how our runtime system can apply any of the technique we have developed starting from a common specification of the algorithm.
Parallel performance investigations of an unstructured mesh Navier-Stokes solver

NASA Technical Reports Server (NTRS)

Mavriplis, Dimitri J.

2000-01-01

A Reynolds-averaged Navier-Stokes solver based on unstructured mesh techniques for analysis of high-lift configurations is described. The method makes use of an agglomeration multigrid solver for convergence acceleration. Implicit line-smoothing is employed to relieve the stiffness associated with highly stretched meshes. A GMRES technique is also implemented to speed convergence at the expense of additional memory usage. The solver is cache efficient and fully vectorizable, and is parallelized using a two-level hybrid MPI-OpenMP implementation suitable for shared and/or distributed memory architectures, as well as clusters of shared memory machines. Convergence and scalability results are illustrated for various high-lift cases.
Execution time supports for adaptive scientific algorithms on distributed memory machines

NASA Technical Reports Server (NTRS)

Berryman, Harry; Saltz, Joel; Scroggs, Jeffrey

1990-01-01

Optimizations are considered that are required for efficient execution of code segments that consists of loops over distributed data structures. The PARTI (Parallel Automated Runtime Toolkit at ICASE) execution time primitives are designed to carry out these optimizations and can be used to implement a wide range of scientific algorithms on distributed memory machines. These primitives allow the user to control array mappings in a way that gives an appearance of shared memory. Computations can be based on a global index set. Primitives are used to carry out gather and scatter operations on distributed arrays. Communications patterns are derived at runtime, and the appropriate send and receive messages are automatically generated.
Execution time support for scientific programs on distributed memory machines

NASA Technical Reports Server (NTRS)

Berryman, Harry; Saltz, Joel; Scroggs, Jeffrey

1990-01-01

Optimizations are considered that are required for efficient execution of code segments that consists of loops over distributed data structures. The PARTI (Parallel Automated Runtime Toolkit at ICASE) execution time primitives are designed to carry out these optimizations and can be used to implement a wide range of scientific algorithms on distributed memory machines. These primitives allow the user to control array mappings in a way that gives an appearance of shared memory. Computations can be based on a global index set. Primitives are used to carry out gather and scatter operations on distributed arrays. Communications patterns are derived at runtime, and the appropriate send and receive messages are automatically generated.
Parallelization of KENO-Va Monte Carlo code

NASA Astrophysics Data System (ADS)

Ramón, Javier; Peña, Jorge

1995-07-01

KENO-Va is a code integrated within the SCALE system developed by Oak Ridge that solves the transport equation through the Monte Carlo Method. It is being used at the Consejo de Seguridad Nuclear (CSN) to perform criticality calculations for fuel storage pools and shipping casks. Two parallel versions of the code: one for shared memory machines and other for distributed memory systems using the message-passing interface PVM have been generated. In both versions the neutrons of each generation are tracked in parallel. In order to preserve the reproducibility of the results in both versions, advanced seeds for random numbers were used. The CONVEX C3440 with four processors and shared memory at CSN was used to implement the shared memory version. A FDDI network of 6 HP9000/735 was employed to implement the message-passing version using proprietary PVM. The speedup obtained was 3.6 in both cases.

The Tera Multithreaded Architecture and Unstructured Meshes

NASA Technical Reports Server (NTRS)

Bokhari, Shahid H.; Mavriplis, Dimitri J.

1998-01-01

The Tera Multithreaded Architecture (MTA) is a new parallel supercomputer currently being installed at San Diego Supercomputing Center (SDSC). This machine has an architecture quite different from contemporary parallel machines. The computational processor is a custom design and the machine uses hardware to support very fine grained multithreading. The main memory is shared, hardware randomized and flat. These features make the machine highly suited to the execution of unstructured mesh problems, which are difficult to parallelize on other architectures. We report the results of a study carried out during July-August 1998 to evaluate the execution of EUL3D, a code that solves the Euler equations on an unstructured mesh, on the 2 processor Tera MTA at SDSC. Our investigation shows that parallelization of an unstructured code is extremely easy on the Tera. We were able to get an existing parallel code (designed for a shared memory machine), running on the Tera by changing only the compiler directives. Furthermore, a serial version of this code was compiled to run in parallel on the Tera by judicious use of directives to invoke the "full/empty" tag bits of the machine to obtain synchronization. This version achieves 212 and 406 Mflop/s on one and two processors respectively, and requires no attention to partitioning or placement of data issues that would be of paramount importance in other parallel architectures.
Interconnect Performance Evaluation of SGI Altix 3700 BX2, Cray X1, Cray Opteron Cluster, and Dell PowerEdge

NASA Technical Reports Server (NTRS)

Fatoohi, Rod; Saini, Subbash; Ciotti, Robert

2006-01-01

We study the performance of inter-process communication on four high-speed multiprocessor systems using a set of communication benchmarks. The goal is to identify certain limiting factors and bottlenecks with the interconnect of these systems as well as to compare these interconnects. We measured network bandwidth using different number of communicating processors and communication patterns, such as point-to-point communication, collective communication, and dense communication patterns. The four platforms are: a 512-processor SGI Altix 3700 BX2 shared-memory machine with 3.2 GB/s links; a 64-processor (single-streaming) Cray XI shared-memory machine with 32 1.6 GB/s links; a 128-processor Cray Opteron cluster using a Myrinet network; and a 1280-node Dell PowerEdge cluster with an InfiniBand network. Our, results show the impact of the network bandwidth and topology on the overall performance of each interconnect.
PANDA: A distributed multiprocessor operating system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chubb, P.

1989-01-01

PANDA is a design for a distributed multiprocessor and an operating system. PANDA is designed to allow easy expansion of both hardware and software. As such, the PANDA kernel provides only message passing and memory and process management. The other features needed for the system (device drivers, secondary storage management, etc.) are provided as replaceable user tasks. The thesis presents PANDA's design and implementation, both hardware and software. PANDA uses multiple 68010 processors sharing memory on a VME bus, each such node potentially connected to others via a high speed network. The machine is completely homogeneous: there are no differencesmore » between processors that are detectable by programs running on the machine. A single two-processor node has been constructed. Each processor contains memory management circuits designed to allow processors to share page tables safely. PANDA presents a programmers' model similar to the hardware model: a job is divided into multiple tasks, each having its own address space. Within each task, multiple processes share code and data. Tasks can send messages to each other, and set up virtual circuits between themselves. Peripheral devices such as disc drives are represented within PANDA by tasks. PANDA divides secondary storage into volumes, each volume being accessed by a volume access task, or VAT. All knowledge about the way that data is stored on a disc is kept in its volume's VAT. The design is such that PANDA should provide a useful testbed for file systems and device drivers, as these can be installed without recompiling PANDA itself, and without rebooting the machine.« less
Hypercluster - Parallel processing for computational mechanics

NASA Technical Reports Server (NTRS)

Blech, Richard A.

1988-01-01

An account is given of the development status, performance capabilities and implications for further development of NASA-Lewis' testbed 'hypercluster' parallel computer network, in which multiple processors communicate through a shared memory. Processors have local as well as shared memory; the hypercluster is expanded in the same manner as the hypercube, with processor clusters replacing the normal single processor node. The NASA-Lewis machine has three nodes with a vector personality and one node with a scalar personality. Each of the vector nodes uses four board-level vector processors, while the scalar node uses four general-purpose microcomputer boards.
Implementing Shared Memory Parallelism in MCBEND

NASA Astrophysics Data System (ADS)

Bird, Adam; Long, David; Dobson, Geoff

2017-09-01

MCBEND is a general purpose radiation transport Monte Carlo code from AMEC Foster Wheelers's ANSWERS® Software Service. MCBEND is well established in the UK shielding community for radiation shielding and dosimetry assessments. The existing MCBEND parallel capability effectively involves running the same calculation on many processors. This works very well except when the memory requirements of a model restrict the number of instances of a calculation that will fit on a machine. To more effectively utilise parallel hardware OpenMP has been used to implement shared memory parallelism in MCBEND. This paper describes the reasoning behind the choice of OpenMP, notes some of the challenges of multi-threading an established code such as MCBEND and assesses the performance of the parallel method implemented in MCBEND.
IPCS user's manual

DOE Office of Scientific and Technical Information (OSTI.GOV)

McGoldrick, P.R.

1980-12-11

The Interprocess Communications System (IPCS) was written to provide a virtual machine upon which the Supervisory Control and Diagnostic System (SCDS) for the Mirror Fusion Test Facility (MFTF) could be built. The hardware upon which the IPCS runs consists of nine minicomputers sharing some common memory.
NAS Parallel Benchmark. Results 11-96: Performance Comparison of HPF and MPI Based NAS Parallel Benchmarks. 1.0

NASA Technical Reports Server (NTRS)

Saini, Subash; Bailey, David; Chancellor, Marisa K. (Technical Monitor)

1997-01-01

High Performance Fortran (HPF), the high-level language for parallel Fortran programming, is based on Fortran 90. HALF was defined by an informal standards committee known as the High Performance Fortran Forum (HPFF) in 1993, and modeled on TMC's CM Fortran language. Several HPF features have since been incorporated into the draft ANSI/ISO Fortran 95, the next formal revision of the Fortran standard. HPF allows users to write a single parallel program that can execute on a serial machine, a shared-memory parallel machine, or a distributed-memory parallel machine. HPF eliminates the complex, error-prone task of explicitly specifying how, where, and when to pass messages between processors on distributed-memory machines, or when to synchronize processors on shared-memory machines. HPF is designed in a way that allows the programmer to code an application at a high level, and then selectively optimize portions of the code by dropping into message-passing or calling tuned library routines as 'extrinsics'. Compilers supporting High Performance Fortran features first appeared in late 1994 and early 1995 from Applied Parallel Research (APR) Digital Equipment Corporation, and The Portland Group (PGI). IBM introduced an HPF compiler for the IBM RS/6000 SP/2 in April of 1996. Over the past two years, these implementations have shown steady improvement in terms of both features and performance. The performance of various hardware/ programming model (HPF and MPI (message passing interface)) combinations will be compared, based on latest NAS (NASA Advanced Supercomputing) Parallel Benchmark (NPB) results, thus providing a cross-machine and cross-model comparison. Specifically, HPF based NPB results will be compared with MPI based NPB results to provide perspective on performance currently obtainable using HPF versus MPI or versus hand-tuned implementations such as those supplied by the hardware vendors. In addition we would also present NPB (Version 1.0) performance results for the following systems: DEC Alpha Server 8400 5/440, Fujitsu VPP Series (VX, VPP300, and VPP700), HP/Convex Exemplar SPP2000, IBM RS/6000 SP P2SC node (120 MHz) NEC SX-4/32, SGI/CRAY T3E, SGI Origin2000.
Vienna FORTRAN: A FORTRAN language extension for distributed memory multiprocessors

NASA Technical Reports Server (NTRS)

Chapman, Barbara; Mehrotra, Piyush; Zima, Hans

1991-01-01

Exploiting the performance potential of distributed memory machines requires a careful distribution of data across the processors. Vienna FORTRAN is a language extension of FORTRAN which provides the user with a wide range of facilities for such mapping of data structures. However, programs in Vienna FORTRAN are written using global data references. Thus, the user has the advantage of a shared memory programming paradigm while explicitly controlling the placement of data. The basic features of Vienna FORTRAN are presented along with a set of examples illustrating the use of these features.
Associative-memory representations emerge as shared spatial patterns of theta activity spanning the primate temporal cortex

PubMed Central

Nakahara, Kiyoshi; Adachi, Ken; Kawasaki, Keisuke; Matsuo, Takeshi; Sawahata, Hirohito; Majima, Kei; Takeda, Masaki; Sugiyama, Sayaka; Nakata, Ryota; Iijima, Atsuhiko; Tanigawa, Hisashi; Suzuki, Takafumi; Kamitani, Yukiyasu; Hasegawa, Isao

2016-01-01

Highly localized neuronal spikes in primate temporal cortex can encode associative memory; however, whether memory formation involves area-wide reorganization of ensemble activity, which often accompanies rhythmicity, or just local microcircuit-level plasticity, remains elusive. Using high-density electrocorticography, we capture local-field potentials spanning the monkey temporal lobes, and show that the visual pair-association (PA) memory is encoded in spatial patterns of theta activity in areas TE, 36, and, partially, in the parahippocampal cortex, but not in the entorhinal cortex. The theta patterns elicited by learned paired associates are distinct between pairs, but similar within pairs. This pattern similarity, emerging through novel PA learning, allows a machine-learning decoder trained on theta patterns elicited by a particular visual item to correctly predict the identity of those elicited by its paired associate. Our results suggest that the formation and sharing of widespread cortical theta patterns via learning-induced reorganization are involved in the mechanisms of associative memory representation. PMID:27282247
Static Memory Deduplication for Performance Optimization in Cloud Computing.

PubMed

Jia, Gangyong; Han, Guangjie; Wang, Hao; Yang, Xuan

2017-04-27

In a cloud computing environment, the number of virtual machines (VMs) on a single physical server and the number of applications running on each VM are continuously growing. This has led to an enormous increase in the demand of memory capacity and subsequent increase in the energy consumption in the cloud. Lack of enough memory has become a major bottleneck for scalability and performance of virtualization interfaces in cloud computing. To address this problem, memory deduplication techniques which reduce memory demand through page sharing are being adopted. However, such techniques suffer from overheads in terms of number of online comparisons required for the memory deduplication. In this paper, we propose a static memory deduplication (SMD) technique which can reduce memory capacity requirement and provide performance optimization in cloud computing. The main innovation of SMD is that the process of page detection is performed offline, thus potentially reducing the performance cost, especially in terms of response time. In SMD, page comparisons are restricted to the code segment, which has the highest shared content. Our experimental results show that SMD efficiently reduces memory capacity requirement and improves performance. We demonstrate that, compared to other approaches, the cost in terms of the response time is negligible.
Static Memory Deduplication for Performance Optimization in Cloud Computing

PubMed Central

Jia, Gangyong; Han, Guangjie; Wang, Hao; Yang, Xuan

2017-01-01

In a cloud computing environment, the number of virtual machines (VMs) on a single physical server and the number of applications running on each VM are continuously growing. This has led to an enormous increase in the demand of memory capacity and subsequent increase in the energy consumption in the cloud. Lack of enough memory has become a major bottleneck for scalability and performance of virtualization interfaces in cloud computing. To address this problem, memory deduplication techniques which reduce memory demand through page sharing are being adopted. However, such techniques suffer from overheads in terms of number of online comparisons required for the memory deduplication. In this paper, we propose a static memory deduplication (SMD) technique which can reduce memory capacity requirement and provide performance optimization in cloud computing. The main innovation of SMD is that the process of page detection is performed offline, thus potentially reducing the performance cost, especially in terms of response time. In SMD, page comparisons are restricted to the code segment, which has the highest shared content. Our experimental results show that SMD efficiently reduces memory capacity requirement and improves performance. We demonstrate that, compared to other approaches, the cost in terms of the response time is negligible. PMID:28448434
Enhancement of the Shared Graphics Workspace.

DTIC Science & Technology

1987-12-31

participants to share videodisc images and computer graphics displayed in color and text and facsimile information displayed in black on amber. They...could annotate the information in up to five * colors and print the annotated version at both sites, using a standard fax machine. The SGWS also used a fax...system to display a document, whether text or photo, the camera scans the document, digitizes the data, and sends it via direct memory access (DMA) to
Programming distributed memory architectures using Kali

NASA Technical Reports Server (NTRS)

Mehrotra, Piyush; Vanrosendale, John

1990-01-01

Programming nonshared memory systems is more difficult than programming shared memory systems, in part because of the relatively low level of current programming environments for such machines. A new programming environment is presented, Kali, which provides a global name space and allows direct access to remote data values. In order to retain efficiency, Kali provides a system on annotations, allowing the user to control those aspects of the program critical to performance, such as data distribution and load balancing. The primitives and constructs provided by the language is described, and some of the issues raised in translating a Kali program for execution on distributed memory systems are also discussed.
Shared virtual memory and generalized speedup

NASA Technical Reports Server (NTRS)

Sun, Xian-He; Zhu, Jianping

1994-01-01

Generalized speedup is defined as parallel speed over sequential speed. The generalized speedup and its relation with other existing performance metrics, such as traditional speedup, efficiency, scalability, etc., are carefully studied. In terms of the introduced asymptotic speed, it was shown that the difference between the generalized speedup and the traditional speedup lies in the definition of the efficiency of uniprocessor processing, which is a very important issue in shared virtual memory machines. A scientific application was implemented on a KSR-1 parallel computer. Experimental and theoretical results show that the generalized speedup is distinct from the traditional speedup and provides a more reasonable measurement. In the study of different speedups, various causes of superlinear speedup are also presented.
The OpenMP Implementation of NAS Parallel Benchmarks and its Performance

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Frumkin, Michael; Yan, Jerry

1999-01-01

As the new ccNUMA architecture became popular in recent years, parallel programming with compiler directives on these machines has evolved to accommodate new needs. In this study, we examine the effectiveness of OpenMP directives for parallelizing the NAS Parallel Benchmarks. Implementation details will be discussed and performance will be compared with the MPI implementation. We have demonstrated that OpenMP can achieve very good results for parallelization on a shared memory system, but effective use of memory and cache is very important.
Dynamic programming on a shared-memory multiprocessor

NASA Technical Reports Server (NTRS)

Edmonds, Phil; Chu, Eleanor; George, Alan

1993-01-01

Three new algorithms for solving dynamic programming problems on a shared-memory parallel computer are described. All three algorithms attempt to balance work load, while keeping synchronization cost low. In particular, for a multiprocessor having p processors, an analysis of the best algorithm shows that the arithmetic cost is O(n-cubed/6p) and that the synchronization cost is O(absolute value of log sub C n) if p much less than n, where C = (2p-1)/(2p + 1) and n is the size of the problem. The low synchronization cost is important for machines where synchronization is expensive. Analysis and experiments show that the best algorithm is effective in balancing the work load and producing high efficiency.
Computational Nanotechnology of Materials, Devices, and Machines: Carbon Nanotubes

NASA Technical Reports Server (NTRS)

Srivastava, Deepak; Kwak, Dolhan (Technical Monitor)

2000-01-01

The mechanics and chemistry of carbon nanotubes have relevance for their numerous electronic applications. Mechanical deformations such as bending and twisting affect the nanotube's conductive properties, and at the same time they possess high strength and elasticity. Two principal techniques were utilized including the analysis of large scale classical molecular dynamics on a shared memory architecture machine and a quantum molecular dynamics methodology. In carbon based electronics, nanotubes are used as molecular wires with topological defects which are mediated through various means. Nanotubes can be connected to form junctions.
Computational Nanotechnology of Materials, Electronics and Machines: Carbon Nanotubes

NASA Technical Reports Server (NTRS)

Srivastava, Deepak

2001-01-01

This report presents the goals and research of the Integrated Product Team (IPT) on Devices and Nanotechnology. NASA's needs for this technology are discussed and then related to the research focus of the team. The two areas of focus for technique development are: 1) large scale classical molecular dynamics on a shared memory architecture machine; and 2) quantum molecular dynamics methodology. The areas of focus for research are: 1) nanomechanics/materials; 2) carbon based electronics; 3) BxCyNz composite nanotubes and junctions; 4) nano mechano-electronics; and 5) nano mechano-chemistry.
Unraveling Network-induced Memory Contention: Deeper Insights with Machine Learning

DOE PAGES

Groves, Taylor Liles; Grant, Ryan; Gonzales, Aaron; ...

2017-11-21

Remote Direct Memory Access (RDMA) is expected to be an integral communication mechanism for future exascale systems enabling asynchronous data transfers, so that applications may fully utilize CPU resources while simultaneously sharing data amongst remote nodes. We examine Network-induced Memory Contention (NiMC) on Infiniband networks. We expose the interactions between RDMA, main-memory and cache, when applications and out-of-band services compete for memory resources. We then explore NiMCs resulting impact on application-level performance. For a range of hardware technologies and HPC workloads, we quantify NiMC and show that NiMCs impact grows with scale resulting in up to 3X performance degradation atmore » scales as small as 8K processes even in applications that previously have been shown to be performance resilient in the presence of noise. In addition, this work examines the problem of predicting NiMC's impact on applications by leveraging machine learning and easily accessible performance counters. This approach provides additional insights about the root cause of NiMC and facilitates dynamic selection of potential solutions. Finally, we evaluated three potential techniques to reduce NiMCs impact, namely hardware offloading, core reservation and network throttling.« less
Unraveling Network-induced Memory Contention: Deeper Insights with Machine Learning

DOE Office of Scientific and Technical Information (OSTI.GOV)

Groves, Taylor Liles; Grant, Ryan; Gonzales, Aaron

Remote Direct Memory Access (RDMA) is expected to be an integral communication mechanism for future exascale systems enabling asynchronous data transfers, so that applications may fully utilize CPU resources while simultaneously sharing data amongst remote nodes. We examine Network-induced Memory Contention (NiMC) on Infiniband networks. We expose the interactions between RDMA, main-memory and cache, when applications and out-of-band services compete for memory resources. We then explore NiMCs resulting impact on application-level performance. For a range of hardware technologies and HPC workloads, we quantify NiMC and show that NiMCs impact grows with scale resulting in up to 3X performance degradation atmore » scales as small as 8K processes even in applications that previously have been shown to be performance resilient in the presence of noise. In addition, this work examines the problem of predicting NiMC's impact on applications by leveraging machine learning and easily accessible performance counters. This approach provides additional insights about the root cause of NiMC and facilitates dynamic selection of potential solutions. Finally, we evaluated three potential techniques to reduce NiMCs impact, namely hardware offloading, core reservation and network throttling.« less

Scalable Triadic Analysis of Large-Scale Graphs: Multi-Core vs. Multi-Processor vs. Multi-Threaded Shared Memory Architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chin, George; Marquez, Andres; Choudhury, Sutanay

2012-09-01

Triadic analysis encompasses a useful set of graph mining methods that is centered on the concept of a triad, which is a subgraph of three nodes and the configuration of directed edges across the nodes. Such methods are often applied in the social sciences as well as many other diverse fields. Triadic methods commonly operate on a triad census that counts the number of triads of every possible edge configuration in a graph. Like other graph algorithms, triadic census algorithms do not scale well when graphs reach tens of millions to billions of nodes. To enable the triadic analysis ofmore » large-scale graphs, we developed and optimized a triad census algorithm to efficiently execute on shared memory architectures. We will retrace the development and evolution of a parallel triad census algorithm. Over the course of several versions, we continually adapted the code’s data structures and program logic to expose more opportunities to exploit parallelism on shared memory that would translate into improved computational performance. We will recall the critical steps and modifications that occurred during code development and optimization. Furthermore, we will compare the performances of triad census algorithm versions on three specific systems: Cray XMT, HP Superdome, and AMD multi-core NUMA machine. These three systems have shared memory architectures but with markedly different hardware capabilities to manage parallelism.« less
Distributed-Memory Fast Maximal Independent Set

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kanewala Appuhamilage, Thejaka Amila J.; Zalewski, Marcin J.; Lumsdaine, Andrew

The Maximal Independent Set (MIS) graph problem arises in many applications such as computer vision, information theory, molecular biology, and process scheduling. The growing scale of MIS problems suggests the use of distributed-memory hardware as a cost-effective approach to providing necessary compute and memory resources. Luby proposed four randomized algorithms to solve the MIS problem. All those algorithms are designed focusing on shared-memory machines and are analyzed using the PRAM model. These algorithms do not have direct efficient distributed-memory implementations. In this paper, we extend two of Luby’s seminal MIS algorithms, “Luby(A)” and “Luby(B),” to distributed-memory execution, and we evaluatemore » their performance. We compare our results with the “Filtered MIS” implementation in the Combinatorial BLAS library for two types of synthetic graph inputs.« less
Programming in Vienna Fortran

NASA Technical Reports Server (NTRS)

Chapman, Barbara; Mehrotra, Piyush; Zima, Hans

1992-01-01

Exploiting the full performance potential of distributed memory machines requires a careful distribution of data across the processors. Vienna Fortran is a language extension of Fortran which provides the user with a wide range of facilities for such mapping of data structures. In contrast to current programming practice, programs in Vienna Fortran are written using global data references. Thus, the user has the advantages of a shared memory programming paradigm while explicitly controlling the data distribution. In this paper, we present the language features of Vienna Fortran for FORTRAN 77, together with examples illustrating the use of these features.
Efficient diagonalization of the sparse matrices produced within the framework of the UK R-matrix molecular codes

NASA Astrophysics Data System (ADS)

Galiatsatos, P. G.; Tennyson, J.

2012-11-01

The most time consuming step within the framework of the UK R-matrix molecular codes is that of the diagonalization of the inner region Hamiltonian matrix (IRHM). Here we present the method that we follow to speed up this step. We use shared memory machines (SMM), distributed memory machines (DMM), the OpenMP directive based parallel language, the MPI function based parallel language, the sparse matrix diagonalizers ARPACK and PARPACK, a variation for real symmetric matrices of the official coordinate sparse matrix format and finally a parallel sparse matrix-vector product (PSMV). The efficient application of the previous techniques rely on two important facts: the sparsity of the matrix is large enough (more than 98%) and in order to get back converged results we need a small only part of the matrix spectrum.
Portable programming on parallel/networked computers using the Application Portable Parallel Library (APPL)

NASA Technical Reports Server (NTRS)

Quealy, Angela; Cole, Gary L.; Blech, Richard A.

1993-01-01

The Application Portable Parallel Library (APPL) is a subroutine-based library of communication primitives that is callable from applications written in FORTRAN or C. APPL provides a consistent programmer interface to a variety of distributed and shared-memory multiprocessor MIMD machines. The objective of APPL is to minimize the effort required to move parallel applications from one machine to another, or to a network of homogeneous machines. APPL encompasses many of the message-passing primitives that are currently available on commercial multiprocessor systems. This paper describes APPL (version 2.3.1) and its usage, reports the status of the APPL project, and indicates possible directions for the future. Several applications using APPL are discussed, as well as performance and overhead results.
Performance prediction: A case study using a multi-ring KSR-1 machine

NASA Technical Reports Server (NTRS)

Sun, Xian-He; Zhu, Jianping

1995-01-01

While computers with tens of thousands of processors have successfully delivered high performance power for solving some of the so-called 'grand-challenge' applications, the notion of scalability is becoming an important metric in the evaluation of parallel machine architectures and algorithms. In this study, the prediction of scalability and its application are carefully investigated. A simple formula is presented to show the relation between scalability, single processor computing power, and degradation of parallelism. A case study is conducted on a multi-ring KSR1 shared virtual memory machine. Experimental and theoretical results show that the influence of topology variation of an architecture is predictable. Therefore, the performance of an algorithm on a sophisticated, heirarchical architecture can be predicted and the best algorithm-machine combination can be selected for a given application.
Parallelising a molecular dynamics algorithm on a multi-processor workstation

NASA Astrophysics Data System (ADS)

Müller-Plathe, Florian

1990-12-01

The Verlet neighbour-list algorithm is parallelised for a multi-processor Hewlett-Packard/Apollo DN10000 workstation. The implementation makes use of memory shared between the processors. It is a genuine master-slave approach by which most of the computational tasks are kept in the master process and the slaves are only called to do part of the nonbonded forces calculation. The implementation features elements of both fine-grain and coarse-grain parallelism. Apart from three calls to library routines, two of which are standard UNIX calls, and two machine-specific language extensions, the whole code is written in standard Fortran 77. Hence, it may be expected that this parallelisation concept can be transfered in parts or as a whole to other multi-processor shared-memory computers. The parallel code is routinely used in production work.
Parallel ALLSPD-3D: Speeding Up Combustor Analysis Via Parallel Processing

NASA Technical Reports Server (NTRS)

Fricker, David M.

1997-01-01

The ALLSPD-3D Computational Fluid Dynamics code for reacting flow simulation was run on a set of benchmark test cases to determine its parallel efficiency. These test cases included non-reacting and reacting flow simulations with varying numbers of processors. Also, the tests explored the effects of scaling the simulation with the number of processors in addition to distributing a constant size problem over an increasing number of processors. The test cases were run on a cluster of IBM RS/6000 Model 590 workstations with ethernet and ATM networking plus a shared memory SGI Power Challenge L workstation. The results indicate that the network capabilities significantly influence the parallel efficiency, i.e., a shared memory machine is fastest and ATM networking provides acceptable performance. The limitations of ethernet greatly hamper the rapid calculation of flows using ALLSPD-3D.
Hardware/software codesign for embedded RISC core

NASA Astrophysics Data System (ADS)

Liu, Peng

2001-12-01

This paper describes hardware/software codesign method of the extendible embedded RISC core VIRGO, which based on MIPS-I instruction set architecture. VIRGO is described by Verilog hardware description language that has five-stage pipeline with shared 32-bit cache/memory interface, and it is controlled by distributed control scheme. Every pipeline stage has one small controller, which controls the pipeline stage status and cooperation among the pipeline phase. Since description use high level language and structure is distributed, VIRGO core has highly extension that can meet the requirements of application. We take look at the high-definition television MPEG2 MPHL decoder chip, constructed the hardware/software codesign virtual prototyping machine that can research on VIRGO core instruction set architecture, and system on chip memory size requirements, and system on chip software, etc. We also can evaluate the system on chip design and RISC instruction set based on the virtual prototyping machine platform.
Implementations of BLAST for parallel computers.

PubMed

Jülich, A

1995-02-01

The BLAST sequence comparison programs have been ported to a variety of parallel computers-the shared memory machine Cray Y-MP 8/864 and the distributed memory architectures Intel iPSC/860 and nCUBE. Additionally, the programs were ported to run on workstation clusters. We explain the parallelization techniques and consider the pros and cons of these methods. The BLAST programs are very well suited for parallelization for a moderate number of processors. We illustrate our results using the program blastp as an example. As input data for blastp, a 799 residue protein query sequence and the protein database PIR were used.
ORCA Project: Research on high-performance parallel computer programming environments. Final report, 1 Apr-31 Mar 90

DOE Office of Scientific and Technical Information (OSTI.GOV)

Snyder, L.; Notkin, D.; Adams, L.

1990-03-31

This task relates to research on programming massively parallel computers. Previous work on the Ensamble concept of programming was extended and investigation into nonshared memory models of parallel computation was undertaken. Previous work on the Ensamble concept defined a set of programming abstractions and was used to organize the programming task into three distinct levels; Composition of machine instruction, composition of processes, and composition of phases. It was applied to shared memory models of computations. During the present research period, these concepts were extended to nonshared memory models. During the present research period, one Ph D. thesis was completed, onemore » book chapter, and six conference proceedings were published.« less
A robot arm simulation with a shared memory multiprocessor machine

NASA Technical Reports Server (NTRS)

Kim, Sung-Soo; Chuang, Li-Ping

1989-01-01

A parallel processing scheme for a single chain robot arm is presented for high speed computation on a shared memory multiprocessor. A recursive formulation that is derived from a virtual work form of the d'Alembert equations of motion is utilized for robot arm dynamics. A joint drive system that consists of a motor rotor and gears is included in the arm dynamics model, in order to take into account gyroscopic effects due to the spinning of the rotor. The fine grain parallelism of mechanical and control subsystem models is exploited, based on independent computation associated with bodies, joint drive systems, and controllers. Efficiency and effectiveness of the parallel scheme are demonstrated through simulations of a telerobotic manipulator arm. Two different mechanical subsystem models, i.e., with and without gyroscopic effects, are compared, to show the trade-off between efficiency and accuracy.
Implementing Molecular Dynamics for Hybrid High Performance Computers - 1. Short Range Forces

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brown, W Michael; Wang, Peng; Plimpton, Steven J

The use of accelerators such as general-purpose graphics processing units (GPGPUs) have become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power requirements. Hybrid high performance computers, machines with more than one type of floating-point processor, are now becoming more prevalent due to these advantages. In this work, we discuss several important issues in porting a large molecular dynamics code for use on parallel hybrid machines - 1) choosing a hybrid parallel decomposition that works on central processing units (CPUs) with distributed memory and accelerator cores with shared memory,more » 2) minimizing the amount of code that must be ported for efficient acceleration, 3) utilizing the available processing power from both many-core CPUs and accelerators, and 4) choosing a programming model for acceleration. We present our solution to each of these issues for short-range force calculation in the molecular dynamics package LAMMPS. We describe algorithms for efficient short range force calculation on hybrid high performance machines. We describe a new approach for dynamic load balancing of work between CPU and accelerator cores. We describe the Geryon library that allows a single code to compile with both CUDA and OpenCL for use on a variety of accelerators. Finally, we present results on a parallel test cluster containing 32 Fermi GPGPUs and 180 CPU cores.« less
A compositional reservoir simulator on distributed memory parallel computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rame, M.; Delshad, M.

1995-12-31

This paper presents the application of distributed memory parallel computes to field scale reservoir simulations using a parallel version of UTCHEM, The University of Texas Chemical Flooding Simulator. The model is a general purpose highly vectorized chemical compositional simulator that can simulate a wide range of displacement processes at both field and laboratory scales. The original simulator was modified to run on both distributed memory parallel machines (Intel iPSC/960 and Delta, Connection Machine 5, Kendall Square 1 and 2, and CRAY T3D) and a cluster of workstations. A domain decomposition approach has been taken towards parallelization of the code. Amore » portion of the discrete reservoir model is assigned to each processor by a set-up routine that attempts a data layout as even as possible from the load-balance standpoint. Each of these subdomains is extended so that data can be shared between adjacent processors for stencil computation. The added routines that make parallel execution possible are written in a modular fashion that makes the porting to new parallel platforms straight forward. Results of the distributed memory computing performance of Parallel simulator are presented for field scale applications such as tracer flood and polymer flood. A comparison of the wall-clock times for same problems on a vector supercomputer is also presented.« less
Parallelization and automatic data distribution for nuclear reactor simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liebrock, L.M.

1997-07-01

Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directlymore » affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed.« less
Parallel algorithms for boundary value problems

NASA Technical Reports Server (NTRS)

Lin, Avi

1990-01-01

A general approach to solve boundary value problems numerically in a parallel environment is discussed. The basic algorithm consists of two steps: the local step where all the P available processors work in parallel, and the global step where one processor solves a tridiagonal linear system of the order P. The main advantages of this approach are two fold. First, this suggested approach is very flexible, especially in the local step and thus the algorithm can be used with any number of processors and with any of the SIMD or MIMD machines. Secondly, the communication complexity is very small and thus can be used as easily with shared memory machines. Several examples for using this strategy are discussed.
Paging memory from random access memory to backing storage in a parallel computer

DOEpatents

Archer, Charles J; Blocksome, Michael A; Inglett, Todd A; Ratterman, Joseph D; Smith, Brian E

2013-05-21

Paging memory from random access memory (`RAM`) to backing storage in a parallel computer that includes a plurality of compute nodes, including: executing a data processing application on a virtual machine operating system in a virtual machine on a first compute node; providing, by a second compute node, backing storage for the contents of RAM on the first compute node; and swapping, by the virtual machine operating system in the virtual machine on the first compute node, a page of memory from RAM on the first compute node to the backing storage on the second compute node.
A general purpose subroutine for fast fourier transform on a distributed memory parallel machine

NASA Technical Reports Server (NTRS)

Dubey, A.; Zubair, M.; Grosch, C. E.

1992-01-01

One issue which is central in developing a general purpose Fast Fourier Transform (FFT) subroutine on a distributed memory parallel machine is the data distribution. It is possible that different users would like to use the FFT routine with different data distributions. Thus, there is a need to design FFT schemes on distributed memory parallel machines which can support a variety of data distributions. An FFT implementation on a distributed memory parallel machine which works for a number of data distributions commonly encountered in scientific applications is presented. The problem of rearranging the data after computing the FFT is also addressed. The performance of the implementation on a distributed memory parallel machine Intel iPSC/860 is evaluated.
Kanerva's sparse distributed memory: An associative memory algorithm well-suited to the Connection Machine

NASA Technical Reports Server (NTRS)

Rogers, David

1988-01-01

The advent of the Connection Machine profoundly changes the world of supercomputers. The highly nontraditional architecture makes possible the exploration of algorithms that were impractical for standard Von Neumann architectures. Sparse distributed memory (SDM) is an example of such an algorithm. Sparse distributed memory is a particularly simple and elegant formulation for an associative memory. The foundations for sparse distributed memory are described, and some simple examples of using the memory are presented. The relationship of sparse distributed memory to three important computational systems is shown: random-access memory, neural networks, and the cerebellum of the brain. Finally, the implementation of the algorithm for sparse distributed memory on the Connection Machine is discussed.
Proceedings: Sisal `93

DOE Office of Scientific and Technical Information (OSTI.GOV)

Feo, J.T.

1993-10-01

This report contain papers on: Programmability and performance issues; The case of an iterative partial differential equation solver; Implementing the kernal of the Australian Region Weather Prediction Model in Sisal; Even and quarter-even prime length symmetric FFTs and their Sisal Implementations; Top-down thread generation for Sisal; Overlapping communications and computations on NUMA architechtures; Compiling technique based on dataflow analysis for funtional programming language Valid; Copy elimination for true multidimensional arrays in Sisal 2.0; Increasing parallelism for an optimization that reduces copying in IF2 graphs; Caching in on Sisal; Cache performance of Sisal Vs. FORTRAN; FFT algorithms on a shared-memory multiprocessor;more » A parallel implementation of nonnumeric search problems in Sisal; Computer vision algorithms in Sisal; Compilation of Sisal for a high-performance data driven vector processor; Sisal on distributed memory machines; A virtual shared addressing system for distributed memory Sisal; Developing a high-performance FFT algorithm in Sisal for a vector supercomputer; Implementation issues for IF2 on a static data-flow architechture; and Systematic control of parallelism in array-based data-flow computation. Selected papers have been indexed separately for inclusion in the Energy Science and Technology Database.« less

Three-Dimensional High-Lift Analysis Using a Parallel Unstructured Multigrid Solver

NASA Technical Reports Server (NTRS)

Mavriplis, Dimitri J.

1998-01-01

A directional implicit unstructured agglomeration multigrid solver is ported to shared and distributed memory massively parallel machines using the explicit domain-decomposition and message-passing approach. Because the algorithm operates on local implicit lines in the unstructured mesh, special care is required in partitioning the problem for parallel computing. A weighted partitioning strategy is described which avoids breaking the implicit lines across processor boundaries, while incurring minimal additional communication overhead. Good scalability is demonstrated on a 128 processor SGI Origin 2000 machine and on a 512 processor CRAY T3E machine for reasonably fine grids. The feasibility of performing large-scale unstructured grid calculations with the parallel multigrid algorithm is demonstrated by computing the flow over a partial-span flap wing high-lift geometry on a highly resolved grid of 13.5 million points in approximately 4 hours of wall clock time on the CRAY T3E.
Benchmark tests on the digital equipment corporation Alpha AXP 21164-based AlphaServer 8400, including a comparison of optimized vector and superscalar processing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wasserman, H.J.

1996-02-01

The second generation of the Digital Equipment Corp. (DEC) DECchip Alpha AXP microprocessor is referred to as the 21164. From the viewpoint of numerically-intensive computing, the primary difference between it and its predecessor, the 21064, is that the 21164 has twice the multiply/add throughput per clock period (CP), a maximum of two floating point operations (FLOPS) per CP vs. one for 21064. The AlphaServer 8400 is a shared-memory multiprocessor server system that can accommodate up to 12 CPUs and up to 14 GB of memory. In this report we will compare single processor performance of the 8400 system with thatmore » of the International Business Machines Corp. (IBM) RISC System/6000 POWER-2 microprocessor running at 66 MHz, the Silicon Graphics, Inc. (SGI) MIPS R8000 microprocessor running at 75 MHz, and the Cray Research, Inc. CRAY J90. The performance comparison is based on a set of Fortran benchmark codes that represent a portion of the Los Alamos National Laboratory supercomputer workload. The advantage of using these codes, is that the codes also span a wide range of computational characteristics, such as vectorizability, problem size, and memory access pattern. The primary disadvantage of using them is that detailed, quantitative analysis of performance behavior of all codes on all machines is difficult. One important addition to the benchmark set appears for the first time in this report. Whereas the older version was written for a vector processor, the newer version is more optimized for microprocessor architectures. Therefore, we have for the first time, an opportunity to measure performance on a single application using implementations that expose the respective strengths of vector and superscalar architecture. All results in this report are from single processors. A subsequent article will explore shared-memory multiprocessing performance of the 8400 system.« less
Interaction with Machine Improvisation

NASA Astrophysics Data System (ADS)

Assayag, Gerard; Bloch, George; Cont, Arshia; Dubnov, Shlomo

We describe two multi-agent architectures for an improvisation oriented musician-machine interaction systems that learn in real time from human performers. The improvisation kernel is based on sequence modeling and statistical learning. We present two frameworks of interaction with this kernel. In the first, the stylistic interaction is guided by a human operator in front of an interactive computer environment. In the second framework, the stylistic interaction is delegated to machine intelligence and therefore, knowledge propagation and decision are taken care of by the computer alone. The first framework involves a hybrid architecture using two popular composition/performance environments, Max and OpenMusic, that are put to work and communicate together, each one handling the process at a different time/memory scale. The second framework shares the same representational schemes with the first but uses an Active Learning architecture based on collaborative, competitive and memory-based learning to handle stylistic interactions. Both systems are capable of processing real-time audio/video as well as MIDI. After discussing the general cognitive background of improvisation practices, the statistical modelling tools and the concurrent agent architecture are presented. Then, an Active Learning scheme is described and considered in terms of using different improvisation regimes for improvisation planning. Finally, we provide more details about the different system implementations and describe several performances with the system.
Concurrent Image Processing Executive (CIPE). Volume 1: Design overview

NASA Technical Reports Server (NTRS)

Lee, Meemong; Groom, Steven L.; Mazer, Alan S.; Williams, Winifred I.

1990-01-01

The design and implementation of a Concurrent Image Processing Executive (CIPE), which is intended to become the support system software for a prototype high performance science analysis workstation are described. The target machine for this software is a JPL/Caltech Mark 3fp Hypercube hosted by either a MASSCOMP 5600 or a Sun-3, Sun-4 workstation; however, the design will accommodate other concurrent machines of similar architecture, i.e., local memory, multiple-instruction-multiple-data (MIMD) machines. The CIPE system provides both a multimode user interface and an applications programmer interface, and has been designed around four loosely coupled modules: user interface, host-resident executive, hypercube-resident executive, and application functions. The loose coupling between modules allows modification of a particular module without significantly affecting the other modules in the system. In order to enhance hypercube memory utilization and to allow expansion of image processing capabilities, a specialized program management method, incremental loading, was devised. To minimize data transfer between host and hypercube, a data management method which distributes, redistributes, and tracks data set information was implemented. The data management also allows data sharing among application programs. The CIPE software architecture provides a flexible environment for scientific analysis of complex remote sensing image data, such as planetary data and imaging spectrometry, utilizing state-of-the-art concurrent computation capabilities.
Parallel-vector out-of-core equation solver for computational mechanics

NASA Technical Reports Server (NTRS)

Qin, J.; Agarwal, T. K.; Storaasli, O. O.; Nguyen, D. T.; Baddourah, M. A.

1993-01-01

A parallel/vector out-of-core equation solver is developed for shared-memory computers, such as the Cray Y-MP machine. The input/ output (I/O) time is reduced by using the a synchronous BUFFER IN and BUFFER OUT, which can be executed simultaneously with the CPU instructions. The parallel and vector capability provided by the supercomputers is also exploited to enhance the performance. Numerical applications in large-scale structural analysis are given to demonstrate the efficiency of the present out-of-core solver.
A study of an arbiter function in the structures of a shared bus

NASA Astrophysics Data System (ADS)

Seck, J.-P.

The results of a comparative study of synchronous and asynchronous arbiters for managing user access to a shared bus is presented. The best available method is determined to be modular arbiter structures attached only to the decision module. Linear and circular arbitration strategies are examined for suitability for automatic decision-making. A multiple strategies arbiter scheme is devised, involving the superposition of various strategies of one sequential machine into another. It is then possible to modify the strategy on-line if the current strategy is ineffective. The utilization of a multiple structure of cascading arbiter devices is noted to be effective if response time is not a critical matter. Finally, attention is given to automatic circuit testing and fault detection. An example is furnished in terms of a management system for a shared memory in a multimicroprocessor structure.
Tuning collective communication for Partitioned Global Address Space programming models

DOE PAGES

Nishtala, Rajesh; Zheng, Yili; Hargrove, Paul H.; ...

2011-06-12

Partitioned Global Address Space (PGAS) languages offer programmers the convenience of a shared memory programming style combined with locality control necessary to run on large-scale distributed memory systems. Even within a PGAS language programmers often need to perform global communication operations such as broadcasts or reductions, which are best performed as collective operations in which a group of threads work together to perform the operation. In this study we consider the problem of implementing collective communication within PGAS languages and explore some of the design trade-offs in both the interface and implementation. In particular, PGAS collectives have semantic issues thatmore » are different than in send–receive style message passing programs, and different implementation approaches that take advantage of the one-sided communication style in these languages. We present an implementation framework for PGAS collectives as part of the GASNet communication layer, which supports shared memory, distributed memory and hybrids. The framework supports a broad set of algorithms for each collective, over which the implementation may be automatically tuned. In conclusion, we demonstrate the benefit of optimized GASNet collectives using application benchmarks written in UPC, and demonstrate that the GASNet collectives can deliver scalable performance on a variety of state-of-the-art parallel machines including a Cray XT4, an IBM BlueGene/P, and a Sun Constellation system with InfiniBand interconnect.« less
The Global File System

NASA Technical Reports Server (NTRS)

Soltis, Steven R.; Ruwart, Thomas M.; OKeefe, Matthew T.

1996-01-01

The global file system (GFS) is a prototype design for a distributed file system in which cluster nodes physically share storage devices connected via a network-like fiber channel. Networks and network-attached storage devices have advanced to a level of performance and extensibility so that the previous disadvantages of shared disk architectures are no longer valid. This shared storage architecture attempts to exploit the sophistication of storage device technologies whereas a server architecture diminishes a device's role to that of a simple component. GFS distributes the file system responsibilities across processing nodes, storage across the devices, and file system resources across the entire storage pool. GFS caches data on the storage devices instead of the main memories of the machines. Consistency is established by using a locking mechanism maintained by the storage devices to facilitate atomic read-modify-write operations. The locking mechanism is being prototyped in the Silicon Graphics IRIX operating system and is accessed using standard Unix commands and modules.
Low Latency Messages on Distributed Memory Multiprocessors

DOE PAGES

Rosing, Matt; Saltz, Joel

1995-01-01

This article describes many of the issues in developing an efficient interface for communication on distributed memory machines. Although the hardware component of message latency is less than 1 ws on many distributed memory machines, the software latency associated with sending and receiving typed messages is on the order of 50 μs. The reason for this imbalance is that the software interface does not match the hardware. By changing the interface to match the hardware more closely, applications with fine grained communication can be put on these machines. This article describes several tests performed and many of the issues involvedmore » in supporting low latency messages on distributed memory machines.« less
Performance Analysis of Ivshmem for High-Performance Computing in Virtual Machines

NASA Astrophysics Data System (ADS)

Ivanovic, Pavle; Richter, Harald

2018-01-01

High-Performance computing (HPC) is rarely accomplished via virtual machines (VMs). In this paper, we present a remake of ivshmem which can change this. Ivshmem was a shared memory (SHM) between virtual machines on the same server, with SHM-access synchronization included, until about 5 years ago when newer versions of Linux and its virtualization library libvirt evolved. We restored that SHM-access synchronization feature because it is indispensable for HPC and made ivshmem runnable with contemporary versions of Linux, libvirt, KVM, QEMU and especially MPICH, which is an implementation of MPI - the standard HPC communication library. Additionally, MPICH was transparently modified by us to get ivshmem included, resulting in a three to ten times performance improvement compared to TCP/IP. Furthermore, we have transparently replaced MPI_PUT, a single-side MPICH communication mechanism, by an own MPI_PUT wrapper. As a result, our ivshmem even surpasses non-virtualized SHM data transfers for block lengths greater than 512 KBytes, showing the benefits of virtualization. All improvements were possible without using SR-IOV.
Concurrent computation of attribute filters on shared memory parallel machines.

PubMed

Wilkinson, Michael H F; Gao, Hui; Hesselink, Wim H; Jonker, Jan-Eppo; Meijster, Arnold

2008-10-01

Morphological attribute filters have not previously been parallelized, mainly because they are both global and non-separable. We propose a parallel algorithm that achieves efficient parallelism for a large class of attribute filters, including attribute openings, closings, thinnings and thickenings, based on Salembier's Max-Trees and Min-trees. The image or volume is first partitioned in multiple slices. We then compute the Max-trees of each slice using any sequential Max-Tree algorithm. Subsequently, the Max-trees of the slices can be merged to obtain the Max-tree of the image. A C-implementation yielded good speed-ups on both a 16-processor MIPS 14000 parallel machine, and a dual-core Opteron-based machine. It is shown that the speed-up of the parallel algorithm is a direct measure of the gain with respect to the sequential algorithm used. Furthermore, the concurrent algorithm shows a speed gain of up to 72 percent on a single-core processor, due to reduced cache thrashing.
Implementing Molecular Dynamics on Hybrid High Performance Computers - Three-Body Potentials

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brown, W Michael; Yamada, Masako

The use of coprocessors or accelerators such as graphics processing units (GPUs) has become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power re- quirements. Hybrid high-performance computers, defined as machines with nodes containing more than one type of floating-point processor (e.g. CPU and GPU), are now becoming more prevalent due to these advantages. Although there has been extensive research into methods to efficiently use accelerators to improve the performance of molecular dynamics (MD) employing pairwise potential energy models, little is reported in the literature for models that includemore » many-body effects. 3-body terms are required for many popular potentials such as MEAM, Tersoff, REBO, AIREBO, Stillinger-Weber, Bond-Order Potentials, and others. Because the per-atom simulation times are much higher for models incorporating 3-body terms, there is a clear need for efficient algo- rithms usable on hybrid high performance computers. Here, we report a shared-memory force-decomposition for 3-body potentials that avoids memory conflicts to allow for a deterministic code with substantial performance improvements on hybrid machines. We describe modifications necessary for use in distributed memory MD codes and show results for the simulation of water with Stillinger-Weber on the hybrid Titan supercomputer. We compare performance of the 3-body model to the SPC/E water model when using accelerators. Finally, we demonstrate that our approach can attain a speedup of 5.1 with acceleration on Titan for production simulations to study water droplet freezing on a surface.« less
Proceedings of the second SISAL users` conference

DOE Office of Scientific and Technical Information (OSTI.GOV)

Feo, J T; Frerking, C; Miller, P J

1992-12-01

This report contains papers on the following topics: A sisal code for computing the fourier transform on S{sub N}; five ways to fill your knapsack; simulating material dislocation motion in sisal; candis as an interface for sisal; parallelisation and performance of the burg algorithm on a shared-memory multiprocessor; use of genetic algorithm in sisal to solve the file design problem; implementing FFT`s in sisal; programming and evaluating the performance of signal processing applications in the sisal programming environment; sisal and Von Neumann-based languages: translation and intercommunication; an IF2 code generator for ADAM architecture; program partitioning for NUMA multiprocessor computer systems;more » mapping functional parallelism on distributed memory machines; implicit array copying: prevention is better than cure ; mathematical syntax for sisal; an approach for optimizing recursive functions; implementing arrays in sisal 2.0; Fol: an object oriented extension to the sisal language; twine: a portable, extensible sisal execution kernel; and investigating the memory performance of the optimizing sisal compiler.« less
Sequential behavior and its inherent tolerance to memory faults.

NASA Technical Reports Server (NTRS)

Meyer, J. F.

1972-01-01

Representation of a memory fault of a sequential machine M by a function mu on the states of M and the result of the fault by an appropriately determined machine M(mu). Given some sequential behavior B, its inherent tolerance to memory faults can then be measured in terms of the minimum memory redundancy required to realize B with a state-assigned machine having fault tolerance type tau and fault tolerance level t. A behavior having maximum inherent tolerance is exhibited, and it is shown that behaviors of the same size can have different inherent tolerance.
The Effects of Different Electrode Types for Obtaining Surface Machining Shape on Shape Memory Alloy Using Electrochemical Machining

NASA Astrophysics Data System (ADS)

Choi, S. G.; Kim, S. H.; Choi, W. K.; Moon, G. C.; Lee, E. S.

2017-06-01

Shape memory alloy (SMA) is important material used for the medicine and aerospace industry due to its characteristics called the shape memory effect, which involves the recovery of deformed alloy to its original state through the application of temperature or stress. Consumers in modern society demand stability in parts. Electrochemical machining is one of the methods for obtained these stabilities in parts requirements. These parts of shape memory alloy require fine patterns in some applications. In order to machine a fine pattern, the electrochemical machining method is suitable. For precision electrochemical machining using different shape electrodes, the current density should be controlled precisely. And electrode shape is required for precise electrochemical machining. It is possible to obtain precise square holes on the SMA if the insulation layer controlled the unnecessary current between electrode and workpiece. If it is adjusting the unnecessary current to obtain the desired shape, it will be a great contribution to the medical industry and the aerospace industry. It is possible to process a desired shape to the shape memory alloy by micro controlling the unnecessary current. In case of the square electrode without insulation layer, it derives inexact square holes due to the unnecessary current. The results using the insulated electrode in only side show precise square holes. The removal rate improved in case of insulated electrode than others because insulation layer concentrate the applied current to the machining zone.
Performing an allreduce operation using shared memory

DOEpatents

Archer, Charles J [Rochester, MN; Dozsa, Gabor [Ardsley, NY; Ratterman, Joseph D [Rochester, MN; Smith, Brian E [Rochester, MN

2012-04-17

Methods, apparatus, and products are disclosed for performing an allreduce operation using shared memory that include: receiving, by at least one of a plurality of processing cores on a compute node, an instruction to perform an allreduce operation; establishing, by the core that received the instruction, a job status object for specifying a plurality of shared memory allreduce work units, the plurality of shared memory allreduce work units together performing the allreduce operation on the compute node; determining, by an available core on the compute node, a next shared memory allreduce work unit in the job status object; and performing, by that available core on the compute node, that next shared memory allreduce work unit.
Performing an allreduce operation using shared memory

DOEpatents

Archer, Charles J; Dozsa, Gabor; Ratterman, Joseph D; Smith, Brian E

2014-06-10

Methods, apparatus, and products are disclosed for performing an allreduce operation using shared memory that include: receiving, by at least one of a plurality of processing cores on a compute node, an instruction to perform an allreduce operation; establishing, by the core that received the instruction, a job status object for specifying a plurality of shared memory allreduce work units, the plurality of shared memory allreduce work units together performing the allreduce operation on the compute node; determining, by an available core on the compute node, a next shared memory allreduce work unit in the job status object; and performing, by that available core on the compute node, that next shared memory allreduce work unit.
Computer-Aided Parallelizer and Optimizer

NASA Technical Reports Server (NTRS)

Jin, Haoqiang

2011-01-01

The Computer-Aided Parallelizer and Optimizer (CAPO) automates the insertion of compiler directives (see figure) to facilitate parallel processing on Shared Memory Parallel (SMP) machines. While CAPO currently is integrated seamlessly into CAPTools (developed at the University of Greenwich, now marketed as ParaWise), CAPO was independently developed at Ames Research Center as one of the components for the Legacy Code Modernization (LCM) project. The current version takes serial FORTRAN programs, performs interprocedural data dependence analysis, and generates OpenMP directives. Due to the widely supported OpenMP standard, the generated OpenMP codes have the potential to run on a wide range of SMP machines. CAPO relies on accurate interprocedural data dependence information currently provided by CAPTools. Compiler directives are generated through identification of parallel loops in the outermost level, construction of parallel regions around parallel loops and optimization of parallel regions, and insertion of directives with automatic identification of private, reduction, induction, and shared variables. Attempts also have been made to identify potential pipeline parallelism (implemented with point-to-point synchronization). Although directives are generated automatically, user interaction with the tool is still important for producing good parallel codes. A comprehensive graphical user interface is included for users to interact with the parallelization process.
The force on the flex: Global parallelism and portability

NASA Technical Reports Server (NTRS)

Jordan, H. F.

1986-01-01

A parallel programming methodology, called the force, supports the construction of programs to be executed in parallel by an unspecified, but potentially large, number of processes. The methodology was originally developed on a pipelined, shared memory multiprocessor, the Denelcor HEP, and embodies the primitive operations of the force in a set of macros which expand into multiprocessor Fortran code. A small set of primitives is sufficient to write large parallel programs, and the system has been used to produce 10,000 line programs in computational fluid dynamics. The level of complexity of the force primitives is intermediate. It is high enough to mask detailed architectural differences between multiprocessors but low enough to give the user control over performance. The system is being ported to a medium scale multiprocessor, the Flex/32, which is a 20 processor system with a mixture of shared and local memory. Memory organization and the type of processor synchronization supported by the hardware on the two machines lead to some differences in efficient implementations of the force primitives, but the user interface remains the same. An initial implementation was done by retargeting the macros to Flexible Computer Corporation's ConCurrent C language. Subsequently, the macros were caused to directly produce the system calls which form the basis for ConCurrent C. The implementation of the Fortran based system is in step with Flexible Computer Corporations's implementation of a Fortran system in the parallel environment.
Parallel Implementation of Triangular Cellular Automata for Computing Two-Dimensional Elastodynamic Response on Arbitrary Domains

NASA Astrophysics Data System (ADS)

Leamy, Michael J.; Springer, Adam C.

In this research we report parallel implementation of a Cellular Automata-based simulation tool for computing elastodynamic response on complex, two-dimensional domains. Elastodynamic simulation using Cellular Automata (CA) has recently been presented as an alternative, inherently object-oriented technique for accurately and efficiently computing linear and nonlinear wave propagation in arbitrarily-shaped geometries. The local, autonomous nature of the method should lead to straight-forward and efficient parallelization. We address this notion on symmetric multiprocessor (SMP) hardware using a Java-based object-oriented CA code implementing triangular state machines (i.e., automata) and the MPI bindings written in Java (MPJ Express). We use MPJ Express to reconfigure our existing CA code to distribute a domain's automata to cores present on a dual quad-core shared-memory system (eight total processors). We note that this message passing parallelization strategy is directly applicable to computer clustered computing, which will be the focus of follow-on research. Results on the shared memory platform indicate nearly-ideal, linear speed-up. We conclude that the CA-based elastodynamic simulator is easily configured to run in parallel, and yields excellent speed-up on SMP hardware.

Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures

DTIC Science & Technology

2017-10-04

Report: Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures The views, opinions and/or findings contained in this...Chapel Hill Title: Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures Report Term: 0-Other Email: dm...algorithms for scientific and geometric computing by exploiting the power and performance efficiency of heterogeneous shared memory architectures . These
The effect of the order in which episodic autobiographical memories versus autobiographical knowledge are shared on feelings of closeness.

PubMed

Brandon, Nicole R; Beike, Denise R; Cole, Holly E

2017-07-01

Autobiographical memories (AMs) can be used to create and maintain closeness with others [Alea, N., & Bluck, S. (2003). Why are you telling me that? A conceptual model of the social function of autobiographical memory. Memory, 11(2), 165-178]. However, the differential effects of memory specificity are not well established. Two studies with 148 participants tested whether the order in which autobiographical knowledge (AK) and specific episodic AM (EAM) are shared affects feelings of closeness. Participants read two memories hypothetically shared by each of four strangers. The strangers first shared either AK or an EAM, and then shared either AK or an EAM. Participants were randomly assigned to read either positive or negative AMs from the strangers. Findings suggest that people feel closer to those who share positive AMs in the same way they construct memories: starting with general and moving to specific.
Memory Benchmarks for SMP-Based High Performance Parallel Computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yoo, A B; de Supinski, B; Mueller, F

2001-11-20

As the speed gap between CPU and main memory continues to grow, memory accesses increasingly dominates the performance of many applications. The problem is particularly acute for symmetric multiprocessor (SMP) systems, where the shared memory may be accessed concurrently by a group of threads running on separate CPUs. Unfortunately, several key issues governing memory system performance in current systems are not well understood. Complex interactions between the levels of the memory hierarchy, buses or switches, DRAM back-ends, system software, and application access patterns can make it difficult to pinpoint bottlenecks and determine appropriate optimizations, and the situation is even moremore » complex for SMP systems. To partially address this problem, we formulated a set of multi-threaded microbenchmarks for characterizing and measuring the performance of the underlying memory system in SMP-based high-performance computers. We report our use of these microbenchmarks on two important SMP-based machines. This paper has four primary contributions. First, we introduce a microbenchmark suite to systematically assess and compare the performance of different levels in SMP memory hierarchies. Second, we present a new tool based on hardware performance monitors to determine a wide array of memory system characteristics, such as cache sizes, quickly and easily; by using this tool, memory performance studies can be targeted to the full spectrum of performance regimes with many fewer data points than is otherwise required. Third, we present experimental results indicating that the performance of applications with large memory footprints remains largely constrained by memory. Fourth, we demonstrate that thread-level parallelism further degrades memory performance, even for the latest SMPs with hardware prefetching and switch-based memory interconnects.« less
Exploring Manycore Multinode Systems for Irregular Applications with FPGA Prototyping

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ceriani, Marco; Palermo, Gianluca; Secchi, Simone

We present a prototype of a multi-core architecture implemented on FPGA, designed to enable efficient execution of irregular applications on distributed shared memory machines, while maintaining high performance on regular workloads. The architecture is composed of off-the-shelf soft-core cores, local interconnection and memory interface, integrated with custom components that optimize it for irregular applications. It relies on three key elements: a global address space, multithreading, and fine-grained synchronization. Global addresses are scrambled to reduce the formation of network hot-spots, while the latency of the transactions is covered by integrating an hardware scheduler within the custom load/store buffers to take advantagemore » from the availability of multiple executions threads, increasing the efficiency in a transparent way to the application. We evaluated a dual node system irregular kernels showing scalability in the number of cores and threads.« less
Shared Semantics and the Use of Organizational Memories for E-Mail Communications.

ERIC Educational Resources Information Center

Schwartz, David G.

1998-01-01

Examines the use of shared semantics information to link concepts in an organizational memory to e-mail communications. Presents a framework for determining shared semantics based on organizational and personal user profiles. Illustrates how shared semantics are used by the HyperMail system to help link organizational memories (OM) content to…
Multiprogramming performance degradation - Case study on a shared memory multiprocessor

NASA Technical Reports Server (NTRS)

Dimpsey, R. T.; Iyer, R. K.

1989-01-01

The performance degradation due to multiprogramming overhead is quantified for a parallel-processing machine. Measurements of real workloads were taken, and it was found that there is a moderate correlation between the completion time of a program and the amount of system overhead measured during program execution. Experiments in controlled environments were then conducted to calculate a lower bound on the performance degradation of parallel jobs caused by multiprogramming overhead. The results show that the multiprogramming overhead of parallel jobs consumes at least 4 percent of the processor time. When two or more serial jobs are introduced into the system, this amount increases to 5.3 percent
Shared direct memory access on the Explorer 2-LX

NASA Technical Reports Server (NTRS)

Musgrave, Jeffrey L.

1990-01-01

Advances in Expert System technology and Artificial Intelligence have provided a framework for applying automated Intelligence to the solution of problems which were generally perceived as intractable using more classical approaches. As a result, hybrid architectures and parallel processing capability have become more common in computing environments. The Texas Instruments Explorer II-LX is an example of a machine which combines a symbolic processing environment, and a computationally oriented environment in a single chassis for integrated problem solutions. This user's manual is an attempt to make these capabilities more accessible to a wider range of engineers and programmers with problems well suited to solution in such an environment.
An efficient 3-dim FFT for plane wave electronic structure calculations on massively parallel machines composed of multiprocessor nodes

NASA Astrophysics Data System (ADS)

Goedecker, Stefan; Boulet, Mireille; Deutsch, Thierry

2003-08-01

Three-dimensional Fast Fourier Transforms (FFTs) are the main computational task in plane wave electronic structure calculations. Obtaining a high performance on a large numbers of processors is non-trivial on the latest generation of parallel computers that consist of nodes made up of a shared memory multiprocessors. A non-dogmatic method for obtaining high performance for such 3-dim FFTs in a combined MPI/OpenMP programming paradigm will be presented. Exploiting the peculiarities of plane wave electronic structure calculations, speedups of up to 160 and speeds of up to 130 Gflops were obtained on 256 processors.
Fast and Accurate Simulation of the Cray XMT Multithreaded Supercomputer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Villa, Oreste; Tumeo, Antonino; Secchi, Simone

Irregular applications, such as data mining and analysis or graph-based computations, show unpredictable memory/network access patterns and control structures. Highly multithreaded architectures with large processor counts, like the Cray MTA-1, MTA-2 and XMT, appear to address their requirements better than commodity clusters. However, the research on highly multithreaded systems is currently limited by the lack of adequate architectural simulation infrastructures due to issues such as size of the machines, memory footprint, simulation speed, accuracy and customization. At the same time, Shared-memory MultiProcessors (SMPs) with multi-core processors have become an attractive platform to simulate large scale machines. In this paper, wemore » introduce a cycle-level simulator of the highly multithreaded Cray XMT supercomputer. The simulator runs unmodified XMT applications. We discuss how we tackled the challenges posed by its development, detailing the techniques introduced to make the simulation as fast as possible while maintaining a high accuracy. By mapping XMT processors (ThreadStorm with 128 hardware threads) to host computing cores, the simulation speed remains constant as the number of simulated processors increases, up to the number of available host cores. The simulator supports zero-overhead switching among different accuracy levels at run-time and includes a network model that takes into account contention. On a modern 48-core SMP host, our infrastructure simulates a large set of irregular applications 500 to 2000 times slower than real time when compared to a 128-processor XMT, while remaining within 10\\% of accuracy. Emulation is only from 25 to 200 times slower than real time.« less
An implementation of the SNR high speed network communication protocol (Receiver part)

NASA Astrophysics Data System (ADS)

Wan, Wen-Jyh

1995-03-01

This thesis work is to implement the receiver pan of the SNR high speed network transport protocol. The approach was to use the Systems of Communicating Machines (SCM) as the formal definition of the protocol. Programs were developed on top of the Unix system using C programming language. The Unix system features that were adopted for this implementation were multitasking, signals, shared memory, semaphores, sockets, timers and process control. The problems encountered, and solved, were signal loss, shared memory conflicts, process synchronization, scheduling, data alignment and errors in the SCM specification itself. The result was a correctly functioning program which implemented the SNR protocol. The system was tested using different connection modes, lost packets, duplicate packets and large data transfers. The contributions of this thesis are: (1) implementation of the receiver part of the SNR high speed transport protocol; (2) testing and integration with the transmitter part of the SNR transport protocol on an FDDI data link layered network; (3) demonstration of the functions of the SNR transport protocol such as connection management, sequenced delivery, flow control and error recovery using selective repeat methods of retransmission; and (4) modifications to the SNR transport protocol specification such as corrections for incorrect predicate conditions, defining of additional packet types formats, solutions for signal lost and processes contention problems etc.
A time-shared machine repair problem with mixed spares under N-policy

NASA Astrophysics Data System (ADS)

Jain, Madhu; Shekhar, Chandra; Shukla, Shalini

2016-06-01

The present investigation deals with a machine repair problem consisting of cold and warm standby machines. The machines are subject to breakdown and are repaired by the permanent repairman operating under N-policy. There is provision of one additional removable repairman who is called upon when the work load of failed machines crosses a certain threshold level and is removed as soon as the work load again ceases to that level. Both repairmen recover the failed machines by following the time sharing concept which means that the repairmen share their repair job simultaneously among all the failed machines that have joined the system for repair. Markovian model has been developed by considering the queue dependent rates and solved analytically using the recursive technique. Various performance indices are derived which are further used to obtain the cost function. By taking illustration, numerical simulation and sensitivity analysis have been provided.
Technical support for digital systems technology development. Task order 1: ISP contention analysis and control

NASA Technical Reports Server (NTRS)

Stehle, Roy H.; Ogier, Richard G.

1993-01-01

Alternatives for realizing a packet-based network switch for use on a frequency division multiple access/time division multiplexed (FDMA/TDM) geostationary communication satellite were investigated. Each of the eight downlink beams supports eight directed dwells. The design needed to accommodate multicast packets with very low probability of loss due to contention. Three switch architectures were designed and analyzed. An output-queued, shared bus system yielded a functionally simple system, utilizing a first-in, first-out (FIFO) memory per downlink dwell, but at the expense of a large total memory requirement. A shared memory architecture offered the most efficiency in memory requirements, requiring about half the memory of the shared bus design. The processing requirement for the shared-memory system adds system complexity that may offset the benefits of the smaller memory. An alternative design using a shared memory buffer per downlink beam decreases circuit complexity through a distributed design, and requires at most 1000 packets of memory more than the completely shared memory design. Modifications to the basic packet switch designs were proposed to accommodate circuit-switched traffic, which must be served on a periodic basis with minimal delay. Methods for dynamically controlling the downlink dwell lengths were developed and analyzed. These methods adapt quickly to changing traffic demands, and do not add significant complexity or cost to the satellite and ground station designs. Methods for reducing the memory requirement by not requiring the satellite to store full packets were also proposed and analyzed. In addition, optimal packet and dwell lengths were computed as functions of memory size for the three switch architectures.
A self-defining hierarchical data system

NASA Technical Reports Server (NTRS)

Bailey, J.

1992-01-01

The Self-Defining Data System (SDS) is a system which allows the creation of self-defining hierarchical data structures in a form which allows the data to be moved between different machine architectures. Because the structures are self-defining they can be used for communication between independent modules in a distributed system. Unlike disk-based hierarchical data systems such as Starlink's HDS, SDS works entirely in memory and is very fast. Data structures are created and manipulated as internal dynamic structures in memory managed by SDS itself. A structure may then be exported into a caller supplied memory buffer in a defined external format. This structure can be written as a file or sent as a message to another machine. It remains static in structure until it is reimported into SDS. SDS is written in portable C and has been run on a number of different machine architectures. Structures are portable between machines with SDS looking after conversion of byte order, floating point format, and alignment. A Fortran callable version is also available for some machines.
Supervisory control and diagnostics system for the mirror fusion test facility: overview and status 1980

DOE Office of Scientific and Technical Information (OSTI.GOV)

McGoldrick, P.R.

1981-01-01

The Mirror Fusion Test Facility (MFTF) is a complex facility requiring a highly-computerized Supervisory Control and Diagnostics System (SCDS) to monitor and provide control over ten subsystems; three of which require true process control. SCDS will provide physicists with a method of studying machine and plasma behavior by acquiring and processing up to four megabytes of plasma diagnostic information every five minutes. A high degree of availability and throughput is provided by a distributed computer system (nine 32-bit minicomputers on shared memory). Data, distributed across SCDS, is managed by a high-bandwidth Distributed Database Management System. The MFTF operators' control roommore » consoles use color television monitors with touch sensitive screens; this is a totally new approach. The method of handling deviations to normal machine operation and how the operator should be notified and assisted in the resolution of problems has been studied and a system designed.« less
Simultaneous Scheduling of Jobs, AGVs and Tools Considering Tool Transfer Times in Multi Machine FMS By SOS Algorithm

NASA Astrophysics Data System (ADS)

Sivarami Reddy, N.; Ramamurthy, D. V., Dr.; Prahlada Rao, K., Dr.

2017-08-01

This article addresses simultaneous scheduling of machines, AGVs and tools where machines are allowed to share the tools considering transfer times of jobs and tools between machines, to generate best optimal sequences that minimize makespan in a multi-machine Flexible Manufacturing System (FMS). Performance of FMS is expected to improve by effective utilization of its resources, by proper integration and synchronization of their scheduling. Symbiotic Organisms Search (SOS) algorithm is a potent tool which is a better alternative for solving optimization problems like scheduling and proven itself. The proposed SOS algorithm is tested on 22 job sets with makespan as objective for scheduling of machines and tools where machines are allowed to share tools without considering transfer times of jobs and tools and the results are compared with the results of existing methods. The results show that the SOS has outperformed. The same SOS algorithm is used for simultaneous scheduling of machines, AGVs and tools where machines are allowed to share tools considering transfer times of jobs and tools to determine the best optimal sequences that minimize makespan.
Automatic Data Traffic Control on DSM Architecture

NASA Technical Reports Server (NTRS)

Frumkin, Michael; Jin, Hao-Qiang; Yan, Jerry; Kwak, Dochan (Technical Monitor)

2000-01-01

We study data traffic on distributed shared memory machines and conclude that data placement and grouping improve performance of scientific codes. We present several methods which user can employ to improve data traffic in his code. We report on implementation of a tool which detects the code fragments causing data congestions and advises user on improvements of data routing in these fragments. The capabilities of the tool include deduction of data alignment and affinity from the source code; detection of the code constructs having abnormally high cache or TLB misses; generation of data placement constructs. We demonstrate the capabilities of the tool on experiments with NAS parallel benchmarks and with a simple computational fluid dynamics application ARC3D.
Accelerate quasi Monte Carlo method for solving systems of linear algebraic equations through shared memory

NASA Astrophysics Data System (ADS)

Lai, Siyan; Xu, Ying; Shao, Bo; Guo, Menghan; Lin, Xiaola

2017-04-01

In this paper we study on Monte Carlo method for solving systems of linear algebraic equations (SLAE) based on shared memory. Former research demostrated that GPU can effectively speed up the computations of this issue. Our purpose is to optimize Monte Carlo method simulation on GPUmemoryachritecture specifically. Random numbers are organized to storein shared memory, which aims to accelerate the parallel algorithm. Bank conflicts can be avoided by our Collaborative Thread Arrays(CTA)scheme. The results of experiments show that the shared memory based strategy can speed up the computaions over than 3X at most.
CaLRS: A Critical-Aware Shared LLC Request Scheduling Algorithm on GPGPU

PubMed Central

Ma, Jianliang; Meng, Jinglei; Chen, Tianzhou; Wu, Minghui

2015-01-01

Ultra high thread-level parallelism in modern GPUs usually introduces numerous memory requests simultaneously. So there are always plenty of memory requests waiting at each bank of the shared LLC (L2 in this paper) and global memory. For global memory, various schedulers have already been developed to adjust the request sequence. But we find few work has ever focused on the service sequence on the shared LLC. We measured that a big number of GPU applications always queue at LLC bank for services, which provide opportunity to optimize the service order on LLC. Through adjusting the GPU memory request service order, we can improve the schedulability of SM. So we proposed a critical-aware shared LLC request scheduling algorithm (CaLRS) in this paper. The priority representative of memory request is critical for CaLRS. We use the number of memory requests that originate from the same warp but have not been serviced when they arrive at the shared LLC bank to represent the criticality of each warp. Experiments show that the proposed scheme can boost the SM schedulability effectively by promoting the scheduling priority of the memory requests with high criticality and improves the performance of GPU indirectly. PMID:25729772
High-throughput state-machine replication using software transactional memory.

PubMed

Zhao, Wenbing; Yang, William; Zhang, Honglei; Yang, Jack; Luo, Xiong; Zhu, Yueqin; Yang, Mary; Luo, Chaomin

2016-11-01

State-machine replication is a common way of constructing general purpose fault tolerance systems. To ensure replica consistency, requests must be executed sequentially according to some total order at all non-faulty replicas. Unfortunately, this could severely limit the system throughput. This issue has been partially addressed by identifying non-conflicting requests based on application semantics and executing these requests concurrently. However, identifying and tracking non-conflicting requests require intimate knowledge of application design and implementation, and a custom fault tolerance solution developed for one application cannot be easily adopted by other applications. Software transactional memory offers a new way of constructing concurrent programs. In this article, we present the mechanisms needed to retrofit existing concurrency control algorithms designed for software transactional memory for state-machine replication. The main benefit for using software transactional memory in state-machine replication is that general purpose concurrency control mechanisms can be designed without deep knowledge of application semantics. As such, new fault tolerance systems based on state-machine replications with excellent throughput can be easily designed and maintained. In this article, we introduce three different concurrency control mechanisms for state-machine replication using software transactional memory, namely, ordered strong strict two-phase locking, conventional timestamp-based multiversion concurrency control, and speculative timestamp-based multiversion concurrency control. Our experiments show that speculative timestamp-based multiversion concurrency control mechanism has the best performance in all types of workload, the conventional timestamp-based multiversion concurrency control offers the worst performance due to high abort rate in the presence of even moderate contention between transactions. The ordered strong strict two-phase locking mechanism offers the simplest solution with excellent performance in low contention workload, and fairly good performance in high contention workload.
High-throughput state-machine replication using software transactional memory

PubMed Central

Yang, William; Zhang, Honglei; Yang, Jack; Luo, Xiong; Zhu, Yueqin; Yang, Mary; Luo, Chaomin

2017-01-01

State-machine replication is a common way of constructing general purpose fault tolerance systems. To ensure replica consistency, requests must be executed sequentially according to some total order at all non-faulty replicas. Unfortunately, this could severely limit the system throughput. This issue has been partially addressed by identifying non-conflicting requests based on application semantics and executing these requests concurrently. However, identifying and tracking non-conflicting requests require intimate knowledge of application design and implementation, and a custom fault tolerance solution developed for one application cannot be easily adopted by other applications. Software transactional memory offers a new way of constructing concurrent programs. In this article, we present the mechanisms needed to retrofit existing concurrency control algorithms designed for software transactional memory for state-machine replication. The main benefit for using software transactional memory in state-machine replication is that general purpose concurrency control mechanisms can be designed without deep knowledge of application semantics. As such, new fault tolerance systems based on state-machine replications with excellent throughput can be easily designed and maintained. In this article, we introduce three different concurrency control mechanisms for state-machine replication using software transactional memory, namely, ordered strong strict two-phase locking, conventional timestamp-based multiversion concurrency control, and speculative timestamp-based multiversion concurrency control. Our experiments show that speculative timestamp-based multiversion concurrency control mechanism has the best performance in all types of workload, the conventional timestamp-based multiversion concurrency control offers the worst performance due to high abort rate in the presence of even moderate contention between transactions. The ordered strong strict two-phase locking mechanism offers the simplest solution with excellent performance in low contention workload, and fairly good performance in high contention workload. PMID:29075049

Superior memory efficiency of quantum devices for the simulation of continuous-time stochastic processes

NASA Astrophysics Data System (ADS)

Elliott, Thomas J.; Gu, Mile

2018-03-01

Continuous-time stochastic processes pervade everyday experience, and the simulation of models of these processes is of great utility. Classical models of systems operating in continuous-time must typically track an unbounded amount of information about past behaviour, even for relatively simple models, enforcing limits on precision due to the finite memory of the machine. However, quantum machines can require less information about the past than even their optimal classical counterparts to simulate the future of discrete-time processes, and we demonstrate that this advantage extends to the continuous-time regime. Moreover, we show that this reduction in the memory requirement can be unboundedly large, allowing for arbitrary precision even with a finite quantum memory. We provide a systematic method for finding superior quantum constructions, and a protocol for analogue simulation of continuous-time renewal processes with a quantum machine.
Design and implementation of a medium speed communications interface and protocol for a low cost, refreshed display computer

NASA Technical Reports Server (NTRS)

Phyne, J. R.; Nelson, M. D.

1975-01-01

The design and implementation of hardware and software systems involved in using a 40,000 bit/second communication line as the connecting link between an IMLAC PDS 1-D display computer and a Univac 1108 computer system were described. The IMLAC consists of two independent processors sharing a common memory. The display processor generates the deflection and beam control currents as it interprets a program contained in the memory; the minicomputer has a general instruction set and is responsible for starting and stopping the display processor and for communicating with the outside world through the keyboard, teletype, light pen, and communication line. The processing time associated with each data byte was minimized by designing the input and output processes as finite state machines which automatically sequence from each state to the next. Several tests of the communication link and the IMLAC software were made using a special low capacity computer grade cable between the IMLAC and the Univac.
Time Constraints and Resource Sharing in Adults' Working Memory Spans

ERIC Educational Resources Information Center

Barrouillet, Pierre; Bernardin, Sophie; Camos, Valerie

2004-01-01

This article presents a new model that accounts for working memory spans in adults, the time-based resource-sharing model. The model assumes that both components (i.e., processing and maintenance) of the main working memory tasks require attention and that memory traces decay as soon as attention is switched away. Because memory retrievals are…
Externalising the autobiographical self: sharing personal memories online facilitated memory retention.

PubMed

Wang, Qi; Lee, Dasom; Hou, Yubo

2017-07-01

Internet technology provides a new means of recalling and sharing personal memories in the digital age. What is the mnemonic consequence of posting personal memories online? Theories of transactive memory and autobiographical memory would make contrasting predictions. In the present study, college students completed a daily diary for a week, listing at the end of each day all the events that happened to them on that day. They also reported whether they posted any of the events online. Participants received a surprise memory test after the completion of the diary recording and then another test a week later. At both tests, events posted online were significantly more likely than those not posted online to be recalled. It appears that sharing memories online may provide unique opportunities for rehearsal and meaning-making that facilitate memory retention.
Computational Models of Human Performance: Validation of Memory and Procedural Representation in Advanced Air/Ground Simulation

NASA Technical Reports Server (NTRS)

Corker, Kevin M.; Labacqz, J. Victor (Technical Monitor)

1997-01-01

The Man-Machine Interaction Design and Analysis System (MIDAS) under joint U.S. Army and NASA cooperative is intended to assist designers of complex human/automation systems in successfully incorporating human performance capabilities and limitations into decision and action support systems. MIDAS is a computational representation of multiple human operators, selected perceptual, cognitive, and physical functions of those operators, and the physical/functional representation of the equipment with which they operate. MIDAS has been used as an integrated predictive framework for the investigation of human/machine systems, particularly in situations with high demands on the operators. We have extended the human performance models to include representation of both human operators and intelligent aiding systems in flight management, and air traffic service. The focus of this development is to predict human performance in response to aiding system developed to identify aircraft conflict and to assist in the shared authority for resolution. The demands of this application requires representation of many intelligent agents sharing world-models, coordinating action/intention, and cooperative scheduling of goals and action in an somewhat unpredictable world of operations. In recent applications to airborne systems development, MIDAS has demonstrated an ability to predict flight crew decision-making and procedural behavior when interacting with automated flight management systems and Air Traffic Control. In this paper, we describe two enhancements to MIDAS. The first involves the addition of working memory in the form of an articulatory buffer for verbal communication protocols and a visuo-spatial buffer for communications via digital datalink. The second enhancement is a representation of multiple operators working as a team. This enhanced model was used to predict the performance of human flight crews and their level of compliance with commercial aviation communication procedures. We show how the data produced by MIDAS compares with flight crew performance data from full mission simulations. Finally, we discuss the use of these features to study communication issues connected with aircraft-based separation assurance.
Fault-Tolerant, Real-Time, Multi-Core Computer System

NASA Technical Reports Server (NTRS)

Gostelow, Kim P.

2012-01-01

A document discusses a fault-tolerant, self-aware, low-power, multi-core computer for space missions with thousands of simple cores, achieving speed through concurrency. The proposed machine decides how to achieve concurrency in real time, rather than depending on programmers. The driving features of the system are simple hardware that is modular in the extreme, with no shared memory, and software with significant runtime reorganizing capability. The document describes a mechanism for moving ongoing computations and data that is based on a functional model of execution. Because there is no shared memory, the processor connects to its neighbors through a high-speed data link. Messages are sent to a neighbor switch, which in turn forwards that message on to its neighbor until reaching the intended destination. Except for the neighbor connections, processors are isolated and independent of each other. The processors on the periphery also connect chip-to-chip, thus building up a large processor net. There is no particular topology to the larger net, as a function at each processor allows it to forward a message in the correct direction. Some chip-to-chip connections are not necessarily nearest neighbors, providing short cuts for some of the longer physical distances. The peripheral processors also provide the connections to sensors, actuators, radios, science instruments, and other devices with which the computer system interacts.
Visual and Spatial Working Memory Are Not that Dissociated after All: A Time-Based Resource-Sharing Account

ERIC Educational Resources Information Center

Vergauwe, Evie; Barrouillet, Pierre; Camos, Valerie

2009-01-01

Examinations of interference between visual and spatial materials in working memory have suggested domain- and process-based fractionations of visuo-spatial working memory. The present study examined the role of central time-based resource sharing in visuo-spatial working memory and assessed its role in obtained interference patterns. Visual and…
Rapid recovery from transient faults in the fault-tolerant processor with fault-tolerant shared memory

NASA Technical Reports Server (NTRS)

Harper, Richard E.; Butler, Bryan P.

1990-01-01

The Draper fault-tolerant processor with fault-tolerant shared memory (FTP/FTSM), which is designed to allow application tasks to continue execution during the memory alignment process, is described. Processor performance is not affected by memory alignment. In addition, the FTP/FTSM incorporates a hardware scrubber device to perform the memory alignment quickly during unused memory access cycles. The FTP/FTSM architecture is described, followed by an estimate of the time required for channel reintegration.
Relations of maternal style and child self-concept to autobiographical memories in chinese, chinese immigrant, and European american 3-year-olds.

PubMed

Wang, Qi

2006-01-01

The relations of maternal reminiscing style and child self-concept to children's shared and independent autobiographical memories were examined in a sample of 189 three-year-olds and their mothers from Chinese families in China, first-generation Chinese immigrant families in the United States, and European American families. Mothers shared memories with their children and completed questionnaires; children recounted autobiographical events and described themselves with a researcher. Independent of culture, gender, child age, and language skills, maternal elaborations and evaluations were associated with children's shared memory reports, and maternal evaluations and child agentic self-focus were associated with children's independent memory reports. Maternal style and child self-concept further mediated cultural influences on children's memory. The findings provide insight into the social-cultural construction of autobiographical memory.
VINE-A NUMERICAL CODE FOR SIMULATING ASTROPHYSICAL SYSTEMS USING PARTICLES. II. IMPLEMENTATION AND PERFORMANCE CHARACTERISTICS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nelson, Andrew F.; Wetzstein, M.; Naab, T.

2009-10-01

We continue our presentation of VINE. In this paper, we begin with a description of relevant architectural properties of the serial and shared memory parallel computers on which VINE is intended to run, and describe their influences on the design of the code itself. We continue with a detailed description of a number of optimizations made to the layout of the particle data in memory and to our implementation of a binary tree used to access that data for use in gravitational force calculations and searches for smoothed particle hydrodynamics (SPH) neighbor particles. We describe the modifications to the codemore » necessary to obtain forces efficiently from special purpose 'GRAPE' hardware, the interfaces required to allow transparent substitution of those forces in the code instead of those obtained from the tree, and the modifications necessary to use both tree and GRAPE together as a fused GRAPE/tree combination. We conclude with an extensive series of performance tests, which demonstrate that the code can be run efficiently and without modification in serial on small workstations or in parallel using the OpenMP compiler directives on large-scale, shared memory parallel machines. We analyze the effects of the code optimizations and estimate that they improve its overall performance by more than an order of magnitude over that obtained by many other tree codes. Scaled parallel performance of the gravity and SPH calculations, together the most costly components of most simulations, is nearly linear up to at least 120 processors on moderate sized test problems using the Origin 3000 architecture, and to the maximum machine sizes available to us on several other architectures. At similar accuracy, performance of VINE, used in GRAPE-tree mode, is approximately a factor 2 slower than that of VINE, used in host-only mode. Further optimizations of the GRAPE/host communications could improve the speed by as much as a factor of 3, but have not yet been implemented in VINE. Finally, we find that although parallel performance on small problems may reach a plateau beyond which more processors bring no additional speedup, performance never decreases, a factor important for running large simulations on many processors with individual time steps, where only a small fraction of the total particles require updates at any given moment.« less
A variable-mode stator consequent pole memory machine

NASA Astrophysics Data System (ADS)

Yang, Hui; Lyu, Shukang; Lin, Heyun; Zhu, Z. Q.

2018-05-01

In this paper, a variable-mode concept is proposed for the speed range extension of a stator-consequent-pole memory machine (SCPMM). An integrated permanent magnet (PM) and electrically excited control scheme is utilized to simplify the flux-weakening control instead of relatively complicated continuous PM magnetization control. Due to the nature of memory machine, the magnetization state of low coercive force (LCF) magnets can be easily changed by applying either a positive or negative current pulse. Therefore, the number of PM poles may be changed to satisfy the specific performance requirement under different speed ranges, i.e. the machine with all PM poles can offer high torque output while that with half PM poles provides wide constant power range. In addition, the SCPMM with non-magnetized PMs can be considered as a dual-three phase electrically excited reluctance machine, which can be fed by an open-winding based dual inverters that provide direct current (DC) bias excitation to further extend the speed range. The effectiveness of the proposed variable-mode operation for extending its operating region and improving the system reliability is verified by both finite element analysis (FEA) and experiments.
Cognitive Machine-Learning Algorithm for Cardiac Imaging: A Pilot Study for Differentiating Constrictive Pericarditis From Restrictive Cardiomyopathy.

PubMed

Sengupta, Partho P; Huang, Yen-Min; Bansal, Manish; Ashrafi, Ali; Fisher, Matt; Shameer, Khader; Gall, Walt; Dudley, Joel T

2016-06-01

Associating a patient's profile with the memories of prototypical patients built through previous repeat clinical experience is a key process in clinical judgment. We hypothesized that a similar process using a cognitive computing tool would be well suited for learning and recalling multidimensional attributes of speckle tracking echocardiography data sets derived from patients with known constrictive pericarditis and restrictive cardiomyopathy. Clinical and echocardiographic data of 50 patients with constrictive pericarditis and 44 with restrictive cardiomyopathy were used for developing an associative memory classifier-based machine-learning algorithm. The speckle tracking echocardiography data were normalized in reference to 47 controls with no structural heart disease, and the diagnostic area under the receiver operating characteristic curve of the associative memory classifier was evaluated for differentiating constrictive pericarditis from restrictive cardiomyopathy. Using only speckle tracking echocardiography variables, associative memory classifier achieved a diagnostic area under the curve of 89.2%, which improved to 96.2% with addition of 4 echocardiographic variables. In comparison, the area under the curve of early diastolic mitral annular velocity and left ventricular longitudinal strain were 82.1% and 63.7%, respectively. Furthermore, the associative memory classifier demonstrated greater accuracy and shorter learning curves than other machine-learning approaches, with accuracy asymptotically approaching 90% after a training fraction of 0.3 and remaining flat at higher training fractions. This study demonstrates feasibility of a cognitive machine-learning approach for learning and recalling patterns observed during echocardiographic evaluations. Incorporation of machine-learning algorithms in cardiac imaging may aid standardized assessments and support the quality of interpretations, particularly for novice readers with limited experience. © 2016 American Heart Association, Inc.
Performance Evaluation of Remote Memory Access (RMA) Programming on Shared Memory Parallel Computers

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Jost, Gabriele; Biegel, Bryan A. (Technical Monitor)

2002-01-01

The purpose of this study is to evaluate the feasibility of remote memory access (RMA) programming on shared memory parallel computers. We discuss different RMA based implementations of selected CFD application benchmark kernels and compare them to corresponding message passing based codes. For the message-passing implementation we use MPI point-to-point and global communication routines. For the RMA based approach we consider two different libraries supporting this programming model. One is a shared memory parallelization library (SMPlib) developed at NASA Ames, the other is the MPI-2 extensions to the MPI Standard. We give timing comparisons for the different implementation strategies and discuss the performance.
SAHAYOG: A Testbed for Load Sharing under Failure,

DTIC Science & Technology

1987-07-01

messages, shared memory and semaphores . To communicate using messages, processes create message queues using system-provided prim- itives. The message...The size of the memory that is to be shared is decided by the process when it makes a request for memory allocation. The semaphore option of IPC can be...used to prevent two or more concurrent processes from executing their critical sections at the same time. Semaphores must be used when the processes
Implementation and evaluation of shared-memory communication and synchronization operations in MPICH2 using the Nemesis communication subsystem.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Buntinas, D.; Mercier, G.; Gropp, W.

2007-09-01

This paper presents the implementation of MPICH2 over the Nemesis communication subsystem and the evaluation of its shared-memory performance. We describe design issues as well as some of the optimization techniques we employed. We conducted a performance evaluation over shared memory using microbenchmarks. The evaluation shows that MPICH2 Nemesis has very low communication overhead, making it suitable for smaller-grained applications.
First steps in using machine learning on fMRI data to predict intrusive memories of traumatic film footage

PubMed Central

Clark, Ian A.; Niehaus, Katherine E.; Duff, Eugene P.; Di Simplicio, Martina C.; Clifford, Gari D.; Smith, Stephen M.; Mackay, Clare E.; Woolrich, Mark W.; Holmes, Emily A.

2014-01-01

After psychological trauma, why do some only some parts of the traumatic event return as intrusive memories while others do not? Intrusive memories are key to cognitive behavioural treatment for post-traumatic stress disorder, and an aetiological understanding is warranted. We present here analyses using multivariate pattern analysis (MVPA) and a machine learning classifier to investigate whether peri-traumatic brain activation was able to predict later intrusive memories (i.e. before they had happened). To provide a methodological basis for understanding the context of the current results, we first show how functional magnetic resonance imaging (fMRI) during an experimental analogue of trauma (a trauma film) via a prospective event-related design was able to capture an individual's later intrusive memories. Results showed widespread increases in brain activation at encoding when viewing a scene in the scanner that would later return as an intrusive memory in the real world. These fMRI results were replicated in a second study. While traditional mass univariate regression analysis highlighted an association between brain processing and symptomatology, this is not the same as prediction. Using MVPA and a machine learning classifier, it was possible to predict later intrusive memories across participants with 68% accuracy, and within a participant with 97% accuracy; i.e. the classifier could identify out of multiple scenes those that would later return as an intrusive memory. We also report here brain networks key in intrusive memory prediction. MVPA opens the possibility of decoding brain activity to reconstruct idiosyncratic cognitive events with relevance to understanding and predicting mental health symptoms. PMID:25151915
Comparison of two paradigms for distributed shared memory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Levelt, W.G.; Kaashoek, M.F.; Bal, H.E.

1990-08-01

The paper compares two paradigms for Distributed Shared Memory on loosely coupled computing systems: the shared data-object model as used in Orca, a programming language specially designed for loosely coupled computing systems and the Shared Virtual Memory model. For both paradigms the authors have implemented two systems, one using only point-to-point messages, the other using broadcasting as well. They briefly describe these two paradigms and their implementations. Then they compare their performance on four applications: the traveling salesman problem, alpha-beta search, matrix multiplication and the all pairs shortest paths problem. The measurements show that both paradigms can be used efficientlymore » for programming large-grain parallel applications. Significant speedups were obtained on all applications. The unstructured Shared Virtual Memory paradigm achieves the best absolute performance, although this is largely due to the preliminary nature of the Orca compiler used. The structured shared data-object model achieves the highest speedups and is much easier to program and to debug.« less
The Linux operating system: An introduction

NASA Technical Reports Server (NTRS)

Bokhari, Shahid H.

1995-01-01

Linux is a Unix-like operating system for Intel 386/486/Pentium based IBM-PCs and compatibles. The kernel of this operating system was written from scratch by Linus Torvalds and, although copyrighted by the author, may be freely distributed. A world-wide group has collaborated in developing Linux on the Internet. Linux can run the powerful set of compilers and programming tools of the Free Software Foundation, and XFree86, a port of the X Window System from MIT. Most capabilities associated with high performance workstations, such as networking, shared file systems, electronic mail, TeX, LaTeX, etc. are freely available for Linux. It can thus transform cheap IBM-PC compatible machines into Unix workstations with considerable capabilities. The author explains how Linux may be obtained, installed and networked. He also describes some interesting applications for Linux that are freely available. The enormous consumer market for IBM-PC compatible machines continually drives down prices of CPU chips, memory, hard disks, CDROMs, etc. Linux can convert such machines into powerful workstations that can be used for teaching, research and software development. For professionals who use Unix based workstations at work, Linux permits virtually identical working environments on their personal home machines. For cost conscious educational institutions Linux can create world-class computing environments from cheap, easily maintained, PC clones. Finally, for university students, it provides an essentially cost-free path away from DOS into the world of Unix and X Windows.
Low latency messages on distributed memory multiprocessors

NASA Technical Reports Server (NTRS)

Rosing, Matthew; Saltz, Joel

1993-01-01

Many of the issues in developing an efficient interface for communication on distributed memory machines are described and a portable interface is proposed. Although the hardware component of message latency is less than one microsecond on many distributed memory machines, the software latency associated with sending and receiving typed messages is on the order of 50 microseconds. The reason for this imbalance is that the software interface does not match the hardware. By changing the interface to match the hardware more closely, applications with fine grained communication can be put on these machines. Based on several tests that were run on the iPSC/860, an interface that will better match current distributed memory machines is proposed. The model used in the proposed interface consists of a computation processor and a communication processor on each node. Communication between these processors and other nodes in the system is done through a buffered network. Information that is transmitted is either data or procedures to be executed on the remote processor. The dual processor system is better suited for efficiently handling asynchronous communications compared to a single processor system. The ability to send data or procedure is very flexible for minimizing message latency, based on the type of communication being performed. The test performed and the proposed interface are described.
Assisted navigation based on shared-control, using discrete and sparse human-machine interfaces.

PubMed

Lopes, Ana C; Nunes, Urbano; Vaz, Luis; Vaz, Luís

2010-01-01

This paper presents a shared-control approach for Assistive Mobile Robots (AMR), which depends on the user's ability to navigate a semi-autonomous powered wheelchair, using a sparse and discrete human-machine interface (HMI). This system is primarily intended to help users with severe motor disabilities that prevent them to use standard human-machine interfaces. Scanning interfaces and Brain Computer Interfaces (BCI), characterized to provide a small set of commands issued sparsely, are possible HMIs. This shared-control approach is intended to be applied in an Assisted Navigation Training Framework (ANTF) that is used to train users' ability in steering a powered wheelchair in an appropriate manner, given the restrictions imposed by their limited motor capabilities. A shared-controller based on user characterization, is proposed. This controller is able to share the information provided by the local motion planning level with the commands issued sparsely by the user. Simulation results of the proposed shared-control method, are presented.

Working memory resources are shared across sensory modalities.

PubMed

Salmela, V R; Moisala, M; Alho, K

2014-10-01

A common assumption in the working memory literature is that the visual and auditory modalities have separate and independent memory stores. Recent evidence on visual working memory has suggested that resources are shared between representations, and that the precision of representations sets the limit for memory performance. We tested whether memory resources are also shared across sensory modalities. Memory precision for two visual (spatial frequency and orientation) and two auditory (pitch and tone duration) features was measured separately for each feature and for all possible feature combinations. Thus, only the memory load was varied, from one to four features, while keeping the stimuli similar. In Experiment 1, two gratings and two tones-both containing two varying features-were presented simultaneously. In Experiment 2, two gratings and two tones-each containing only one varying feature-were presented sequentially. The memory precision (delayed discrimination threshold) for a single feature was close to the perceptual threshold. However, as the number of features to be remembered was increased, the discrimination thresholds increased more than twofold. Importantly, the decrease in memory precision did not depend on the modality of the other feature(s), or on whether the features were in the same or in separate objects. Hence, simultaneously storing one visual and one auditory feature had an effect on memory precision equal to those of simultaneously storing two visual or two auditory features. The results show that working memory is limited by the precision of the stored representations, and that working memory can be described as a resource pool that is shared across modalities.
Performance Modeling and Measurement of Parallelized Code for Distributed Shared Memory Multiprocessors

NASA Technical Reports Server (NTRS)

Waheed, Abdul; Yan, Jerry

1998-01-01

This paper presents a model to evaluate the performance and overhead of parallelizing sequential code using compiler directives for multiprocessing on distributed shared memory (DSM) systems. With increasing popularity of shared address space architectures, it is essential to understand their performance impact on programs that benefit from shared memory multiprocessing. We present a simple model to characterize the performance of programs that are parallelized using compiler directives for shared memory multiprocessing. We parallelized the sequential implementation of NAS benchmarks using native Fortran77 compiler directives for an Origin2000, which is a DSM system based on a cache-coherent Non Uniform Memory Access (ccNUMA) architecture. We report measurement based performance of these parallelized benchmarks from four perspectives: efficacy of parallelization process; scalability; parallelization overhead; and comparison with hand-parallelized and -optimized version of the same benchmarks. Our results indicate that sequential programs can conveniently be parallelized for DSM systems using compiler directives but realizing performance gains as predicted by the performance model depends primarily on minimizing architecture-specific data locality overhead.
Distributed simulation using a real-time shared memory network

NASA Technical Reports Server (NTRS)

Simon, Donald L.; Mattern, Duane L.; Wong, Edmond; Musgrave, Jeffrey L.

1993-01-01

The Advanced Control Technology Branch of the NASA Lewis Research Center performs research in the area of advanced digital controls for aeronautic and space propulsion systems. This work requires the real-time implementation of both control software and complex dynamical models of the propulsion system. We are implementing these systems in a distributed, multi-vendor computer environment. Therefore, a need exists for real-time communication and synchronization between the distributed multi-vendor computers. A shared memory network is a potential solution which offers several advantages over other real-time communication approaches. A candidate shared memory network was tested for basic performance. The shared memory network was then used to implement a distributed simulation of a ramjet engine. The accuracy and execution time of the distributed simulation was measured and compared to the performance of the non-partitioned simulation. The ease of partitioning the simulation, the minimal time required to develop for communication between the processors and the resulting execution time all indicate that the shared memory network is a real-time communication technique worthy of serious consideration.
Hardware support for software controlled fast multiplexing of performance counters

DOEpatents

Salapura, Valentina; Wisniewski, Robert W

2013-10-01

Performance counters may be operable to collect one or more counts of one or more selected activities, and registers may be operable to store a set of performance counter configurations. A state machine may be operable to automatically select a register from the registers for reconfiguring the one or more performance counters in response to receiving a first signal. The state machine may be further operable to reconfigure the one or more performance counters based on a configuration specified in the selected register. The state machine yet further may be operable to copy data in selected one or more of the performance counters to a memory location, or to copy data from the memory location to the counters, in response to receiving a second signal. The state machine may be operable to store or restore the counter values and state machine configuration in response to a context switch event.
Hardware support for software controlled fast multiplexing of performance counters

DOEpatents

Salapura, Valentina; Wisniewski, Robert W.

2013-01-01

Performance counters may be operable to collect one or more counts of one or more selected activities, and registers may be operable to store a set of performance counter configurations. A state machine may be operable to automatically select a register from the registers for reconfiguring the one or more performance counters in response to receiving a first signal. The state machine may be further operable to reconfigure the one or more performance counters based on a configuration specified in the selected register. The state machine yet further may be operable to copy data in selected one or more of the performance counters to a memory location, or to copy data from the memory location to the counters, in response to receiving a second signal. The state machine may be operable to store or restore the counter values and state machine configuration in response to a context switch event.
Cross-Modal Decoding of Neural Patterns Associated with Working Memory: Evidence for Attention-Based Accounts of Working Memory

PubMed Central

Majerus, Steve; Cowan, Nelson; Péters, Frédéric; Van Calster, Laurens; Phillips, Christophe; Schrouff, Jessica

2016-01-01

Recent studies suggest common neural substrates involved in verbal and visual working memory (WM), interpreted as reflecting shared attention-based, short-term retention mechanisms. We used a machine-learning approach to determine more directly the extent to which common neural patterns characterize retention in verbal WM and visual WM. Verbal WM was assessed via a standard delayed probe recognition task for letter sequences of variable length. Visual WM was assessed via a visual array WM task involving the maintenance of variable amounts of visual information in the focus of attention. We trained a classifier to distinguish neural activation patterns associated with high- and low-visual WM load and tested the ability of this classifier to predict verbal WM load (high–low) from their associated neural activation patterns, and vice versa. We observed significant between-task prediction of load effects during WM maintenance, in posterior parietal and superior frontal regions of the dorsal attention network; in contrast, between-task prediction in sensory processing cortices was restricted to the encoding stage. Furthermore, between-task prediction of load effects was strongest in those participants presenting the highest capacity for the visual WM task. This study provides novel evidence for common, attention-based neural patterns supporting verbal and visual WM. PMID:25146374
ShareSync: A Solution for Deterministic Data Sharing over Ethernet

NASA Technical Reports Server (NTRS)

Dunn, Daniel J., II; Koons, William A.; Kennedy, Richard D.; Davis, Philip A.

2007-01-01

As part of upgrading the Contact Dynamics Simulation Laboratory (CDSL) at the NASA Marshall Space Flight Center (MSFC), a simple, cost effective method was needed to communicate data among the networked simulation machines and I/O controllers used to run the facility. To fill this need and similar applicable situations, a generic protocol was developed, called ShareSync. ShareSync is a lightweight, real-time, publish-subscribe Ethernet protocol for simple and deterministic data sharing across diverse machines and operating systems. ShareSync provides a simple Application Programming Interface (API) for simulation programmers to incorporate into their code. The protocol is compatible with virtually all Ethernet-capable machines, is flexible enough to support a variety of applications, is fast enough to provide soft real-time determinism, and is a low-cost resource for distributed simulation development, deployment, and maintenance. The first design cycle iteration of ShareSync has been completed, and the protocol has undergone several testing procedures including endurance and benchmarking tests and approaches the 2001ts data synchronization design goal for the CDSL.
Effect of cutting parameters on strain hardening of nickel–titanium shape memory alloy

NASA Astrophysics Data System (ADS)

Wang, Guijie; Liu, Zhanqiang; Ai, Xing; Huang, Weimin; Niu, Jintao

2018-07-01

Nickel–titanium shape memory alloy (SMA) has been widely used as implant materials due to its good biocompatibility, shape memory property and super-elasticity. However, the severe strain hardening is a main challenge due to cutting force and temperature caused by machining. An orthogonal experiment of nickel–titanium SMA with different milling parameters conditions was conducted in this paper. On the one hand, the effect of cutting parameters on work hardening is obtained. It is found that the cutting speed has the most important effect on work hardening. The depth of machining induced layer and the degree of hardening become smaller with the increase of cutting speed when the cutting speed is less than 200 m min‑1 and then get larger with further increase of cutting speed. The relative intensity of diffraction peak increases as the cutting speed increase. In addition, all of the depth of machining induced layer, the degree of hardening and the relative intensity of diffraction peak increase when the feed rate increases. On the other hand, it is found that the depth of machining induced layer is closely related with the degree of hardening and phase transition. The higher the content of austenite in the machined surface is, the higher the degree of hardening will be. The depth of the machining induced layer increases with the degree of hardening increasing.
Multiprocessor shared-memory information exchange

DOE Office of Scientific and Technical Information (OSTI.GOV)

Santoline, L.L.; Bowers, M.D.; Crew, A.W.

1989-02-01

In distributed microprocessor-based instrumentation and control systems, the inter-and intra-subsystem communication requirements ultimately form the basis for the overall system architecture. This paper describes a software protocol which addresses the intra-subsystem communications problem. Specifically the protocol allows for multiple processors to exchange information via a shared-memory interface. The authors primary goal is to provide a reliable means for information to be exchanged between central application processor boards (masters) and dedicated function processor boards (slaves) in a single computer chassis. The resultant Multiprocessor Shared-Memory Information Exchange (MSMIE) protocol, a standard master-slave shared-memory interface suitable for use in nuclear safety systems, ismore » designed to pass unidirectional buffers of information between the processors while providing a minimum, deterministic cycle time for this data exchange.« less
Computerized scoring algorithms for the Autobiographical Memory Test.

PubMed

Takano, Keisuke; Gutenbrunner, Charlotte; Martens, Kris; Salmon, Karen; Raes, Filip

2018-02-01

Reduced specificity of autobiographical memories is a hallmark of depressive cognition. Autobiographical memory (AM) specificity is typically measured by the Autobiographical Memory Test (AMT), in which respondents are asked to describe personal memories in response to emotional cue words. Due to this free descriptive responding format, the AMT relies on experts' hand scoring for subsequent statistical analyses. This manual coding potentially impedes research activities in big data analytics such as large epidemiological studies. Here, we propose computerized algorithms to automatically score AM specificity for the Dutch (adult participants) and English (youth participants) versions of the AMT by using natural language processing and machine learning techniques. The algorithms showed reliable performances in discriminating specific and nonspecific (e.g., overgeneralized) autobiographical memories in independent testing data sets (area under the receiver operating characteristic curve > .90). Furthermore, outcome values of the algorithms (i.e., decision values of support vector machines) showed a gradient across similar (e.g., specific and extended memories) and different (e.g., specific memory and semantic associates) categories of AMT responses, suggesting that, for both adults and youth, the algorithms well capture the extent to which a memory has features of specific memories. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Balance in machine architecture: Bandwidth on board and offboard, integer/control speed and flops versus memory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fischler, M.

1992-04-01

The issues to be addressed here are those of balance'' in machine architecture. By this, we mean how much emphasis must be placed on various aspects of the system to maximize its usefulness for physics. There are three components that contribute to the utility of a system: How the machine can be used, how big a problem can be attacked, and what the effective capabilities (power) of the hardware are like. The effective power issue is a matter of evaluating the impact of design decisions trading off architectural features such as memory bandwidth and interprocessor communication capabilities. What is studiedmore » is the effect these machine parameters have on how quickly the system can solve desired problems. There is a reasonable method for studying this: One selects a few representative algorithms and computes the impact of changing memory bandwidths, and so forth. The only room for controversy here is in the selection of representative problems. The issue of how big a problem can be attacked boils down to a balance of memory size versus power. Although this is a balance issue it is very different than the effective power situation, because no firm answer can be given at this time. The power to memory ratio is highly problem dependent, and optimizing it requires several pieces of physics input, including: how big a lattice is needed for interesting results; what sort of algorithms are best to use; and how many sweeps are needed to get valid results. We seem to be at the threshold of learning things about these issues, but for now, the memory size issue will necessarily be addressed in terms of best guesses, rules of thumb, and researchers' opinions.« less
Balance in machine architecture: Bandwidth on board and offboard, integer/control speed and flops versus memory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fischler, M.

1992-04-01

The issues to be addressed here are those of ``balance`` in machine architecture. By this, we mean how much emphasis must be placed on various aspects of the system to maximize its usefulness for physics. There are three components that contribute to the utility of a system: How the machine can be used, how big a problem can be attacked, and what the effective capabilities (power) of the hardware are like. The effective power issue is a matter of evaluating the impact of design decisions trading off architectural features such as memory bandwidth and interprocessor communication capabilities. What is studiedmore » is the effect these machine parameters have on how quickly the system can solve desired problems. There is a reasonable method for studying this: One selects a few representative algorithms and computes the impact of changing memory bandwidths, and so forth. The only room for controversy here is in the selection of representative problems. The issue of how big a problem can be attacked boils down to a balance of memory size versus power. Although this is a balance issue it is very different than the effective power situation, because no firm answer can be given at this time. The power to memory ratio is highly problem dependent, and optimizing it requires several pieces of physics input, including: how big a lattice is needed for interesting results; what sort of algorithms are best to use; and how many sweeps are needed to get valid results. We seem to be at the threshold of learning things about these issues, but for now, the memory size issue will necessarily be addressed in terms of best guesses, rules of thumb, and researchers` opinions.« less
Memory access in shared virtual memory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Berrendorf, R.

1992-01-01

Shared virtual memory (SVM) is a virtual memory layer with a single address space on top of a distributed real memory on parallel computers. We examine the behavior and performance of SVM running a parallel program with medium-grained, loop-level parallelism on top of it. A simulator for the underlying parallel architecture can be used to examine the behavior of SVM more deeply. The influence of several parameters, such as the number of processors, page size, cold or warm start, and restricted page replication, is studied.
Memory access in shared virtual memory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Berrendorf, R.

1992-09-01

Shared virtual memory (SVM) is a virtual memory layer with a single address space on top of a distributed real memory on parallel computers. We examine the behavior and performance of SVM running a parallel program with medium-grained, loop-level parallelism on top of it. A simulator for the underlying parallel architecture can be used to examine the behavior of SVM more deeply. The influence of several parameters, such as the number of processors, page size, cold or warm start, and restricted page replication, is studied.
Multi-Response Optimization of WEDM Process Parameters Using Taguchi Based Desirability Function Analysis

NASA Astrophysics Data System (ADS)

Majumder, Himadri; Maity, Kalipada

2018-03-01

Shape memory alloy has a unique capability to return to its original shape after physical deformation by applying heat or thermo-mechanical or magnetic load. In this experimental investigation, desirability function analysis (DFA), a multi-attribute decision making was utilized to find out the optimum input parameter setting during wire electrical discharge machining (WEDM) of Ni-Ti shape memory alloy. Four critical machining parameters, namely pulse on time (TON), pulse off time (TOFF), wire feed (WF) and wire tension (WT) were taken as machining inputs for the experiments to optimize three interconnected responses like cutting speed, kerf width, and surface roughness. Input parameter combination TON = 120 μs., TOFF = 55 μs., WF = 3 m/min. and WT = 8 kg-F were found to produce the optimum results. The optimum process parameters for each desired response were also attained using Taguchi’s signal-to-noise ratio. Confirmation test has been done to validate the optimum machining parameter combination which affirmed DFA was a competent approach to select optimum input parameters for the ideal response quality for WEDM of Ni-Ti shape memory alloy.
Multi-variants synthesis of Petri nets for FPGA devices

NASA Astrophysics Data System (ADS)

Bukowiec, Arkadiusz; Doligalski, Michał

2015-09-01

There is presented new method of synthesis of application specific logic controllers for FPGA devices. The specification of control algorithm is made with use of control interpreted Petri net (PT type). It allows specifying parallel processes in easy way. The Petri net is decomposed into state-machine type subnets. In this case, each subnet represents one parallel process. For this purpose there are applied algorithms of coloring of Petri nets. There are presented two approaches of such decomposition: with doublers of macroplaces or with one global wait place. Next, subnets are implemented into two-level logic circuit of the controller. The levels of logic circuit are obtained as a result of its architectural decomposition. The first level combinational circuit is responsible for generation of next places and second level decoder is responsible for generation output symbols. There are worked out two variants of such circuits: with one shared operational memory or with many flexible distributed memories as a decoder. Variants of Petri net decomposition and structures of logic circuits can be combined together without any restrictions. It leads to existence of four variants of multi-variants synthesis.
A discrete Fourier transform for virtual memory machines

NASA Technical Reports Server (NTRS)

Galant, David C.

1992-01-01

An algebraic theory of the Discrete Fourier Transform is developed in great detail. Examination of the details of the theory leads to a computationally efficient fast Fourier transform for the use on computers with virtual memory. Such an algorithm is of great use on modern desktop machines. A FORTRAN coded version of the algorithm is given for the case when the sequence of numbers to be transformed is a power of two.
The Effect of Using a Written Preanesthetic Machine Checklist on Detection of Anesthesia Machine Faults.

DTIC Science & Technology

1986-08-01

Craik and Lockhart describes the various levels of information processing involved in memory (8). The preliminary level ...A prevalent malfunction. Anesthesia and Analg~esia, 1985, 64: 745-747. 8. Craik , F.I.M., Lockhart , R.S. Levels of processing : A framework for memory...research. Journal of Verbal Learning and Behavior, 1972, 11: 671-684. 9. Craik , F.I.M., Lockhart , R.S. Levels of processing : A * framework for
Working Memory Span Development: A Time-Based Resource-Sharing Model Account

ERIC Educational Resources Information Center

Barrouillet, Pierre; Gavens, Nathalie; Vergauwe, Evie; Gaillard, Vinciane; Camos, Valerie

2009-01-01

The time-based resource-sharing model (P. Barrouillet, S. Bernardin, & V. Camos, 2004) assumes that during complex working memory span tasks, attention is frequently and surreptitiously switched from processing to reactivate decaying memory traces before their complete loss. Three experiments involving children from 5 to 14 years of age…
Influence of magnet eddy current on magnetization characteristics of variable flux memory machine

NASA Astrophysics Data System (ADS)

Yang, Hui; Lin, Heyun; Zhu, Z. Q.; Lyu, Shukang

2018-05-01

In this paper, the magnet eddy current characteristics of a newly developed variable flux memory machine (VFMM) is investigated. Firstly, the machine structure, non-linear hysteresis characteristics and eddy current modeling of low coercive force magnet are described, respectively. Besides, the PM eddy current behaviors when applying the demagnetizing current pulses are unveiled and investigated. The mismatch of the required demagnetization currents between the cases with or without considering the magnet eddy current is identified. In addition, the influences of the magnet eddy current on the demagnetization effect of VFMM are analyzed. Finally, a prototype is manufactured and tested to verify the theoretical analyses.

Direct access inter-process shared memory

DOEpatents

Brightwell, Ronald B; Pedretti, Kevin; Hudson, Trammell B

2013-10-22

A technique for directly sharing physical memory between processes executing on processor cores is described. The technique includes loading a plurality of processes into the physical memory for execution on a corresponding plurality of processor cores sharing the physical memory. An address space is mapped to each of the processes by populating a first entry in a top level virtual address table for each of the processes. The address space of each of the processes is cross-mapped into each of the processes by populating one or more subsequent entries of the top level virtual address table with the first entry in the top level virtual address table from other processes.
Memory Network For Distributed Data Processors

NASA Technical Reports Server (NTRS)

Bolen, David; Jensen, Dean; Millard, ED; Robinson, Dave; Scanlon, George

1992-01-01

Universal Memory Network (UMN) is modular, digital data-communication system enabling computers with differing bus architectures to share 32-bit-wide data between locations up to 3 km apart with less than one millisecond of latency. Makes it possible to design sophisticated real-time and near-real-time data-processing systems without data-transfer "bottlenecks". This enterprise network permits transmission of volume of data equivalent to an encyclopedia each second. Facilities benefiting from Universal Memory Network include telemetry stations, simulation facilities, power-plants, and large laboratories or any facility sharing very large volumes of data. Main hub of UMN is reflection center including smaller hubs called Shared Memory Interfaces.
Grouping and binding in visual short-term memory.

PubMed

Quinlan, Philip T; Cohen, Dale J

2012-09-01

Findings of 2 experiments are reported that challenge the current understanding of visual short-term memory (VSTM). In both experiments, a single study display, containing 6 colored shapes, was presented briefly and then probed with a single colored shape. At stake is how VSTM retains a record of different objects that share common features: In the 1st experiment, 2 study items sometimes shared a common feature (either a shape or a color). The data revealed a color sharing effect, in which memory was much better for items that shared a common color than for items that did not. The 2nd experiment showed that the size of the color sharing effect depended on whether a single pair of items shared a common color or whether 2 pairs of items were so defined-memory for all items improved when 2 color groups were presented. In explaining performance, an account is advanced in which items compete for a fixed number of slots, but then memory recall for any given stored item is prone to error. A critical assumption is that items that share a common color are stored together in a slot as a chunk. The evidence provides further support for the idea that principles of perceptual organization may determine the manner in which items are stored in VSTM. PsycINFO Database Record (c) 2012 APA, all rights reserved.
Managing virtual machines with Vac and Vcycle

NASA Astrophysics Data System (ADS)

McNab, A.; Love, P.; MacMahon, E.

2015-12-01

We compare the Vac and Vcycle virtual machine lifecycle managers and our experiences in providing production job execution services for ATLAS, CMS, LHCb, and the GridPP VO at sites in the UK, France and at CERN. In both the Vac and Vcycle systems, the virtual machines are created outside of the experiment's job submission and pilot framework. In the case of Vac, a daemon runs on each physical host which manages a pool of virtual machines on that host, and a peer-to-peer UDP protocol is used to achieve the desired target shares between experiments across the site. In the case of Vcycle, a daemon manages a pool of virtual machines on an Infrastructure-as-a-Service cloud system such as OpenStack, and has within itself enough information to create the types of virtual machines to achieve the desired target shares. Both systems allow unused shares for one experiment to temporarily taken up by other experiements with work to be done. The virtual machine lifecycle is managed with a minimum of information, gathered from the virtual machine creation mechanism (such as libvirt or OpenStack) and using the proposed Machine/Job Features API from WLCG. We demonstrate that the same virtual machine designs can be used to run production jobs on Vac and Vcycle/OpenStack sites for ATLAS, CMS, LHCb, and GridPP, and that these technologies allow sites to be operated in a reliable and robust way.
Machine parts recognition using a trinary associative memory

NASA Technical Reports Server (NTRS)

Awwal, Abdul Ahad S.; Karim, Mohammad A.; Liu, Hua-Kuang

1989-01-01

The convergence mechanism of vectors in Hopfield's neural network in relation to recognition of partially known patterns is studied in terms of both inner products and Hamming distance. It has been shown that Hamming distance should not always be used in determining the convergence of vectors. Instead, inner product weighting coefficients play a more dominant role in certain data representations for determining the convergence mechanism. A trinary neuron representation for associative memory is found to be more effective for associative recall. Applications of the trinary associative memory to reconstruct machine part images that are partially missing are demonstrated by means of computer simulation as examples of the usefulness of this approach.
Machine Parts Recognition Using A Trinary Associative Memory

NASA Astrophysics Data System (ADS)

Awwal, Abdul A. S.; Karim, Mohammad A.; Liu, Hua-Kuang

1989-05-01

The convergence mechanism of vectors in Hopfield's neural network in relation to recognition of partially known patterns is studied in terms of both inner products and Hamming distance. It has been shown that Hamming distance should not always be used in determining the convergence of vectors. Instead, inner product weighting coefficients play a more dominant role in certain data representations for determining the convergence mechanism. A trinary neuron representation for associative memory is found to be more effective for associative recall. Applications of the trinary associative memory to reconstruct machine part images that are partially missing are demonstrated by means of computer simulation as examples of the usefulness of this approach.
Shared memories reveal shared structure in neural activity across individuals

PubMed Central

Chen, J.; Leong, Y.C.; Honey, C.J.; Yong, C.H.; Norman, K.A.; Hasson, U.

2016-01-01

Our lives revolve around sharing experiences and memories with others. When different people recount the same events, how similar are their underlying neural representations? Participants viewed a fifty-minute movie, then verbally described the events during functional MRI, producing unguided detailed descriptions lasting up to forty minutes. As each person spoke, event-specific spatial patterns were reinstated in default-network, medial-temporal, and high-level visual areas. Individual event patterns were both highly discriminable from one another and similar between people, suggesting consistent spatial organization. In many high-order areas, patterns were more similar between people recalling the same event than between recall and perception, indicating systematic reshaping of percept into memory. These results reveal the existence of a common spatial organization for memories in high-level cortical areas, where encoded information is largely abstracted beyond sensory constraints; and that neural patterns during perception are altered systematically across people into shared memory representations for real-life events. PMID:27918531
Combining Distributed and Shared Memory Models: Approach and Evolution of the Global Arrays Toolkit

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nieplocha, Jarek; Harrison, Robert J.; Kumar, Mukul

2002-07-29

Both shared memory and distributed memory models have advantages and shortcomings. Shared memory model is much easier to use but it ignores data locality/placement. Given the hierarchical nature of the memory subsystems in the modern computers this characteristic might have a negative impact on performance and scalability. Various techniques, such as code restructuring to increase data reuse and introducing blocking in data accesses, can address the problem and yield performance competitive with message passing[Singh], however at the cost of compromising the ease of use feature. Distributed memory models such as message passing or one-sided communication offer performance and scalability butmore » they compromise the ease-of-use. In this context, the message-passing model is sometimes referred to as?assembly programming for the scientific computing?. The Global Arrays toolkit[GA1, GA2] attempts to offer the best features of both models. It implements a shared-memory programming model in which data locality is managed explicitly by the programmer. This management is achieved by explicit calls to functions that transfer data between a global address space (a distributed array) and local storage. In this respect, the GA model has similarities to the distributed shared-memory models that provide an explicit acquire/release protocol. However, the GA model acknowledges that remote data is slower to access than local data and allows data locality to be explicitly specified and hence managed. The GA model exposes to the programmer the hierarchical memory of modern high-performance computer systems, and by recognizing the communication overhead for remote data transfer, it promotes data reuse and locality of reference. This paper describes the characteristics of the Global Arrays programming model, capabilities of the toolkit, and discusses its evolution.« less
Parallel computing for probabilistic fatigue analysis

NASA Technical Reports Server (NTRS)

Sues, Robert H.; Lua, Yuan J.; Smith, Mark D.

1993-01-01

This paper presents the results of Phase I research to investigate the most effective parallel processing software strategies and hardware configurations for probabilistic structural analysis. We investigate the efficiency of both shared and distributed-memory architectures via a probabilistic fatigue life analysis problem. We also present a parallel programming approach, the virtual shared-memory paradigm, that is applicable across both types of hardware. Using this approach, problems can be solved on a variety of parallel configurations, including networks of single or multiprocessor workstations. We conclude that it is possible to effectively parallelize probabilistic fatigue analysis codes; however, special strategies will be needed to achieve large-scale parallelism to keep large number of processors busy and to treat problems with the large memory requirements encountered in practice. We also conclude that distributed-memory architecture is preferable to shared-memory for achieving large scale parallelism; however, in the future, the currently emerging hybrid-memory architectures will likely be optimal.
Accelerating adaptive inverse distance weighting interpolation algorithm on a graphics processing unit

PubMed Central

Xu, Liangliang; Xu, Nengxiong

2017-01-01

This paper focuses on designing and implementing parallel adaptive inverse distance weighting (AIDW) interpolation algorithms by using the graphics processing unit (GPU). The AIDW is an improved version of the standard IDW, which can adaptively determine the power parameter according to the data points’ spatial distribution pattern and achieve more accurate predictions than those predicted by IDW. In this paper, we first present two versions of the GPU-accelerated AIDW, i.e. the naive version without profiting from the shared memory and the tiled version taking advantage of the shared memory. We also implement the naive version and the tiled version using two data layouts, structure of arrays and array of aligned structures, on both single and double precision. We then evaluate the performance of parallel AIDW by comparing it with its corresponding serial algorithm on three different machines equipped with the GPUs GT730M, M5000 and K40c. The experimental results indicate that: (i) there is no significant difference in the computational efficiency when different data layouts are employed; (ii) the tiled version is always slightly faster than the naive version; and (iii) on single precision the achieved speed-up can be up to 763 (on the GPU M5000), while on double precision the obtained highest speed-up is 197 (on the GPU K40c). To benefit the community, all source code and testing data related to the presented parallel AIDW algorithm are publicly available. PMID:28989754
Accelerating adaptive inverse distance weighting interpolation algorithm on a graphics processing unit.

PubMed

Mei, Gang; Xu, Liangliang; Xu, Nengxiong

2017-09-01

This paper focuses on designing and implementing parallel adaptive inverse distance weighting (AIDW) interpolation algorithms by using the graphics processing unit (GPU). The AIDW is an improved version of the standard IDW, which can adaptively determine the power parameter according to the data points' spatial distribution pattern and achieve more accurate predictions than those predicted by IDW. In this paper, we first present two versions of the GPU-accelerated AIDW, i.e. the naive version without profiting from the shared memory and the tiled version taking advantage of the shared memory. We also implement the naive version and the tiled version using two data layouts, structure of arrays and array of aligned structures, on both single and double precision. We then evaluate the performance of parallel AIDW by comparing it with its corresponding serial algorithm on three different machines equipped with the GPUs GT730M, M5000 and K40c. The experimental results indicate that: (i) there is no significant difference in the computational efficiency when different data layouts are employed; (ii) the tiled version is always slightly faster than the naive version; and (iii) on single precision the achieved speed-up can be up to 763 (on the GPU M5000), while on double precision the obtained highest speed-up is 197 (on the GPU K40c). To benefit the community, all source code and testing data related to the presented parallel AIDW algorithm are publicly available.
Save Now [Y/N]? Machine Memory at War in Iain Banks' "Look to Windward"

ERIC Educational Resources Information Center

Blackmore, Tim

2010-01-01

Creating memory during and after wartime trauma is vexed by state attempts to control public and private discourse. Science fiction author Iain Banks' novel "Look to Windward" proposes different ways of preserving memory and culture, from posthuman memory devices, to artwork, to architecture, to personal, local ways of remembering.…
Mnemonic transmission, social contagion, and emergence of collective memory: Influence of emotional valence, group structure, and information distribution.

PubMed

Choi, Hae-Yoon; Kensinger, Elizabeth A; Rajaram, Suparna

2017-09-01

Social transmission of memory and its consequence on collective memory have generated enduring interdisciplinary interest because of their widespread significance in interpersonal, sociocultural, and political arenas. We tested the influence of 3 key factors-emotional salience of information, group structure, and information distribution-on mnemonic transmission, social contagion, and collective memory. Participants individually studied emotionally salient (negative or positive) and nonemotional (neutral) picture-word pairs that were completely shared, partially shared, or unshared within participant triads, and then completed 3 consecutive recalls in 1 of 3 conditions: individual-individual-individual (control), collaborative-collaborative (identical group; insular structure)-individual, and collaborative-collaborative (reconfigured group; diverse structure)-individual. Collaboration enhanced negative memories especially in insular group structure and especially for shared information, and promoted collective forgetting of positive memories. Diverse group structure reduced this negativity effect. Unequally distributed information led to social contagion that creates false memories; diverse structure propagated a greater variety of false memories whereas insular structure promoted confidence in false recognition and false collective memory. A simultaneous assessment of network structure, information distribution, and emotional valence breaks new ground to specify how network structure shapes the spread of negative memories and false memories, and the emergence of collective memory. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Effects of cacheing on multitasking efficiency and programming strategy on an ELXSI 6400

DOE Office of Scientific and Technical Information (OSTI.GOV)

Montry, G.R.; Benner, R.E.

1985-12-01

The impact of a cache/shared memory architecture, and, in particular, the cache coherency problem, upon concurrent algorithm and program development is discussed. In this context, a simple set of programming strategies are proposed which streamline code development and improve code performance when multitasking in a cache/shared memory or distributed memory environment.
Implementation of a parallel unstructured Euler solver on shared and distributed memory architectures

NASA Technical Reports Server (NTRS)

Mavriplis, D. J.; Das, Raja; Saltz, Joel; Vermeland, R. E.

1992-01-01

An efficient three dimensional unstructured Euler solver is parallelized on a Cray Y-MP C90 shared memory computer and on an Intel Touchstone Delta distributed memory computer. This paper relates the experiences gained and describes the software tools and hardware used in this study. Performance comparisons between two differing architectures are made.
Transmission of hepatitis C virus between hemodialysis patients sharing the same machine.

PubMed

Sartor, Catherine; Brunet, Philippe; Simon, Sophie; Tamalet, Catherine; Berland, Yvon; Drancourt, Michel

2004-07-01

After a patient acquired hepatitis C virus (HCV) infection in our unit, we performed epidemiologic and virologic investigations, including genotyping and phylogenetic analyses. The results provided evidence for HCV transmission between two patients sharing the same machine and suggested possible transmission via accidental contamination of the venous pressure monitoring system.
Sleep Benefits Memory for Semantic Category Structure While Preserving Exemplar-Specific Information.

PubMed

Schapiro, Anna C; McDevitt, Elizabeth A; Chen, Lang; Norman, Kenneth A; Mednick, Sara C; Rogers, Timothy T

2017-11-01

Semantic memory encompasses knowledge about both the properties that typify concepts (e.g. robins, like all birds, have wings) as well as the properties that individuate conceptually related items (e.g. robins, in particular, have red breasts). We investigate the impact of sleep on new semantic learning using a property inference task in which both kinds of information are initially acquired equally well. Participants learned about three categories of novel objects possessing some properties that were shared among category exemplars and others that were unique to an exemplar, with exposure frequency varying across categories. In Experiment 1, memory for shared properties improved and memory for unique properties was preserved across a night of sleep, while memory for both feature types declined over a day awake. In Experiment 2, memory for shared properties improved across a nap, but only for the lower-frequency category, suggesting a prioritization of weakly learned information early in a sleep period. The increase was significantly correlated with amount of REM, but was also observed in participants who did not enter REM, suggesting involvement of both REM and NREM sleep. The results provide the first evidence that sleep improves memory for the shared structure of object categories, while simultaneously preserving object-unique information.
Analysis towards VMEM File of a Suspended Virtual Machine

NASA Astrophysics Data System (ADS)

Song, Zheng; Jin, Bo; Sun, Yongqing

With the popularity of virtual machines, forensic investigators are challenged with more complicated situations, among which discovering the evidences in virtualized environment is of significant importance. This paper mainly analyzes the file suffixed with .vmem in VMware Workstation, which stores all pseudo-physical memory into an image. The internal file structure of .vmem file is studied and disclosed. Key information about processes and threads of a suspended virtual machine is revealed. Further investigation into the Windows XP SP3 heap contents is conducted and a proof-of-concept tool is provided. Different methods to obtain forensic memory images are introduced, with both advantages and limits analyzed. We conclude with an outlook.
Automatic Generation of Directive-Based Parallel Programs for Shared Memory Parallel Systems

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Yan, Jerry; Frumkin, Michael

2000-01-01

The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. Due to its ease of programming and its good performance, the technique has become very popular. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate directive-based, OpenMP, parallel programs. We outline techniques used in the implementation of the tool and present test results on the NAS parallel benchmarks and ARC3D, a CFD application. This work demonstrates the great potential of using computer-aided tools to quickly port parallel programs and also achieve good performance.
Computational Model-Based Prediction of Human Episodic Memory Performance Based on Eye Movements

NASA Astrophysics Data System (ADS)

Sato, Naoyuki; Yamaguchi, Yoko

Subjects' episodic memory performance is not simply reflected by eye movements. We use a ‘theta phase coding’ model of the hippocampus to predict subjects' memory performance from their eye movements. Results demonstrate the ability of the model to predict subjects' memory performance. These studies provide a novel approach to computational modeling in the human-machine interface.

Machine Learning Analysis Identifies Drosophila Grunge/Atrophin as an Important Learning and Memory Gene Required for Memory Retention and Social Learning.

PubMed

Kacsoh, Balint Z; Greene, Casey S; Bosco, Giovanni

2017-11-06

High-throughput experiments are becoming increasingly common, and scientists must balance hypothesis-driven experiments with genome-wide data acquisition. We sought to predict novel genes involved in Drosophila learning and long-term memory from existing public high-throughput data. We performed an analysis using PILGRM, which analyzes public gene expression compendia using machine learning. We evaluated the top prediction alongside genes involved in learning and memory in IMP, an interface for functional relationship networks. We identified Grunge/Atrophin ( Gug/Atro ), a transcriptional repressor, histone deacetylase, as our top candidate. We find, through multiple, distinct assays, that Gug has an active role as a modulator of memory retention in the fly and its function is required in the adult mushroom body. Depletion of Gug specifically in neurons of the adult mushroom body, after cell division and neuronal development is complete, suggests that Gug function is important for memory retention through regulation of neuronal activity, and not by altering neurodevelopment. Our study provides a previously uncharacterized role for Gug as a possible regulator of neuronal plasticity at the interface of memory retention and memory extinction. Copyright © 2017 Kacsoh et al.
Structurally Integrated Versus Structurally Segregated Memory Representations: Implications for the Design of Instructional Materials.

ERIC Educational Resources Information Center

Hayes-Roth, Barbara

Two kinds of memory organization are distinguished: segregrated versus integrated. In segregated memory organizations, related learned propositions have separate memory representations. In integrated memory organizations, memory representations of related propositions share common subrepresentations. Segregated memory organizations facilitate…
Preconditioned implicit solvers for the Navier-Stokes equations on distributed-memory machines

NASA Technical Reports Server (NTRS)

Ajmani, Kumud; Liou, Meng-Sing; Dyson, Rodger W.

1994-01-01

The GMRES method is parallelized, and combined with local preconditioning to construct an implicit parallel solver to obtain steady-state solutions for the Navier-Stokes equations of fluid flow on distributed-memory machines. The new implicit parallel solver is designed to preserve the convergence rate of the equivalent 'serial' solver. A static domain-decomposition is used to partition the computational domain amongst the available processing nodes of the parallel machine. The SPMD (Single-Program Multiple-Data) programming model is combined with message-passing tools to develop the parallel code on a 32-node Intel Hypercube and a 512-node Intel Delta machine. The implicit parallel solver is validated for internal and external flow problems, and is found to compare identically with flow solutions obtained on a Cray Y-MP/8. A peak computational speed of 2300 MFlops/sec has been achieved on 512 nodes of the Intel Delta machine,k for a problem size of 1024 K equations (256 K grid points).
Getting connected: Both associative and semantic links structure semantic memory for newly learned persons.

PubMed

Wiese, Holger; Schweinberger, Stefan R

2015-01-01

The present study examined whether semantic memory for newly learned people is structured by visual co-occurrence, shared semantics, or both. Participants were trained with pairs of simultaneously presented (i.e., co-occurring) preexperimentally unfamiliar faces, which either did or did not share additionally provided semantic information (occupation, place of living, etc.). Semantic information could also be shared between faces that did not co-occur. A subsequent priming experiment revealed faster responses for both co-occurrence/no shared semantics and no co-occurrence/shared semantics conditions, than for an unrelated condition. Strikingly, priming was strongest in the co-occurrence/shared semantics condition, suggesting additive effects of these factors. Additional analysis of event-related brain potentials yielded priming in the N400 component only for combined effects of visual co-occurrence and shared semantics, with more positive amplitudes in this than in the unrelated condition. Overall, these findings suggest that both semantic relatedness and visual co-occurrence are important when novel information is integrated into person-related semantic memory.
Dynamically programmable cache

NASA Astrophysics Data System (ADS)

Nakkar, Mouna; Harding, John A.; Schwartz, David A.; Franzon, Paul D.; Conte, Thomas

1998-10-01

Reconfigurable machines have recently been used as co- processors to accelerate the execution of certain algorithms or program subroutines. The problems with the above approach include high reconfiguration time and limited partial reconfiguration. By far the most critical problems are: (1) the small on-chip memory which results in slower execution time, and (2) small FPGA areas that cannot implement large subroutines. Dynamically Programmable Cache (DPC) is a novel architecture for embedded processors which offers solutions to the above problems. To solve memory access problems, DPC processors merge reconfigurable arrays with the data cache at various cache levels to create a multi-level reconfigurable machines. As a result DPC machines have both higher data accessibility and FPGA memory bandwidth. To solve the limited FPGA resource problem, DPC processors implemented multi-context switching (Virtualization) concept. Virtualization allows implementation of large subroutines with fewer FPGA cells. Additionally, DPC processors can parallelize the execution of several operations resulting in faster execution time. In this paper, the speedup improvement for DPC machines are shown to be 5X faster than an Altera FLEX10K FPGA chip and 2X faster than a Sun Ultral SPARC station for two different algorithms (convolution and motion estimation).
Memory T and memory B cells share a transcriptional program of self-renewal with long-term hematopoietic stem cells

PubMed Central

Luckey, Chance John; Bhattacharya, Deepta; Goldrath, Ananda W.; Weissman, Irving L.; Benoist, Christophe; Mathis, Diane

2006-01-01

The only cells of the hematopoietic system that undergo self-renewal for the lifetime of the organism are long-term hematopoietic stem cells and memory T and B cells. To determine whether there is a shared transcriptional program among these self-renewing populations, we first compared the gene-expression profiles of naïve, effector and memory CD8+ T cells with those of long-term hematopoietic stem cells, short-term hematopoietic stem cells, and lineage-committed progenitors. Transcripts augmented in memory CD8+ T cells relative to naïve and effector T cells were selectively enriched in long-term hematopoietic stem cells and were progressively lost in their short-term and lineage-committed counterparts. Furthermore, transcripts selectively decreased in memory CD8+ T cells were selectively down-regulated in long-term hematopoietic stem cells and progressively increased with differentiation. To confirm that this pattern was a general property of immunologic memory, we turned to independently generated gene expression profiles of memory, naïve, germinal center, and plasma B cells. Once again, memory-enriched and -depleted transcripts were also appropriately augmented and diminished in long-term hematopoietic stem cells, and their expression correlated with progressive loss of self-renewal function. Thus, there appears to be a common signature of both up- and down-regulated transcripts shared between memory T cells, memory B cells, and long-term hematopoietic stem cells. This signature was not consistently enriched in neural or embryonic stem cell populations and, therefore, appears to be restricted to the hematopoeitic system. These observations provide evidence that the shared phenotype of self-renewal in the hematopoietic system is linked at the molecular level. PMID:16492737
Categorical and associative relations increase false memory relative to purely associative relations.

PubMed

Coane, Jennifer H; McBride, Dawn M; Termonen, Miia-Liisa; Cutting, J Cooper

2016-01-01

The goal of the present study was to examine the contributions of associative strength and similarity in terms of shared features to the production of false memories in the Deese/Roediger-McDermott list-learning paradigm. Whereas the activation/monitoring account suggests that false memories are driven by automatic associative activation from list items to nonpresented lures, combined with errors in source monitoring, other accounts (e.g., fuzzy trace theory, global-matching models) emphasize the importance of semantic-level similarity, and thus predict that shared features between list and lure items will increase false memory. Participants studied lists of nine items related to a nonpresented lure. Half of the lists consisted of items that were associated but did not share features with the lure, and the other half included items that were equally associated but also shared features with the lure (in many cases, these were taxonomically related items). The two types of lists were carefully matched in terms of a variety of lexical and semantic factors, and the same lures were used across list types. In two experiments, false recognition of the critical lures was greater following the study of lists that shared features with the critical lure, suggesting that similarity at a categorical or taxonomic level contributes to false memory above and beyond associative strength. We refer to this phenomenon as a "feature boost" that reflects additive effects of shared meaning and association strength and is generally consistent with accounts of false memory that have emphasized thematic or feature-level similarity among studied and nonstudied representations.
Content addressable memory project

NASA Technical Reports Server (NTRS)

Hall, Josh; Levy, Saul; Smith, D.; Wei, S.; Miyake, K.; Murdocca, M.

1991-01-01

The progress on the Rutgers CAM (Content Addressable Memory) Project is described. The overall design of the system is completed at the architectural level and described. The machine is composed of two kinds of cells: (1) the CAM cells which include both memory and processor, and support local processing within each cell; and (2) the tree cells, which have smaller instruction set, and provide global processing over the CAM cells. A parameterized design of the basic CAM cell is completed. Progress was made on the final specification of the CPS. The machine architecture was driven by the design of algorithms whose requirements are reflected in the resulted instruction set(s). A few of these algorithms are described.
Real Time Linux - The RTOS for Astronomy?

NASA Astrophysics Data System (ADS)

Daly, P. N.

The BoF was attended by about 30 participants and a free CD of real time Linux-based upon RedHat 5.2-was available. There was a detailed presentation on the nature of real time Linux and the variants for hard real time: New Mexico Tech's RTL and DIAPM's RTAI. Comparison tables between standard Linux and real time Linux responses to time interval generation and interrupt response latency were presented (see elsewhere in these proceedings). The present recommendations are to use RTL for UP machines running the 2.0.x kernels and RTAI for SMP machines running the 2.2.x kernel. Support, both academically and commercially, is available. Some known limitations were presented and the solutions reported e.g., debugging and hardware support. The features of RTAI (scheduler, fifos, shared memory, semaphores, message queues and RPCs) were described. Typical performance statistics were presented: Pentium-based oneshot tasks running > 30kHz, 486-based oneshot tasks running at ~ 10 kHz, periodic timer tasks running in excess of 90 kHz with average zero jitter peaking to ~ 13 mus (UP) and ~ 30 mus (SMP). Some detail on kernel module programming, including coding examples, were presented showing a typical data acquisition system generating simulated (random) data writing to a shared memory buffer and a fifo buffer to communicate between real time Linux and user space. All coding examples were complete and tested under RTAI v0.6 and the 2.2.12 kernel. Finally, arguments were raised in support of real time Linux: it's open source, free under GPL, enables rapid prototyping, has good support and the ability to have a fully functioning workstation capable of co-existing hard real time performance. The counter weight-the negatives-of lack of platforms (x86 and PowerPC only at present), lack of board support, promiscuous root access and the danger of ignorance of real time programming issues were also discussed. See ftp://orion.tuc.noao.edu/pub/pnd/rtlbof.tgz for the StarOffice overheads for this presentation.
System and method for programmable bank selection for banked memory subsystems

DOEpatents

Blumrich, Matthias A.; Chen, Dong; Gara, Alan G.; Giampapa, Mark E.; Hoenicke, Dirk; Ohmacht, Martin; Salapura, Valentina; Sugavanam, Krishnan

2010-09-07

A programmable memory system and method for enabling one or more processor devices access to shared memory in a computing environment, the shared memory including one or more memory storage structures having addressable locations for storing data. The system comprises: one or more first logic devices associated with a respective one or more processor devices, each first logic device for receiving physical memory address signals and programmable for generating a respective memory storage structure select signal upon receipt of pre-determined address bit values at selected physical memory address bit locations; and, a second logic device responsive to each of the respective select signal for generating an address signal used for selecting a memory storage structure for processor access. The system thus enables each processor device of a computing environment memory storage access distributed across the one or more memory storage structures.
High Performance Programming Using Explicit Shared Memory Model on Cray T3D1

NASA Technical Reports Server (NTRS)

Simon, Horst D.; Saini, Subhash; Grassi, Charles

1994-01-01

The Cray T3D system is the first-phase system in Cray Research, Inc.'s (CRI) three-phase massively parallel processing (MPP) program. This system features a heterogeneous architecture that closely couples DEC's Alpha microprocessors and CRI's parallel-vector technology, i.e., the Cray Y-MP and Cray C90. An overview of the Cray T3D hardware and available programming models is presented. Under Cray Research adaptive Fortran (CRAFT) model four programming methods (data parallel, work sharing, message-passing using PVM, and explicit shared memory model) are available to the users. However, at this time data parallel and work sharing programming models are not available to the user community. The differences between standard PVM and CRI's PVM are highlighted with performance measurements such as latencies and communication bandwidths. We have found that the performance of neither standard PVM nor CRI s PVM exploits the hardware capabilities of the T3D. The reasons for the bad performance of PVM as a native message-passing library are presented. This is illustrated by the performance of NAS Parallel Benchmarks (NPB) programmed in explicit shared memory model on Cray T3D. In general, the performance of standard PVM is about 4 to 5 times less than obtained by using explicit shared memory model. This degradation in performance is also seen on CM-5 where the performance of applications using native message-passing library CMMD on CM-5 is also about 4 to 5 times less than using data parallel methods. The issues involved (such as barriers, synchronization, invalidating data cache, aligning data cache etc.) while programming in explicit shared memory model are discussed. Comparative performance of NPB using explicit shared memory programming model on the Cray T3D and other highly parallel systems such as the TMC CM-5, Intel Paragon, Cray C90, IBM-SP1, etc. is presented.
A proposal to demonstrate production of salad crops in the Space Station Mockup Facility with particular attention to space, energy, and labor constraints

NASA Technical Reports Server (NTRS)

Brooks, Carolyn A.

1992-01-01

The Salad Machine Research has continued to be a two path effort with the research at Marshall Space Flight Center (MSFC) focusing on the design, construction, and operation of a semiautomated system (Salad Machine) for the production of salad vegetables within a standard rack. Boeing Corporation in cooperation with NASA MSFC constructed a four drawer Salad Machine which was occasionally placed within the Space Station Freedom Mockup facility for view by selected visitors. Final outfitting of the Salad Machine is awaiting the arrival of parts for the nutrient delivery system. Research at the Alabama A&M facilities focused on compatibility of radish and lettuce plants when grown on the same nutrient solution. Lettuce fresh weight shoot yield was significantly enhanced when lettuce plants were grown on nutrient solution which was shared with radish. Radish tuber production was not significantly affected although there was a trend for radish from shared solutions to be heavier than those grown on separate nutrient solutions. The effect of sharing nutrient solutions on carbohydrate partitioning reflected the effect of sharing solution on fresh weight yield. Lettuce shoot dry weight was significantly greater for plants from shared solutions than from separate. There was no significant effect on sharing nutrient solution on radish tuber dry weight. Partitioning of nitrogen, calcium, magnesium, and potassium was not affected by sharing, there was, however, a disproportionate amount of potassium in the tissues, suggesting luxury consumption of potassium in all plants and tissues. It is concluded that lettuce plants benefit from sharing nutrient solution with radish and that radish is not harmed.
Shared processing in multiple object tracking and visual working memory in the absence of response order and task order confounds

PubMed Central

Howe, Piers D. L.

2017-01-01

To understand how the visual system represents multiple moving objects and how those representations contribute to tracking, it is essential that we understand how the processes of attention and working memory interact. In the work described here we present an investigation of that interaction via a series of tracking and working memory dual-task experiments. Previously, it has been argued that tracking is resistant to disruption by a concurrent working memory task and that any apparent disruption is in fact due to observers making a response to the working memory task, rather than due to competition for shared resources. Contrary to this, in our experiments we find that when task order and response order confounds are avoided, all participants show a similar decrease in both tracking and working memory performance. However, if task and response order confounds are not adequately controlled for we find substantial individual differences, which could explain the previous conflicting reports on this topic. Our results provide clear evidence that tracking and working memory tasks share processing resources. PMID:28410383
Shared processing in multiple object tracking and visual working memory in the absence of response order and task order confounds.

PubMed

Lapierre, Mark D; Cropper, Simon J; Howe, Piers D L

2017-01-01

To understand how the visual system represents multiple moving objects and how those representations contribute to tracking, it is essential that we understand how the processes of attention and working memory interact. In the work described here we present an investigation of that interaction via a series of tracking and working memory dual-task experiments. Previously, it has been argued that tracking is resistant to disruption by a concurrent working memory task and that any apparent disruption is in fact due to observers making a response to the working memory task, rather than due to competition for shared resources. Contrary to this, in our experiments we find that when task order and response order confounds are avoided, all participants show a similar decrease in both tracking and working memory performance. However, if task and response order confounds are not adequately controlled for we find substantial individual differences, which could explain the previous conflicting reports on this topic. Our results provide clear evidence that tracking and working memory tasks share processing resources.
Visual and spatial working memory are not that dissociated after all: a time-based resource-sharing account.

PubMed

Vergauwe, Evie; Barrouillet, Pierre; Camos, Valérie

2009-07-01

Examinations of interference between visual and spatial materials in working memory have suggested domain- and process-based fractionations of visuo-spatial working memory. The present study examined the role of central time-based resource sharing in visuo-spatial working memory and assessed its role in obtained interference patterns. Visual and spatial storage were combined with both visual and spatial on-line processing components in computer-paced working memory span tasks (Experiment 1) and in a selective interference paradigm (Experiment 2). The cognitive load of the processing components was manipulated to investigate its impact on concurrent maintenance for both within-domain and between-domain combinations of processing and storage components. In contrast to both domain- and process-based fractionations of visuo-spatial working memory, the results revealed that recall performance was determined by the cognitive load induced by the processing of items, rather than by the domain to which those items pertained. These findings are interpreted as evidence for a time-based resource-sharing mechanism in visuo-spatial working memory.
Memory-Augmented Cellular Automata for Image Analysis.

DTIC Science & Technology

1978-11-01

case in which each cell has memory size proportional to the logarithm of the input size, showing the increased capabilities of these machines for executing a variety of basic image analysis and recognition tasks. (Author)
Why are you telling me that? A conceptual model of the social function of autobiographical memory.

PubMed

Alea, Nicole; Bluck, Susan

2003-03-01

In an effort to stimulate and guide empirical work within a functional framework, this paper provides a conceptual model of the social functions of autobiographical memory (AM) across the lifespan. The model delineates the processes and variables involved when AMs are shared to serve social functions. Components of the model include: lifespan contextual influences, the qualitative characteristics of memory (emotionality and level of detail recalled), the speaker's characteristics (age, gender, and personality), the familiarity and similarity of the listener to the speaker, the level of responsiveness during the memory-sharing process, and the nature of the social relationship in which the memory sharing occurs (valence and length of the relationship). These components are shown to influence the type of social function served and/or, the extent to which social functions are served. Directions for future empirical work to substantiate the model and hypotheses derived from the model are provided.
Ontological modelling of knowledge management for human-machine integrated design of ultra-precision grinding machine

NASA Astrophysics Data System (ADS)

Hong, Haibo; Yin, Yuehong; Chen, Xing

2016-11-01

Despite the rapid development of computer science and information technology, an efficient human-machine integrated enterprise information system for designing complex mechatronic products is still not fully accomplished, partly because of the inharmonious communication among collaborators. Therefore, one challenge in human-machine integration is how to establish an appropriate knowledge management (KM) model to support integration and sharing of heterogeneous product knowledge. Aiming at the diversity of design knowledge, this article proposes an ontology-based model to reach an unambiguous and normative representation of knowledge. First, an ontology-based human-machine integrated design framework is described, then corresponding ontologies and sub-ontologies are established according to different purposes and scopes. Second, a similarity calculation-based ontology integration method composed of ontology mapping and ontology merging is introduced. The ontology searching-based knowledge sharing method is then developed. Finally, a case of human-machine integrated design of a large ultra-precision grinding machine is used to demonstrate the effectiveness of the method.
Massively Multithreaded Maxflow for Image Segmentation on the Cray XMT-2

PubMed Central

Bokhari, Shahid H.; Çatalyürek, Ümit V.; Gurcan, Metin N.

2014-01-01

SUMMARY Image segmentation is a very important step in the computerized analysis of digital images. The maxflow mincut approach has been successfully used to obtain minimum energy segmentations of images in many fields. Classical algorithms for maxflow in networks do not directly lend themselves to efficient parallel implementations on contemporary parallel processors. We present the results of an implementation of Goldberg-Tarjan preflow-push algorithm on the Cray XMT-2 massively multithreaded supercomputer. This machine has hardware support for 128 threads in each physical processor, a uniformly accessible shared memory of up to 4 TB and hardware synchronization for each 64 bit word. It is thus well-suited to the parallelization of graph theoretic algorithms, such as preflow-push. We describe the implementation of the preflow-push code on the XMT-2 and present the results of timing experiments on a series of synthetically generated as well as real images. Our results indicate very good performance on large images and pave the way for practical applications of this machine architecture for image analysis in a production setting. The largest images we have run are 320002 pixels in size, which are well beyond the largest previously reported in the literature. PMID:25598745
NAS Applications and Advanced Algorithms

NASA Technical Reports Server (NTRS)

Bailey, David H.; Biswas, Rupak; VanDerWijngaart, Rob; Kutler, Paul (Technical Monitor)

1997-01-01

This paper examines the applications most commonly run on the supercomputers at the Numerical Aerospace Simulation (NAS) facility. It analyzes the extent to which such applications are fundamentally oriented to vector computers, and whether or not they can be efficiently implemented on hierarchical memory machines, such as systems with cache memories and highly parallel, distributed memory systems.

Cross-Modal Decoding of Neural Patterns Associated with Working Memory: Evidence for Attention-Based Accounts of Working Memory.

PubMed

Majerus, Steve; Cowan, Nelson; Péters, Frédéric; Van Calster, Laurens; Phillips, Christophe; Schrouff, Jessica

2016-01-01

Recent studies suggest common neural substrates involved in verbal and visual working memory (WM), interpreted as reflecting shared attention-based, short-term retention mechanisms. We used a machine-learning approach to determine more directly the extent to which common neural patterns characterize retention in verbal WM and visual WM. Verbal WM was assessed via a standard delayed probe recognition task for letter sequences of variable length. Visual WM was assessed via a visual array WM task involving the maintenance of variable amounts of visual information in the focus of attention. We trained a classifier to distinguish neural activation patterns associated with high- and low-visual WM load and tested the ability of this classifier to predict verbal WM load (high-low) from their associated neural activation patterns, and vice versa. We observed significant between-task prediction of load effects during WM maintenance, in posterior parietal and superior frontal regions of the dorsal attention network; in contrast, between-task prediction in sensory processing cortices was restricted to the encoding stage. Furthermore, between-task prediction of load effects was strongest in those participants presenting the highest capacity for the visual WM task. This study provides novel evidence for common, attention-based neural patterns supporting verbal and visual WM. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Destination memory impairment in older people.

PubMed

Gopie, Nigel; Craik, Fergus I M; Hasher, Lynn

2010-12-01

Older adults are assumed to have poor destination memory-knowing to whom they tell particular information-and anecdotes about them repeating stories to the same people are cited as informal evidence for this claim. Experiment 1 assessed young and older adults' destination memory by having participants tell facts (e.g., "A dime has 118 ridges around its edge") to pictures of famous people (e.g., Oprah Winfrey). Surprise recognition memory tests, which also assessed confidence, revealed that older adults, compared to young adults, were disproportionately impaired on destination memory relative to spared memory for the individual components (i.e., facts, faces) of the episode. Older adults also were more confident that they had not told a fact to a particular person when they actually had (i.e., a miss); this presumably causes them to repeat information more often than young adults. When the direction of information transfer was reversed in Experiment 2, such that the famous people shared information with the participants (i.e., a source memory experiment), age-related memory differences disappeared. In contrast to the destination memory experiment, older adults in the source memory experiment were more confident than young adults that someone had shared a fact with them when a different person actually had shared the fact (i.e., a false alarm). Overall, accuracy and confidence jointly influence age-related changes to destination memory, a fundamental component of successful communication. (c) 2010 APA, all rights reserved).
Sharing programming resources between Bio* projects through remote procedure call and native call stack strategies.

PubMed

Prins, Pjotr; Goto, Naohisa; Yates, Andrew; Gautier, Laurent; Willis, Scooter; Fields, Christopher; Katayama, Toshiaki

2012-01-01

Open-source software (OSS) encourages computer programmers to reuse software components written by others. In evolutionary bioinformatics, OSS comes in a broad range of programming languages, including C/C++, Perl, Python, Ruby, Java, and R. To avoid writing the same functionality multiple times for different languages, it is possible to share components by bridging computer languages and Bio* projects, such as BioPerl, Biopython, BioRuby, BioJava, and R/Bioconductor. In this chapter, we compare the two principal approaches for sharing software between different programming languages: either by remote procedure call (RPC) or by sharing a local call stack. RPC provides a language-independent protocol over a network interface; examples are RSOAP and Rserve. The local call stack provides a between-language mapping not over the network interface, but directly in computer memory; examples are R bindings, RPy, and languages sharing the Java Virtual Machine stack. This functionality provides strategies for sharing of software between Bio* projects, which can be exploited more often. Here, we present cross-language examples for sequence translation, and measure throughput of the different options. We compare calling into R through native R, RSOAP, Rserve, and RPy interfaces, with the performance of native BioPerl, Biopython, BioJava, and BioRuby implementations, and with call stack bindings to BioJava and the European Molecular Biology Open Software Suite. In general, call stack approaches outperform native Bio* implementations and these, in turn, outperform RPC-based approaches. To test and compare strategies, we provide a downloadable BioNode image with all examples, tools, and libraries included. The BioNode image can be run on VirtualBox-supported operating systems, including Windows, OSX, and Linux.
RTNN: The New Parallel Machine in Zaragoza

NASA Astrophysics Data System (ADS)

Sijs, A. J. V. D.

I report on the development of RTNN, a parallel computer designed as a 4^4 hypercube of 256 T9000 transputer nodes, each with 8 MB memory. The peak performance of the machine is expected to be 2.5 Gflops.
Job Management Requirements for NAS Parallel Systems and Clusters

NASA Technical Reports Server (NTRS)

Saphir, William; Tanner, Leigh Ann; Traversat, Bernard

1995-01-01

A job management system is a critical component of a production supercomputing environment, permitting oversubscribed resources to be shared fairly and efficiently. Job management systems that were originally designed for traditional vector supercomputers are not appropriate for the distributed-memory parallel supercomputers that are becoming increasingly important in the high performance computing industry. Newer job management systems offer new functionality but do not solve fundamental problems. We address some of the main issues in resource allocation and job scheduling we have encountered on two parallel computers - a 160-node IBM SP2 and a cluster of 20 high performance workstations located at the Numerical Aerodynamic Simulation facility. We describe the requirements for resource allocation and job management that are necessary to provide a production supercomputing environment on these machines, prioritizing according to difficulty and importance, and advocating a return to fundamental issues.
Families of Graph Algorithms: SSSP Case Study

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kanewala Appuhamilage, Thejaka Amila Jay; Zalewski, Marcin J.; Lumsdaine, Andrew

2017-08-28

Single-Source Shortest Paths (SSSP) is a well-studied graph problem. Examples of SSSP algorithms include the original Dijkstra’s algorithm and the parallel Δ-stepping and KLA-SSSP algorithms. In this paper, we use a novel Abstract Graph Machine (AGM) model to show that all these algorithms share a common logic and differ from one another by the order in which they perform work. We use the AGM model to thoroughly analyze the family of algorithms that arises from the common logic. We start with the basic algorithm without any ordering (Chaotic), and then we derive the existing and new algorithms by methodically exploringmore » semantic and spatial ordering of work. Our experimental results show that new derived algorithms show better performance than the existing distributed memory parallel algorithms, especially at higher scales.« less
An analysis of options available for developing a common laser ray tracing package for Ares and Kull code frameworks

DOE Office of Scientific and Technical Information (OSTI.GOV)

Weeratunga, S K

Ares and Kull are mature code frameworks that support ALE hydrodynamics for a variety of HEDP applications at LLNL, using two widely different meshing approaches. While Ares is based on a 2-D/3-D block-structured mesh data base, Kull is designed to support unstructured, arbitrary polygonal/polyhedral meshes. In addition, both frameworks are capable of running applications on large, distributed-memory parallel machines. Currently, both these frameworks separately support assorted collections of physics packages related to HEDP, including one for the energy deposition by laser/ion-beam ray tracing. This study analyzes the options available for developing a common laser/ion-beam ray tracing package that can bemore » easily shared between these two code frameworks and concludes with a set of recommendations for its development.« less
Nonlinear machine learning and design of reconfigurable digital colloids.

PubMed

Long, Andrew W; Phillips, Carolyn L; Jankowksi, Eric; Ferguson, Andrew L

2016-09-14

Digital colloids, a cluster of freely rotating "halo" particles tethered to the surface of a central particle, were recently proposed as ultra-high density memory elements for information storage. Rational design of these digital colloids for memory storage applications requires a quantitative understanding of the thermodynamic and kinetic stability of the configurational states within which information is stored. We apply nonlinear machine learning to Brownian dynamics simulations of these digital colloids to extract the low-dimensional intrinsic manifold governing digital colloid morphology, thermodynamics, and kinetics. By modulating the relative size ratio between halo particles and central particles, we investigate the size-dependent configurational stability and transition kinetics for the 2-state tetrahedral (N = 4) and 30-state octahedral (N = 6) digital colloids. We demonstrate the use of this framework to guide the rational design of a memory storage element to hold a block of text that trades off the competing design criteria of memory addressability and volatility.
Interference due to shared features between action plans is influenced by working memory span.

PubMed

Fournier, Lisa R; Behmer, Lawrence P; Stubblefield, Alexandra M

2014-12-01

In this study, we examined the interactions between the action plans that we hold in memory and the actions that we carry out, asking whether the interference due to shared features between action plans is due to selection demands imposed on working memory. Individuals with low and high working memory spans learned arbitrary motor actions in response to two different visual events (A and B), presented in a serial order. They planned a response to the first event (A) and while maintaining this action plan in memory they then executed a speeded response to the second event (B). Afterward, they executed the action plan for the first event (A) maintained in memory. Speeded responses to the second event (B) were delayed when it shared an action feature (feature overlap) with the first event (A), relative to when it did not (no feature overlap). The size of the feature-overlap delay was greater for low-span than for high-span participants. This indicates that interference due to overlapping action plans is greater when fewer working memory resources are available, suggesting that this interference is due to selection demands imposed on working memory. Thus, working memory plays an important role in managing current and upcoming action plans, at least for newly learned tasks. Also, managing multiple action plans is compromised in individuals who have low versus high working memory spans.
Destination Memory Impairment in Older People

PubMed Central

Gopie, Nigel; Craik, Fergus I. M.; Hasher, Lynn

2012-01-01

Older adults are assumed to have poor destination memory— knowing to whom they tell particular information—and anecdotes about them repeating stories to the same people are cited as informal evidence for this claim. Experiment 1 assessed young and older adults’ destination memory by having participants tell facts (e.g., “A dime has 118 ridges around its edge”) to pictures of famous people (e.g., Oprah Winfrey). Surprise recognition memory tests, which also assessed confidence, revealed that older adults, compared to young adults, were disproportionately impaired on destination memory relative to spared memory for the individual components (i.e., facts, faces) of the episode. Older adults also were more confident that they had not told a fact to a particular person when they actually had (i.e., a miss); this presumably causes them to repeat information more often than young adults. When the direction of information transfer was reversed in Experiment 2, such that the famous people shared information with the participants (i.e., a source memory experiment), age-related memory differences disappeared. In contrast to the destination memory experiment, older adults in the source memory experiment were more confident than young adults that someone had shared a fact with them when a different person actually had shared the fact (i.e., a false alarm). Overall, accuracy and confidence jointly influence age-related changes to destination memory, a fundamental component of successful communication. PMID:20718537
Unleashing the Power of Distributed CPU/GPU Architectures: Massive Astronomical Data Analysis and Visualization Case Study

NASA Astrophysics Data System (ADS)

Hassan, A. H.; Fluke, C. J.; Barnes, D. G.

2012-09-01

Upcoming and future astronomy research facilities will systematically generate terabyte-sized data sets moving astronomy into the Petascale data era. While such facilities will provide astronomers with unprecedented levels of accuracy and coverage, the increases in dataset size and dimensionality will pose serious computational challenges for many current astronomy data analysis and visualization tools. With such data sizes, even simple data analysis tasks (e.g. calculating a histogram or computing data minimum/maximum) may not be achievable without access to a supercomputing facility. To effectively handle such dataset sizes, which exceed today's single machine memory and processing limits, we present a framework that exploits the distributed power of GPUs and many-core CPUs, with a goal of providing data analysis and visualizing tasks as a service for astronomers. By mixing shared and distributed memory architectures, our framework effectively utilizes the underlying hardware infrastructure handling both batched and real-time data analysis and visualization tasks. Offering such functionality as a service in a “software as a service” manner will reduce the total cost of ownership, provide an easy to use tool to the wider astronomical community, and enable a more optimized utilization of the underlying hardware infrastructure.
Thread mapping using system-level model for shared memory multicores

NASA Astrophysics Data System (ADS)

Mitra, Reshmi

Exploring thread-to-core mapping options for a parallel application on a multicore architecture is computationally very expensive. For the same algorithm, the mapping strategy (MS) with the best response time may change with data size and thread counts. The primary challenge is to design a fast, accurate and automatic framework for exploring these MSs for large data-intensive applications. This is to ensure that the users can explore the design space within reasonable machine hours, without thorough understanding on how the code interacts with the platform. Response time is related to the cycles per instructions retired (CPI), taking into account both active and sleep states of the pipeline. This work establishes a hybrid approach, based on Markov Chain Model (MCM) and Model Tree (MT) for system-level steady state CPI prediction. It is designed for shared memory multicore processors with coarse-grained multithreading. The thread status is represented by the MCM states. The program characteristics are modeled as the transition probabilities, representing the system moving between active and suspended thread states. The MT model extrapolates these probabilities for the actual application size (AS) from the smaller AS performance. This aspect of the framework, along with, the use of mathematical expressions for the actual AS performance information, results in a tremendous reduction in the CPI prediction time. The framework is validated using an electromagnetics application. The average performance prediction error for steady state CPI results with 12 different MSs is less than 1%. The total run time of model is of the order of minutes, whereas the actual application execution time is in terms of days.
Low latency memory access and synchronization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Blumrich, Matthias A.; Chen, Dong; Coteus, Paul W.

A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Each processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processormore » only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple prefetching for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefetch rather than some other predictive algorithm. This enables hardware to effectively prefetch memory access patterns that are non-contiguous, but repetitive.« less
Low latency memory access and synchronization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Blumrich, Matthias A.; Chen, Dong; Coteus, Paul W.

A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Bach processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processormore » only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple prefetching for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefetch rather than some other predictive algorithm. This enables hardware to effectively prefetch memory access patterns that are non-contiguous, but repetitive.« less
Location-Unbound Color-Shape Binding Representations in Visual Working Memory.

PubMed

Saiki, Jun

2016-02-01

The mechanism by which nonspatial features, such as color and shape, are bound in visual working memory, and the role of those features' location in their binding, remains unknown. In the current study, I modified a redundancy-gain paradigm to investigate these issues. A set of features was presented in a two-object memory display, followed by a single object probe. Participants judged whether the probe contained any features of the memory display, regardless of its location. Response time distributions revealed feature coactivation only when both features of a single object in the memory display appeared together in the probe, regardless of the response time benefit from the probe and memory objects sharing the same location. This finding suggests that a shared location is necessary in the formation of bound representations but unnecessary in their maintenance. Electroencephalography data showed that amplitude modulations reflecting location-unbound feature coactivation were different from those reflecting the location-sharing benefit, consistent with the behavioral finding that feature-location binding is unnecessary in the maintenance of color-shape binding. © The Author(s) 2015.
Shared reality in intergroup communication: Increasing the epistemic authority of an out-group audience.

PubMed

Echterhoff, Gerald; Kopietz, René; Higgins, E Tory

2017-06-01

Communicators typically tune messages to their audience's attitude. Such audience tuning biases communicators' memory for the topic toward the audience's attitude to the extent that they create a shared reality with the audience. To investigate shared reality in intergroup communication, we first established that a reduced memory bias after tuning messages to an out-group (vs. in-group) audience is a subtle index of communicators' denial of shared reality to that out-group audience (Experiments 1a and 1b). We then examined whether the audience-tuning memory bias might emerge when the out-group audience's epistemic authority is enhanced, either by increasing epistemic expertise concerning the communication topic or by creating epistemic consensus among members of a multiperson out-group audience. In Experiment 2, when Germans communicated to a Turkish audience with an attitude about a Turkish (vs. German) target, the audience-tuning memory bias appeared. In Experiment 3, when the audience of German communicators consisted of 3 Turks who all held the same attitude toward the target, the memory bias again appeared. The association between message valence and memory valence was consistently higher when the audience's epistemic authority was high (vs. low). An integrative analysis across all studies also suggested that the memory bias increases with increasing strength of epistemic inputs (epistemic expertise, epistemic consensus, and audience-tuned message production). The findings suggest novel ways of overcoming intergroup biases in intergroup relations. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
The Developmental Influence of Primary Memory Capacity on Working Memory and Academic Achievement

PubMed Central

2015-01-01

In this study, we investigate the development of primary memory capacity among children. Children between the ages of 5 and 8 completed 3 novel tasks (split span, interleaved lists, and a modified free-recall task) that measured primary memory by estimating the number of items in the focus of attention that could be spontaneously recalled in serial order. These tasks were calibrated against traditional measures of simple and complex span. Clear age-related changes in these primary memory estimates were observed. There were marked individual differences in primary memory capacity, but each novel measure was predictive of simple span performance. Among older children, each measure shared variance with reading and mathematics performance, whereas for younger children, the interleaved lists task was the strongest single predictor of academic ability. We argue that these novel tasks have considerable potential for the measurement of primary memory capacity and provide new, complementary ways of measuring the transient memory processes that predict academic performance. The interleaved lists task also shared features with interference control tasks, and our findings suggest that young children have a particular difficulty in resisting distraction and that variance in the ability to resist distraction is also shared with measures of educational attainment. PMID:26075630
The developmental influence of primary memory capacity on working memory and academic achievement.

PubMed

Hall, Debbora; Jarrold, Christopher; Towse, John N; Zarandi, Amy L

2015-08-01

In this study, we investigate the development of primary memory capacity among children. Children between the ages of 5 and 8 completed 3 novel tasks (split span, interleaved lists, and a modified free-recall task) that measured primary memory by estimating the number of items in the focus of attention that could be spontaneously recalled in serial order. These tasks were calibrated against traditional measures of simple and complex span. Clear age-related changes in these primary memory estimates were observed. There were marked individual differences in primary memory capacity, but each novel measure was predictive of simple span performance. Among older children, each measure shared variance with reading and mathematics performance, whereas for younger children, the interleaved lists task was the strongest single predictor of academic ability. We argue that these novel tasks have considerable potential for the measurement of primary memory capacity and provide new, complementary ways of measuring the transient memory processes that predict academic performance. The interleaved lists task also shared features with interference control tasks, and our findings suggest that young children have a particular difficulty in resisting distraction and that variance in the ability to resist distraction is also shared with measures of educational attainment. (c) 2015 APA, all rights reserved).
Execution models for mapping programs onto distributed memory parallel computers

NASA Technical Reports Server (NTRS)

Sussman, Alan

1992-01-01

The problem of exploiting the parallelism available in a program to efficiently employ the resources of the target machine is addressed. The problem is discussed in the context of building a mapping compiler for a distributed memory parallel machine. The paper describes using execution models to drive the process of mapping a program in the most efficient way onto a particular machine. Through analysis of the execution models for several mapping techniques for one class of programs, we show that the selection of the best technique for a particular program instance can make a significant difference in performance. On the other hand, the results of benchmarks from an implementation of a mapping compiler show that our execution models are accurate enough to select the best mapping technique for a given program.
Conditional load and store in a shared memory

DOEpatents

Blumrich, Matthias A; Ohmacht, Martin

2015-02-03

A method, system and computer program product for implementing load-reserve and store-conditional instructions in a multi-processor computing system. The computing system includes a multitude of processor units and a shared memory cache, and each of the processor units has access to the memory cache. In one embodiment, the method comprises providing the memory cache with a series of reservation registers, and storing in these registers addresses reserved in the memory cache for the processor units as a result of issuing load-reserve requests. In this embodiment, when one of the processor units makes a request to store data in the memory cache using a store-conditional request, the reservation registers are checked to determine if an address in the memory cache is reserved for that processor unit. If an address in the memory cache is reserved for that processor, the data are stored at this address.

Performance Analysis of Multilevel Parallel Applications on Shared Memory Architectures

NASA Technical Reports Server (NTRS)

Biegel, Bryan A. (Technical Monitor); Jost, G.; Jin, H.; Labarta J.; Gimenez, J.; Caubet, J.

2003-01-01

Parallel programming paradigms include process level parallelism, thread level parallelization, and multilevel parallelism. This viewgraph presentation describes a detailed performance analysis of these paradigms for Shared Memory Architecture (SMA). This analysis uses the Paraver Performance Analysis System. The presentation includes diagrams of a flow of useful computations.
Measuring Transactiving Memory Systems Using Network Analysis

ERIC Educational Resources Information Center

King, Kylie Goodell

2017-01-01

Transactive memory systems (TMSs) describe the structures and processes that teams use to share information, work together, and accomplish shared goals. First introduced over three decades ago, TMSs have been measured in a variety of ways. This dissertation proposes the use of network analysis in measuring TMS. This is accomplished by describing…
Operator Influence of Unexploded Ordnance Sensor Technologies

DTIC Science & Technology

2007-03-01

chart display ActiveX control Mscomct2.dll – date/time display ActiveX control Pnpscr.dll – Systran SCRAMNet replicated shared memory device...response value database rgm_p2.dll – Phase 2 shared memory API and implementation Commercial components StripM.ocx – strip chart display ActiveX
Parallelization of Lower-Upper Symmetric Gauss-Seidel Method for Chemically Reacting Flow

NASA Technical Reports Server (NTRS)

Yoon, Seokkwan; Jost, Gabriele; Chang, Sherry

2005-01-01

Development of technologies for exploration of the solar system has revived an interest in computational simulation of chemically reacting flows since planetary probe vehicles exhibit non-equilibrium phenomena during the atmospheric entry of a planet or a moon as well as the reentry to the Earth. Stability in combustion is essential for new propulsion systems. Numerical solution of real-gas flows often increases computational work by an order-of-magnitude compared to perfect gas flow partly because of the increased complexity of equations to solve. Recently, as part of Project Columbia, NASA has integrated a cluster of interconnected SGI Altix systems to provide a ten-fold increase in current supercomputing capacity that includes an SGI Origin system. Both the new and existing machines are based on cache coherent non-uniform memory access architecture. Lower-Upper Symmetric Gauss-Seidel (LU-SGS) relaxation method has been implemented into both perfect and real gas flow codes including Real-Gas Aerodynamic Simulator (RGAS). However, the vectorized RGAS code runs inefficiently on cache-based shared-memory machines such as SGI system. Parallelization of a Gauss-Seidel method is nontrivial due to its sequential nature. The LU-SGS method has been vectorized on an oblique plane in INS3D-LU code that has been one of the base codes for NAS Parallel benchmarks. The oblique plane has been called a hyperplane by computer scientists. It is straightforward to parallelize a Gauss-Seidel method by partitioning the hyperplanes once they are formed. Another way of parallelization is to schedule processors like a pipeline using software. Both hyperplane and pipeline methods have been implemented using openMP directives. The present paper reports the performance of the parallelized RGAS code on SGI Origin and Altix systems.
Concurrent working memory load can facilitate selective attention: evidence for specialized load.

PubMed

Park, Soojin; Kim, Min-Shik; Chun, Marvin M

2007-10-01

Load theory predicts that concurrent working memory load impairs selective attention and increases distractor interference (N. Lavie, A. Hirst, J. W. de Fockert, & E. Viding). Here, the authors present new evidence that the type of concurrent working memory load determines whether load impairs selective attention or not. Working memory load was paired with a same/different matching task that required focusing on targets while ignoring distractors. When working memory items shared the same limited-capacity processing mechanisms with targets in the matching task, distractor interference increased. However, when working memory items shared processing with distractors in the matching task, distractor interference decreased, facilitating target selection. A specialized load account is proposed to describe the dissociable effects of working memory load on selective processing depending on whether the load overlaps with targets or with distractors. (c) 2007 APA
Human-machine interactions

DOEpatents

Forsythe, J Chris [Sandia Park, NM; Xavier, Patrick G [Albuquerque, NM; Abbott, Robert G [Albuquerque, NM; Brannon, Nathan G [Albuquerque, NM; Bernard, Michael L [Tijeras, NM; Speed, Ann E [Albuquerque, NM

2009-04-28

Digital technology utilizing a cognitive model based on human naturalistic decision-making processes, including pattern recognition and episodic memory, can reduce the dependency of human-machine interactions on the abilities of a human user and can enable a machine to more closely emulate human-like responses. Such a cognitive model can enable digital technology to use cognitive capacities fundamental to human-like communication and cooperation to interact with humans.
Transactive memory systems scale for couples: development and validation

PubMed Central

Hewitt, Lauren Y.; Roberts, Lynne D.

2015-01-01

People in romantic relationships can develop shared memory systems by pooling their cognitive resources, allowing each person access to more information but with less cognitive effort. Research examining such memory systems in romantic couples largely focuses on remembering word lists or performing lab-based tasks, but these types of activities do not capture the processes underlying couples’ transactive memory systems, and may not be representative of the ways in which romantic couples use their shared memory systems in everyday life. We adapted an existing measure of transactive memory systems for use with romantic couples (TMSS-C), and conducted an initial validation study. In total, 397 participants who each identified as being a member of a romantic relationship of at least 3 months duration completed the study. The data provided a good fit to the anticipated three-factor structure of the components of couples’ transactive memory systems (specialization, credibility and coordination), and there was reasonable evidence of both convergent and divergent validity, as well as strong evidence of test–retest reliability across a 2-week period. The TMSS-C provides a valuable tool that can quickly and easily capture the underlying components of romantic couples’ transactive memory systems. It has potential to help us better understand this intriguing feature of romantic relationships, and how shared memory systems might be associated with other important features of romantic relationships. PMID:25999873
OPMILL - MICRO COMPUTER PROGRAMMING ENVIRONMENT FOR CNC MILLING MACHINES THREE AXIS EQUATION PLOTTING CAPABILITIES

NASA Technical Reports Server (NTRS)

Ray, R. B.

1994-01-01

OPMILL is a computer operating system for a Kearney and Trecker milling machine that provides a fast and easy way to program machine part manufacture with an IBM compatible PC. The program gives the machinist an "equation plotter" feature which plots any set of equations that define axis moves (up to three axes simultaneously) and converts those equations to a machine milling program that will move a cutter along a defined path. Other supported functions include: drill with peck, bolt circle, tap, mill arc, quarter circle, circle, circle 2 pass, frame, frame 2 pass, rotary frame, pocket, loop and repeat, and copy blocks. The system includes a tool manager that can handle up to 25 tools and automatically adjusts tool length for each tool. It will display all tool information and stop the milling machine at the appropriate time. Information for the program is entered via a series of menus and compiled to the Kearney and Trecker format. The program can then be loaded into the milling machine, the tool path graphically displayed, and tool change information or the program in Kearney and Trecker format viewed. The program has a complete file handling utility that allows the user to load the program into memory from the hard disk, save the program to the disk with comments, view directories, merge a program on the disk with one in memory, save a portion of a program in memory, and change directories. OPMILL was developed on an IBM PS/2 running DOS 3.3 with 1 MB of RAM. OPMILL was written for an IBM PC or compatible 8088 or 80286 machine connected via an RS-232 port to a Kearney and Trecker Data Mill 700/C Control milling machine. It requires a "D:" drive (fixed-disk or virtual), a browse or text display utility, and an EGA or better display. Users wishing to modify and recompile the source code will also need Turbo BASIC, Turbo C, and Crescent Software's QuickPak for Turbo BASIC. IBM PC and IBM PS/2 are registered trademarks of International Business Machines. Turbo BASIC and Turbo C are trademarks of Borland International.
Operating System For Numerically Controlled Milling Machine

NASA Technical Reports Server (NTRS)

Ray, R. B.

1992-01-01

OPMILL program is operating system for Kearney and Trecker milling machine providing fast easy way to program manufacture of machine parts with IBM-compatible personal computer. Gives machinist "equation plotter" feature, which plots equations that define movements and converts equations to milling-machine-controlling program moving cutter along defined path. System includes tool-manager software handling up to 25 tools and automatically adjusts to account for each tool. Developed on IBM PS/2 computer running DOS 3.3 with 1 MB of random-access memory.
Single bus star connected reluctance drive and method

DOEpatents

Fahimi, Babak; Shamsi, Pourya

2016-05-10

A system and methods for operating a switched reluctance machine includes a controller, an inverter connected to the controller and to the switched reluctance machine, a hysteresis control connected to the controller and to the inverter, a set of sensors connected to the switched reluctance machine and to the controller, the switched reluctance machine further including a set of phases the controller further comprising a processor and a memory connected to the processor, wherein the processor programmed to execute a control process and a generation process.
Three-Dimensional Cellular Structures Enhanced By Shape Memory Alloys

NASA Technical Reports Server (NTRS)

Nathal, Michael V.; Krause, David L.; Wilmoth, Nathan G.; Bednarcyk, Brett A.; Baker, Eric H.

2014-01-01

This research effort explored lightweight structural concepts married with advanced smart materials to achieve a wide variety of benefits in airframe and engine components. Lattice block structures were cast from an aerospace structural titanium alloy Ti-6Al-4V and a NiTi shape memory alloy (SMA), and preliminary properties have been measured. A finite element-based modeling approach that can rapidly and accurately capture the deformation response of lattice architectures was developed. The Ti-6-4 and SMA material behavior was calibrated via experimental tests of ligaments machined from the lattice. Benchmark testing of complete lattice structures verified the main aspects of the model as well as demonstrated the advantages of the lattice structure. Shape memory behavior of a sample machined from a lattice block was also demonstrated.
DMA shared byte counters in a parallel computer

DOEpatents

Chen, Dong; Gara, Alan G.; Heidelberger, Philip; Vranas, Pavlos

2010-04-06

A parallel computer system is constructed as a network of interconnected compute nodes. Each of the compute nodes includes at least one processor, a memory and a DMA engine. The DMA engine includes a processor interface for interfacing with the at least one processor, DMA logic, a memory interface for interfacing with the memory, a DMA network interface for interfacing with the network, injection and reception byte counters, injection and reception FIFO metadata, and status registers and control registers. The injection FIFOs maintain memory locations of the injection FIFO metadata memory locations including its current head and tail, and the reception FIFOs maintain the reception FIFO metadata memory locations including its current head and tail. The injection byte counters and reception byte counters may be shared between messages.
Division of attention as a function of the number of steps, visual shifts, and memory load

NASA Technical Reports Server (NTRS)

Chechile, R. A.; Butler, K.; Gutowski, W.; Palmer, E. A.

1986-01-01

The effects on divided attention of visual shifts and long-term memory retrieval during a monitoring task are considered. A concurrent vigilance task was standardized under all experimental conditions. The results show that subjects can perform nearly perfectly on all of the time-shared tasks if long-term memory retrieval is not required for monitoring. With the requirement of memory retrieval, however, there was a large decrease in accuracy for all of the time-shared activities. It was concluded that the attentional demand of longterm memory retrieval is appreciable (even for a well-learned motor sequence), and thus memory retrieval results in a sizable reduction in the capability of subjects to divide their attention. A selected bibliography on the divided attention literature is provided.
Welcoming nora: a family event.

PubMed

Walsh, Allison J; Walsh, Paul R; Walsh, Jane M; Walsh, Gavin T

2011-01-01

In this column, Allison and Paul Walsh share the story of the birth of Nora, their third baby and their second child to be born at home. Allison and Paul share their individual memories of labor and birth. But their story is only part of the story of Nora's birth. Nora's birth was a family event, with Allison and Paul's other children very much part of the experience. Jane and Gavin share their own memories of their baby sister's birth.
Implementing Journaling in a Linux Shared Disk File System

NASA Technical Reports Server (NTRS)

Preslan, Kenneth W.; Barry, Andrew; Brassow, Jonathan; Cattelan, Russell; Manthei, Adam; Nygaard, Erling; VanOort, Seth; Teigland, David; Tilstra, Mike; O'Keefe, Matthew;

2000-01-01

In computer systems today, speed and responsiveness is often determined by network and storage subsystem performance. Faster, more scalable networking interfaces like Fibre Channel and Gigabit Ethernet provide the scaffolding from which higher performance computer systems implementations may be constructed, but new thinking is required about how machines interact with network-enabled storage devices. In this paper we describe how we implemented journaling in the Global File System (GFS), a shared-disk, cluster file system for Linux. Our previous three papers on GFS at the Mass Storage Symposium discussed our first three GFS implementations, their performance, and the lessons learned. Our fourth paper describes, appropriately enough, the evolution of GFS version 3 to version 4, which supports journaling and recovery from client failures. In addition, GFS scalability tests extending to 8 machines accessing 8 4-disk enclosures were conducted: these tests showed good scaling. We describe the GFS cluster infrastructure, which is necessary for proper recovery from machine and disk failures in a collection of machines sharing disks using GFS. Finally, we discuss the suitability of Linux for handling the big data requirements of supercomputing centers.

Colouring in the Blanks: Memory Drawings of the 1990 Kuwait Invasion

ERIC Educational Resources Information Center

Pepin-Wakefield, Yvonne

2009-01-01

This study used drawing tasks to examine the similarities and differences between females and males who shared a collective traumatic event in early childhood. Could these childhood memories be recorded, measured, and compared for gender differences in drawings by young adults who had shared a similar experience as children? Exploration of this…
Functions of Memory Sharing and Mother-Child Reminiscing Behaviors: Individual and Cultural Variations

ERIC Educational Resources Information Center

Kulkofsky, Sarah; Wang, Qi; Koh, Jessie Bee Kim

2009-01-01

This study examined maternal beliefs about the functions of memory sharing and the relations between these beliefs and mother-child reminiscing behaviors in a cross-cultural context. Sixty-three European American and 47 Chinese mothers completed an open-ended questionnaire concerning their beliefs about the functions of parent-child memory…
Stillbirth and stigma: the spoiling and repair of multiple social identities.

PubMed

Brierley-Jones, Lyn; Crawley, Rosalind; Lomax, Samantha; Ayers, Susan

This study investigated mothers' experiences surrounding stillbirth in the United Kingdom, their memory making and sharing opportunities, and the effect these opportunities had on them. Qualitative data were generated from free text responses to open-ended questions. Thematic content analysis revealed that "stigma" was experienced by most women and Goffman's (1963) work on stigma was subsequently used as an analytical framework. Results suggest that stillbirth can spoil the identities of "patient," "mother," and "full citizen." Stigma was reported as arising from interactions with professionals, family, friends, work colleagues, and even casual acquaintances. Stillbirth produces common learning experiences often requiring "identity work" (Murphy, 2012). Memory making and sharing may be important in this work and further research is needed. Stigma can reduce the memory sharing opportunities for women after stillbirth and this may explain some of the differential mental health effects of memory making after stillbirth that is documented in the literature.
Distributed computing for membrane-based modeling of action potential propagation.

PubMed

Porras, D; Rogers, J M; Smith, W M; Pollard, A E

2000-08-01

Action potential propagation simulations with physiologic membrane currents and macroscopic tissue dimensions are computationally expensive. We, therefore, analyzed distributed computing schemes to reduce execution time in workstation clusters by parallelizing solutions with message passing. Four schemes were considered in two-dimensional monodomain simulations with the Beeler-Reuter membrane equations. Parallel speedups measured with each scheme were compared to theoretical speedups, recognizing the relationship between speedup and code portions that executed serially. A data decomposition scheme based on total ionic current provided the best performance. Analysis of communication latencies in that scheme led to a load-balancing algorithm in which measured speedups at 89 +/- 2% and 75 +/- 8% of theoretical speedups were achieved in homogeneous and heterogeneous clusters of workstations. Speedups in this scheme with the Luo-Rudy dynamic membrane equations exceeded 3.0 with eight distributed workstations. Cluster speedups were comparable to those measured during parallel execution on a shared memory machine.
Basic Requirements for Systems Software Research and Development

NASA Technical Reports Server (NTRS)

Kuszmaul, Chris; Nitzberg, Bill

1996-01-01

Our success over the past ten years evaluating and developing advanced computing technologies has been due to a simple research and development (R/D) model. Our model has three phases: (a) evaluating the state-of-the-art, (b) identifying problems and creating innovations, and (c) developing solutions, improving the state- of-the-art. This cycle has four basic requirements: a large production testbed with real users, a diverse collection of state-of-the-art hardware, facilities for evalua- tion of emerging technologies and development of innovations, and control over system management on these testbeds. Future research will be irrelevant and future products will not work if any of these requirements is eliminated. In order to retain our effectiveness, the numerical aerospace simulator (NAS) must replace out-of-date production testbeds in as timely a fashion as possible, and cannot afford to ignore innovative designs such as new distributed shared memory machines, clustered commodity-based computers, and multi-threaded architectures.

Gregarious Data Re-structuring in a Many Core Architecture

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shrestha, Sunil; Manzano Franco, Joseph B.; Marquez, Andres

this paper, we have developed a new methodology that takes in consideration the access patterns from a single parallel actor (e.g. a thread), as well as, the access patterns of “grouped” parallel actors that share a resource (e.g. a distributed Level 3 cache). We start with a hierarchical tile code for our target machine and apply a series of transformations at the tile level to improve data residence in a given memory hierarchy level. The contribution of this paper includes (a) collaborative data restructuring for group reuse and (b) low overhead transformation technique to improve access pattern and bring closelymore » connected data elements together. Preliminary results in a many core architecture, Tilera TileGX, shows promising improvements over optimized OpenMP code (up to 31% increase in GFLOPS) and over our own previous work on fine grained runtimes (up to 16%) for selected kernels« less
Computers and the orthopaedic office.

PubMed

Berumen, Edmundo; Barllow, Fidel Dobarganes; Fong, Fransisco Javier; Lopez, Jorge Arturo

2002-01-01

The advance of today's medicine could be linked very closely to the history of computers through the last twenty years. In the beginning the first attempt to build a computer was trying to help us with mathematical calculations. This has changed recently and computers are now linked to x-ray machines, CT scanners, and MRIs. Being able to share information is one of the goals of the future. Today's computer technology has helped a great deal to allow orthopaedic surgeons from around the world to consult on a difficult case or to become a part of a large database. Obtaining the results from a method of treatment using a multicentric information study can be done on a regular basis. In the future, computers will help us to retrieve information from patients' clinical history directly from a hospital database or by portable memory cards that will carry every radiograph or video from previous surgeries.
Urgent Virtual Machine Eviction with Enlightened Post-Copy

DTIC Science & Technology

2015-12-01

memory is in use, almost all of which is by Memcached. MySQL : The VMs run MySQL 5.6, and the clients execute OLTPBenchmark [3] using the Twitter...workload with scale factor of 960. The VMs are each allocated 16 cores and 30 GB of memory, and MySQL is configured with a 16 GB buffer pool in memory. The...operation mix for 5 minutes as a warm-up. At the time of migration, MySQL uses approximately 17 GB of memory, and almost all of the 30 GB memory is
A simple modern correctness condition for a space-based high-performance multiprocessor

NASA Technical Reports Server (NTRS)

Probst, David K.; Li, Hon F.

1992-01-01

A number of U.S. national programs, including space-based detection of ballistic missile launches, envisage putting significant computing power into space. Given sufficient progress in low-power VLSI, multichip-module packaging and liquid-cooling technologies, we will see design of high-performance multiprocessors for individual satellites. In very high speed implementations, performance depends critically on tolerating large latencies in interprocessor communication; without latency tolerance, performance is limited by the vastly differing time scales in processor and data-memory modules, including interconnect times. The modern approach to tolerating remote-communication cost in scalable, shared-memory multiprocessors is to use a multithreaded architecture, and alter the semantics of shared memory slightly, at the price of forcing the programmer either to reason about program correctness in a relaxed consistency model or to agree to program in a constrained style. The literature on multiprocessor correctness conditions has become increasingly complex, and sometimes confusing, which may hinder its practical application. We propose a simple modern correctness condition for a high-performance, shared-memory multiprocessor; the correctness condition is based on a simple interface between the multiprocessor architecture and a high-performance, shared-memory multiprocessor; the correctness condition is based on a simple interface between the multiprocessor architecture and the parallel programming system.
Brain Information Sharing During Visual Short-Term Memory Binding Yields a Memory Biomarker for Familial Alzheimer's Disease.

PubMed

Parra, Mario A; Mikulan, Ezequiel; Trujillo, Natalia; Sala, Sergio Della; Lopera, Francisco; Manes, Facundo; Starr, John; Ibanez, Agustin

2017-01-01

Alzheimer's disease (AD) as a disconnection syndrome which disrupts both brain information sharing and memory binding functions. The extent to which these two phenotypic expressions share pathophysiological mechanisms remains unknown. To unveil the electrophysiological correlates of integrative memory impairments in AD towards new memory biomarkers for its prodromal stages. Patients with 100% risk of familial AD (FAD) and healthy controls underwent assessment with the Visual Short-Term Memory binding test (VSTMBT) while we recorded their EEG. We applied a novel brain connectivity method (Weighted Symbolic Mutual Information) to EEG data. Patients showed significant deficits during the VSTMBT. A reduction of brain connectivity was observed during resting as well as during correct VSTM binding, particularly over frontal and posterior regions. An increase of connectivity was found during VSTM binding performance over central regions. While decreased connectivity was found in cases in more advanced stages of FAD, increased brain connectivity appeared in cases in earlier stages. Such altered patterns of task-related connectivity were found in 89% of the assessed patients. VSTM binding in the prodromal stages of FAD are associated to altered patterns of brain connectivity thus confirming the link between integrative memory deficits and impaired brain information sharing in prodromal FAD. While significant loss of brain connectivity seems to be a feature of the advanced stages of FAD increased brain connectivity characterizes its earlier stages. These findings are discussed in the light of recent proposals about the earliest pathophysiological mechanisms of AD and their clinical expression. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Checkpoint repair for high-performance out-of-order execution machines

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hwu, W.M.W.; Patt, Y.N.

Out-or-order execution and branch prediction are two mechanisms that can be used profitably in the design of supercomputers to increase performance. Proper exception handling and branch prediction miss handling in an out-of-order execution machine to require some kind of repair mechanism which can restore the machine to a known previous state. In this paper the authors present a class of repair mechanisms using the concept of checkpointing. The authors derive several properties of checkpoint repair mechanisms. In addition, they provide algorithms for performing checkpoint repair that incur little overhead in time and modest cost in hardware, which also require nomore » additional complexity or time for use with write-back cache memory systems than they do with write-through cache memory systems, contrary to statements made by previous researchers.« less
Binary classification of items of interest in a repeatable process

DOEpatents

Abell, Jeffrey A.; Spicer, John Patrick; Wincek, Michael Anthony; Wang, Hui; Chakraborty, Debejyo

2014-06-24

A system includes host and learning machines in electrical communication with sensors positioned with respect to an item of interest, e.g., a weld, and memory. The host executes instructions from memory to predict a binary quality status of the item. The learning machine receives signals from the sensor(s), identifies candidate features, and extracts features from the candidates that are more predictive of the binary quality status relative to other candidate features. The learning machine maps the extracted features to a dimensional space that includes most of the items from a passing binary class and excludes all or most of the items from a failing binary class. The host also compares the received signals for a subsequent item of interest to the dimensional space to thereby predict, in real time, the binary quality status of the subsequent item of interest.
Decomposing the relationship between cognitive functioning and self-referent memory beliefs in older adulthood: What’s memory got to do with it?

PubMed Central

Payne, Brennan R.; Gross, Alden L.; Hill, Patrick L.; Parisi, Jeanine M.; Rebok, George W.; Stine-Morrow, Elizabeth A. L.

2018-01-01

With advancing age, episodic memory performance shows marked declines along with concurrent reports of lower subjective memory beliefs. Given that normative age-related declines in episodic memory co-occur with declines in other cognitive domains, we examined the relationship between memory beliefs and multiple domains of cognitive functioning. Confirmatory bi-factor structural equation models were used to parse the shared and independent variance among factors representing episodic memory, psychomotor speed, and executive reasoning in one large cohort study (Senior Odyssey, N = 462), and replicated using another large cohort of healthy older adults (ACTIVE, N = 2,802). Accounting for a general fluid cognitive functioning factor (comprised of the shared variance among measures of episodic memory, speed, and reasoning) attenuated the relationship between objective memory performance and subjective memory beliefs in both samples. Moreover, the general cognitive functioning factor was the strongest predictor of memory beliefs in both samples. These findings are consistent with the notion that dispositional memory beliefs may reflect perceptions of cognition more broadly. This may be one reason why memory beliefs have broad predictive validity for interventions that target fluid cognitive ability. PMID:27685541
Decomposing the relationship between cognitive functioning and self-referent memory beliefs in older adulthood: what's memory got to do with it?

PubMed

Payne, Brennan R; Gross, Alden L; Hill, Patrick L; Parisi, Jeanine M; Rebok, George W; Stine-Morrow, Elizabeth A L

2017-07-01

With advancing age, episodic memory performance shows marked declines along with concurrent reports of lower subjective memory beliefs. Given that normative age-related declines in episodic memory co-occur with declines in other cognitive domains, we examined the relationship between memory beliefs and multiple domains of cognitive functioning. Confirmatory bi-factor structural equation models were used to parse the shared and independent variance among factors representing episodic memory, psychomotor speed, and executive reasoning in one large cohort study (Senior Odyssey, N = 462), and replicated using another large cohort of healthy older adults (ACTIVE, N = 2802). Accounting for a general fluid cognitive functioning factor (comprised of the shared variance among measures of episodic memory, speed, and reasoning) attenuated the relationship between objective memory performance and subjective memory beliefs in both samples. Moreover, the general cognitive functioning factor was the strongest predictor of memory beliefs in both samples. These findings are consistent with the notion that dispositional memory beliefs may reflect perceptions of cognition more broadly. This may be one reason why memory beliefs have broad predictive validity for interventions that target fluid cognitive ability.
Trinary Associative Memory Would Recognize Machine Parts

NASA Technical Reports Server (NTRS)

Liu, Hua-Kuang; Awwal, Abdul Ahad S.; Karim, Mohammad A.

1991-01-01

Trinary associative memory combines merits and overcomes major deficiencies of unipolar and bipolar logics by combining them in three-valued logic that reverts to unipolar or bipolar binary selectively, as needed to perform specific tasks. Advantage of associative memory: one obtains access to all parts of it simultaneously on basis of content, rather than address, of data. Consequently, used to exploit fully parallelism and speed of optical computing.
Method for prefetching non-contiguous data structures

DOEpatents

Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton On Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Hoenicke, Dirk [Ossining, NY; Ohmacht, Martin [Brewster, NY; Steinmacher-Burow, Burkhard D [Mount Kisco, NY; Takken, Todd E [Mount Kisco, NY; Vranas, Pavlos M [Bedford Hills, NY

2009-05-05

A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Each processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple perfecting for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefect rather than some other predictive algorithm. This enables hardware to effectively prefect memory access patterns that are non-contiguous, but repetitive.
On Why It Is Impossible to Prove that the BDX90 Dispatcher Implements a Time-sharing System

NASA Technical Reports Server (NTRS)

Boyer, R. S.; Moore, J. S.

1983-01-01

The Software Implemented Fault Tolerance SIFT system, is written in PASCAL except for about a page of machine code. The SIFT system implements a small time sharing system in which PASCAL programs for separate application tasks are executed according to a schedule with real time constraints. The PASCAL language has no provision for handling the notion of an interrupt such as the B930 clock interrupt. The PASCAL language also lacks the notion of running a PASCAL subroutine for a given amount of time, suspending it, saving away the suspension, and later activating the suspension. Machine code was used to overcome these inadequacies of PASCAL. Code which handles clock interrupts and suspends processes is called a dispatcher. The time sharing/virtual machine idea is completely destroyed by the reconfiguration task. After termination of the reconfiguration task, the tasks run by the dispatcher have no relation to those run before reconfiguration. It is impossible to view the dispatcher as a time-sharing system implementing virtual BDX930s running concurrently when one process can wipe out the others.
Cooperative Data Sharing: Simple Support for Clusters of SMP Nodes

NASA Technical Reports Server (NTRS)

DiNucci, David C.; Balley, David H. (Technical Monitor)

1997-01-01

Libraries like PVM and MPI send typed messages to allow for heterogeneous cluster computing. Lower-level libraries, such as GAM, provide more efficient access to communication by removing the need to copy messages between the interface and user space in some cases. still lower-level interfaces, such as UNET, get right down to the hardware level to provide maximum performance. However, these are all still interfaces for passing messages from one process to another, and have limited utility in a shared-memory environment, due primarily to the fact that message passing is just another term for copying. This drawback is made more pertinent by today's hybrid architectures (e.g. clusters of SMPs), where it is difficult to know beforehand whether two communicating processes will share memory. As a result, even portable language tools (like HPF compilers) must either map all interprocess communication, into message passing with the accompanying performance degradation in shared memory environments, or they must check each communication at run-time and implement the shared-memory case separately for efficiency. Cooperative Data Sharing (CDS) is a single user-level API which abstracts all communication between processes into the sharing and access coordination of memory regions, in a model which might be described as "distributed shared messages" or "large-grain distributed shared memory". As a result, the user programs to a simple latency-tolerant abstract communication specification which can be mapped efficiently to either a shared-memory or message-passing based run-time system, depending upon the available architecture. Unlike some distributed shared memory interfaces, the user still has complete control over the assignment of data to processors, the forwarding of data to its next likely destination, and the queuing of data until it is needed, so even the relatively high latency present in clusters can be accomodated. CDS does not require special use of an MMU, which can add overhead to some DSM systems, and does not require an SPMD programming model. unlike some message-passing interfaces, CDS allows the user to implement efficient demand-driven applications where processes must "fight" over data, and does not perform copying if processes share memory and do not attempt concurrent writes. CDS also supports heterogeneous computing, dynamic process creation, handlers, and a very simple thread-arbitration mechanism. Additional support for array subsections is currently being considered. The CDS1 API, which forms the kernel of CDS, is built primarily upon only 2 communication primitives, one process initiation primitive, and some data translation (and marshalling) routines, memory allocation routines, and priority control routines. The entire current collection of 28 routines provides enough functionality to implement most (or all) of MPI 1 and 2, which has a much larger interface consisting of hundreds of routines. still, the API is small enough to consider integrating into standard os interfaces for handling inter-process communication in a network-independent way. This approach would also help to solve many of the problems plaguing other higher-level standards such as MPI and PVM which must, in some cases, "play OS" to adequately address progress and process control issues. The CDS2 API, a higher level of interface roughly equivalent in functionality to MPI and to be built entirely upon CDS1, is still being designed. It is intended to add support for the equivalent of communicators, reduction and other collective operations, process topologies, additional support for process creation, and some automatic memory management. CDS2 will not exactly match MPI, because the copy-free semantics of communication from CDS1 will be supported. CDS2 application programs will be free to carefully also use CDS1. CDS1 has been implemented on networks of workstations running unmodified Unix-based operating systems, using UDP/IP and vendor-supplied high- performance locks. Although its inter-node performance is currently unimpressive due to rudimentary implementation technique, it even now outperforms highly-optimized MPI implementation on intra-node communication due to its support for non-copy communication. The similarity of the CDS1 architecture to that of other projects such as UNET and TRAP suggests that the inter-node performance can be increased significantly to surpass MPI or PVM, and it may be possible to migrate some of its functionality to communication controllers.
A shared resource between declarative memory and motor memory.

PubMed

Keisler, Aysha; Shadmehr, Reza

2010-11-03

The neural systems that support motor adaptation in humans are thought to be distinct from those that support the declarative system. Yet, during motor adaptation changes in motor commands are supported by a fast adaptive process that has important properties (rapid learning, fast decay) that are usually associated with the declarative system. The fast process can be contrasted to a slow adaptive process that also supports motor memory, but learns gradually and shows resistance to forgetting. Here we show that after people stop performing a motor task, the fast motor memory can be disrupted by a task that engages declarative memory, but the slow motor memory is immune from this interference. Furthermore, we find that the fast/declarative component plays a major role in the consolidation of the slow motor memory. Because of the competitive nature of declarative and nondeclarative memory during consolidation, impairment of the fast/declarative component leads to improvements in the slow/nondeclarative component. Therefore, the fast process that supports formation of motor memory is not only neurally distinct from the slow process, but it shares critical resources with the declarative memory system.
A shared resource between declarative memory and motor memory

PubMed Central

Keisler, Aysha; Shadmehr, Reza

2010-01-01

The neural systems that support motor adaptation in humans are thought to be distinct from those that support the declarative system. Yet, during motor adaptation changes in motor commands are supported by a fast adaptive process that has important properties (rapid learning, fast decay) that are usually associated with the declarative system. The fast process can be contrasted to a slow adaptive process that also supports motor memory, but learns gradually and shows resistance to forgetting. Here we show that after people stop performing a motor task, the fast motor memory can be disrupted by a task that engages declarative memory, but the slow motor memory is immune from this interference. Furthermore, we find that the fast/declarative component plays a major role in the consolidation of the slow motor memory. Because of the competitive nature of declarative and non-declarative memory during consolidation, impairment of the fast/declarative component leads to improvements in the slow/non-declarative component. Therefore, the fast process that supports formation of motor memory is not only neurally distinct from the slow process, but it shares critical resources with the declarative memory system. PMID:21048140
Discrete-Slots Models of Visual Working-Memory Response Times

PubMed Central

Donkin, Christopher; Nosofsky, Robert M.; Gold, Jason M.; Shiffrin, Richard M.

2014-01-01

Much recent research has aimed to establish whether visual working memory (WM) is better characterized by a limited number of discrete all-or-none slots or by a continuous sharing of memory resources. To date, however, researchers have not considered the response-time (RT) predictions of discrete-slots versus shared-resources models. To complement the past research in this field, we formalize a family of mixed-state, discrete-slots models for explaining choice and RTs in tasks of visual WM change detection. In the tasks under investigation, a small set of visual items is presented, followed by a test item in 1 of the studied positions for which a change judgment must be made. According to the models, if the studied item in that position is retained in 1 of the discrete slots, then a memory-based evidence-accumulation process determines the choice and the RT; if the studied item in that position is missing, then a guessing-based accumulation process operates. Observed RT distributions are therefore theorized to arise as probabilistic mixtures of the memory-based and guessing distributions. We formalize an analogous set of continuous shared-resources models. The model classes are tested on individual subjects with both qualitative contrasts and quantitative fits to RT-distribution data. The discrete-slots models provide much better qualitative and quantitative accounts of the RT and choice data than do the shared-resources models, although there is some evidence for “slots plus resources” when memory set size is very small. PMID:24015956
Properties of a memory network in psychology

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wedemann, Roseli S.; Donangelo, Raul; Carvalho, Luis A. V. de

We have previously described neurotic psychopathology and psychoanalytic working-through by an associative memory mechanism, based on a neural network model, where memory was modelled by a Boltzmann machine (BM). Since brain neural topology is selectively structured, we simulated known microscopic mechanisms that control synaptic properties, showing that the network self-organizes to a hierarchical, clustered structure. Here, we show some statistical mechanical properties of the complex networks which result from this self-organization. They indicate that a generalization of the BM may be necessary to model memory.
Properties of a memory network in psychology

NASA Astrophysics Data System (ADS)

Wedemann, Roseli S.; Donangelo, Raul; de Carvalho, Luís A. V.

2007-12-01

We have previously described neurotic psychopathology and psychoanalytic working-through by an associative memory mechanism, based on a neural network model, where memory was modelled by a Boltzmann machine (BM). Since brain neural topology is selectively structured, we simulated known microscopic mechanisms that control synaptic properties, showing that the network self-organizes to a hierarchical, clustered structure. Here, we show some statistical mechanical properties of the complex networks which result from this self-organization. They indicate that a generalization of the BM may be necessary to model memory.
Sparse distributed memory

NASA Technical Reports Server (NTRS)

Denning, Peter J.

1989-01-01

Sparse distributed memory was proposed be Pentti Kanerva as a realizable architecture that could store large patterns and retrieve them based on partial matches with patterns representing current sensory inputs. This memory exhibits behaviors, both in theory and in experiment, that resemble those previously unapproached by machines - e.g., rapid recognition of faces or odors, discovery of new connections between seemingly unrelated ideas, continuation of a sequence of events when given a cue from the middle, knowing that one doesn't know, or getting stuck with an answer on the tip of one's tongue. These behaviors are now within reach of machines that can be incorporated into the computing systems of robots capable of seeing, talking, and manipulating. Kanerva's theory is a break with the Western rationalistic tradition, allowing a new interpretation of learning and cognition that respects biology and the mysteries of individual human beings.
Shared Representations in Language Processing and Verbal Short-Term Memory: The Case of Grammatical Gender

ERIC Educational Resources Information Center

Schweppe, Judith; Rummer, Ralf

2007-01-01

The general idea of language-based accounts of short-term memory is that retention of linguistic materials is based on representations within the language processing system. In the present sentence recall study, we address the question whether the assumption of shared representations holds for morphosyntactic information (here: grammatical gender…

The Precategorical Nature of Visual Short-Term Memory

ERIC Educational Resources Information Center

Quinlan, Philip T.; Cohen, Dale J.

2016-01-01

We conducted a series of recognition experiments that assessed whether visual short-term memory (VSTM) is sensitive to shared category membership of to-be-remembered (tbr) images of common objects. In Experiment 1 some of the tbr items shared the same basic level category (e.g., hand axe): Such items were no better retained than others. In the…
Fault tolerant onboard packet switch architecture for communication satellites: Shared memory per beam approach

NASA Technical Reports Server (NTRS)

Shalkhauser, Mary JO; Quintana, Jorge A.; Soni, Nitin J.

1994-01-01

The NASA Lewis Research Center is developing a multichannel communication signal processing satellite (MCSPS) system which will provide low data rate, direct to user, commercial communications services. The focus of current space segment developments is a flexible, high-throughput, fault tolerant onboard information switching processor. This information switching processor (ISP) is a destination-directed packet switch which performs both space and time switching to route user information among numerous user ground terminals. Through both industry study contracts and in-house investigations, several packet switching architectures were examined. A contention-free approach, the shared memory per beam architecture, was selected for implementation. The shared memory per beam architecture, fault tolerance insertion, implementation, and demonstration plans are described.
Comparison of Origin 2000 and Origin 3000 Using NAS Parallel Benchmarks

NASA Technical Reports Server (NTRS)

Turney, Raymond D.

2001-01-01

This report describes results of benchmark tests on the Origin 3000 system currently being installed at the NASA Ames National Advanced Supercomputing facility. This machine will ultimately contain 1024 R14K processors. The first part of the system, installed in November, 2000 and named mendel, is an Origin 3000 with 128 R12K processors. For comparison purposes, the tests were also run on lomax, an Origin 2000 with R12K processors. The BT, LU, and SP application benchmarks in the NAS Parallel Benchmark Suite and the kernel benchmark FT were chosen to determine system performance and measure the impact of changes on the machine as it evolves. Having been written to measure performance on Computational Fluid Dynamics applications, these benchmarks are assumed appropriate to represent the NAS workload. Since the NAS runs both message passing (MPI) and shared-memory, compiler directive type codes, both MPI and OpenMP versions of the benchmarks were used. The MPI versions used were the latest official release of the NAS Parallel Benchmarks, version 2.3. The OpenMP versiqns used were PBN3b2, a beta version that is in the process of being released. NPB 2.3 and PBN 3b2 are technically different benchmarks, and NPB results are not directly comparable to PBN results.
Complementary Machine Intelligence and Human Intelligence in Virtual Teaching Assistant for Tutoring Program Tracing

ERIC Educational Resources Information Center

Chou, Chih-Yueh; Huang, Bau-Hung; Lin, Chi-Jen

2011-01-01

This study proposes a virtual teaching assistant (VTA) to share teacher tutoring tasks in helping students practice program tracing and proposes two mechanisms of complementing machine intelligence and human intelligence to develop the VTA. The first mechanism applies machine intelligence to extend human intelligence (teacher answers) to evaluate…
Large-Scale Image Analytics Using Deep Learning

NASA Astrophysics Data System (ADS)

Ganguly, S.; Nemani, R. R.; Basu, S.; Mukhopadhyay, S.; Michaelis, A.; Votava, P.

2014-12-01

High resolution land cover classification maps are needed to increase the accuracy of current Land ecosystem and climate model outputs. Limited studies are in place that demonstrates the state-of-the-art in deriving very high resolution (VHR) land cover products. In addition, most methods heavily rely on commercial softwares that are difficult to scale given the region of study (e.g. continents to globe). Complexities in present approaches relate to (a) scalability of the algorithm, (b) large image data processing (compute and memory intensive), (c) computational cost, (d) massively parallel architecture, and (e) machine learning automation. In addition, VHR satellite datasets are of the order of terabytes and features extracted from these datasets are of the order of petabytes. In our present study, we have acquired the National Agricultural Imaging Program (NAIP) dataset for the Continental United States at a spatial resolution of 1-m. This data comes as image tiles (a total of quarter million image scenes with ~60 million pixels) and has a total size of ~100 terabytes for a single acquisition. Features extracted from the entire dataset would amount to ~8-10 petabytes. In our proposed approach, we have implemented a novel semi-automated machine learning algorithm rooted on the principles of "deep learning" to delineate the percentage of tree cover. In order to perform image analytics in such a granular system, it is mandatory to devise an intelligent archiving and query system for image retrieval, file structuring, metadata processing and filtering of all available image scenes. Using the Open NASA Earth Exchange (NEX) initiative, which is a partnership with Amazon Web Services (AWS), we have developed an end-to-end architecture for designing the database and the deep belief network (following the distbelief computing model) to solve a grand challenge of scaling this process across quarter million NAIP tiles that cover the entire Continental United States. The AWS core components that we use to solve this problem are DynamoDB along with S3 for database query and storage, ElastiCache shared memory architecture for image segmentation, Elastic Map Reduce (EMR) for image feature extraction, and the memory optimized Elastic Cloud Compute (EC2) for the learning algorithm.
Optical memories in digital computing

NASA Technical Reports Server (NTRS)

Alford, C. O.; Gaylord, T. K.

1979-01-01

High capacity optical memories with relatively-high data-transfer rate and multiport simultaneous access capability may serve as basis for new computer architectures. Several computer structures that might profitably use memories are: a) simultaneous record-access system, b) simultaneously-shared memory computer system, and c) parallel digital processing structure.
Facilitating arrhythmia simulation: the method of quantitative cellular automata modeling and parallel running

PubMed Central

Zhu, Hao; Sun, Yan; Rajagopal, Gunaretnam; Mondry, Adrian; Dhar, Pawan

2004-01-01

Background Many arrhythmias are triggered by abnormal electrical activity at the ionic channel and cell level, and then evolve spatio-temporally within the heart. To understand arrhythmias better and to diagnose them more precisely by their ECG waveforms, a whole-heart model is required to explore the association between the massively parallel activities at the channel/cell level and the integrative electrophysiological phenomena at organ level. Methods We have developed a method to build large-scale electrophysiological models by using extended cellular automata, and to run such models on a cluster of shared memory machines. We describe here the method, including the extension of a language-based cellular automaton to implement quantitative computing, the building of a whole-heart model with Visible Human Project data, the parallelization of the model on a cluster of shared memory computers with OpenMP and MPI hybrid programming, and a simulation algorithm that links cellular activity with the ECG. Results We demonstrate that electrical activities at channel, cell, and organ levels can be traced and captured conveniently in our extended cellular automaton system. Examples of some ECG waveforms simulated with a 2-D slice are given to support the ECG simulation algorithm. A performance evaluation of the 3-D model on a four-node cluster is also given. Conclusions Quantitative multicellular modeling with extended cellular automata is a highly efficient and widely applicable method to weave experimental data at different levels into computational models. This process can be used to investigate complex and collective biological activities that can be described neither by their governing differentiation equations nor by discrete parallel computation. Transparent cluster computing is a convenient and effective method to make time-consuming simulation feasible. Arrhythmias, as a typical case, can be effectively simulated with the methods described. PMID:15339335
Reader set encoding for directory of shared cache memory in multiprocessor system

DOEpatents

Ahn, Dnaiel; Ceze, Luis H.; Gara, Alan; Ohmacht, Martin; Xiaotong, Zhuang

2014-06-10

In a parallel processing system with speculative execution, conflict checking occurs in a directory lookup of a cache memory that is shared by all processors. In each case, the same physical memory address will map to the same set of that cache, no matter which processor originated that access. The directory includes a dynamic reader set encoding, indicating what speculative threads have read a particular line. This reader set encoding is used in conflict checking. A bitset encoding is used to specify particular threads that have read the line.
Insights on consciousness from taste memory research.

PubMed

Gallo, Milagros

2016-01-01

Taste research in rodents supports the relevance of memory in order to determine the content of consciousness by modifying both taste perception and later action. Associated with this issue is the fact that taste and visual modalities share anatomical circuits traditionally related to conscious memory. This challenges the view of taste memory as a type of non-declarative unconscious memory.
Memory-Intensive Benchmarks: IRAM vs. Cache-Based Machines

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Gaeke, Brian R.; Husbands, Parry; Li, Xiaoye S.; Oliker, Leonid; Yelick, Katherine A.; Biegel, Bryan (Technical Monitor)

2002-01-01

The increasing gap between processor and memory performance has lead to new architectural models for memory-intensive applications. In this paper, we explore the performance of a set of memory-intensive benchmarks and use them to compare the performance of conventional cache-based microprocessors to a mixed logic and DRAM processor called VIRAM. The benchmarks are based on problem statements, rather than specific implementations, and in each case we explore the fundamental hardware requirements of the problem, as well as alternative algorithms and data structures that can help expose fine-grained parallelism or simplify memory access patterns. The benchmarks are characterized by their memory access patterns, their basic control structures, and the ratio of computation to memory operation.
Ringo: Interactive Graph Analytics on Big-Memory Machines

PubMed Central

Perez, Yonathan; Sosič, Rok; Banerjee, Arijit; Puttagunta, Rohan; Raison, Martin; Shah, Pararth; Leskovec, Jure

2016-01-01

We present Ringo, a system for analysis of large graphs. Graphs provide a way to represent and analyze systems of interacting objects (people, proteins, webpages) with edges between the objects denoting interactions (friendships, physical interactions, links). Mining graphs provides valuable insights about individual objects as well as the relationships among them. In building Ringo, we take advantage of the fact that machines with large memory and many cores are widely available and also relatively affordable. This allows us to build an easy-to-use interactive high-performance graph analytics system. Graphs also need to be built from input data, which often resides in the form of relational tables. Thus, Ringo provides rich functionality for manipulating raw input data tables into various kinds of graphs. Furthermore, Ringo also provides over 200 graph analytics functions that can then be applied to constructed graphs. We show that a single big-memory machine provides a very attractive platform for performing analytics on all but the largest graphs as it offers excellent performance and ease of use as compared to alternative approaches. With Ringo, we also demonstrate how to integrate graph analytics with an iterative process of trial-and-error data exploration and rapid experimentation, common in data mining workloads. PMID:27081215
Ringo: Interactive Graph Analytics on Big-Memory Machines.

PubMed

Perez, Yonathan; Sosič, Rok; Banerjee, Arijit; Puttagunta, Rohan; Raison, Martin; Shah, Pararth; Leskovec, Jure

2015-01-01

We present Ringo, a system for analysis of large graphs. Graphs provide a way to represent and analyze systems of interacting objects (people, proteins, webpages) with edges between the objects denoting interactions (friendships, physical interactions, links). Mining graphs provides valuable insights about individual objects as well as the relationships among them. In building Ringo, we take advantage of the fact that machines with large memory and many cores are widely available and also relatively affordable. This allows us to build an easy-to-use interactive high-performance graph analytics system. Graphs also need to be built from input data, which often resides in the form of relational tables. Thus, Ringo provides rich functionality for manipulating raw input data tables into various kinds of graphs. Furthermore, Ringo also provides over 200 graph analytics functions that can then be applied to constructed graphs. We show that a single big-memory machine provides a very attractive platform for performing analytics on all but the largest graphs as it offers excellent performance and ease of use as compared to alternative approaches. With Ringo, we also demonstrate how to integrate graph analytics with an iterative process of trial-and-error data exploration and rapid experimentation, common in data mining workloads.
Confessions of a robot lobotomist

NASA Technical Reports Server (NTRS)

Gottshall, R. Marc

1994-01-01

Since its inception, numerically controlled (NC) machining methods have been used throughout the aerospace industry to mill, drill, and turn complex shapes by sequentially stepping through motion programs. However, the recent demand for more precision, faster feeds, exotic sensors, and branching execution have existing computer numerical control (CNC) and distributed numerical control (DNC) systems running at maximum controller capacity. Typical disadvantages of current CNC's include fixed memory capacities, limited communication ports, and the use of multiple control languages. The need to tailor CNC's to meet specific applications, whether it be expanded memory, additional communications, or integrated vision, often requires replacing the original controller supplied with the commercial machine tool with a more powerful and capable system. This paper briefly describes the process and equipment requirements for new controllers and their evolutionary implementation in an aerospace environment. The process of controller retrofit with currently available machines is examined, along with several case studies and their computational and architectural implications.
The time machine in our mind.

PubMed

Stocker, Kurt

2012-04-01

This article provides the first comprehensive conceptual account for the imagistic mental machinery that allows us to travel through time--for the time machine in our mind. It is argued that language reveals this imagistic machine and how we use it. Findings from a range of cognitive fields are theoretically unified and a recent proposal about spatialized mental time travel is elaborated on. The following novel distinctions are offered: external versus internal viewing of time; ''watching" time versus projective ''travel" through time; optional versus obligatory mental time travel; mental time travel into anteriority or posteriority versus mental time travel into the past or future; single mental time travel versus nested dual mental time travel; mental time travel in episodic memory versus mental time travel in semantic memory; and ''seeing" versus ''sensing" mental imagery. Theoretical, empirical, and applied implications are discussed. Copyright © 2012 Cognitive Science Society, Inc.
Investigating Ground Swarm Robotics Using Agent Based Simulation

DTIC Science & Technology

2006-12-01

Incorporation of virtual pheromones as a shared memory map is modeled as an additional capability that is found to enhance the robustness and reliability of the...virtual pheromones as a shared memory map is modeled as an additional capability that is found to enhance the robustness and reliability of the swarm... PHEROMONES .......................................... 42 1. Repel Friends under Inorganic SA.................................................. 45 2. Max
Machine Shop (Bldg. 163) north bay, east end interior looking ...

Library of Congress Historic Buildings Survey, Historic Engineering Record, Historic Landscapes Survey

Machine Shop (Bldg. 163) north bay, east end interior looking northeast. Note the 250-ton Shaw bridge crane on the upper rails and two smaller P&H bridge cranes on the lower rails. Three cranes shared the lower rails - Atchison, Topeka, Santa Fe Railroad, Albuquerque Shops, Machine Shop, 908 Second Street, Southwest, Albuquerque, Bernalillo County, NM
Exascale Hardware Architectures Working Group

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hemmert, S; Ang, J; Chiang, P

2011-03-15

The ASC Exascale Hardware Architecture working group is challenged to provide input on the following areas impacting the future use and usability of potential exascale computer systems: processor, memory, and interconnect architectures, as well as the power and resilience of these systems. Going forward, there are many challenging issues that will need to be addressed. First, power constraints in processor technologies will lead to steady increases in parallelism within a socket. Additionally, all cores may not be fully independent nor fully general purpose. Second, there is a clear trend toward less balanced machines, in terms of compute capability compared tomore » memory and interconnect performance. In order to mitigate the memory issues, memory technologies will introduce 3D stacking, eventually moving on-socket and likely on-die, providing greatly increased bandwidth but unfortunately also likely providing smaller memory capacity per core. Off-socket memory, possibly in the form of non-volatile memory, will create a complex memory hierarchy. Third, communication energy will dominate the energy required to compute, such that interconnect power and bandwidth will have a significant impact. All of the above changes are driven by the need for greatly increased energy efficiency, as current technology will prove unsuitable for exascale, due to unsustainable power requirements of such a system. These changes will have the most significant impact on programming models and algorithms, but they will be felt across all layers of the machine. There is clear need to engage all ASC working groups in planning for how to deal with technological changes of this magnitude. The primary function of the Hardware Architecture Working Group is to facilitate codesign with hardware vendors to ensure future exascale platforms are capable of efficiently supporting the ASC applications, which in turn need to meet the mission needs of the NNSA Stockpile Stewardship Program. This issue is relatively immediate, as there is only a small window of opportunity to influence hardware design for 2018 machines. Given the short timeline a firm co-design methodology with vendors is of prime importance.« less
Balancing Contention and Synchronization on the Intel Paragon

NASA Technical Reports Server (NTRS)

Bokhari, Shahid H.; Nicol, David M.

1996-01-01

The Intel Paragon is a mesh-connected distributed memory parallel computer. It uses an oblivious and deterministic message routing algorithm: this permits us to develop highly optimized schedules for frequently needed communication patterns. The complete exchange is one such pattern. Several approaches are available for carrying it out on the mesh. We study an algorithm developed by Scott. This algorithm assumes that a communication link can carry one message at a time and that a node can only transmit one message at a time. It requires global synchronization to enforce a schedule of transmissions. Unfortunately global synchronization has substantial overhead on the Paragon. At the same time the powerful interconnection mechanism of this machine permits 2 or 3 messages to share a communication link with minor overhead. It can also overlap multiple message transmission from the same node to some extent. We develop a generalization of Scott's algorithm that executes complete exchange with a prescribed contention. Schedules that incur greater contention require fewer synchronization steps. This permits us to tradeoff contention against synchronization overhead. We describe the performance of this algorithm and compare it with Scott's original algorithm as well as with a naive algorithm that does not take interconnection structure into account. The Bounded contention algorithm is always better than Scott's algorithm and outperforms the naive algorithm for all but the smallest message sizes. The naive algorithm fails to work on meshes larger than 12 x 12. These results show that due consideration of processor interconnect and machine performance parameters is necessary to obtain peak performance from the Paragon and its successor mesh machines.
[Research on infrared safety protection system for machine tool].

PubMed

Zhang, Shuan-Ji; Zhang, Zhi-Ling; Yan, Hui-Ying; Wang, Song-De

2008-04-01

In order to ensure personal safety and prevent injury accident in machine tool operation, an infrared machine tool safety system was designed with infrared transmitting-receiving module, memory self-locked relay and voice recording-playing module. When the operator does not enter the danger area, the system has no response. Once the operator's whole or part of body enters the danger area and shades the infrared beam, the system will alarm and output an control signal to the machine tool executive element, and at the same time, the system makes the machine tool emergency stop to prevent equipment damaged and person injured. The system has a module framework, and has many advantages including safety, reliability, common use, circuit simplicity, maintenance convenience, low power consumption, low costs, working stability, easy debugging, vibration resistance and interference resistance. It is suitable for being installed and used in different machine tools such as punch machine, pour plastic machine, digital control machine, armor plate cutting machine, pipe bending machine, oil pressure machine etc.
Centrally managed unified shared virtual address space

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wilkes, John

Systems, apparatuses, and methods for managing a unified shared virtual address space. A host may execute system software and manage a plurality of nodes coupled to the host. The host may send work tasks to the nodes, and for each node, the host may externally manage the node's view of the system's virtual address space. Each node may have a central processing unit (CPU) style memory management unit (MMU) with an internal translation lookaside buffer (TLB). In one embodiment, the host may be coupled to a given node via an input/output memory management unit (IOMMU) interface, where the IOMMU frontendmore » interface shares the TLB with the given node's MMU. In another embodiment, the host may control the given node's view of virtual address space via memory-mapped control registers.« less

Attention and Visuospatial Working Memory Share the Same Processing Resources

PubMed Central

Feng, Jing; Pratt, Jay; Spence, Ian

2012-01-01

Attention and visuospatial working memory (VWM) share very similar characteristics; both have the same upper bound of about four items in capacity and they recruit overlapping brain regions. We examined whether both attention and VWM share the same processing resources using a novel dual-task costs approach based on a load-varying dual-task technique. With sufficiently large loads on attention and VWM, considerable interference between the two processes was observed. A further load increase on either process produced reciprocal increases in interference on both processes, indicating that attention and VWM share common resources. More critically, comparison among four experiments on the reciprocal interference effects, as measured by the dual-task costs, demonstrates no significant contribution from additional processing other than the shared processes. These results support the notion that attention and VWM share the same processing resources. PMID:22529826
3D hierarchical spatial representation and memory of multimodal sensory data

NASA Astrophysics Data System (ADS)

Khosla, Deepak; Dow, Paul A.; Huber, David J.

2009-04-01

This paper describes an efficient method and system for representing, processing and understanding multi-modal sensory data. More specifically, it describes a computational method and system for how to process and remember multiple locations in multimodal sensory space (e.g., visual, auditory, somatosensory, etc.). The multimodal representation and memory is based on a biologically-inspired hierarchy of spatial representations implemented with novel analogues of real representations used in the human brain. The novelty of the work is in the computationally efficient and robust spatial representation of 3D locations in multimodal sensory space as well as an associated working memory for storage and recall of these representations at the desired level for goal-oriented action. We describe (1) A simple and efficient method for human-like hierarchical spatial representations of sensory data and how to associate, integrate and convert between these representations (head-centered coordinate system, body-centered coordinate, etc.); (2) a robust method for training and learning a mapping of points in multimodal sensory space (e.g., camera-visible object positions, location of auditory sources, etc.) to the above hierarchical spatial representations; and (3) a specification and implementation of a hierarchical spatial working memory based on the above for storage and recall at the desired level for goal-oriented action(s). This work is most useful for any machine or human-machine application that requires processing of multimodal sensory inputs, making sense of it from a spatial perspective (e.g., where is the sensory information coming from with respect to the machine and its parts) and then taking some goal-oriented action based on this spatial understanding. A multi-level spatial representation hierarchy means that heterogeneous sensory inputs (e.g., visual, auditory, somatosensory, etc.) can map onto the hierarchy at different levels. When controlling various machine/robot degrees of freedom, the desired movements and action can be computed from these different levels in the hierarchy. The most basic embodiment of this machine could be a pan-tilt camera system, an array of microphones, a machine with arm/hand like structure or/and a robot with some or all of the above capabilities. We describe the approach, system and present preliminary results on a real-robotic platform.
Second Report of the Multirate Processor (MRP) for Digital Voice Communications.

DTIC Science & Technology

1982-09-30

machine are: * two arithmetic logic units (ALUs)-one for data processing, and the other for address generation, * two memorys -6144 words (70 bits per word...of program memory , and 6094 words (16 bits per word) of data memory , q * input/output through modem and teletype, -15 .9 S-;. KANG AND FRANSEN Table...provides a measure of intelligibility and allows one to evaluate the discriminability of six distinctive features: voicing, nasality, sustention
System and method for memory allocation in a multiclass memory system

DOEpatents

Loh, Gabriel; Meswani, Mitesh; Ignatowski, Michael; Nutter, Mark

2016-06-28

A system for memory allocation in a multiclass memory system includes a processor coupleable to a plurality of memories sharing a unified memory address space, and a library store to store a library of software functions. The processor identifies a type of a data structure in response to a memory allocation function call to the library for allocating memory to the data structure. Using the library, the processor allocates portions of the data structure among multiple memories of the multiclass memory system based on the type of the data structure.
A Formal Model of Capacity Limits in Working Memory

ERIC Educational Resources Information Center

Oberauer, Klaus; Kliegl, Reinhold

2006-01-01

A mathematical model of working-memory capacity limits is proposed on the key assumption of mutual interference between items in working memory. Interference is assumed to arise from overwriting of features shared by these items. The model was fit to time-accuracy data of memory-updating tasks from four experiments using nonlinear mixed effect…
Ordering of guarded and unguarded stores for no-sync I/O

DOEpatents

Gara, Alan; Ohmacht, Martin

2013-06-25

A parallel computing system processes at least one store instruction. A first processor core issues a store instruction. A first queue, associated with the first processor core, stores the store instruction. A second queue, associated with a first local cache memory device of the first processor core, stores the store instruction. The first processor core updates first data in the first local cache memory device according to the store instruction. The third queue, associated with at least one shared cache memory device, stores the store instruction. The first processor core invalidates second data, associated with the store instruction, in the at least one shared cache memory. The first processor core invalidates third data, associated with the store instruction, in other local cache memory devices of other processor cores. The first processor core flushing only the first queue.
Explicit time integration of finite element models on a vectorized, concurrent computer with shared memory

NASA Technical Reports Server (NTRS)

Gilbertsen, Noreen D.; Belytschko, Ted

1990-01-01

The implementation of a nonlinear explicit program on a vectorized, concurrent computer with shared memory is described and studied. The conflict between vectorization and concurrency is described and some guidelines are given for optimal block sizes. Several example problems are summarized to illustrate the types of speed-ups which can be achieved by reprogramming as compared to compiler optimization.
LU Factorization with Partial Pivoting for a Multi-CPU, Multi-GPU Shared Memory System

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kurzak, Jakub; Luszczek, Pitior; Faverge, Mathieu

2012-03-01

LU factorization with partial pivoting is a canonical numerical procedure and the main component of the High Performance LINPACK benchmark. This article presents an implementation of the algorithm for a hybrid, shared memory, system with standard CPU cores and GPU accelerators. Performance in excess of one TeraFLOPS is achieved using four AMD Magny Cours CPUs and four NVIDIA Fermi GPUs.
Neural Mechanisms of Interference Control Underlie the Relationship between Fluid Intelligence and Working Memory Span

ERIC Educational Resources Information Center

Burgess, Gregory C.; Gray, Jeremy R.; Conway, Andrew R. A.; Braver, Todd S.

2011-01-01

Fluid intelligence (gF) and working memory (WM) span predict success in demanding cognitive situations. Recent studies show that much of the variance in gF and WM span is shared, suggesting common neural mechanisms. This study provides a direct investigation of the degree to which shared variance in gF and WM span can be explained by neural…
Analytical design of intelligent machines

NASA Technical Reports Server (NTRS)

Saridis, George N.; Valavanis, Kimon P.

1987-01-01

The problem of designing 'intelligent machines' to operate in uncertain environments with minimum supervision or interaction with a human operator is examined. The structure of an 'intelligent machine' is defined to be the structure of a Hierarchically Intelligent Control System, composed of three levels hierarchically ordered according to the principle of 'increasing precision with decreasing intelligence', namely: the organizational level, performing general information processing tasks in association with a long-term memory; the coordination level, dealing with specific information processing tasks with a short-term memory; and the control level, which performs the execution of various tasks through hardware using feedback control methods. The behavior of such a machine may be managed by controls with special considerations and its 'intelligence' is directly related to the derivation of a compatible measure that associates the intelligence of the higher levels with the concept of entropy, which is a sufficient analytic measure that unifies the treatment of all the levels of an 'intelligent machine' as the mathematical problem of finding the right sequence of internal decisions and controls for a system structured in the order of intelligence and inverse order of precision such that it minimizes its total entropy. A case study on the automatic maintenance of a nuclear plant illustrates the proposed approach.
What Multilevel Parallel Programs do when you are not Watching: A Performance Analysis Case Study Comparing MPI/OpenMP, MLP, and Nested OpenMP

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Labarta, Jesus; Gimenez, Judit

2004-01-01

With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors, parallel programming techniques have evolved that support parallelism beyond a single level. When comparing the performance of applications based on different programming paradigms, it is important to differentiate between the influence of the programming model itself and other factors, such as implementation specific behavior of the operating system (OS) or architectural issues. Rewriting-a large scientific application in order to employ a new programming paradigms is usually a time consuming and error prone task. Before embarking on such an endeavor it is important to determine that there is really a gain that would not be possible with the current implementation. A detailed performance analysis is crucial to clarify these issues. The multilevel programming paradigms considered in this study are hybrid MPI/OpenMP, MLP, and nested OpenMP. The hybrid MPI/OpenMP approach is based on using MPI [7] for the coarse grained parallelization and OpenMP [9] for fine grained loop level parallelism. The MPI programming paradigm assumes a private address space for each process. Data is transferred by explicitly exchanging messages via calls to the MPI library. This model was originally designed for distributed memory architectures but is also suitable for shared memory systems. The second paradigm under consideration is MLP which was developed by Taft. The approach is similar to MPi/OpenMP, using a mix of coarse grain process level parallelization and loop level OpenMP parallelization. As it is the case with MPI, a private address space is assumed for each process. The MLP approach was developed for ccNUMA architectures and explicitly takes advantage of the availability of shared memory. A shared memory arena which is accessible by all processes is required. Communication is done by reading from and writing to the shared memory.
A translator and simulator for the Burroughs D machine

NASA Technical Reports Server (NTRS)

Roberts, J.

1972-01-01

The D Machine is described as a small user microprogrammable computer designed to be a versatile building block for such diverse functions as: disk file controllers, I/O controllers, and emulators. TRANSLANG is an ALGOL-like language, which allows D Machine users to write microprograms in an English-like format as opposed to creating binary bit pattern maps. The TRANSLANG translator parses TRANSLANG programs into D Machine microinstruction bit patterns which can be executed on the D Machine simulator. In addition to simulation and translation, the two programs also offer several debugging tools, such as: a full set of diagnostic error messages, register dumps, simulated memory dumps, traces on instructions and groups of instructions, and breakpoints.
The 2008 Battle of Sadr City: Reimagining Urban Combat

DTIC Science & Technology

2013-01-01

so intense that some of the Strykers ran out of ammunition for the .50 caliber machine guns in their remote weapons system. Adding to the confusion...review. Out of all the day’s unusual violence, the one thing that stood out in their memories was the lone gunman with his RPK machine gun . Collings’s...platoon suppressed the enemy effectively using the .50 caliber machine guns in the remote weapons systems. They also called in AH-64 Apache attack
Audience-tuning effects on memory: the role of shared reality.

PubMed

Echterhoff, Gerald; Higgins, E Tory; Groll, Stephan

2005-09-01

After tuning to an audience, communicators' own memories for the topic often reflect the biased view expressed in their messages. Three studies examined explanations for this bias. Memories for a target person were biased when feedback signaled the audience's successful identification of the target but not after failed identification (Experiment 1). Whereas communicators tuning to an in-group audience exhibited the bias, communicators tuning to an out-group audience did not (Experiment 2). These differences did not depend on communicators' mood but were mediated by communicators' trust in their audience's judgment about other people (Experiments 2 and 3). Message and memory were more closely associated for high than for low trusters. Apparently, audience-tuning effects depend on the communicators' experience of a shared reality.
Rapid solution of large-scale systems of equations

NASA Technical Reports Server (NTRS)

Storaasli, Olaf O.

1994-01-01

The analysis and design of complex aerospace structures requires the rapid solution of large systems of linear and nonlinear equations, eigenvalue extraction for buckling, vibration and flutter modes, structural optimization and design sensitivity calculation. Computers with multiple processors and vector capabilities can offer substantial computational advantages over traditional scalar computer for these analyses. These computers fall into two categories: shared memory computers and distributed memory computers. This presentation covers general-purpose, highly efficient algorithms for generation/assembly or element matrices, solution of systems of linear and nonlinear equations, eigenvalue and design sensitivity analysis and optimization. All algorithms are coded in FORTRAN for shared memory computers and many are adapted to distributed memory computers. The capability and numerical performance of these algorithms will be addressed.
Examining age-related shared variance between face cognition, vision, and self-reported physical health: a test of the common cause hypothesis for social cognition

PubMed Central

Olderbak, Sally; Hildebrandt, Andrea; Wilhelm, Oliver

2015-01-01

The shared decline in cognitive abilities, sensory functions (e.g., vision and hearing), and physical health with increasing age is well documented with some research attributing this shared age-related decline to a single common cause (e.g., aging brain). We evaluate the extent to which the common cause hypothesis predicts associations between vision and physical health with social cognition abilities specifically face perception and face memory. Based on a sample of 443 adults (17–88 years old), we test a series of structural equation models, including Multiple Indicator Multiple Cause (MIMIC) models, and estimate the extent to which vision and self-reported physical health are related to face perception and face memory through a common factor, before and after controlling for their fluid cognitive component and the linear effects of age. Results suggest significant shared variance amongst these constructs, with a common factor explaining some, but not all, of the shared age-related variance. Also, we found that the relations of face perception, but not face memory, with vision and physical health could be completely explained by fluid cognition. Overall, results suggest that a single common cause explains most, but not all age-related shared variance with domain specific aging mechanisms evident. PMID:26321998
Solutions and debugging for data consistency in multiprocessors with noncoherent caches

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bernstein, D.; Mendelson, B.; Breternitz, M. Jr.

1995-02-01

We analyze two important problems that arise in shared-memory multiprocessor systems. The stale data problem involves ensuring that data items in local memory of individual processors are current, independent of writes done by other processors. False sharing occurs when two processors have copies of the same shared data block but update different portions of the block. The false sharing problem involves guaranteeing that subsequent writes are properly combined. In modern architectures these problems are usually solved in hardware, by exploiting mechanisms for hardware controlled cache consistency. This leads to more expensive and nonscalable designs. Therefore, we are concentrating on softwaremore » methods for ensuring cache consistency that would allow for affordable and scalable multiprocessing systems. Unfortunately, providing software control is nontrivial, both for the compiler writer and for the application programmer. For this reason we are developing a debugging environment that will facilitate the development of compiler-based techniques and will help the programmer to tune his or her application using explicit cache management mechanisms. We extend the notion of a race condition for IBM Shared Memory System POWER/4, taking into consideration its noncoherent caches, and propose techniques for detection of false sharing problems. Identification of the stale data problem is discussed as well, and solutions are suggested.« less
Experimental study on Response Parameters of Ni-rich NiTi Shape Memory Alloy during Wire Electric Discharge Machining

NASA Astrophysics Data System (ADS)

Bisaria, Himanshu; Shandilya, Pragya

2018-03-01

Nowadays NiTi SMAs are gaining more prominence due to their unique properties such as superelasticity, shape memory effect, high fatigue strength and many other enriched physical and mechanical properties. The current studies explore the effect of machining parameters namely, peak current (Ip), pulse off time (TOFF), and pulse on time (TON) on wire wear ratio (WWR), and dimensional deviation (DD) in WEDM. It was found that high discharge energy was mainly ascribed to high WWR and DD. The WWR and DD increased with the increase in pulse on time and peak current whereas high pulse off time was favourable for low WWR and DD.
A sparse matrix algorithm on the Boolean vector machine

NASA Technical Reports Server (NTRS)

Wagner, Robert A.; Patrick, Merrell L.

1988-01-01

VLSI technology is being used to implement a prototype Boolean Vector Machine (BVM), which is a large network of very small processors with equally small memories that operate in SIMD mode; these use bit-serial arithmetic, and communicate via cube-connected cycles network. The BVM's bit-serial arithmetic and the small memories of individual processors are noted to compromise the system's effectiveness in large numerical problem applications. Attention is presently given to the implementation of a basic matrix-vector iteration algorithm for space matrices of the BVM, in order to generate over 1 billion useful floating-point operations/sec for this iteration algorithm. The algorithm is expressed in a novel language designated 'BVM'.
Achieving Small Structures in Thin NiTi Sheets for Medical Applications with Water Jet and Micro Machining: A Comparison

NASA Astrophysics Data System (ADS)

Frotscher, M.; Kahleyss, F.; Simon, T.; Biermann, D.; Eggeler, G.

2011-07-01

NiTi shape memory alloys (SMA) are used for a variety of applications including medical implants and tools as well as actuators, making use of their unique properties. However, due to the hardness and strength, in combination with the high elasticity of the material, the machining of components can be challenging. The most common machining techniques used today are laser cutting and electrical discharge machining (EDM). In this study, we report on the machining of small structures into binary NiTi sheets, applying alternative processing methods being well-established for other metallic materials. Our results indicate that water jet machining and micro milling can be used to machine delicate structures, even in very thin NiTi sheets. Further work is required to optimize the cut quality and the machining speed in order to increase the cost-effectiveness and to make both methods more competitive.

Automatic Generation of OpenMP Directives and Its Application to Computational Fluid Dynamics Codes

NASA Technical Reports Server (NTRS)

Yan, Jerry; Jin, Haoqiang; Frumkin, Michael; Yan, Jerry (Technical Monitor)

2000-01-01

The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate OpenMP-based parallel programs with nominal user assistance. We outline techniques used in the implementation of the tool and discuss the application of this tool on the NAS Parallel Benchmarks and several computational fluid dynamics codes. This work demonstrates the great potential of using the tool to quickly port parallel programs and also achieve good performance that exceeds some of the commercial tools.
Generalized memory associativity in a network model for the neuroses

NASA Astrophysics Data System (ADS)

Wedemann, Roseli S.; Donangelo, Raul; de Carvalho, Luís A. V.

2009-03-01

We review concepts introduced in earlier work, where a neural network mechanism describes some mental processes in neurotic pathology and psychoanalytic working-through, as associative memory functioning, according to the findings of Freud. We developed a complex network model, where modules corresponding to sensorial and symbolic memories interact, representing unconscious and conscious mental processes. The model illustrates Freud's idea that consciousness is related to symbolic and linguistic memory activity in the brain. We have introduced a generalization of the Boltzmann machine to model memory associativity. Model behavior is illustrated with simulations and some of its properties are analyzed with methods from statistical mechanics.
Packaging of a large capacity magnetic bubble domain spacecraft recorder

NASA Technical Reports Server (NTRS)

Becker, F. J.; Stermer, R. L.

1977-01-01

A Solid State Spacecraft Data Recorder (SSDR), based on bubble domain technology, having a storage capacity of 10 to the 8th power bits, was designed and is being tested. The recorder consists of two memory modules each having 32 cells, each cell containing sixteen 100 kilobit serial bubble memory chips. The memory modules are interconnected to a Drive and Control Unit (DCU) module containing four microprocessors, 500 integrated circuits, a RAM core memory and two PROM's. The two memory modules and DCU are housed in individual machined aluminum frames, are stacked in brick fashion and through bolted to a base plate assembly which also houses the power supply.
Air Bearings Machined On Ultra Precision, Hydrostatic CNC-Lathe

NASA Astrophysics Data System (ADS)

Knol, Pierre H.; Szepesi, Denis; Deurwaarder, Jan M.

1987-01-01

Micromachining of precision elements requires an adequate machine concept to meet the high demand of surface finish, dimensional and shape accuracy. The Hembrug ultra precision lathes have been exclusively designed with hydrostatic principles for main spindle and guideways. This concept is to be explained with some major advantages of hydrostatics compared with aerostatics at universal micromachining applications. Hembrug has originally developed the conventional Mikroturn ultra precision facing lathes, for diamond turning of computer memory discs. This first generation of machines was followed by the advanced computer numerically controlled types for machining of complex precision workpieces. One of these parts, an aerostatic bearing component has been succesfully machined on the Super-Mikroturn CNC. A case study of airbearing machining confirms the statement that a good result of the micromachining does not depend on machine performance alone, but also on the technology applied.
Efficient checkpointing schemes for depletion perturbation solutions on memory-limited architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Stripling, H. F.; Adams, M. L.; Hawkins, W. D.

2013-07-01

We describe a methodology for decreasing the memory footprint and machine I/O load associated with the need to access a forward solution during an adjoint solve. Specifically, we are interested in the depletion perturbation equations, where terms in the adjoint Bateman and transport equations depend on the forward flux solution. Checkpointing is the procedure of storing snapshots of the forward solution to disk and using these snapshots to recompute the parts of the forward solution that are necessary for the adjoint solve. For large problems, however, the storage cost of just a few copies of an angular flux vector canmore » exceed the available RAM on the host machine. We propose a methodology that does not checkpoint the angular flux vector; instead, we write and store converged source moments, which are typically of a much lower dimension than the angular flux solution. This reduces the memory footprint and I/O load of the problem, but requires that we perform single sweeps to reconstruct flux vectors on demand. We argue that this trade-off is exactly the kind of algorithm that will scale on advanced, memory-limited architectures. We analyze the cost, in terms of FLOPS and memory footprint, of five checkpointing schemes. We also provide computational results that support the analysis and show that the memory-for-work trade off does improve time to solution. (authors)« less
Scheduling for Locality in Shared-Memory Multiprocessors

DTIC Science & Technology

1993-05-01

Submitted in Partial Fulfillment of the Requirements for the Degree ’)iIC Q(JALfryT INSPECTED 5 DOCTOR OF PHILOSOPHY I Accesion For Supervised by NTIS CRAM... architecture on parallel program performance, explain the implications of this trend on popular parallel programming models, and propose system software to 0...decomoosition and scheduling algorithms. I. SUIUECT TERMS IS. NUMBER OF PAGES shared-memory multiprocessors; architecture trends; loop 110 scheduling
Advanced Development of Certified OS Kernels

DTIC Science & Technology

2015-06-01

It provides an infrastructure to map a physical page into multiple processes’ page maps in different address spaces. Their ownership mechanism ensures...of their shared memory infrastructure . Trap module The trap module specifies the behaviors of exception handlers and mCertiKOS system calls. In...layers), 1 pm for the shared memory infrastructure (3 layers), 3.5 pm for the thread management (10 layers), 1 pm for the process management (4 layers
6 DOF Nonlinear AUV Simulation Toolbox

DTIC Science & Technology

1997-01-01

is to supply a flexible 3D -simulation platform for motion visualization, in-lab debugging and testing of mission-specific strategies as well as those...Explorer are modular designed [Smith] in order to cut time and cost for vehicle recontlguration. A flexible 3D -simulation platform is desired to... 3D models. Current implemented modules include a nonlinear dynamic model for the OEX, shared memory and semaphore manager tools, shared memory monitor
A cache-aided multiprocessor rollback recovery scheme

NASA Technical Reports Server (NTRS)

Wu, Kun-Lung; Fuchs, W. Kent

1989-01-01

This paper demonstrates how previous uniprocessor cache-aided recovery schemes can be applied to multiprocessor architectures, for recovering from transient processor failures, utilizing private caches and a global shared memory. As with cache-aided uniprocessor recovery, the multiprocessor cache-aided recovery scheme of this paper can be easily integrated into standard bus-based snoopy cache coherence protocols. A consistent shared memory state is maintained without the necessity of global check-pointing.
Color line scan camera technology and machine vision: requirements to consider

NASA Astrophysics Data System (ADS)

Paernaenen, Pekka H. T.

1997-08-01

Color machine vision has shown a dynamic uptrend in use within the past few years as the introduction of new cameras and scanner technologies itself underscores. In the future, the movement from monochrome imaging to color will hasten, as machine vision system users demand more knowledge about their product stream. As color has come to the machine vision, certain requirements for the equipment used to digitize color images are needed. Color machine vision needs not only a good color separation but also a high dynamic range and a good linear response from the camera used. Good dynamic range and linear response is necessary for color machine vision. The importance of these features becomes even more important when the image is converted to another color space. There is always lost some information when converting integer data to another form. Traditionally the color image processing has been much slower technique than the gray level image processing due to the three times greater data amount per image. The same has applied for the three times more memory needed. The advancements in computers, memory and processing units has made it possible to handle even large color images today cost efficiently. In some cases he image analysis in color images can in fact even be easier and faster than with a similar gray level image because of more information per pixel. Color machine vision sets new requirements for lighting, too. High intensity and white color light is required in order to acquire good images for further image processing or analysis. New development in lighting technology is bringing eventually solutions for color imaging.
Targeted Memory Reactivation during Sleep Adaptively Promotes the Strengthening or Weakening of Overlapping Memories.

PubMed

Oyarzún, Javiera P; Morís, Joaquín; Luque, David; de Diego-Balaguer, Ruth; Fuentemilla, Lluís

2017-08-09

System memory consolidation is conceptualized as an active process whereby newly encoded memory representations are strengthened through selective memory reactivation during sleep. However, our learning experience is highly overlapping in content (i.e., shares common elements), and memories of these events are organized in an intricate network of overlapping associated events. It remains to be explored whether and how selective memory reactivation during sleep has an impact on these overlapping memories acquired during awake time. Here, we test in a group of adult women and men the prediction that selective memory reactivation during sleep entails the reactivation of associated events and that this may lead the brain to adaptively regulate whether these associated memories are strengthened or pruned from memory networks on the basis of their relative associative strength with the shared element. Our findings demonstrate the existence of efficient regulatory neural mechanisms governing how complex memory networks are shaped during sleep as a function of their associative memory strength. SIGNIFICANCE STATEMENT Numerous studies have demonstrated that system memory consolidation is an active, selective, and sleep-dependent process in which only subsets of new memories become stabilized through their reactivation. However, the learning experience is highly overlapping in content and thus events are encoded in an intricate network of related memories. It remains to be explored whether and how memory reactivation has an impact on overlapping memories acquired during awake time. Here, we show that sleep memory reactivation promotes strengthening and weakening of overlapping memories based on their associative memory strength. These results suggest the existence of an efficient regulatory neural mechanism that avoids the formation of cluttered memory representation of multiple events and promotes stabilization of complex memory networks. Copyright © 2017 the authors 0270-6474/17/377748-11$15.00/0.
Methods, systems and apparatus for controlling operation of two alternating current (AC) machines

DOEpatents

Gallegos-Lopez, Gabriel [Torrance, CA; Nagashima, James M [Cerritos, CA; Perisic, Milun [Torrance, CA; Hiti, Silva [Redondo Beach, CA

2012-06-05

A system is provided for controlling two alternating current (AC) machines via a five-phase PWM inverter module. The system comprises a first control loop, a second control loop, and a current command adjustment module. The current command adjustment module operates in conjunction with the first control loop and the second control loop to continuously adjust current command signals that control the first AC machine and the second AC machine such that they share the input voltage available to them without compromising the target mechanical output power of either machine. This way, even when the phase voltage available to either one of the machines decreases, that machine outputs its target mechanical output power.
Library Information System Time-Sharing (LISTS) Project. Final Report.

ERIC Educational Resources Information Center

Black, Donald V.

The Library Information System Time-Sharing (LISTS) experiment was based on three innovations in data processing technology: (1) the advent of computer time-sharing on third-generation machines, (2) the development of general-purpose file-management software and (3) the introduction of large, library-oriented data bases. The main body of the…
Performance Analysis of the NAS Y-MP Workload

NASA Technical Reports Server (NTRS)

Bergeron, Robert J.; Kutler, Paul (Technical Monitor)

1997-01-01

This paper describes the performance characteristics of the computational workloads on the NAS Cray Y-MP machines, a Y-MP 832 and later a Y-MP 8128. Hardware measurements indicated that the Y-MP workload performance matured over time, ultimately sustaining an average throughput of 0.8 GFLOPS and a vector operation fraction of 87%. The measurements also revealed an operation rate exceeding 1 per clock period, a well-balanced architecture featuring a strong utilization of vector functional units, and an efficient memory organization. Introduction of the larger memory 8128 increased throughput by allowing a more efficient utilization of CPUs. Throughput also depended on the metering of the batch queues; low-idle Saturday workloads required a buffer of small jobs to prevent memory starvation of the CPU. UNICOS required about 7% of total CPU time to service the 832 workloads; this overhead decreased to 5% for the 8128 workloads. While most of the system time went to service I/O requests, efficient scheduling prevented excessive idle due to I/O wait. System measurements disclosed no obvious bottlenecks in the response of the machine and UNICOS to the workloads. In most cases, Cray-provided software tools were- quite sufficient for measuring the performance of both the machine and operating, system.
Autobiographical memory functions of nostalgia in comparison to rumination and counterfactual thinking: similarity and uniqueness.

PubMed

Cheung, Wing-Yee; Wildschut, Tim; Sedikides, Constantine

2018-02-01

We compared and contrasted nostalgia with rumination and counterfactual thinking in terms of their autobiographical memory functions. Specifically, we assessed individual differences in nostalgia, rumination, and counterfactual thinking, which we then linked to self-reported functions or uses of autobiographical memory (Self-Regard, Boredom Reduction, Death Preparation, Intimacy Maintenance, Conversation, Teach/Inform, and Bitterness Revival). We tested which memory functions are shared and which are uniquely linked to nostalgia. The commonality among nostalgia, rumination, and counterfactual thinking resides in their shared positive associations with all memory functions: individuals who evinced a stronger propensity towards past-oriented thought (as manifested in nostalgia, rumination, and counterfactual thinking) reported greater overall recruitment of memories in the service of present functioning. The uniqueness of nostalgia resides in its comparatively strong positive associations with Intimacy Maintenance, Teach/Inform, and Self-Regard and weak association with Bitterness Revival. In all, nostalgia possesses a more positive functional signature than do rumination and counterfactual thinking.
Mnemonic convergence in social networks: The emergent properties of cognition at a collective level.

PubMed

Coman, Alin; Momennejad, Ida; Drach, Rae D; Geana, Andra

2016-07-19

The development of shared memories, beliefs, and norms is a fundamental characteristic of human communities. These emergent outcomes are thought to occur owing to a dynamic system of information sharing and memory updating, which fundamentally depends on communication. Here we report results on the formation of collective memories in laboratory-created communities. We manipulated conversational network structure in a series of real-time, computer-mediated interactions in fourteen 10-member communities. The results show that mnemonic convergence, measured as the degree of overlap among community members' memories, is influenced by both individual-level information-processing phenomena and by the conversational social network structure created during conversational recall. By studying laboratory-created social networks, we show how large-scale social phenomena (i.e., collective memory) can emerge out of microlevel local dynamics (i.e., mnemonic reinforcement and suppression effects). The social-interactionist approach proposed herein points to optimal strategies for spreading information in social networks and provides a framework for measuring and forging collective memories in communities of individuals.
Information, knowledge and the future of machines.

PubMed

MacFarlane, Alistair G J

2003-08-15

This wide-ranging survey considers the future of machines in terms of information, complexity and the growth of knowledge shared amongst agents. Mechanical and human agents are compared and contrasted, and it is argued that, for the foreseeable future, their roles will be complementary. The future development of machines is examined in terms of unions of human and machine agency evolving as part of economic activity. Limits to, and threats posed by, the continuing evolution of such a society of agency are considered.
Using memories to understand others: the role of episodic memory in theory of mind impairment in Alzheimer disease.

PubMed

Moreau, Noémie; Viallet, François; Champagne-Lavau, Maud

2013-09-01

Theory of mind (TOM) refers to the ability to infer one's own and other's mental states. Growing evidence highlighted the presence of impairment on the most complex TOM tasks in Alzheimer disease (AD). However, how TOM deficit is related to other cognitive dysfunctions and more specifically to episodic memory impairment - the prominent feature of this disease - is still under debate. Recent neuroanatomical findings have shown that remembering past events and inferring others' states of mind share the same cerebral network suggesting the two abilities share a common process .This paper proposes to review emergent evidence of TOM impairment in AD patients and to discuss the evidence of a relationship between TOM and episodic memory. We will discuss about AD patients' deficit in TOM being possibly related to their difficulties in recollecting memories of past social interactions. Copyright © 2013 Elsevier B.V. All rights reserved.
Mental time travel and the shaping of the human mind

PubMed Central

Suddendorf, Thomas; Addis, Donna Rose; Corballis, Michael C.

2009-01-01

Episodic memory, enabling conscious recollection of past episodes, can be distinguished from semantic memory, which stores enduring facts about the world. Episodic memory shares a core neural network with the simulation of future episodes, enabling mental time travel into both the past and the future. The notion that there might be something distinctly human about mental time travel has provoked ingenious attempts to demonstrate episodic memory or future simulation in non-human animals, but we argue that they have not yet established a capacity comparable to the human faculty. The evolution of the capacity to simulate possible future events, based on episodic memory, enhanced fitness by enabling action in preparation of different possible scenarios that increased present or future survival and reproduction chances. Human language may have evolved in the first instance for the sharing of past and planned future events, and, indeed, fictional ones, further enhancing fitness in social settings. PMID:19528013
Ownership of Machine-Readable Bibliographic Data. Canadian Network Papers Number 5 = Propriete des Donnees Bibliographique Lisibles par Machine. Documents sur les Resaux Canadiens Numero 5.

ERIC Educational Resources Information Center

Duchesne, R. M.; And Others

Because of data ownership questions raised by the interchange and sharing of machine readable bibliographic data, this paper was prepared for the Bibliographic and Communications Network Committee of the National Library Advisory Board. Background information and definitions are followed by a review of the legal aspects relating to property and…

Implementing Machine Learning in the PCWG Tool

DOE Office of Scientific and Technical Information (OSTI.GOV)

Clifton, Andrew; Ding, Yu; Stuart, Peter

The Power Curve Working Group (www.pcwg.org) is an ad-hoc industry-led group to investigate the performance of wind turbines in real-world conditions. As part of ongoing experience-sharing exercises, machine learning has been proposed as a possible way to predict turbine performance. This presentation provides some background information about machine learning and how it might be implemented in the PCWG exercises.
A manual for PARTI runtime primitives

NASA Technical Reports Server (NTRS)

Berryman, Harry; Saltz, Joel

1990-01-01

Primitives are presented that are designed to help users efficiently program irregular problems (e.g., unstructured mesh sweeps, sparse matrix codes, adaptive mesh partial differential equations solvers) on distributed memory machines. These primitives are also designed for use in compilers for distributed memory multiprocessors. Communications patterns are captured at runtime, and the appropriate send and receive messages are automatically generated.
A communication-avoiding, hybrid-parallel, rank-revealing orthogonalization method.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hoemmen, Mark

2010-11-01

Orthogonalization consumes much of the run time of many iterative methods for solving sparse linear systems and eigenvalue problems. Commonly used algorithms, such as variants of Gram-Schmidt or Householder QR, have performance dominated by communication. Here, 'communication' includes both data movement between the CPU and memory, and messages between processors in parallel. Our Tall Skinny QR (TSQR) family of algorithms requires asymptotically fewer messages between processors and data movement between CPU and memory than typical orthogonalization methods, yet achieves the same accuracy as Householder QR factorization. Furthermore, in block orthogonalizations, TSQR is faster and more accurate than existing approaches formore » orthogonalizing the vectors within each block ('normalization'). TSQR's rank-revealing capability also makes it useful for detecting deflation in block iterative methods, for which existing approaches sacrifice performance, accuracy, or both. We have implemented a version of TSQR that exploits both distributed-memory and shared-memory parallelism, and supports real and complex arithmetic. Our implementation is optimized for the case of orthogonalizing a small number (5-20) of very long vectors. The shared-memory parallel component uses Intel's Threading Building Blocks, though its modular design supports other shared-memory programming models as well, including computation on the GPU. Our implementation achieves speedups of 2 times or more over competing orthogonalizations. It is available now in the development branch of the Trilinos software package, and will be included in the 10.8 release.« less
Shared prefetching to reduce execution skew in multi-threaded systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Eichenberger, Alexandre E; Gunnels, John A

Mechanisms are provided for optimizing code to perform prefetching of data into a shared memory of a computing device that is shared by a plurality of threads that execute on the computing device. A memory stream of a portion of code that is shared by the plurality of threads is identified. A set of prefetch instructions is distributed across the plurality of threads. Prefetch instructions are inserted into the instruction sequences of the plurality of threads such that each instruction sequence has a separate sub-portion of the set of prefetch instructions, thereby generating optimized code. Executable code is generated basedmore » on the optimized code and stored in a storage device. The executable code, when executed, performs the prefetches associated with the distributed set of prefetch instructions in a shared manner across the plurality of threads.« less
A shared neural ensemble links distinct contextual memories encoded close in time

NASA Astrophysics Data System (ADS)

Cai, Denise J.; Aharoni, Daniel; Shuman, Tristan; Shobe, Justin; Biane, Jeremy; Song, Weilin; Wei, Brandon; Veshkini, Michael; La-Vu, Mimi; Lou, Jerry; Flores, Sergio E.; Kim, Isaac; Sano, Yoshitake; Zhou, Miou; Baumgaertel, Karsten; Lavi, Ayal; Kamata, Masakazu; Tuszynski, Mark; Mayford, Mark; Golshani, Peyman; Silva, Alcino J.

2016-06-01

Recent studies suggest that a shared neural ensemble may link distinct memories encoded close in time. According to the memory allocation hypothesis, learning triggers a temporary increase in neuronal excitability that biases the representation of a subsequent memory to the neuronal ensemble encoding the first memory, such that recall of one memory increases the likelihood of recalling the other memory. Here we show in mice that the overlap between the hippocampal CA1 ensembles activated by two distinct contexts acquired within a day is higher than when they are separated by a week. Several findings indicate that this overlap of neuronal ensembles links two contextual memories. First, fear paired with one context is transferred to a neutral context when the two contexts are acquired within a day but not across a week. Second, the first memory strengthens the second memory within a day but not across a week. Older mice, known to have lower CA1 excitability, do not show the overlap between ensembles, the transfer of fear between contexts, or the strengthening of the second memory. Finally, in aged mice, increasing cellular excitability and activating a common ensemble of CA1 neurons during two distinct context exposures rescued the deficit in linking memories. Taken together, these findings demonstrate that contextual memories encoded close in time are linked by directing storage into overlapping ensembles. Alteration of these processes by ageing could affect the temporal structure of memories, thus impairing efficient recall of related information.
An Investigation into the Economics of Retrospective Conversion Using a CD-ROM System.

ERIC Educational Resources Information Center

Co, Francisca K.

This study compares the cost effectiveness of using a CD-ROM (compact disk read-only memory) system known as Bibliofile and the currently used OCLC (Online Computer Library Center)-based method to convert a university library's shelflist into a machine-readable database in the MARC (Machine-Readable Cataloging) format. The cost of each method of…
Machine vision for real time orbital operations

NASA Technical Reports Server (NTRS)

Vinz, Frank L.

1988-01-01

Machine vision for automation and robotic operation of Space Station era systems has the potential for increasing the efficiency of orbital servicing, repair, assembly and docking tasks. A machine vision research project is described in which a TV camera is used for inputing visual data to a computer so that image processing may be achieved for real time control of these orbital operations. A technique has resulted from this research which reduces computer memory requirements and greatly increases typical computational speed such that it has the potential for development into a real time orbital machine vision system. This technique is called AI BOSS (Analysis of Images by Box Scan and Syntax).
Factor structure of overall autobiographical memory usage: the directive, self and social functions revisited.

PubMed

Rasmussen, Anne S; Habermas, Tilmann

2011-08-01

According to theory, autobiographical memory serves three broad functions of overall usage: directive, self, and social. However, there is evidence to suggest that the tripartite model may be better conceptualised in terms of a four-factor model with two social functions. In the present study we examined the two models in Danish and German samples, using the Thinking About Life Experiences Questionnaire (TALE; Bluck, Alea, Habermas, & Rubin, 2005), which measures the overall usage of the three functions generalised across concrete memories. Confirmatory factor analysis supported the four-factor model and rejected the theoretical three-factor model in both samples. The results are discussed in relation to cultural differences in overall autobiographical memory usage as well as sharing versus non-sharing aspects of social remembering.
Hardware enabled performance counters with support for operating system context switching

DOEpatents

Salapura, Valentina; Wisniewski, Robert W.

2015-06-30

A device for supporting hardware enabled performance counters with support for context switching include a plurality of performance counters operable to collect information associated with one or more computer system related activities, a first register operable to store a memory address, a second register operable to store a mode indication, and a state machine operable to read the second register and cause the plurality of performance counters to copy the information to memory area indicated by the memory address based on the mode indication.
Automated quantitative muscle biopsy analysis system

NASA Technical Reports Server (NTRS)

Castleman, Kenneth R. (Inventor)

1980-01-01

An automated system to aid the diagnosis of neuromuscular diseases by producing fiber size histograms utilizing histochemically stained muscle biopsy tissue. Televised images of the microscopic fibers are processed electronically by a multi-microprocessor computer, which isolates, measures, and classifies the fibers and displays the fiber size distribution. The architecture of the multi-microprocessor computer, which is iterated to any required degree of complexity, features a series of individual microprocessors P.sub.n each receiving data from a shared memory M.sub.n-1 and outputing processed data to a separate shared memory M.sub.n+1 under control of a program stored in dedicated memory M.sub.n.
Experimental evaluation of multiprocessor cache-based error recovery

NASA Technical Reports Server (NTRS)

Janssens, Bob; Fuchs, W. K.

1991-01-01

Several variations of cache-based checkpointing for rollback error recovery in shared-memory multiprocessors have been recently developed. By modifying the cache replacement policy, these techniques use the inherent redundancy in the memory hierarchy to periodically checkpoint the computation state. Three schemes, different in the manner in which they avoid rollback propagation, are evaluated. By simulation with address traces from parallel applications running on an Encore Multimax shared-memory multiprocessor, the performance effect of integrating the recovery schemes in the cache coherence protocol are evaluated. The results indicate that the cache-based schemes can provide checkpointing capability with low performance overhead but uncontrollable high variability in the checkpoint interval.
Performing a local reduction operation on a parallel computer

DOEpatents

Blocksome, Michael A; Faraj, Daniel A

2013-06-04

A parallel computer including compute nodes, each including two reduction processing cores, a network write processing core, and a network read processing core, each processing core assigned an input buffer. Copying, in interleaved chunks by the reduction processing cores, contents of the reduction processing cores' input buffers to an interleaved buffer in shared memory; copying, by one of the reduction processing cores, contents of the network write processing core's input buffer to shared memory; copying, by another of the reduction processing cores, contents of the network read processing core's input buffer to shared memory; and locally reducing in parallel by the reduction processing cores: the contents of the reduction processing core's input buffer; every other interleaved chunk of the interleaved buffer; the copied contents of the network write processing core's input buffer; and the copied contents of the network read processing core's input buffer.
Performing a local reduction operation on a parallel computer

DOEpatents

Blocksome, Michael A.; Faraj, Daniel A.

2012-12-11

A parallel computer including compute nodes, each including two reduction processing cores, a network write processing core, and a network read processing core, each processing core assigned an input buffer. Copying, in interleaved chunks by the reduction processing cores, contents of the reduction processing cores' input buffers to an interleaved buffer in shared memory; copying, by one of the reduction processing cores, contents of the network write processing core's input buffer to shared memory; copying, by another of the reduction processing cores, contents of the network read processing core's input buffer to shared memory; and locally reducing in parallel by the reduction processing cores: the contents of the reduction processing core's input buffer; every other interleaved chunk of the interleaved buffer; the copied contents of the network write processing core's input buffer; and the copied contents of the network read processing core's input buffer.
A collaborative framework for Distributed Privacy-Preserving Support Vector Machine learning.

PubMed

Que, Jialan; Jiang, Xiaoqian; Ohno-Machado, Lucila

2012-01-01

A Support Vector Machine (SVM) is a popular tool for decision support. The traditional way to build an SVM model is to estimate parameters based on a centralized repository of data. However, in the field of biomedicine, patient data are sometimes stored in local repositories or institutions where they were collected, and may not be easily shared due to privacy concerns. This creates a substantial barrier for researchers to effectively learn from the distributed data using machine learning tools like SVMs. To overcome this difficulty and promote efficient information exchange without sharing sensitive raw data, we developed a Distributed Privacy Preserving Support Vector Machine (DPP-SVM). The DPP-SVM enables privacy-preserving collaborative learning, in which a trusted server integrates "privacy-insensitive" intermediary results. The globally learned model is guaranteed to be exactly the same as learned from combined data. We also provide a free web-service (http://privacy.ucsd.edu:8080/ppsvm/) for multiple participants to collaborate and complete the SVM-learning task in an efficient and privacy-preserving manner.
Memory loss

MedlinePlus

... this page: //medlineplus.gov/ency/article/003257.htm Memory loss To use the sharing features on this ... Bethesda, MD 20894 U.S. Department of Health and Human Services National Institutes of Health Page last updated: ...
New laser machining processes for shape memory alloys

NASA Astrophysics Data System (ADS)

Haferkamp, Heinz; Paschko, Stefan; Goede, Martin

2001-04-01

Due to special material properties, shape memory alloys (SMA) are finding increasing attention in micro system technology. However, only a few processes are available for the machining of miniaturized SMA-components. In this connection, laser material processing offers completely new possibilities. This paper describes the actual status of two projects that are being carried out to qualify new methods to machine SMA components by means of laser radiation. Within one project, the laser material ablation process of miniaturized SMA- components using ultra-short laser pulses (pulse duration: approx. 200 fs) in comparison to conventional laser material ablation is being investigated. Especially for SMA micro- sensors and actuators, it is important to minimize the heat affected zone (HAZ) to maintain the special mechanical properties. Light-microscopic investigations of the grain texture of SMA devices processed with ultra-short laser pulses show that the HAZ can be neglected. Presently, the main goal of the project is to qualify this new processing technique for the micro-structuring of complex SMA micro devices with high precision. Within a second project, investigations are being carried out to realize the induction of the two-way memory effect (TWME) into SMA components using laser radiation. By precisely heating SMA components with laser radiation, local tensions remain near the component surface. In connection with the shape memory effect, these tensions can be used to make the components execute complicated movements. Compared to conventional training methods to induce the TWME, this procedure is faster and easier. Furthermore, higher numbers of thermal cycling are expected because of the low dislocation density in the main part of the component.
Limits in decision making arise from limits in memory retrieval.

PubMed

Giguère, Gyslain; Love, Bradley C

2013-05-07

Some decisions, such as predicting the winner of a baseball game, are challenging in part because outcomes are probabilistic. When making such decisions, one view is that humans stochastically and selectively retrieve a small set of relevant memories that provides evidence for competing options. We show that optimal performance at test is impossible when retrieving information in this fashion, no matter how extensive training is, because limited retrieval introduces noise into the decision process that cannot be overcome. One implication is that people should be more accurate in predicting future events when trained on idealized rather than on the actual distributions of items. In other words, we predict the best way to convey information to people is to present it in a distorted, idealized form. Idealization of training distributions is predicted to reduce the harmful noise induced by immutable bottlenecks in people's memory retrieval processes. In contrast, machine learning systems that selectively weight (i.e., retrieve) all training examples at test should not benefit from idealization. These conjectures are strongly supported by several studies and supporting analyses. Unlike machine systems, people's test performance on a target distribution is higher when they are trained on an idealized version of the distribution rather than on the actual target distribution. Optimal machine classifiers modified to selectively and stochastically sample from memory match the pattern of human performance. These results suggest firm limits on human rationality and have broad implications for how to train humans tasked with important classification decisions, such as radiologists, baggage screeners, intelligence analysts, and gamblers.
Limits in decision making arise from limits in memory retrieval

PubMed Central

Giguère, Gyslain; Love, Bradley C.

2013-01-01

Some decisions, such as predicting the winner of a baseball game, are challenging in part because outcomes are probabilistic. When making such decisions, one view is that humans stochastically and selectively retrieve a small set of relevant memories that provides evidence for competing options. We show that optimal performance at test is impossible when retrieving information in this fashion, no matter how extensive training is, because limited retrieval introduces noise into the decision process that cannot be overcome. One implication is that people should be more accurate in predicting future events when trained on idealized rather than on the actual distributions of items. In other words, we predict the best way to convey information to people is to present it in a distorted, idealized form. Idealization of training distributions is predicted to reduce the harmful noise induced by immutable bottlenecks in people’s memory retrieval processes. In contrast, machine learning systems that selectively weight (i.e., retrieve) all training examples at test should not benefit from idealization. These conjectures are strongly supported by several studies and supporting analyses. Unlike machine systems, people’s test performance on a target distribution is higher when they are trained on an idealized version of the distribution rather than on the actual target distribution. Optimal machine classifiers modified to selectively and stochastically sample from memory match the pattern of human performance. These results suggest firm limits on human rationality and have broad implications for how to train humans tasked with important classification decisions, such as radiologists, baggage screeners, intelligence analysts, and gamblers. PMID:23610402
Methods For Self-Organizing Software

DOEpatents

Bouchard, Ann M.; Osbourn, Gordon C.

2005-10-18

A method for dynamically self-assembling and executing software is provided, containing machines that self-assemble execution sequences and data structures. In addition to ordered functions calls (found commonly in other software methods), mutual selective bonding between bonding sites of machines actuates one or more of the bonding machines. Two or more machines can be virtually isolated by a construct, called an encapsulant, containing a population of machines and potentially other encapsulants that can only bond with each other. A hierarchical software structure can be created using nested encapsulants. Multi-threading is implemented by populations of machines in different encapsulants that are interacting concurrently. Machines and encapsulants can move in and out of other encapsulants, thereby changing the functionality. Bonding between machines' sites can be deterministic or stochastic with bonding triggering a sequence of actions that can be implemented by each machine. A self-assembled execution sequence occurs as a sequence of stochastic binding between machines followed by their deterministic actuation. It is the sequence of bonding of machines that determines the execution sequence, so that the sequence of instructions need not be contiguous in memory.
Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications

NASA Technical Reports Server (NTRS)

OKeefe, Matthew (Editor); Kerr, Christopher L. (Editor)

1998-01-01

This report contains the abstracts and technical papers from the Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications, held June 15-18, 1998, in Scottsdale, Arizona. The purpose of the workshop is to bring together software developers in meteorology and oceanography to discuss software engineering and code design issues for parallel architectures, including Massively Parallel Processors (MPP's), Parallel Vector Processors (PVP's), Symmetric Multi-Processors (SMP's), Distributed Shared Memory (DSM) multi-processors, and clusters. Issues to be discussed include: (1) code architectures for current parallel models, including basic data structures, storage allocation, variable naming conventions, coding rules and styles, i/o and pre/post-processing of data; (2) designing modular code; (3) load balancing and domain decomposition; (4) techniques that exploit parallelism efficiently yet hide the machine-related details from the programmer; (5) tools for making the programmer more productive; and (6) the proliferation of programming models (F--, OpenMP, MPI, and HPF).

We Remember, We Forget: Collaborative Remembering in Older Couples

ERIC Educational Resources Information Center

Harris, Celia B.; Keil, Paul G.; Sutton, John; Barnier, Amanda J.; McIlwain, Doris J. F.

2011-01-01

Transactive memory theory describes the processes by which benefits for memory can occur when remembering is shared in dyads or groups. In contrast, cognitive psychology experiments demonstrate that social influences on memory disrupt and inhibit individual recall. However, most research in cognitive psychology has focused on groups of strangers…
76 FR 12821 - 150th Anniversary of the Inauguration of Abraham Lincoln

Federal Register 2010, 2011, 2012, 2013, 2014

2011-03-09

... together by shared memories and common hopes. As we observe the 150th anniversary of his Inauguration, we... his memory enabled America to move beyond a young collection of States to become a free and unified... memory and uphold the principles he so nobly advanced. [[Page 12822
Expert Systems on Multiprocessor Architectures. Volume 2. Technical Reports

DTIC Science & Technology

1991-06-01

Report RC 12936 (#58037). IBM T. J. Wartson Reiearch Center. July 1987. � Alan Jay Smith. Cache memories. Coniputing Sitrry., 1.1(3): I.3-5:30...basic-shared is an instrument for ashared memory design. The components panels are processor- qload-scrolling-bar-panel, memory-qload-scrolling-bar-panel
Blanket Gate Would Address Blocks Of Memory

NASA Technical Reports Server (NTRS)

Lambe, John; Moopenn, Alexander; Thakoor, Anilkumar P.

1988-01-01

Circuit-chip area used more efficiently. Proposed gate structure selectively allows and restricts access to blocks of memory in electronic neural-type network. By breaking memory into independent blocks, gate greatly simplifies problem of reading from and writing to memory. Since blocks not used simultaneously, share operational amplifiers that prompt and read information stored in memory cells. Fewer operational amplifiers needed, and chip area occupied reduced correspondingly. Cost per bit drops as result.
The potential of multi-port optical memories in digital computing

NASA Technical Reports Server (NTRS)

Alford, C. O.; Gaylord, T. K.

1975-01-01

A high-capacity memory with a relatively high data transfer rate and multi-port simultaneous access capability may serve as the basis for new computer architectures. The implementation of a multi-port optical memory is discussed. Several computer structures are presented that might profitably use such a memory. These structures include (1) a simultaneous record access system, (2) a simultaneously shared memory computer system, and (3) a parallel digital processing structure.
A Prolog Emulator

NASA Technical Reports Server (NTRS)

Tick, Evan

1987-01-01

This note describes an efficient software emulator for the Warren Abstract Machine (WAM) Prolog architecture. The version of the WAM implemented is called Lcode. The Lcode emulator, written in C, executes the 'naive reverse' benchmark at 3900 LIPS. The emulator is one of a set of tools used to measure the memory-referencing characteristics and performance of Prolog programs. These tools include a compiler, assembler, and memory simulators. An overview of the Lcode architecture is given here, followed by a description and listing of the emulator code implementing each Lcode instruction. This note will be of special interest to those studying the WAM and its performance characteristics. In general, this note will be of interest to those creating efficient software emulators for abstract machine architectures.
Block-Parallel Data Analysis with DIY2

DOE Office of Scientific and Technical Information (OSTI.GOV)

Morozov, Dmitriy; Peterka, Tom

DIY2 is a programming model and runtime for block-parallel analytics on distributed-memory machines. Its main abstraction is block-structured data parallelism: data are decomposed into blocks; blocks are assigned to processing elements (processes or threads); computation is described as iterations over these blocks, and communication between blocks is defined by reusable patterns. By expressing computation in this general form, the DIY2 runtime is free to optimize the movement of blocks between slow and fast memories (disk and flash vs. DRAM) and to concurrently execute blocks residing in memory with multiple threads. This enables the same program to execute in-core, out-of-core, serial,more » parallel, single-threaded, multithreaded, or combinations thereof. This paper describes the implementation of the main features of the DIY2 programming model and optimizations to improve performance. DIY2 is evaluated on benchmark test cases to establish baseline performance for several common patterns and on larger complete analysis codes running on large-scale HPC machines.« less
A performance comparison of the IBM RS/6000 and the Astronautics ZS-1

DOE Office of Scientific and Technical Information (OSTI.GOV)

Smith, W.M.; Abraham, S.G.; Davidson, E.S.

1991-01-01

Concurrent uniprocessor architectures, of which vector and superscalar are two examples, are designed to capitalize on fine-grain parallelism. The authors have developed a performance evaluation method for comparing and improving these architectures, and in this article they present the methodology and a detailed case study of two machines. The runtime of many programs is dominated by time spent in loop constructs - for example, Fortran Do-loops. Loops generally comprise two logical processes: The access process generates addresses for memory operations while the execute process operates on floating-point data. Memory access patterns typically can be generated independently of the data inmore » the execute process. This independence allows the access process to slip ahead, thereby hiding memory latency. The IBM 360/91 was designed in 1967 to achieve slip dynamically, at runtime. One CPU unit executes integer operations while another handles floating-point operations. Other machines, including the VAX 9000 and the IBM RS/6000, use a similar approach.« less
High Order Accuracy Methods for Supersonic Reactive Flows

DTIC Science & Technology

2008-06-25

k = 0, · · · , N and N is the polynomial order used. The positive constant M is chosen such that σ(N) becomes machine zero. Typically M ∼ 32. Table...function used in this study is the Exponential filter given by σ(η) = exp(−αηp), (38) where α = − ln() and is the machine zero. The spectral...respectively. All numerical experiments were run 49 on a 667 MHz Compaq Alpha machine with 1GB memory and with an Alpha internal floating point processor. 9.1
Research on intelligent machine self-perception method based on LSTM

NASA Astrophysics Data System (ADS)

Wang, Qiang; Cheng, Tao

2018-05-01

In this paper, we use the advantages of LSTM in feature extraction and processing high-dimensional and complex nonlinear data, and apply it to the autonomous perception of intelligent machines. Compared with the traditional multi-layer neural network, this model has memory, can handle time series information of any length. Since the multi-physical domain signals of processing machines have a certain timing relationship, and there is a contextual relationship between states and states, using this deep learning method to realize the self-perception of intelligent processing machines has strong versatility and adaptability. The experiment results show that the method proposed in this paper can obviously improve the sensing accuracy under various working conditions of the intelligent machine, and also shows that the algorithm can well support the intelligent processing machine to realize self-perception.
A manual for PARTI runtime primitives, revision 1

NASA Technical Reports Server (NTRS)

Das, Raja; Saltz, Joel; Berryman, Harry

1991-01-01

Primitives are presented that are designed to help users efficiently program irregular problems (e.g., unstructured mesh sweeps, sparse matrix codes, adaptive mesh partial differential equations solvers) on distributed memory machines. These primitives are also designed for use in compilers for distributed memory multiprocessors. Communications patterns are captured at runtime, and the appropriate send and receive messages are automatically generated.
Methodology for fast detection of false sharing in threaded scientific codes

DOEpatents

Chung, I-Hsin; Cong, Guojing; Murata, Hiroki; Negishi, Yasushi; Wen, Hui-Fang

2014-11-25

A profiling tool identifies a code region with a false sharing potential. A static analysis tool classifies variables and arrays in the identified code region. A mapping detection library correlates memory access instructions in the identified code region with variables and arrays in the identified code region while a processor is running the identified code region. The mapping detection library identifies one or more instructions at risk, in the identified code region, which are subject to an analysis by a false sharing detection library. A false sharing detection library performs a run-time analysis of the one or more instructions at risk while the processor is re-running the identified code region. The false sharing detection library determines, based on the performed run-time analysis, whether two different portions of the cache memory line are accessed by the generated binary code.
HEC Applications on Columbia Project

NASA Technical Reports Server (NTRS)

Taft, Jim

2004-01-01

NASA's Columbia system consists of a cluster of twenty 512 processor SGI Altix systems. Each of these systems is 3 TFLOP/s in peak performance - approximately the same as the entire compute capability at NAS just one year ago. Each 512p system is a single system image machine with one Linunx O5, one high performance file system, and one globally shared memory. The NAS Terascale Applications Group (TAG) is chartered to assist in scaling NASA's mission critical codes to at least 512p in order to significantly improve emergency response during flight operations, as well as provide significant improvements in the codes. and rate of scientific discovery across the scientifc disciplines within NASA's Missions. Recent accomplishments are 4x improvements to codes in the ocean modeling community, 10x performance improvements in a number of computational fluid dynamics codes used in aero-vehicle design, and 5x improvements in a number of space science codes dealing in extreme physics. The TAG group will continue its scaling work to 2048p and beyond (10240 cpus) as the Columbia system becomes fully operational and the upgrades to the SGI NUMAlink memory fabric are in place. The NUMlink uprades dramatically improve system scalability for a single application. These upgrades will allow a number of codes to execute faster at higher fidelity than ever before on any other system, thus increasing the rate of scientific discovery even further
An Adaptive Insertion and Promotion Policy for Partitioned Shared Caches

NASA Astrophysics Data System (ADS)

Mahrom, Norfadila; Liebelt, Michael; Raof, Rafikha Aliana A.; Daud, Shuhaizar; Hafizah Ghazali, Nur

2018-03-01

Cache replacement policies in chip multiprocessors (CMP) have been investigated extensively and proven able to enhance shared cache management. However, competition among multiple processors executing different threads that require simultaneous access to a shared memory may cause cache contention and memory coherence problems on the chip. These issues also exist due to some drawbacks of the commonly used Least Recently Used (LRU) policy employed in multiprocessor systems, which are because of the cache lines residing in the cache longer than required. In image processing analysis of for example extra pulmonary tuberculosis (TB), an accurate diagnosis for tissue specimen is required. Therefore, a fast and reliable shared memory management system to execute algorithms for processing vast amount of specimen image is needed. In this paper, the effects of the cache replacement policy in a partitioned shared cache are investigated. The goal is to quantify whether better performance can be achieved by using less complex replacement strategies. This paper proposes a Middle Insertion 2 Positions Promotion (MI2PP) policy to eliminate cache misses that could adversely affect the access patterns and the throughput of the processors in the system. The policy employs a static predefined insertion point, near distance promotion, and the concept of ownership in the eviction policy to effectively improve cache thrashing and to avoid resource stealing among the processors.
Surface Characteristics of Machined NiTi Shape Memory Alloy: The Effects of Cryogenic Cooling and Preheating Conditions

NASA Astrophysics Data System (ADS)

Kaynak, Y.; Huang, B.; Karaca, H. E.; Jawahir, I. S.

2017-07-01

This experimental study focuses on the phase state and phase transformation response of the surface and subsurface of machined NiTi alloys. X-ray diffraction (XRD) analysis and differential scanning calorimeter techniques were utilized to measure the phase state and the transformation response of machined specimens, respectively. Specimens were machined under dry machining at ambient temperature, preheated conditions, and cryogenic cooling conditions at various cutting speeds. The findings from this research demonstrate that cryogenic machining substantially alters austenite finish temperature of martensitic NiTi alloy. Austenite finish ( A f) temperature shows more than 25 percent increase resulting from cryogenic machining compared with austenite finish temperature of as-received NiTi. Dry and preheated conditions do not substantially alter austenite finish temperature. XRD analysis shows that distinctive transformation from martensite to austenite occurs during machining process in all three conditions. Complete transformation from martensite to austenite is observed in dry cutting at all selected cutting speeds.
Hierarchical Traces for Reduced NSM Memory Requirements

NASA Astrophysics Data System (ADS)

Dahl, Torbjørn S.

This paper presents work on using hierarchical long term memory to reduce the memory requirements of nearest sequence memory (NSM) learning, a previously published, instance-based reinforcement learning algorithm. A hierarchical memory representation reduces the memory requirements by allowing traces to share common sub-sequences. We present moderated mechanisms for estimating discounted future rewards and for dealing with hidden state using hierarchical memory. We also present an experimental analysis of how the sub-sequence length affects the memory compression achieved and show that the reduced memory requirements do not effect the speed of learning. Finally, we analyse and discuss the persistence of the sub-sequences independent of specific trace instances.
Celebrating Successful Students

ERIC Educational Resources Information Center

Squires, Dan; Case, Pauline

2008-01-01

The Machine Tool Program at Cowley College in Arkansas City, Kansas, is preparing students to become future leaders in the machining field, and the school recognizes the importance of sharing and celebrating those stories of success with the public to demonstrate the effectiveness of career and technical education (CTE) programs. Cowley College is…
The Contribution of Working Memory to Fluid Reasoning: Capacity, Control, or Both?

ERIC Educational Resources Information Center

Chuderski, Adam; Necka, Edward

2012-01-01

Fluid reasoning shares a large part of its variance with working memory capacity (WMC). The literature on working memory (WM) suggests that the capacity of the focus of attention responsible for simultaneous maintenance and integration of information within WM, as well as the effectiveness of executive control exerted over WM, determines…
Feature-Based Memory-Driven Attentional Capture: Visual Working Memory Content Affects Visual Attention

ERIC Educational Resources Information Center

Olivers, Christian N. L.; Meijer, Frank; Theeuwes, Jan

2006-01-01

In 7 experiments, the authors explored whether visual attention (the ability to select relevant visual information) and visual working memory (the ability to retain relevant visual information) share the same content representations. The presence of singleton distractors interfered more strongly with a visual search task when it was accompanied by…
Time-Related Decay or Interference-Based Forgetting in Working Memory?

ERIC Educational Resources Information Center

Portrat, Sophie; Barrouillet, Pierre; Camos, Valerie

2008-01-01

The time-based resource-sharing model of working memory assumes that memory traces suffer from a time-related decay when attention is occupied by concurrent activities. Using complex continuous span tasks in which temporal parameters are carefully controlled, P. Barrouillet, S. Bernardin, S. Portrat, E. Vergauwe, & V. Camos (2007) recently…

Developmental Change in Working Memory Strategies: From Passive Maintenance to Active Refreshing

ERIC Educational Resources Information Center

Camos, Valerie; Barrouillet, Pierre

2011-01-01

Change in strategies is often mentioned as a source of memory development. However, though performance in working memory tasks steadily improves during childhood, theories differ in linking this development to strategy changes. Whereas some theories, such as the time-based resource-sharing model, invoke the age-related increase in use and…
Cache write generate for parallel image processing on shared memory architectures.

PubMed

Wittenbrink, C M; Somani, A K; Chen, C H

1996-01-01

We investigate cache write generate, our cache mode invention. We demonstrate that for parallel image processing applications, the new mode improves main memory bandwidth, CPU efficiency, cache hits, and cache latency. We use register level simulations validated by the UW-Proteus system. Many memory, cache, and processor configurations are evaluated.
Mnemonic convergence in social networks: The emergent properties of cognition at a collective level

PubMed Central

Coman, Alin; Momennejad, Ida; Drach, Rae D.; Geana, Andra

2016-01-01

The development of shared memories, beliefs, and norms is a fundamental characteristic of human communities. These emergent outcomes are thought to occur owing to a dynamic system of information sharing and memory updating, which fundamentally depends on communication. Here we report results on the formation of collective memories in laboratory-created communities. We manipulated conversational network structure in a series of real-time, computer-mediated interactions in fourteen 10-member communities. The results show that mnemonic convergence, measured as the degree of overlap among community members’ memories, is influenced by both individual-level information-processing phenomena and by the conversational social network structure created during conversational recall. By studying laboratory-created social networks, we show how large-scale social phenomena (i.e., collective memory) can emerge out of microlevel local dynamics (i.e., mnemonic reinforcement and suppression effects). The social-interactionist approach proposed herein points to optimal strategies for spreading information in social networks and provides a framework for measuring and forging collective memories in communities of individuals. PMID:27357678
Hardware support for collecting performance counters directly to memory

DOEpatents

Gara, Alan; Salapura, Valentina; Wisniewski, Robert W.

2012-09-25

Hardware support for collecting performance counters directly to memory, in one aspect, may include a plurality of performance counters operable to collect one or more counts of one or more selected activities. A first storage element may be operable to store an address of a memory location. A second storage element may be operable to store a value indicating whether the hardware should begin copying. A state machine may be operable to detect the value in the second storage element and trigger hardware copying of data in selected one or more of the plurality of performance counters to the memory location whose address is stored in the first storage element.
Importance of balanced architectures in the design of high-performance imaging systems

NASA Astrophysics Data System (ADS)

Sgro, Joseph A.; Stanton, Paul C.

1999-03-01

Imaging systems employed in demanding military and industrial applications, such as automatic target recognition and computer vision, typically require real-time high-performance computing resources. While high- performances computing systems have traditionally relied on proprietary architectures and custom components, recent advances in high performance general-purpose microprocessor technology have produced an abundance of low cost components suitable for use in high-performance computing systems. A common pitfall in the design of high performance imaging system, particularly systems employing scalable multiprocessor architectures, is the failure to balance computational and memory bandwidth. The performance of standard cluster designs, for example, in which several processors share a common memory bus, is typically constrained by memory bandwidth. The symptom characteristic of this problem is failure to the performance of the system to scale as more processors are added. The problem becomes exacerbated if I/O and memory functions share the same bus. The recent introduction of microprocessors with large internal caches and high performance external memory interfaces makes it practical to design high performance imaging system with balanced computational and memory bandwidth. Real word examples of such designs will be presented, along with a discussion of adapting algorithm design to best utilize available memory bandwidth.
The costs of changing an intended action: movement planning, but not execution, interferes with verbal working memory.

PubMed

Spiegel, M A; Koester, D; Weigelt, M; Schack, T

2012-02-16

How much cognitive effort does it take to change a movement plan? In previous studies, it has been shown that humans plan and represent actions in advance, but it remains unclear whether or not action planning and verbal working memory share cognitive resources. Using a novel experimental paradigm, we combined in two experiments a grasp-to-place task with a verbal working memory task. Participants planned a placing movement toward one of two target positions and subsequently encoded and maintained visually presented letters. Both experiments revealed that re-planning the intended action reduced letter recall performance; execution time, however, was not influenced by action modifications. The results of Experiment 2 suggest that the action's interference with verbal working memory arose during the planning rather than the execution phase of the movement. Together, our results strongly suggest that movement planning and verbal working memory share common cognitive resources. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Synapsin Determines Memory Strength after Punishment- and Relief-Learning

PubMed Central

Niewalda, Thomas; Michels, Birgit; Jungnickel, Roswitha; Diegelmann, Sören; Kleber, Jörg; Kähne, Thilo

2015-01-01

Adverse life events can induce two kinds of memory with opposite valence, dependent on timing: “negative” memories for stimuli preceding them and “positive” memories for stimuli experienced at the moment of “relief.” Such punishment memory and relief memory are found in insects, rats, and man. For example, fruit flies (Drosophila melanogaster) avoid an odor after odor-shock training (“forward conditioning” of the odor), whereas after shock-odor training (“backward conditioning” of the odor) they approach it. Do these timing-dependent associative processes share molecular determinants? We focus on the role of Synapsin, a conserved presynaptic phosphoprotein regulating the balance between the reserve pool and the readily releasable pool of synaptic vesicles. We find that a lack of Synapsin leaves task-relevant sensory and motor faculties unaffected. In contrast, both punishment memory and relief memory scores are reduced. These defects reflect a true lessening of associative memory strength, as distortions in nonassociative processing (e.g., susceptibility to handling, adaptation, habituation, sensitization), discrimination ability, and changes in the time course of coincidence detection can be ruled out as alternative explanations. Reductions in punishment- and relief-memory strength are also observed upon an RNAi-mediated knock-down of Synapsin, and are rescued both by acutely restoring Synapsin and by locally restoring it in the mushroom bodies of mutant flies. Thus, both punishment memory and relief memory require the Synapsin protein and in this sense share genetic and molecular determinants. We note that corresponding molecular commonalities between punishment memory and relief memory in humans would constrain pharmacological attempts to selectively interfere with excessive associative punishment memories, e.g., after traumatic experiences. PMID:25972175
Synapsin determines memory strength after punishment- and relief-learning.

PubMed

Niewalda, Thomas; Michels, Birgit; Jungnickel, Roswitha; Diegelmann, Sören; Kleber, Jörg; Kähne, Thilo; Gerber, Bertram

2015-05-13

Adverse life events can induce two kinds of memory with opposite valence, dependent on timing: "negative" memories for stimuli preceding them and "positive" memories for stimuli experienced at the moment of "relief." Such punishment memory and relief memory are found in insects, rats, and man. For example, fruit flies (Drosophila melanogaster) avoid an odor after odor-shock training ("forward conditioning" of the odor), whereas after shock-odor training ("backward conditioning" of the odor) they approach it. Do these timing-dependent associative processes share molecular determinants? We focus on the role of Synapsin, a conserved presynaptic phosphoprotein regulating the balance between the reserve pool and the readily releasable pool of synaptic vesicles. We find that a lack of Synapsin leaves task-relevant sensory and motor faculties unaffected. In contrast, both punishment memory and relief memory scores are reduced. These defects reflect a true lessening of associative memory strength, as distortions in nonassociative processing (e.g., susceptibility to handling, adaptation, habituation, sensitization), discrimination ability, and changes in the time course of coincidence detection can be ruled out as alternative explanations. Reductions in punishment- and relief-memory strength are also observed upon an RNAi-mediated knock-down of Synapsin, and are rescued both by acutely restoring Synapsin and by locally restoring it in the mushroom bodies of mutant flies. Thus, both punishment memory and relief memory require the Synapsin protein and in this sense share genetic and molecular determinants. We note that corresponding molecular commonalities between punishment memory and relief memory in humans would constrain pharmacological attempts to selectively interfere with excessive associative punishment memories, e.g., after traumatic experiences. Copyright © 2015 Niewalda et al.
First Applications of the New Parallel Krylov Solver for MODFLOW on a National and Global Scale

NASA Astrophysics Data System (ADS)

Verkaik, J.; Hughes, J. D.; Sutanudjaja, E.; van Walsum, P.

2016-12-01

Integrated high-resolution hydrologic models are increasingly being used for evaluating water management measures at field scale. Their drawbacks are large memory requirements and long run times. Examples of such models are The Netherlands Hydrological Instrument (NHI) model and the PCRaster Global Water Balance (PCR-GLOBWB) model. Typical simulation periods are 30-100 years with daily timesteps. The NHI model predicts water demands in periods of drought, supporting operational and long-term water-supply decisions. The NHI is a state-of-the-art coupling of several models: a 7-layer MODFLOW groundwater model ( 6.5M 250m cells), a MetaSWAP model for the unsaturated zone (Richards emulator of 0.5M cells), and a surface water model (MOZART-DM). The PCR-GLOBWB model provides a grid-based representation of global terrestrial hydrology and this work uses the version that includes a 2-layer MODFLOW groundwater model ( 4.5M 10km cells). The Parallel Krylov Solver (PKS) speeds up computation by both distributed memory parallelization (Message Passing Interface) and shared memory parallelization (Open Multi-Processing). PKS includes conjugate gradient, bi-conjugate gradient stabilized, and generalized minimal residual linear accelerators that use an overlapping additive Schwarz domain decomposition preconditioner. PKS can be used for both structured and unstructured grids and has been fully integrated in MODFLOW-USG using METIS partitioning and in iMODFLOW using RCB partitioning. iMODFLOW is an accelerated version of MODFLOW-2005 that is implicitly and online coupled to MetaSWAP. Results for benchmarks carried out on the Cartesius Dutch supercomputer (https://userinfo.surfsara.nl/systems/cartesius) for the PCRGLOB-WB model and on a 2x16 core Windows machine for the NHI model show speedups up to 10-20 and 5-10, respectively.
DDGIPS: a general image processing system in robot vision

NASA Astrophysics Data System (ADS)

Tian, Yuan; Ying, Jun; Ye, Xiuqing; Gu, Weikang

2000-10-01

Real-Time Image Processing is the key work in robot vision. With the limitation of the hardware technique, many algorithm-oriented firmware systems were designed in the past. But their architectures were not flexible enough to achieve a multi-algorithm development system. Because of the rapid development of microelectronics technique, many high performance DSP chips and high density FPGA chips have come to life, and this makes it possible to construct a more flexible architecture in real-time image processing system. In this paper, a Double DSP General Image Processing System (DDGIPS) is concerned. We try to construct a two-DSP-based FPGA-computational system with two TMS320C6201s. The TMS320C6x devices are fixed-point processors based on the advanced VLIW CPU, which has eight functional units, including two multipliers and six arithmetic logic units. These features make C6x a good candidate for a general purpose system. In our system, the two TMS320C6201s each has a local memory space, and they also have a shared system memory space which enables them to intercommunicate and exchange data efficiently. At the same time, they can be directly inter-connected in star-shaped architecture. All of these are under the control of a FPGA group. As the core of the system, FPGA plays a very important role: it takes charge of DPS control, DSP communication, memory space access arbitration and the communication between the system and the host machine. And taking advantage of reconfiguring FPGA, all of the interconnection between the two DSP or between DSP and FPGA can be changed. In this way, users can easily rebuild the real-time image processing system according to the data stream and the task of the application and gain great flexibility.
DDGIPS: a general image processing system in robot vision

NASA Astrophysics Data System (ADS)

Tian, Yuan; Ying, Jun; Ye, Xiuqing; Gu, Weikang

2000-10-01

Real-Time Image Processing is the key work in robot vision. With the limitation of the hardware technique, many algorithm-oriented firmware systems were designed in the past. But their architectures were not flexible enough to achieve a multi- algorithm development system. Because of the rapid development of microelectronics technique, many high performance DSP chips and high density FPGA chips have come to life, and this makes it possible to construct a more flexible architecture in real-time image processing system. In this paper, a Double DSP General Image Processing System (DDGIPS) is concerned. We try to construct a two-DSP-based FPGA-computational system with two TMS320C6201s. The TMS320C6x devices are fixed-point processors based on the advanced VLIW CPU, which has eight functional units, including two multipliers and six arithmetic logic units. These features make C6x a good candidate for a general purpose system. In our system, the two TMS320C6210s each has a local memory space, and they also have a shared system memory space which enable them to intercommunicate and exchange data efficiently. At the same time, they can be directly interconnected in star- shaped architecture. All of these are under the control of FPGA group. As the core of the system, FPGA plays a very important role: it takes charge of DPS control, DSP communication, memory space access arbitration and the communication between the system and the host machine. And taking advantage of reconfiguring FPGA, all of the interconnection between the two DSP or between DSP and FPGA can be changed. In this way, users can easily rebuild the real-time image processing system according to the data stream and the task of the application and gain great flexibility.
Audience tuning effects in the context of situated and embodied processes.

PubMed

Semin, Gün R

2018-03-05

This review provides an overview of the research on communication and the 'Saying is Believing' paradigm in the context of different perspectives on communication. The process of 'audience tuning' is shaped by a variety of situated factors in contexts that affect the communicators' confidence in their message. The overwhelming common denominator is that the combination of features that create ambiguity yields the optimal condition for the formation of shared realities. I conclude with an argument that the implied invariance of memory processes in shared reality work needs to be more attentive to the regulatory function of memories driving the expression of shared realities. Copyright © 2018 Elsevier Ltd. All rights reserved.
A 300MHz Embedded Flash Memory with Pipeline Architecture and Offset-Free Sense Amplifiers for Dual-Core Automotive Microcontrollers

NASA Astrophysics Data System (ADS)

Kajiyama, Shinya; Fujito, Masamichi; Kasai, Hideo; Mizuno, Makoto; Yamaguchi, Takanori; Shinagawa, Yutaka

A novel 300MHz embedded flash memory for dual-core microcontrollers with a shared ROM architecture is proposed. One of its features is a three-stage pipeline read operation, which enables reduced access pitch and therefore reduces performance penalty due to conflict of shared ROM accesses. Another feature is a highly sensitive sense amplifier that achieves efficient pipeline operation with two-cycle latency one-cycle pitch as a result of a shortened sense time of 0.63ns. The combination of the pipeline architecture and proposed sense amplifiers significantly reduces access-conflict penalties with shared ROM and enhances performance of 32-bit RISC dual-core microcontrollers by 30%.
Impact of Load Balancing on Unstructured Adaptive Grid Computations for Distributed-Memory Multiprocessors

NASA Technical Reports Server (NTRS)

Sohn, Andrew; Biswas, Rupak; Simon, Horst D.

1996-01-01

The computational requirements for an adaptive solution of unsteady problems change as the simulation progresses. This causes workload imbalance among processors on a parallel machine which, in turn, requires significant data movement at runtime. We present a new dynamic load-balancing framework, called JOVE, that balances the workload across all processors with a global view. Whenever the computational mesh is adapted, JOVE is activated to eliminate the load imbalance. JOVE has been implemented on an IBM SP2 distributed-memory machine in MPI for portability. Experimental results for two model meshes demonstrate that mesh adaption with load balancing gives more than a sixfold improvement over one without load balancing. We also show that JOVE gives a 24-fold speedup on 64 processors compared to sequential execution.
A high performance parallel algorithm for 1-D FFT

DOE Office of Scientific and Technical Information (OSTI.GOV)

Agarwal, R.C.; Gustavson, F.G.; Zubair, M.

1994-12-31

In this paper the authors propose a parallel high performance FFT algorithm based on a multi-dimensional formulation. They use this to solve a commonly encountered FFT based kernel on a distributed memory parallel machine, the IBM scalable parallel system, SP1. The kernel requires a forward FFT computation of an input sequence, multiplication of the transformed data by a coefficient array, and finally an inverse FFT computation of the resultant data. They show that the multi-dimensional formulation helps in reducing the communication costs and also improves the single node performance by effectively utilizing the memory system of the node. They implementedmore » this kernel on the IBM SP1 and observed a performance of 1.25 GFLOPS on a 64-node machine.« less
Implementation of a parallel unstructured Euler solver on the CM-5

NASA Technical Reports Server (NTRS)

Morano, Eric; Mavriplis, D. J.

1995-01-01

An efficient unstructured 3D Euler solver is parallelized on a Thinking Machine Corporation Connection Machine 5, distributed memory computer with vectoring capability. In this paper, the single instruction multiple data (SIMD) strategy is employed through the use of the CM Fortran language and the CMSSL scientific library. The performance of the CMSSL mesh partitioner is evaluated and the overall efficiency of the parallel flow solver is discussed.
Efficient Checkpointing of Virtual Machines using Virtual Machine Introspection

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aderholdt, Ferrol; Han, Fang; Scott, Stephen L

Cloud Computing environments rely heavily on system-level virtualization. This is due to the inherent benefits of virtualization including fault tolerance through checkpoint/restart (C/R) mechanisms. Because clouds are the abstraction of large data centers and large data centers have a higher potential for failure, it is imperative that a C/R mechanism for such an environment provide minimal latency as well as a small checkpoint file size. Recently, there has been much research into C/R with respect to virtual machines (VM) providing excellent solutions to reduce either checkpoint latency or checkpoint file size. However, these approaches do not provide both. This papermore » presents a method of checkpointing VMs by utilizing virtual machine introspection (VMI). Through the usage of VMI, we are able to determine which pages of memory within the guest are used or free and are better able to reduce the amount of pages written to disk during a checkpoint. We have validated this work by using various benchmarks to measure the latency along with the checkpoint size. With respect to checkpoint file size, our approach results in file sizes within 24% or less of the actual used memory within the guest. Additionally, the checkpoint latency of our approach is up to 52% faster than KVM s default method.« less
A general model for memory interference in a multiprocessor system with memory hierarchy

NASA Technical Reports Server (NTRS)

Taha, Badie A.; Standley, Hilda M.

1989-01-01

The problem of memory interference in a multiprocessor system with a hierarchy of shared buses and memories is addressed. The behavior of the processors is represented by a sequence of memory requests with each followed by a determined amount of processing time. A statistical queuing network model for determining the extent of memory interference in multiprocessor systems with clusters of memory hierarchies is presented. The performance of the system is measured by the expected number of busy memory clusters. The results of the analytic model are compared with simulation results, and the correlation between them is found to be very high.
Domain-general involvement of the posterior frontolateral cortex in time-based resource-sharing in working memory: An fMRI study.

PubMed

Vergauwe, Evie; Hartstra, Egbert; Barrouillet, Pierre; Brass, Marcel

2015-07-15

Working memory is often defined in cognitive psychology as a system devoted to the simultaneous processing and maintenance of information. In line with the time-based resource-sharing model of working memory (TBRS; Barrouillet and Camos, 2015; Barrouillet et al., 2004), there is accumulating evidence that, when memory items have to be maintained while performing a concurrent activity, memory performance depends on the cognitive load of this activity, independently of the domain involved. The present study used fMRI to identify regions in the brain that are sensitive to variations in cognitive load in a domain-general way. More precisely, we aimed at identifying brain areas that activate during maintenance of memory items as a direct function of the cognitive load induced by both verbal and spatial concurrent tasks. Results show that the right IFJ and bilateral SPL/IPS are the only areas showing an increased involvement as cognitive load increases and do so in a domain general manner. When correlating the fMRI signal with the approximated cognitive load as defined by the TBRS model, it was shown that the main focus of the cognitive load-related activation is located in the right IFJ. The present findings indicate that the IFJ makes domain-general contributions to time-based resource-sharing in working memory and allowed us to generate the novel hypothesis by which the IFJ might be the neural basis for the process of rapid switching. We argue that the IFJ might be a crucial part of a central attentional bottleneck in the brain because of its inability to upload more than one task rule at once. Copyright © 2015 Elsevier Inc. All rights reserved.
Information and processes underlying semantic and episodic memory across tasks, items, and individuals.

PubMed

Cox, Gregory E; Hemmer, Pernille; Aue, William R; Criss, Amy H

2018-04-01

The development of memory theory has been constrained by a focus on isolated tasks rather than the processes and information that are common to situations in which memory is engaged. We present results from a study in which 453 participants took part in five different memory tasks: single-item recognition, associative recognition, cued recall, free recall, and lexical decision. Using hierarchical Bayesian techniques, we jointly analyzed the correlations between tasks within individuals-reflecting the degree to which tasks rely on shared cognitive processes-and within items-reflecting the degree to which tasks rely on the same information conveyed by the item. Among other things, we find that (a) the processes involved in lexical access and episodic memory are largely separate and rely on different kinds of information, (b) access to lexical memory is driven primarily by perceptual aspects of a word, (c) all episodic memory tasks rely to an extent on a set of shared processes which make use of semantic features to encode both single words and associations between words, and (d) recall involves additional processes likely related to contextual cuing and response production. These results provide a large-scale picture of memory across different tasks which can serve to drive the development of comprehensive theories of memory. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

Humans and machines in space: The vision, the challenge, the payoff; Proceedings of the 29th Goddard Memorial Symposium, Washington, Mar. 14, 15, 1991

NASA Astrophysics Data System (ADS)

Johnson, Bradley; May, Gayle L.; Korn, Paula

The present conference discusses the currently envisioned goals of human-machine systems in spacecraft environments, prospects for human exploration of the solar system, and plausible methods for meeting human needs in space. Also discussed are the problems of human-machine interaction in long-duration space flights, remote medical systems for space exploration, the use of virtual reality for planetary exploration, the alliance between U.S. Antarctic and space programs, and the economic and educational impacts of the U.S. space program.
What Drives Memory-Driven Attentional Capture? The Effects of Memory Type, Display Type, and Search Type

ERIC Educational Resources Information Center

Olivers, Christian N. L.

2009-01-01

An important question is whether visual attention (the ability to select relevant visual information) and visual working memory (the ability to retain relevant visual information) share the same content representations. Some past research has indicated that they do: Singleton distractors interfered more strongly with a visual search task when they…
Discrete Resource Allocation in Visual Working Memory

ERIC Educational Resources Information Center

Barton, Brian; Ester, Edward F.; Awh, Edward

2009-01-01

Are resources in visual working memory allocated in a continuous or a discrete fashion? On one hand, flexible resource models suggest that capacity is determined by a central resource pool that can be flexibly divided such that items of greater complexity receive a larger share of resources. On the other hand, if capacity in working memory is…
Natural Conversations as a Source of False Memories in Children: Implications for the Testimony of Young Witnesses

PubMed Central

Principe, Gabrielle F.; Schindewolf, Erica

2012-01-01

Research on factors that can affect the accuracy of children’s autobiographical remembering has important implications for understanding the abilities of young witnesses to provide legal testimony. In this article, we review our own recent research on one factor that has much potential to induce errors in children’s event recall, namely natural memory sharing conversations with peers and parents. Our studies provide compelling evidence that not only can the content of conversations about the past intrude into later memory but that such exchanges can prompt the generation of entirely false narratives that are more detailed than true accounts of experienced events. Further, our work show that deeper and more creative participation in memory sharing dialogues can boost the damaging effects of conversationally conveyed misinformation. Implications of this collection of findings for children’s testimony are discussed. PMID:23129880
On nonlinear finite element analysis in single-, multi- and parallel-processors

NASA Technical Reports Server (NTRS)

Utku, S.; Melosh, R.; Islam, M.; Salama, M.

1982-01-01

Numerical solution of nonlinear equilibrium problems of structures by means of Newton-Raphson type iterations is reviewed. Each step of the iteration is shown to correspond to the solution of a linear problem, therefore the feasibility of the finite element method for nonlinear analysis is established. Organization and flow of data for various types of digital computers, such as single-processor/single-level memory, single-processor/two-level-memory, vector-processor/two-level-memory, and parallel-processors, with and without sub-structuring (i.e. partitioning) are given. The effect of the relative costs of computation, memory and data transfer on substructuring is shown. The idea of assigning comparable size substructures to parallel processors is exploited. Under Cholesky type factorization schemes, the efficiency of parallel processing is shown to decrease due to the occasional shared data, just as that due to the shared facilities.
Fencing direct memory access data transfers in a parallel active messaging interface of a parallel computer

DOEpatents

Blocksome, Michael A.; Mamidala, Amith R.

2013-09-03

Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to segments of shared random access memory through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and a segment of shared memory; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.
Fencing direct memory access data transfers in a parallel active messaging interface of a parallel computer

DOEpatents

Blocksome, Michael A; Mamidala, Amith R

2014-02-11

Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to segments of shared random access memory through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and a segment of shared memory; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.
Implementation of Parallel Dynamic Simulation on Shared-Memory vs. Distributed-Memory Environments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jin, Shuangshuang; Chen, Yousu; Wu, Di

2015-12-09

Power system dynamic simulation computes the system response to a sequence of large disturbance, such as sudden changes in generation or load, or a network short circuit followed by protective branch switching operation. It consists of a large set of differential and algebraic equations, which is computational intensive and challenging to solve using single-processor based dynamic simulation solution. High-performance computing (HPC) based parallel computing is a very promising technology to speed up the computation and facilitate the simulation process. This paper presents two different parallel implementations of power grid dynamic simulation using Open Multi-processing (OpenMP) on shared-memory platform, and Messagemore » Passing Interface (MPI) on distributed-memory clusters, respectively. The difference of the parallel simulation algorithms and architectures of the two HPC technologies are illustrated, and their performances for running parallel dynamic simulation are compared and demonstrated.« less
Parallelization of NAS Benchmarks for Shared Memory Multiprocessors

NASA Technical Reports Server (NTRS)

Waheed, Abdul; Yan, Jerry C.; Saini, Subhash (Technical Monitor)

1998-01-01

This paper presents our experiences of parallelizing the sequential implementation of NAS benchmarks using compiler directives on SGI Origin2000 distributed shared memory (DSM) system. Porting existing applications to new high performance parallel and distributed computing platforms is a challenging task. Ideally, a user develops a sequential version of the application, leaving the task of porting to new generations of high performance computing systems to parallelization tools and compilers. Due to the simplicity of programming shared-memory multiprocessors, compiler developers have provided various facilities to allow the users to exploit parallelism. Native compilers on SGI Origin2000 support multiprocessing directives to allow users to exploit loop-level parallelism in their programs. Additionally, supporting tools can accomplish this process automatically and present the results of parallelization to the users. We experimented with these compiler directives and supporting tools by parallelizing sequential implementation of NAS benchmarks. Results reported in this paper indicate that with minimal effort, the performance gain is comparable with the hand-parallelized, carefully optimized, message-passing implementations of the same benchmarks.
A Collaborative Framework for Distributed Privacy-Preserving Support Vector Machine Learning

PubMed Central

Que, Jialan; Jiang, Xiaoqian; Ohno-Machado, Lucila

2012-01-01

A Support Vector Machine (SVM) is a popular tool for decision support. The traditional way to build an SVM model is to estimate parameters based on a centralized repository of data. However, in the field of biomedicine, patient data are sometimes stored in local repositories or institutions where they were collected, and may not be easily shared due to privacy concerns. This creates a substantial barrier for researchers to effectively learn from the distributed data using machine learning tools like SVMs. To overcome this difficulty and promote efficient information exchange without sharing sensitive raw data, we developed a Distributed Privacy Preserving Support Vector Machine (DPP-SVM). The DPP-SVM enables privacy-preserving collaborative learning, in which a trusted server integrates “privacy-insensitive” intermediary results. The globally learned model is guaranteed to be exactly the same as learned from combined data. We also provide a free web-service (http://privacy.ucsd.edu:8080/ppsvm/) for multiple participants to collaborate and complete the SVM-learning task in an efficient and privacy-preserving manner. PMID:23304414
Lifelong personal health data and application software via virtual machines in the cloud.

PubMed

Van Gorp, Pieter; Comuzzi, Marco

2014-01-01

Personal Health Records (PHRs) should remain the lifelong property of patients, who should be able to show them conveniently and securely to selected caregivers and institutions. In this paper, we present MyPHRMachines, a cloud-based PHR system taking a radically new architectural solution to health record portability. In MyPHRMachines, health-related data and the application software to view and/or analyze it are separately deployed in the PHR system. After uploading their medical data to MyPHRMachines, patients can access them again from remote virtual machines that contain the right software to visualize and analyze them without any need for conversion. Patients can share their remote virtual machine session with selected caregivers, who will need only a Web browser to access the pre-loaded fragments of their lifelong PHR. We discuss a prototype of MyPHRMachines applied to two use cases, i.e., radiology image sharing and personalized medicine.
Industry 4.0 - How will the nonwoven production of tomorrow look like?

NASA Astrophysics Data System (ADS)

Cloppenburg, F.; Münkel, A.; Gloy, Y.; Gries, T.

2017-10-01

Industry 4.0 stands for the on-going fourth industrial revolution, which uses cyber physical systems. In the textile industry the terms of industry 4.0 are not sufficiently known yet. First developments of industry 4.0 are mainly visible in the weaving industry. The cost structure of the nonwoven industry is unique in the textile industry. High shares of personnel, energy and machine costs are distinctive for nonwoven producers. Therefore the industry 4.0 developments in the nonwoven industry should concentrate on reducing these shares by using the work force efficiently and by increasing the productivity of first-rate quality and therefore decreasing waste production and downtimes. Using the McKinsey digital compass three main working fields are necessary: Self-optimizing nonwoven machines, big data analytics and assistance systems. Concepts for the nonwoven industry are shown, like the “EasyNonwoven” concept, which aims on economically optimizing the machine settings using self-optimization routines.
Information processing via physical soft body

PubMed Central

Nakajima, Kohei; Hauser, Helmut; Li, Tao; Pfeifer, Rolf

2015-01-01

Soft machines have recently gained prominence due to their inherent softness and the resulting safety and resilience in applications. However, these machines also have disadvantages, as they respond with complex body dynamics when stimulated. These dynamics exhibit a variety of properties, including nonlinearity, memory, and potentially infinitely many degrees of freedom, which are often difficult to control. Here, we demonstrate that these seemingly undesirable properties can in fact be assets that can be exploited for real-time computation. Using body dynamics generated from a soft silicone arm, we show that they can be employed to emulate desired nonlinear dynamical systems. First, by using benchmark tasks, we demonstrate that the nonlinearity and memory within the body dynamics can increase the computational performance. Second, we characterize our system’s computational capability by comparing its task performance with a standard machine learning technique and identify its range of validity and limitation. Our results suggest that soft bodies are not only impressive in their deformability and flexibility but can also be potentially used as computational resources on top and for free. PMID:26014748
Concurrent Image Processing Executive (CIPE)

NASA Technical Reports Server (NTRS)

Lee, Meemong; Cooper, Gregory T.; Groom, Steven L.; Mazer, Alan S.; Williams, Winifred I.

1988-01-01

The design and implementation of a Concurrent Image Processing Executive (CIPE), which is intended to become the support system software for a prototype high performance science analysis workstation are discussed. The target machine for this software is a JPL/Caltech Mark IIIfp Hypercube hosted by either a MASSCOMP 5600 or a Sun-3, Sun-4 workstation; however, the design will accommodate other concurrent machines of similar architecture, i.e., local memory, multiple-instruction-multiple-data (MIMD) machines. The CIPE system provides both a multimode user interface and an applications programmer interface, and has been designed around four loosely coupled modules; (1) user interface, (2) host-resident executive, (3) hypercube-resident executive, and (4) application functions. The loose coupling between modules allows modification of a particular module without significantly affecting the other modules in the system. In order to enhance hypercube memory utilization and to allow expansion of image processing capabilities, a specialized program management method, incremental loading, was devised. To minimize data transfer between host and hypercube a data management method which distributes, redistributes, and tracks data set information was implemented.
An integrated runtime and compile-time approach for parallelizing structured and block structured applications

NASA Technical Reports Server (NTRS)

Agrawal, Gagan; Sussman, Alan; Saltz, Joel

1993-01-01

Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid codes) and/or irregularly coupled (called multiblock or irregularly coupled regular mesh problems). A combined runtime and compile-time approach for parallelizing these applications on distributed memory parallel machines in an efficient and machine-independent fashion was described. A runtime library which can be used to port these applications on distributed memory machines was designed and implemented. The library is currently implemented on several different systems. To further ease the task of application programmers, methods were developed for integrating this runtime library with compilers for HPK-like parallel programming languages. How this runtime library was integrated with the Fortran 90D compiler being developed at Syracuse University is discussed. Experimental results to demonstrate the efficacy of our approach are presented. A multiblock Navier-Stokes solver template and a multigrid code were experimented with. Our experimental results show that our primitives have low runtime communication overheads. Further, the compiler parallelized codes perform within 20 percent of the code parallelized by manually inserting calls to the runtime library.
Research in Parallel Algorithms and Software for Computational Aerosciences

NASA Technical Reports Server (NTRS)

Domel, Neal D.

1996-01-01

Phase I is complete for the development of a Computational Fluid Dynamics parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Research in Parallel Algorithms and Software for Computational Aerosciences

NASA Technical Reports Server (NTRS)

Domel, Neal D.

1996-01-01

Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Parallel processing for scientific computations

NASA Technical Reports Server (NTRS)

Alkhatib, Hasan S.

1995-01-01

The scope of this project dealt with the investigation of the requirements to support distributed computing of scientific computations over a cluster of cooperative workstations. Various experiments on computations for the solution of simultaneous linear equations were performed in the early phase of the project to gain experience in the general nature and requirements of scientific applications. A specification of a distributed integrated computing environment, DICE, based on a distributed shared memory communication paradigm has been developed and evaluated. The distributed shared memory model facilitates porting existing parallel algorithms that have been designed for shared memory multiprocessor systems to the new environment. The potential of this new environment is to provide supercomputing capability through the utilization of the aggregate power of workstations cooperating in a cluster interconnected via a local area network. Workstations, generally, do not have the computing power to tackle complex scientific applications, making them primarily useful for visualization, data reduction, and filtering as far as complex scientific applications are concerned. There is a tremendous amount of computing power that is left unused in a network of workstations. Very often a workstation is simply sitting idle on a desk. A set of tools can be developed to take advantage of this potential computing power to create a platform suitable for large scientific computations. The integration of several workstations into a logical cluster of distributed, cooperative, computing stations presents an alternative to shared memory multiprocessor systems. In this project we designed and evaluated such a system.
Machine tool task force

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sutton, G.P.

1980-10-22

The Machine Tool Task Force (MTTF) is a multidisciplined team of international experts, whose mission was to investigate the state of the art of machine tool technology, to identify promising future directions of that technology for both the US government and private industry, and to disseminate the findings of its research in a conference and through the publication of a final report. MTTF was a two and one-half year effort that involved the participation of 122 experts in the specialized technologies of machine tools and in the management of machine tool operations. The scope of the MTTF was limited tomore » cutting-type or material-removal-type machine tools, because they represent the major share and value of all machine tools now installed or being built. The activities of the MTTF and the technical, commercial and economic signifiance of recommended activities for improving machine tool technology are discussed. (LCL)« less
Virtual collaborative environments: programming and controlling robotic devices remotely

NASA Astrophysics Data System (ADS)

Davies, Brady R.; McDonald, Michael J., Jr.; Harrigan, Raymond W.

1995-12-01

This paper describes a technology for remote sharing of intelligent electro-mechanical devices. An architecture and actual system have been developed and tested, based on the proposed National Information Infrastructure (NII) or Information Highway, to facilitate programming and control of intelligent programmable machines (like robots, machine tools, etc.). Using appropriate geometric models, integrated sensors, video systems, and computing hardware; computer controlled resources owned and operated by different (in a geographic sense as well as legal sense) entities can be individually or simultaneously programmed and controlled from one or more remote locations. Remote programming and control of intelligent machines will create significant opportunities for sharing of expensive capital equipment. Using the technology described in this paper, university researchers, manufacturing entities, automation consultants, design entities, and others can directly access robotic and machining facilities located across the country. Disparate electro-mechanical resources will be shared in a manner similar to the way supercomputers are accessed by multiple users. Using this technology, it will be possible for researchers developing new robot control algorithms to validate models and algorithms right from their university labs without ever owning a robot. Manufacturers will be able to model, simulate, and measure the performance of prospective robots before selecting robot hardware optimally suited for their intended application. Designers will be able to access CNC machining centers across the country to fabricate prototypic parts during product design validation. An existing prototype architecture and system has been developed and proven. Programming and control of a large gantry robot located at Sandia National Laboratories in Albuquerque, New Mexico, was demonstrated from such remote locations as Washington D.C., Washington State, and Southern California.

Meeting the memory challenges of brain-scale network simulation.

PubMed

Kunkel, Susanne; Potjans, Tobias C; Eppler, Jochen M; Plesser, Hans Ekkehard; Morrison, Abigail; Diesmann, Markus

2011-01-01

The development of high-performance simulation software is crucial for studying the brain connectome. Using connectome data to generate neurocomputational models requires software capable of coping with models on a variety of scales: from the microscale, investigating plasticity, and dynamics of circuits in local networks, to the macroscale, investigating the interactions between distinct brain regions. Prior to any serious dynamical investigation, the first task of network simulations is to check the consistency of data integrated in the connectome and constrain ranges for yet unknown parameters. Thanks to distributed computing techniques, it is possible today to routinely simulate local cortical networks of around 10(5) neurons with up to 10(9) synapses on clusters and multi-processor shared-memory machines. However, brain-scale networks are orders of magnitude larger than such local networks, in terms of numbers of neurons and synapses as well as in terms of computational load. Such networks have been investigated in individual studies, but the underlying simulation technologies have neither been described in sufficient detail to be reproducible nor made publicly available. Here, we discover that as the network model sizes approach the regime of meso- and macroscale simulations, memory consumption on individual compute nodes becomes a critical bottleneck. This is especially relevant on modern supercomputers such as the Blue Gene/P architecture where the available working memory per CPU core is rather limited. We develop a simple linear model to analyze the memory consumption of the constituent components of neuronal simulators as a function of network size and the number of cores used. This approach has multiple benefits. The model enables identification of key contributing components to memory saturation and prediction of the effects of potential improvements to code before any implementation takes place. As a consequence, development cycles can be shorter and less expensive. Applying the model to our freely available Neural Simulation Tool (NEST), we identify the software components dominant at different scales, and develop general strategies for reducing the memory consumption, in particular by using data structures that exploit the sparseness of the local representation of the network. We show that these adaptations enable our simulation software to scale up to the order of 10,000 processors and beyond. As memory consumption issues are likely to be relevant for any software dealing with complex connectome data on such architectures, our approach and our findings should be useful for researchers developing novel neuroinformatics solutions to the challenges posed by the connectome project.
Operation of a quantum dot in the finite-state machine mode: Single-electron dynamic memory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Klymenko, M. V.; Klein, M.; Levine, R. D.

2016-07-14

A single electron dynamic memory is designed based on the non-equilibrium dynamics of charge states in electrostatically defined metallic quantum dots. Using the orthodox theory for computing the transfer rates and a master equation, we model the dynamical response of devices consisting of a charge sensor coupled to either a single and or a double quantum dot subjected to a pulsed gate voltage. We show that transition rates between charge states in metallic quantum dots are characterized by an asymmetry that can be controlled by the gate voltage. This effect is more pronounced when the switching between charge states correspondsmore » to a Markovian process involving electron transport through a chain of several quantum dots. By simulating the dynamics of electron transport we demonstrate that the quantum box operates as a finite-state machine that can be addressed by choosing suitable shapes and switching rates of the gate pulses. We further show that writing times in the ns range and retention memory times six orders of magnitude longer, in the ms range, can be achieved on the double quantum dot system using experimentally feasible parameters, thereby demonstrating that the device can operate as a dynamic single electron memory.« less
Department of Defense In-House RDT and E Activities: Management Analysis Report for Fiscal Year 1993

DTIC Science & Technology

1994-11-01

A worldwide unique lab because it houses a high - speed modeling and simulation system, a prototype...E Division, San Diego, CA: High Performance Computing Laboratory providing a wide range of advanced computer systems for the scientific investigation...Machines CM-200 and a 256-node Thinking Machines CM-S. The CM-5 is in a very large memory, ( high performance 32 Gbytes, >4 0 OFlop) coafiguration,
Humans and machines in space: The vision, the challenge, the payoff; AAS Goddard Memorial Symposium, 29th, Washington, DC, March 14-15, 1991

NASA Astrophysics Data System (ADS)

Johnson, Bradley; May, Gayle L.; Korn, Paula

A recent symposium produced papers in the areas of solar system exploration, man machine interfaces, cybernetics, virtual reality, telerobotics, life support systems and the scientific and technology spinoff from the NASA space program. A number of papers also addressed the social and economic impacts of the space program. For individual titles, see A95-87468 through A95-87479.
Effects of Aging on True and False Memory Formation: An fMRI Study

ERIC Educational Resources Information Center

Dennis, Nancy A.; Kim, Hongkeun; Cabeza, Roberto

2007-01-01

Compared to young, older adults are more likely to forget events that occurred in the past as well as remember events that never happened. Previous studies examining false memories and aging have shown that these memories are more likely to occur when new items share perceptual or semantic similarities with those presented during encoding. It is…
Ad Hoc Categories and False Memories: Memory Illusions for Categories Created On-The-Spot

ERIC Educational Resources Information Center

Soro, Jerônimo C.; Ferreira, Mário B.; Semin, Gün R.; Mata, André; Carneiro, Paula

2017-01-01

Three experiments were designed to test whether experimentally created ad hoc associative networks evoke false memories. We used the DRM (Deese, Roediger, McDermott) paradigm with lists of ad hoc categories composed of exemplars aggregated toward specific goals (e.g., going for a picnic) that do not share any consistent set of features. Experiment…
Transactive memory in organizational groups: the effects of content, consensus, specialization, and accuracy on group performance.

PubMed

Austin, John R

2003-10-01

Previous research on transactive memory has found a positive relationship between transactive memory system development and group performance in single project laboratory and ad hoc groups. Closely related research on shared mental models and expertise recognition supports these findings. In this study, the author examined the relationship between transactive memory systems and performance in mature, continuing groups. A group's transactive memory system, measured as a combination of knowledge stock, knowledge specialization, transactive memory consensus, and transactive memory accuracy, is positively related to group goal performance, external group evaluations, and internal group evaluations. The positive relationship with group performance was found to hold for both task and external relationship transactive memory systems.
Virtual memory support for distributed computing environments using a shared data object model

NASA Astrophysics Data System (ADS)

Huang, F.; Bacon, J.; Mapp, G.

1995-12-01

Conventional storage management systems provide one interface for accessing memory segments and another for accessing secondary storage objects. This hinders application programming and affects overall system performance due to mandatory data copying and user/kernel boundary crossings, which in the microkernel case may involve context switches. Memory-mapping techniques may be used to provide programmers with a unified view of the storage system. This paper extends such techniques to support a shared data object model for distributed computing environments in which good support for coherence and synchronization is essential. The approach is based on a microkernel, typed memory objects, and integrated coherence control. A microkernel architecture is used to support multiple coherence protocols and the addition of new protocols. Memory objects are typed and applications can choose the most suitable protocols for different types of object to avoid protocol mismatch. Low-level coherence control is integrated with high-level concurrency control so that the number of messages required to maintain memory coherence is reduced and system-wide synchronization is realized without severely impacting the system performance. These features together contribute a novel approach to the support for flexible coherence under application control.
Social Transmission of False Memory in Small Groups and Large Networks.

PubMed

Maswood, Raeya; Rajaram, Suparna

2018-05-21

Sharing information and memories is a key feature of social interactions, making social contexts important for developing and transmitting accurate memories and also false memories. False memory transmission can have wide-ranging effects, including shaping personal memories of individuals as well as collective memories of a network of people. This paper reviews a collection of key findings and explanations in cognitive research on the transmission of false memories in small groups. It also reviews the emerging experimental work on larger networks and collective false memories. Given the reconstructive nature of memory, the abundance of misinformation in everyday life, and the variety of social structures in which people interact, an understanding of transmission of false memories has both scientific and societal implications. © 2018 Cognitive Science Society, Inc.
MULTI: a shared memory approach to cooperative molecular modeling.

PubMed

Darden, T; Johnson, P; Smith, H

1991-03-01

A general purpose molecular modeling system, MULTI, based on the UNIX shared memory and semaphore facilities for interprocess communication is described. In addition to the normal querying or monitoring of geometric data, MULTI also provides processes for manipulating conformations, and for displaying peptide or nucleic acid ribbons, Connolly surfaces, close nonbonded contacts, crystal-symmetry related images, least-squares superpositions, and so forth. This paper outlines the basic techniques used in MULTI to ensure cooperation among these specialized processes, and then describes how they can work together to provide a flexible modeling environment.
SMT-Aware Instantaneous Footprint Optimization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Roy, Probir; Liu, Xu; Song, Shuaiwen

Modern architectures employ simultaneous multithreading (SMT) to increase thread-level parallelism. SMT threads share many functional units and the whole memory hierarchy of a physical core. Without a careful code design, SMT threads can easily contend with each other for these shared resources, causing severe performance degradation. Minimizing SMT thread contention for HPC applications running on dedicated platforms is very challenging, because they usually spawn threads within Single Program Multiple Data (SPMD) models. To address this important issue, we introduce a simple scheme for SMT-aware code optimization, which aims to reduce the memory contention across SMT threads.
A Massively Parallel Code for Polarization Calculations

NASA Astrophysics Data System (ADS)

Akiyama, Shizuka; Höflich, Peter

2001-03-01

We present an implementation of our Monte-Carlo radiation transport method for rapidly expanding, NLTE atmospheres for massively parallel computers which utilizes both the distributed and shared memory models. This allows us to take full advantage of the fast communication and low latency inherent to nodes with multiple CPUs, and to stretch the limits of scalability with the number of nodes compared to a version which is based on the shared memory model. Test calculations on a local 20-node Beowulf cluster with dual CPUs showed an improved scalability by about 40%.
Performance Analysis of Multilevel Parallel Applications on Shared Memory Architectures

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Jin, Haoqiang; Labarta, Jesus; Gimenez, Judit; Caubet, Jordi; Biegel, Bryan A. (Technical Monitor)

2002-01-01

In this paper we describe how to apply powerful performance analysis techniques to understand the behavior of multilevel parallel applications. We use the Paraver/OMPItrace performance analysis system for our study. This system consists of two major components: The OMPItrace dynamic instrumentation mechanism, which allows the tracing of processes and threads and the Paraver graphical user interface for inspection and analyses of the generated traces. We describe how to use the system to conduct a detailed comparative study of a benchmark code implemented in five different programming paradigms applicable for shared memory
Cache-based error recovery for shared memory multiprocessor systems

NASA Technical Reports Server (NTRS)

Wu, Kun-Lung; Fuchs, W. Kent; Patel, Janak H.

1989-01-01

A multiprocessor cache-based checkpointing and recovery scheme for of recovering from transient processor errors in a shared-memory multiprocessor with private caches is presented. New implementation techniques that use checkpoint identifiers and recovery stacks to reduce performance degradation in processor utilization during normal execution are examined. This cache-based checkpointing technique prevents rollback propagation, provides for rapid recovery, and can be integrated into standard cache coherence protocols. An analytical model is used to estimate the relative performance of the scheme during normal execution. Extensions that take error latency into account are presented.
Assessing Advanced Technology in CENATE

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tallent, Nathan R.; Barker, Kevin J.; Gioiosa, Roberto

PNNL's Center for Advanced Technology Evaluation (CENATE) is a new U.S. Department of Energy center whose mission is to assess and facilitate access to emerging computing technology. CENATE is assessing a range of advanced technologies, from evolutionary to disruptive. Technologies of interest include the processor socket (homogeneous and accelerated systems), memories (dynamic, static, memory cubes), motherboards, networks (network interface cards and switches), and input/output and storage devices. CENATE is developing a multi-perspective evaluation process based on integrating advanced system instrumentation, performance measurements, and modeling and simulation. We show evaluations of two emerging network technologies: silicon photonics interconnects and the Datamore » Vortex network. CENATE's evaluation also addresses the question of which machine is best for a given workload under certain constraints. We show a performance-power tradeoff analysis of a well-known machine learning application on two systems.« less
Avoiding and tolerating latency in large-scale next-generation shared-memory multiprocessors

NASA Technical Reports Server (NTRS)

Probst, David K.

1993-01-01

A scalable solution to the memory-latency problem is necessary to prevent the large latencies of synchronization and memory operations inherent in large-scale shared-memory multiprocessors from reducing high performance. We distinguish latency avoidance and latency tolerance. Latency is avoided when data is brought to nearby locales for future reference. Latency is tolerated when references are overlapped with other computation. Latency-avoiding locales include: processor registers, data caches used temporally, and nearby memory modules. Tolerating communication latency requires parallelism, allowing the overlap of communication and computation. Latency-tolerating techniques include: vector pipelining, data caches used spatially, prefetching in various forms, and multithreading in various forms. Relaxing the consistency model permits increased use of avoidance and tolerance techniques. Each model is a mapping from the program text to sets of partial orders on program operations; it is a convention about which temporal precedences among program operations are necessary. Information about temporal locality and parallelism constrains the use of avoidance and tolerance techniques. Suitable architectural primitives and compiler technology are required to exploit the increased freedom to reorder and overlap operations in relaxed models.
Power/Performance Trade-offs of Small Batched LU Based Solvers on GPUs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Villa, Oreste; Fatica, Massimiliano; Gawande, Nitin A.

In this paper we propose and analyze a set of batched linear solvers for small matrices on Graphic Processing Units (GPUs), evaluating the various alternatives depending on the size of the systems to solve. We discuss three different solutions that operate with different level of parallelization and GPU features. The first, exploiting the CUBLAS library, manages matrices of size up to 32x32 and employs Warp level (one matrix, one Warp) parallelism and shared memory. The second works at Thread-block level parallelism (one matrix, one Thread-block), still exploiting shared memory but managing matrices up to 76x76. The third is Thread levelmore » parallel (one matrix, one thread) and can reach sizes up to 128x128, but it does not exploit shared memory and only relies on the high memory bandwidth of the GPU. The first and second solution only support partial pivoting, the third one easily supports partial and full pivoting, making it attractive to problems that require greater numerical stability. We analyze the trade-offs in terms of performance and power consumption as function of the size of the linear systems that are simultaneously solved. We execute the three implementations on a Tesla M2090 (Fermi) and on a Tesla K20 (Kepler).« less
A neuropsychological comparison of obsessive-compulsive disorder and trichotillomania.

PubMed

Chamberlain, Samuel R; Fineberg, Naomi A; Blackwell, Andrew D; Clark, Luke; Robbins, Trevor W; Sahakian, Barbara J

2007-03-02

Obsessive-compulsive disorder (OCD) and trichotillomania (compulsive hair-pulling) share overlapping co-morbidity, familial transmission, and phenomenology. However, the extent to which these disorders share a common cognitive phenotype has yet to be elucidated using patients without confounding co-morbidities. To compare neurocognitive functioning in co-morbidity-free patients with OCD and trichotillomania, focusing on domains of learning and memory, executive function, affective processing, reflection-impulsivity and decision-making. Twenty patients with OCD, 20 patients with trichotillomania, and 20 matched controls undertook neuropsychological assessment after meeting stringent inclusion criteria. Groups were matched for age, education, verbal IQ, and gender. The OCD and trichotillomania groups were impaired on spatial working memory. Only OCD patients showed additional impairments on executive planning and visual pattern recognition memory, and missed more responses to sad target words than other groups on an affective go/no-go task. Furthermore, OCD patients failed to modulate their behaviour between conditions on the reflection-impulsivity test, suggestive of cognitive inflexibility. Both clinical groups showed intact decision-making and probabilistic reversal learning. OCD and trichotillomania shared overlapping spatial working memory problems, but neuropsychological dysfunction in OCD spanned additional domains that were intact in trichotillomania. Findings are discussed in relation to likely fronto-striatal neural substrates and future research directions.
Restoration of fMRI Decodability Does Not Imply Latent Working Memory States

PubMed Central

Schneegans, Sebastian; Bays, Paul M.

2018-01-01

Recent imaging studies have challenged the prevailing view that working memory is mediated by sustained neural activity. Using machine learning methods to reconstruct memory content, these studies found that previously diminished representations can be restored by retrospective cueing or other forms of stimulation. These findings have been interpreted as evidence for an activity-silent working memory state that can be reactivated dependent on task demands. Here, we test the validity of this conclusion by formulating a neural process model of working memory based on sustained activity and using this model to emulate a spatial recall task with retrocueing. The simulation reproduces both behavioral and fMRI results previously taken as evidence for latent states, in particular the restoration of spatial reconstruction quality following an informative cue. Our results demonstrate that recovery of the decodability of an imaging signal does not provide compelling evidence for an activity-silent working memory state. PMID:28820674
A Multi-scale Cognitive Approach to Intrusion Detection and Response

DTIC Science & Technology

2015-12-28

the behavior of the traffic on the network, either by using mathematical formulas or by replaying packet streams. As a result, simulators depend...large scale. Summary of the most important results We obtained a powerful machine, which has 768 cores and 1.25 TB memory . RBG has been...time. Each client is configured with 1GB memory , 10 GB disk space, and one 100M Ethernet interface. The server nodes include web servers

This Is Your Brain: A Decision-Making Machine

DTIC Science & Technology

2015-11-01

brain has vast comput-ing power that performs a plethora of vital tasks. It regu-lates your bodily functions, movements and emotions . It processes and...system beneath the cerebrum and associated with long-term memory and emotions . In our “The brain is a wonderful organ. It starts working when you get...presence of perceived danger. Long-term memories and experiences also are stored here, often along with their emotional connections to pain or
Emerging memories

NASA Astrophysics Data System (ADS)

Baldi, Livio; Bez, Roberto; Sandhu, Gurtej

2014-12-01

Memory is a key component of any data processing system. Following the classical Turing machine approach, memories hold both the data to be processed and the rules for processing them. In the history of microelectronics, the distinction has been rather between working memory, which is exemplified by DRAM, and storage memory, exemplified by NAND. These two types of memory devices now represent 90% of all memory market and 25% of the total semiconductor market, and have been the technology drivers in the last decades. Even if radically different in characteristics, they are however based on the same storage mechanism: charge storage, and this mechanism seems to be near to reaching its physical limits. The search for new alternative memory approaches, based on more scalable mechanisms, has therefore gained new momentum. The status of incumbent memory technologies and their scaling limitations will be discussed. Emerging memory technologies will be analyzed, starting from the ones that are already present for niche applications, and which are getting new attention, thanks to recent technology breakthroughs. Maturity level, physical limitations and potential for scaling will be compared to existing memories. At the end the possible future composition of memory systems will be discussed.
A parallel approximate string matching under Levenshtein distance on graphics processing units using warp-shuffle operations

PubMed Central

Ho, ThienLuan; Oh, Seung-Rohk

2017-01-01

Approximate string matching with k-differences has a number of practical applications, ranging from pattern recognition to computational biology. This paper proposes an efficient memory-access algorithm for parallel approximate string matching with k-differences on Graphics Processing Units (GPUs). In the proposed algorithm, all threads in the same GPUs warp share data using warp-shuffle operation instead of accessing the shared memory. Moreover, we implement the proposed algorithm by exploiting the memory structure of GPUs to optimize its performance. Experiment results for real DNA packages revealed that the performance of the proposed algorithm and its implementation archived up to 122.64 and 1.53 times compared to that of sequential algorithm on CPU and previous parallel approximate string matching algorithm on GPUs, respectively. PMID:29016700
Relative time sharing: new findings and an extension of the resource allocation model of temporal processing.

PubMed

Buhusi, Catalin V; Meck, Warren H

2009-07-12

Individuals time as if using a stopwatch that can be stopped or reset on command. Here, we review behavioural and neurobiological data supporting the time-sharing hypothesis that perceived time depends on the attentional and memory resources allocated to the timing process. Neuroimaging studies in humans suggest that timekeeping tasks engage brain circuits typically involved in attention and working memory. Behavioural, pharmacological, lesion and electrophysiological studies in lower animals support this time-sharing hypothesis. When subjects attend to a second task, or when intruder events are presented, estimated durations are shorter, presumably due to resources being taken away from timing. Here, we extend the time-sharing hypothesis by proposing that resource reallocation is proportional to the perceived contrast, both in temporal and non-temporal features, between intruders and the timed events. New findings support this extension by showing that the effect of an intruder event is dependent on the relative duration of the intruder to the intertrial interval. The conclusion is that the brain circuits engaged by timekeeping comprise not only those primarily involved in time accumulation, but also those involved in the maintenance of attentional and memory resources for timing, and in the monitoring and reallocation of those resources among tasks.
Feasibility of Virtual Machine and Cloud Computing Technologies for High Performance Computing

DTIC Science & Technology

2014-05-01

Hat Enterprise Linux SaaS software as a service VM virtual machine vNUMA virtual non-uniform memory access WRF weather research and forecasting...previously mentioned in Chapter I Section B1 of this paper, which is used to run the weather research and forecasting ( WRF ) model in their experiments...against a VMware virtualization solution of WRF . The experiment consisted of running WRF in a standard configuration between the D-VTM and VMware while
Performance study of a data flow architecture

NASA Technical Reports Server (NTRS)

Adams, George

1985-01-01

Teams of scientists studied data flow concepts, static data flow machine architecture, and the VAL language. Each team mapped its application onto the machine and coded it in VAL. The principal findings of the study were: (1) Five of the seven applications used the full power of the target machine. The galactic simulation and multigrid fluid flow teams found that a significantly smaller version of the machine (16 processing elements) would suffice. (2) A number of machine design parameters including processing element (PE) function unit numbers, array memory size and bandwidth, and routing network capability were found to be crucial for optimal machine performance. (3) The study participants readily acquired VAL programming skills. (4) Participants learned that application-based performance evaluation is a sound method of evaluating new computer architectures, even those that are not fully specified. During the course of the study, participants developed models for using computers to solve numerical problems and for evaluating new architectures. These models form the bases for future evaluation studies.
A Cognitive Machine Learning Algorithm for Cardiac Imaging: A Pilot Study for Differentiating Constrictive Pericarditis from Restrictive Cardiomyopathy

PubMed Central

Sengupta, Partho P.; Huang, Yen-Min; Bansal, Manish; Ashrafi, Ali; Fisher, Matt; Shameer, Khader; Gall, Walt; Dudley, Joel T

2016-01-01

Background Associating a patient’s profile with the memories of prototypical patients built through previous repeat clinical experience is a key process in clinical judgment. We hypothesized that a similar process using a cognitive computing tool would be well suited for learning and recalling multidimensional attributes of speckle tracking echocardiography (STE) data sets derived from patients with known constrictive pericarditis (CP) and restrictive cardiomyopathy (RCM). Methods and Results Clinical and echocardiographic data of 50 patients with CP and 44 with RCM were used for developing an associative memory classifier (AMC) based machine learning algorithm. The STE data was normalized in reference to 47 controls with no structural heart disease, and the diagnostic area under the receiver operating characteristic curve (AUC) of the AMC was evaluated for differentiating CP from RCM. Using only STE variables, AMC achieved a diagnostic AUC of 89·2%, which improved to 96·2% with addition of 4 echocardiographic variables. In comparison, the AUC of early diastolic mitral annular velocity and left ventricular longitudinal strain were 82.1% and 63·7%, respectively. Furthermore, AMC demonstrated greater accuracy and shorter learning curves than other machine learning approaches with accuracy asymptotically approaching 90% after a training fraction of 0·3 and remaining flat at higher training fractions. Conclusions This study demonstrates feasibility of a cognitive machine learning approach for learning and recalling patterns observed during echocardiographic evaluations. Incorporation of machine learning algorithms in cardiac imaging may aid standardized assessments and support the quality of interpretations, particularly for novice readers with limited experience. PMID:27266599
Network Information Management Subsystem

NASA Technical Reports Server (NTRS)

Chatburn, C. C.

1985-01-01

The Deep Space Network is implementing a distributed data base management system in which the data are shared among several applications and the host machines are not totally dedicated to a particular application. Since the data and resources are to be shared, the equipment must be operated carefully so that the resources are shared equitably. The current status of the project is discussed and policies, roles, and guidelines are recommended for the organizations involved in the project.
Design and implementation of laser target simulator in hardware-in-the-loop simulation system based on LabWindows/CVI and RTX

NASA Astrophysics Data System (ADS)

Tong, Qiujie; Wang, Qianqian; Li, Xiaoyang; Shan, Bin; Cui, Xuntai; Li, Chenyu; Peng, Zhong

2016-11-01

In order to satisfy the requirements of the real-time and generality, a laser target simulator in semi-physical simulation system based on RTX+LabWindows/CVI platform is proposed in this paper. Compared with the upper-lower computers simulation platform architecture used in the most of the real-time system now, this system has better maintainability and portability. This system runs on the Windows platform, using Windows RTX real-time extension subsystem to ensure the real-time performance of the system combining with the reflective memory network to complete some real-time tasks such as calculating the simulation model, transmitting the simulation data, and keeping real-time communication. The real-time tasks of simulation system run under the RTSS process. At the same time, we use the LabWindows/CVI to compile a graphical interface, and complete some non-real-time tasks in the process of simulation such as man-machine interaction, display and storage of the simulation data, which run under the Win32 process. Through the design of RTX shared memory and task scheduling algorithm, the data interaction between the real-time tasks process of RTSS and non-real-time tasks process of Win32 is completed. The experimental results show that this system has the strongly real-time performance, highly stability, and highly simulation accuracy. At the same time, it also has the good performance of human-computer interaction.
High-performance sparse matrix-matrix products on Intel KNL and multicore architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nagasaka, Y; Matsuoka, S; Azad, A

Sparse matrix-matrix multiplication (SpGEMM) is a computational primitive that is widely used in areas ranging from traditional numerical applications to recent big data analysis and machine learning. Although many SpGEMM algorithms have been proposed, hardware specific optimizations for multi- and many-core processors are lacking and a detailed analysis of their performance under various use cases and matrices is not available. We firstly identify and mitigate multiple bottlenecks with memory management and thread scheduling on Intel Xeon Phi (Knights Landing or KNL). Specifically targeting multi- and many-core processors, we develop a hash-table-based algorithm and optimize a heap-based shared-memory SpGEMM algorithm. Wemore » examine their performance together with other publicly available codes. Different from the literature, our evaluation also includes use cases that are representative of real graph algorithms, such as multi-source breadth-first search or triangle counting. Our hash-table and heap-based algorithms are showing significant speedups from libraries in the majority of the cases while different algorithms dominate the other scenarios with different matrix size, sparsity, compression factor and operation type. We wrap up in-depth evaluation results and make a recipe to give the best SpGEMM algorithm for target scenario. A critical finding is that hash-table-based SpGEMM gets a significant performance boost if the nonzeros are not required to be sorted within each row of the output matrix.« less
Experiences and results multitasking a hydrodynamics code on global and local memory machines

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mandell, D.

1987-01-01

A one-dimensional, time-dependent Lagrangian hydrodynamics code using a Godunov solution method has been multitasked for the Cray X-MP/48, the Intel iPSC hypercube, the Alliant FX series and the IBM RP3 computers. Actual multitasking results have been obtained for the Cray, Intel and Alliant computers and simulated results were obtained for the Cray and RP3 machines. The differences in the methods required to multitask on each of the machines is discussed. Results are presented for a sample problem involving a shock wave moving down a channel. Comparisons are made between theoretical speedups, predicted by Amdahl's law, and the actual speedups obtained.more » The problems of debugging on the different machines are also described.« less
A performance study of sparse Cholesky factorization on INTEL iPSC/860

NASA Technical Reports Server (NTRS)

Zubair, M.; Ghose, M.

1992-01-01

The problem of Cholesky factorization of a sparse matrix has been very well investigated on sequential machines. A number of efficient codes exist for factorizing large unstructured sparse matrices. However, there is a lack of such efficient codes on parallel machines in general, and distributed machines in particular. Some of the issues that are critical to the implementation of sparse Cholesky factorization on a distributed memory parallel machine are ordering, partitioning and mapping, load balancing, and ordering of various tasks within a processor. Here, we focus on the effect of various partitioning schemes on the performance of sparse Cholesky factorization on the Intel iPSC/860. Also, a new partitioning heuristic for structured as well as unstructured sparse matrices is proposed, and its performance is compared with other schemes.
Syringe vending machines for injection drug users: an experiment in Marseille, France.

PubMed Central

Obadia, Y; Feroni, I; Perrin, V; Vlahov, D; Moatti, J P

1999-01-01

OBJECTIVES: This study evaluated the usefulness of vending machines in providing injection drug users with access to sterile syringes in Marseille, France. METHODS: Self-administered questionnaires were offered to 485 injection drug users obtaining syringes from 32 pharmacies, 4 needle exchange programs, and 3 vending machines. RESULTS: Of the 343 respondents (response rate = 70.7%), 21.3% used the vending machines as their primary source of syringes. Primary users of vending machines were more likely than primary users of other sources to be younger than 30 years, to report no history of drug maintenance treatment, and to report no sharing of needles or injection paraphernalia. CONCLUSIONS: Vending machines may be an appropriate strategy for providing access to syringes for younger injection drug users, who have typically avoided needle exchange programs and pharmacies. PMID:10589315
An experimental distributed microprocessor implementation with a shared memory communications and control medium

NASA Technical Reports Server (NTRS)

Mejzak, R. S.

1980-01-01

The distributed processing concept is defined in terms of control primitives, variables, and structures and their use in performing a decomposed discrete Fourier transform (DET) application function. The design assumes interprocessor communications to be anonymous. In this scheme, all processors can access an entire common database by employing control primitives. Access to selected areas within the common database is random, enforced by a hardware lock, and determined by task and subtask pointers. This enables the number of processors to be varied in the configuration without any modifications to the control structure. Decompositional elements of the DFT application function in terms of tasks and subtasks are also described. The experimental hardware configuration consists of IMSAI 8080 chassis which are independent, 8 bit microcomputer units. These chassis are linked together to form a multiple processing system by means of a shared memory facility. This facility consists of hardware which provides a bus structure to enable up to six microcomputers to be interconnected. It provides polling and arbitration logic so that only one processor has access to shared memory at any one time.
Running ATLAS workloads within massively parallel distributed applications using Athena Multi-Process framework (AthenaMP)

NASA Astrophysics Data System (ADS)

Calafiura, Paolo; Leggett, Charles; Seuster, Rolf; Tsulaia, Vakhtang; Van Gemmeren, Peter

2015-12-01

AthenaMP is a multi-process version of the ATLAS reconstruction, simulation and data analysis framework Athena. By leveraging Linux fork and copy-on-write mechanisms, it allows for sharing of memory pages between event processors running on the same compute node with little to no change in the application code. Originally targeted to optimize the memory footprint of reconstruction jobs, AthenaMP has demonstrated that it can reduce the memory usage of certain configurations of ATLAS production jobs by a factor of 2. AthenaMP has also evolved to become the parallel event-processing core of the recently developed ATLAS infrastructure for fine-grained event processing (Event Service) which allows the running of AthenaMP inside massively parallel distributed applications on hundreds of compute nodes simultaneously. We present the architecture of AthenaMP, various strategies implemented by AthenaMP for scheduling workload to worker processes (for example: Shared Event Queue and Shared Distributor of Event Tokens) and the usage of AthenaMP in the diversity of ATLAS event processing workloads on various computing resources: Grid, opportunistic resources and HPC.
Resource Sharing in Montana: A Study of Interlibrary Loan and Alternatives for a Montana Union Catalog.

ERIC Educational Resources Information Center

Matthews, Joseph R.

This study recommends a variety of actions to create and maintain a Montana union catalog (MONCAT) for more effective usage of in-state resources and library funds. Specifically, it advocates (1) merger of existing COM, machine readable bibliographic records, and OCLC tapes into a single microform catalog; (2) acceptance of only machine readable…
The Robes of Academe

ERIC Educational Resources Information Center

Westerfield, Nancy G.

2007-01-01

In this article, the author shares her experiences as a wife of a low-paid faculty member and the role of her Necchi sewing machine in her married life. She relates how she and her husband had been dirt-poor for the first early years of their marriage. She discusses how the Necchi sewing machine she got as a gift from her mother helped her to earn…
Optimized Infrastructure for the Earth System Prediction Capability

DTIC Science & Technology

2013-09-30

for referencing memory between its native coupling datatype (MCT Attribute Vectors) and ESMF Arrays. This will reduce the copies required and will...introduced ability within CESM to share memory between ESMF and MCT datatypes makes using both tools together much easier. Using both is appealing
A Parallel Vector Machine for the PM Programming Language

NASA Astrophysics Data System (ADS)

Bellerby, Tim

2016-04-01

PM is a new programming language which aims to make the writing of computational geoscience models on parallel hardware accessible to scientists who are not themselves expert parallel programmers. It is based around the concept of communicating operators: language constructs that enable variables local to a single invocation of a parallelised loop to be viewed as if they were arrays spanning the entire loop domain. This mechanism enables different loop invocations (which may or may not be executing on different processors) to exchange information in a manner that extends the successful Communicating Sequential Processes idiom from single messages to collective communication. Communicating operators avoid the additional synchronisation mechanisms, such as atomic variables, required when programming using the Partitioned Global Address Space (PGAS) paradigm. Using a single loop invocation as the fundamental unit of concurrency enables PM to uniformly represent different levels of parallelism from vector operations through shared memory systems to distributed grids. This paper describes an implementation of PM based on a vectorised virtual machine. On a single processor node, concurrent operations are implemented using masked vector operations. Virtual machine instructions operate on vectors of values and may be unmasked, masked using a Boolean field, or masked using an array of active vector cell locations. Conditional structures (such as if-then-else or while statement implementations) calculate and apply masks to the operations they control. A shift in mask representation from Boolean to location-list occurs when active locations become sufficiently sparse. Parallel loops unfold data structures (or vectors of data structures for nested loops) into vectors of values that may additionally be distributed over multiple computational nodes and then split into micro-threads compatible with the size of the local cache. Inter-node communication is accomplished using standard OpenMP and MPI. Performance analyses of the PM vector machine, demonstrating its scaling properties with respect to domain size and the number of processor nodes will be presented for a range of hardware configurations. The PM software and language definition are being made available under unrestrictive MIT and Creative Commons Attribution licenses respectively: www.pm-lang.org.
Infectious Cognition: Risk Perception Affects Socially Shared Retrieval-Induced Forgetting of Medical Information.

PubMed

Coman, Alin; Berry, Jessica N

2015-12-01

When speakers selectively retrieve previously learned information, listeners often concurrently, and covertly, retrieve their memories of that information. This concurrent retrieval typically enhances memory for mentioned information (the rehearsal effect) and impairs memory for unmentioned but related information (socially shared retrieval-induced forgetting, SSRIF), relative to memory for unmentioned and unrelated information. Building on research showing that anxiety leads to increased attention to threat-relevant information, we explored whether concurrent retrieval is facilitated in high-anxiety real-world contexts. Participants first learned category-exemplar facts about meningococcal disease. Following a manipulation of perceived risk of infection (low vs. high risk), they listened to a mock radio show in which some of the facts were selectively practiced. Final recall tests showed that the rehearsal effect was equivalent between the two risk conditions, but SSRIF was significantly larger in the high-risk than in the low-risk condition. Thus, the tendency to exaggerate consequences of news events was found to have deleterious consequences. © The Author(s) 2015.

Fencing direct memory access data transfers in a parallel active messaging interface of a parallel computer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Blocksome, Michael A.; Mamidala, Amith R.

2013-09-03

Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to segments of shared random access memory through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and a segmentmore » of shared memory; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.« less
Communication Studies of DMP and SMP Machines

NASA Technical Reports Server (NTRS)

Sohn, Andrew; Biswas, Rupak; Chancellor, Marisa K. (Technical Monitor)

1997-01-01

Understanding the interplay between machines and problems is key to obtaining high performance on parallel machines. This paper investigates the interplay between programming paradigms and communication capabilities of parallel machines. In particular, we explicate the communication capabilities of the IBM SP-2 distributed-memory multiprocessor and the SGI PowerCHALLENGEarray symmetric multiprocessor. Two benchmark problems of bitonic sorting and Fast Fourier Transform are selected for experiments. Communication-efficient algorithms are developed to exploit the overlapping capabilities of the machines. Programs are written in Message-Passing Interface for portability and identical codes are used for both machines. Various data sizes and message sizes are used to test the machines' communication capabilities. Experimental results indicate that the communication performance of the multiprocessors are consistent with the size of messages. The SP-2 is sensitive to message size but yields a much higher communication overlapping because of the communication co-processor. The PowerCHALLENGEarray is not highly sensitive to message size and yields a low communication overlapping. Bitonic sorting yields lower performance compared to FFT due to a smaller computation-to-communication ratio.
New computing systems and their impact on structural analysis and design

NASA Technical Reports Server (NTRS)

Noor, Ahmed K.

1989-01-01

A review is given of the recent advances in computer technology that are likely to impact structural analysis and design. The computational needs for future structures technology are described. The characteristics of new and projected computing systems are summarized. Advances in programming environments, numerical algorithms, and computational strategies for new computing systems are reviewed, and a novel partitioning strategy is outlined for maximizing the degree of parallelism. The strategy is designed for computers with a shared memory and a small number of powerful processors (or a small number of clusters of medium-range processors). It is based on approximating the response of the structure by a combination of symmetric and antisymmetric response vectors, each obtained using a fraction of the degrees of freedom of the original finite element model. The strategy was implemented on the CRAY X-MP/4 and the Alliant FX/8 computers. For nonlinear dynamic problems on the CRAY X-MP with four CPUs, it resulted in an order of magnitude reduction in total analysis time, compared with the direct analysis on a single-CPU CRAY X-MP machine.
Gyrokinetic micro-turbulence simulations on the NERSC 16-way SMP IBM SP computer: experiences and performance results

NASA Astrophysics Data System (ADS)

Ethier, Stephane; Lin, Zhihong

2001-10-01

Earlier this year, the National Energy Research Scientific Computing center (NERSC) took delivery of the second most powerful computer in the world. With its 2,528 processors running at a peak performance of 1.5 GFlops, this IBM SP machine has a theoretical performance of almost 3.8 TFlops. To efficiently harness such computing power in one single code is not an easy task and requires a good knowledge of the computer's architecture. Here we present the steps that we followed to improve our gyrokinetic micro-turbulence code GTC in order to take advantage of the new 16-way shared memory nodes of the NERSC IBM SP. Performance results are shown as well as details about the improved mixed-mode MPI-OpenMP model that we use. The enhancements to the code allowed us to tackle much bigger problem sizes, getting closer to our goal of simulating an ITER-size tokamak with both kinetic ions and electrons.(This work is supported by DOE Contract No. DE-AC02-76CH03073 (PPPL), and in part by the DOE Fusion SciDAC Project.)
Suppressing my memories by listening to yours: The effect of socially triggered context-based prediction error on memory.

PubMed

Vlasceanu, Madalina; Drach, Rae; Coman, Alin

2018-05-03

The mind is a prediction machine. In most situations, it has expectations as to what might happen. But when predictions are invalidated by experience (i.e., prediction errors), the memories that generate these predictions are suppressed. Here, we explore the effect of prediction error on listeners' memories following social interaction. We find that listening to a speaker recounting experiences similar to one's own triggers prediction errors on the part of the listener that lead to the suppression of her memories. This effect, we show, is sensitive to a perspective-taking manipulation, such that individuals who are instructed to take the perspective of the speaker experience memory suppression, whereas individuals who undergo a low-perspective-taking manipulation fail to show a mnemonic suppression effect. We discuss the relevance of these findings for our understanding of the bidirectional influences between cognition and social contexts, as well as for the real-world situations that involve memory-based predictions.
The Work's Not Over- Roll Up Your Sleeves and Make a Difference!

NASA Astrophysics Data System (ADS)

Sarquis, Mickey

1997-01-01

As my 17-year tenure as the first editor of the Secondary School Chemistry Section draws to a close, John Moore has invited me to share some reflections on my experiences. It's hard for me to believe that this many years have passed; in some ways, it seems like only yesterday that I took on this position. Looking back over my term as Section editor recalls wonderful memories, but it also stimulates me to seek out and take on new challenges as I move into a new phase of involvement in chemical education. In response to John's kind invitation, I'd like to share some of these memories and ideas with you who share my vision of quality chemical education, particularly at the secondary level.
Arousal-biased competition in perception and memory

PubMed Central

Mather, Mara; Sutherland, Matthew R.

2010-01-01

Our everyday surroundings besiege us with information. The battle is for a share of our limited attention and memory, with the brain selecting the winners and discarding the losers. Previous research shows that both bottom-up and top-down factors bias competition in favor of high priority stimuli. We propose that arousal during an event increases this bias both in perception and in long-term memory of the event. Arousal-biased competition theory provides specific predictions about when arousal will enhance and when it will impair memory for events, accounting for some puzzling contradictions in the emotional memory literature. PMID:21660127
Efficacy of Code Optimization on Cache-based Processors

NASA Technical Reports Server (NTRS)

VanderWijngaart, Rob F.; Chancellor, Marisa K. (Technical Monitor)

1997-01-01

The current common wisdom in the U.S. is that the powerful, cost-effective supercomputers of tomorrow will be based on commodity (RISC) micro-processors with cache memories. Already, most distributed systems in the world use such hardware as building blocks. This shift away from vector supercomputers and towards cache-based systems has brought about a change in programming paradigm, even when ignoring issues of parallelism. Vector machines require inner-loop independence and regular, non-pathological memory strides (usually this means: non-power-of-two strides) to allow efficient vectorization of array operations. Cache-based systems require spatial and temporal locality of data, so that data once read from main memory and stored in high-speed cache memory is used optimally before being written back to main memory. This means that the most cache-friendly array operations are those that feature zero or unit stride, so that each unit of data read from main memory (a cache line) contains information for the next iteration in the loop. Moreover, loops ought to be 'fat', meaning that as many operations as possible are performed on cache data-provided instruction caches do not overflow and enough registers are available. If unit stride is not possible, for example because of some data dependency, then care must be taken to avoid pathological strides, just ads on vector computers. For cache-based systems the issues are more complex, due to the effects of associativity and of non-unit block (cache line) size. But there is more to the story. Most modern micro-processors are superscalar, which means that they can issue several (arithmetic) instructions per clock cycle, provided that there are enough independent instructions in the loop body. This is another argument for providing fat loop bodies. With these restrictions, it appears fairly straightforward to produce code that will run efficiently on any cache-based system. It can be argued that although some of the important computational algorithms employed at NASA Ames require different programming styles on vector machines and cache-based machines, respectively, neither architecture class appeared to be favored by particular algorithms in principle. Practice tells us that the situation is more complicated. This report presents observations and some analysis of performance tuning for cache-based systems. We point out several counterintuitive results that serve as a cautionary reminder that memory accesses are not the only factors that determine performance, and that within the class of cache-based systems, significant differences exist.
Glucocorticoids in the prefrontal cortex enhance memory consolidation and impair working memory by a common neural mechanism

PubMed Central

Barsegyan, Areg; Mackenzie, Scott M.; Kurose, Brian D.; McGaugh, James L.; Roozendaal, Benno

2010-01-01

It is well established that acute administration of adrenocortical hormones enhances the consolidation of memories of emotional experiences and, concurrently, impairs working memory. These different glucocorticoid effects on these two memory functions have generally been considered to be independently regulated processes. Here we report that a glucocorticoid receptor agonist administered into the medial prefrontal cortex (mPFC) of male Sprague-Dawley rats both enhances memory consolidation and impairs working memory. Both memory effects are mediated by activation of a membrane-bound steroid receptor and depend on noradrenergic activity within the mPFC to increase levels of cAMP-dependent protein kinase. These findings provide direct evidence that glucocorticoid effects on both memory consolidation and working memory share a common neural influence within the mPFC. PMID:20810923
Benefits and Costs of Context Reinstatement in Episodic Memory: An ERP Study.

PubMed

Bramão, Inês; Johansson, Mikael

2017-01-01

This study investigated context-dependent episodic memory retrieval. An influential idea in the memory literature is that performance benefits when the retrieval context overlaps with the original encoding context. However, such memory facilitation may not be driven by the encoding-retrieval overlap per se but by the presence of diagnostic features in the reinstated context that discriminate the target episode from competing episodes. To test this prediction, the encoding-retrieval overlap and the diagnostic value of the context were manipulated in a novel associative recognition memory task. Participants were asked to memorize word pairs presented together with diagnostic (unique) and nondiagnostic (shared) background scenes. At test, participants recognized the word pairs in the presence and absence of the previously encoded contexts. Behavioral data show facilitated memory performance in the presence of the original context but, importantly, only when the context was diagnostic of the target episode. The electrophysiological data reveal an early anterior ERP encoding-retrieval overlap effect that tracks the cost associated with having nondiagnostic contexts present at retrieval, that is, shared by multiple previous episodes, and a later posterior encoding-retrieval overlap effect that reflects facilitated access to the target episode during retrieval in diagnostic contexts. Taken together, our results underscore the importance of the diagnostic value of the context and suggest that context-dependent episodic memory effects are multiple determined.
Run-time scheduling and execution of loops on message passing machines

NASA Technical Reports Server (NTRS)

Crowley, Kay; Saltz, Joel; Mirchandaney, Ravi; Berryman, Harry

1989-01-01

Sparse system solvers and general purpose codes for solving partial differential equations are examples of the many types of problems whose irregularity can result in poor performance on distributed memory machines. Often, the data structures used in these problems are very flexible. Crucial details concerning loop dependences are encoded in these structures rather than being explicitly represented in the program. Good methods for parallelizing and partitioning these types of problems require assignment of computations in rather arbitrary ways. Naive implementations of programs on distributed memory machines requiring general loop partitions can be extremely inefficient. Instead, the scheduling mechanism needs to capture the data reference patterns of the loops in order to partition the problem. First, the indices assigned to each processor must be locally numbered. Next, it is necessary to precompute what information is needed by each processor at various points in the computation. The precomputed information is then used to generate an execution template designed to carry out the computation, communication, and partitioning of data, in an optimized manner. The design is presented for a general preprocessor and schedule executer, the structures of which do not vary, even though the details of the computation and of the type of information are problem dependent.
Run-time scheduling and execution of loops on message passing machines

NASA Technical Reports Server (NTRS)

Saltz, Joel; Crowley, Kathleen; Mirchandaney, Ravi; Berryman, Harry

1990-01-01

Sparse system solvers and general purpose codes for solving partial differential equations are examples of the many types of problems whose irregularity can result in poor performance on distributed memory machines. Often, the data structures used in these problems are very flexible. Crucial details concerning loop dependences are encoded in these structures rather than being explicitly represented in the program. Good methods for parallelizing and partitioning these types of problems require assignment of computations in rather arbitrary ways. Naive implementations of programs on distributed memory machines requiring general loop partitions can be extremely inefficient. Instead, the scheduling mechanism needs to capture the data reference patterns of the loops in order to partition the problem. First, the indices assigned to each processor must be locally numbered. Next, it is necessary to precompute what information is needed by each processor at various points in the computation. The precomputed information is then used to generate an execution template designed to carry out the computation, communication, and partitioning of data, in an optimized manner. The design is presented for a general preprocessor and schedule executer, the structures of which do not vary, even though the details of the computation and of the type of information are problem dependent.
Computation of emotions in man and machines.

PubMed

Robinson, Peter; el Kaliouby, Rana

2009-12-12

The importance of emotional expression as part of human communication has been understood since Aristotle, and the subject has been explored scientifically since Charles Darwin and others in the nineteenth century. Advances in computer technology now allow machines to recognize and express emotions, paving the way for improved human-computer and human-human communications. Recent advances in psychology have greatly improved our understanding of the role of affect in communication, perception, decision-making, attention and memory. At the same time, advances in technology mean that it is becoming possible for machines to sense, analyse and express emotions. We can now consider how these advances relate to each other and how they can be brought together to influence future research in perception, attention, learning, memory, communication, decision-making and other applications. The computation of emotions includes both recognition and synthesis, using channels such as facial expressions, non-verbal aspects of speech, posture, gestures, physiology, brain imaging and general behaviour. The combination of new results in psychology with new techniques of computation is leading to new technologies with applications in commerce, education, entertainment, security, therapy and everyday life. However, there are important issues of privacy and personal expression that must also be considered.
Computation of emotions in man and machines

PubMed Central

Robinson, Peter; el Kaliouby, Rana

2009-01-01

The importance of emotional expression as part of human communication has been understood since Aristotle, and the subject has been explored scientifically since Charles Darwin and others in the nineteenth century. Advances in computer technology now allow machines to recognize and express emotions, paving the way for improved human–computer and human–human communications. Recent advances in psychology have greatly improved our understanding of the role of affect in communication, perception, decision-making, attention and memory. At the same time, advances in technology mean that it is becoming possible for machines to sense, analyse and express emotions. We can now consider how these advances relate to each other and how they can be brought together to influence future research in perception, attention, learning, memory, communication, decision-making and other applications. The computation of emotions includes both recognition and synthesis, using channels such as facial expressions, non-verbal aspects of speech, posture, gestures, physiology, brain imaging and general behaviour. The combination of new results in psychology with new techniques of computation is leading to new technologies with applications in commerce, education, entertainment, security, therapy and everyday life. However, there are important issues of privacy and personal expression that must also be considered. PMID:19884138
Efficient Machine Learning Approach for Optimizing Scientific Computing Applications on Emerging HPC Architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Arumugam, Kamesh

Efficient parallel implementations of scientific applications on multi-core CPUs with accelerators such as GPUs and Xeon Phis is challenging. This requires - exploiting the data parallel architecture of the accelerator along with the vector pipelines of modern x86 CPU architectures, load balancing, and efficient memory transfer between different devices. It is relatively easy to meet these requirements for highly structured scientific applications. In contrast, a number of scientific and engineering applications are unstructured. Getting performance on accelerators for these applications is extremely challenging because many of these applications employ irregular algorithms which exhibit data-dependent control-ow and irregular memory accesses. Furthermore,more » these applications are often iterative with dependency between steps, and thus making it hard to parallelize across steps. As a result, parallelism in these applications is often limited to a single step. Numerical simulation of charged particles beam dynamics is one such application where the distribution of work and memory access pattern at each time step is irregular. Applications with these properties tend to present significant branch and memory divergence, load imbalance between different processor cores, and poor compute and memory utilization. Prior research on parallelizing such irregular applications have been focused around optimizing the irregular, data-dependent memory accesses and control-ow during a single step of the application independent of the other steps, with the assumption that these patterns are completely unpredictable. We observed that the structure of computation leading to control-ow divergence and irregular memory accesses in one step is similar to that in the next step. It is possible to predict this structure in the current step by observing the computation structure of previous steps. In this dissertation, we present novel machine learning based optimization techniques to address the parallel implementation challenges of such irregular applications on different HPC architectures. In particular, we use supervised learning to predict the computation structure and use it to address the control-ow and memory access irregularities in the parallel implementation of such applications on GPUs, Xeon Phis, and heterogeneous architectures composed of multi-core CPUs with GPUs or Xeon Phis. We use numerical simulation of charged particles beam dynamics simulation as a motivating example throughout the dissertation to present our new approach, though they should be equally applicable to a wide range of irregular applications. The machine learning approach presented here use predictive analytics and forecasting techniques to adaptively model and track the irregular memory access pattern at each time step of the simulation to anticipate the future memory access pattern. Access pattern forecasts can then be used to formulate optimization decisions during application execution which improves the performance of the application at a future time step based on the observations from earlier time steps. In heterogeneous architectures, forecasts can also be used to improve the memory performance and resource utilization of all the processing units to deliver a good aggregate performance. We used these optimization techniques and anticipation strategy to design a cache-aware, memory efficient parallel algorithm to address the irregularities in the parallel implementation of charged particles beam dynamics simulation on different HPC architectures. Experimental result using a diverse mix of HPC architectures shows that our approach in using anticipation strategy is effective in maximizing data reuse, ensuring workload balance, minimizing branch and memory divergence, and in improving resource utilization.« less
VOTable JAVA Streaming Writer and Applications.

NASA Astrophysics Data System (ADS)

Kulkarni, P.; Kembhavi, A.; Kale, S.

2004-07-01

Virtual Observatory related tools use a new standard for data transfer called the VOTable format. This is a variant of the xml format that enables easy transfer of data over the web. We describe a streaming interface that can bridge the VOTable format, through a user friendly graphical interface, with the FITS and ASCII formats, which are commonly used by astronomers. A streaming interface is important for efficient use of memory because of the large size of catalogues. The tools are developed in JAVA to provide a platform independent interface. We have also developed a stand-alone version that can be used to convert data stored in ASCII or FITS format on a local machine. The Streaming writer is successfully being used in VOPlot (See Kale et al 2004 for a description of VOPlot).We present the test results of converting huge FITS and ASCII data into the VOTable format on machines that have only limited memory.
Toward Millions of File System IOPS on Low-Cost, Commodity Hardware

PubMed Central

Zheng, Da; Burns, Randal; Szalay, Alexander S.

2013-01-01

We describe a storage system that removes I/O bottlenecks to achieve more than one million IOPS based on a user-space file abstraction for arrays of commodity SSDs. The file abstraction refactors I/O scheduling and placement for extreme parallelism and non-uniform memory and I/O. The system includes a set-associative, parallel page cache in the user space. We redesign page caching to eliminate CPU overhead and lock-contention in non-uniform memory architecture machines. We evaluate our design on a 32 core NUMA machine with four, eight-core processors. Experiments show that our design delivers 1.23 million 512-byte read IOPS. The page cache realizes the scalable IOPS of Linux asynchronous I/O (AIO) and increases user-perceived I/O performance linearly with cache hit rates. The parallel, set-associative cache matches the cache hit rates of the global Linux page cache under real workloads. PMID:24402052
Toward Millions of File System IOPS on Low-Cost, Commodity Hardware.

PubMed

Zheng, Da; Burns, Randal; Szalay, Alexander S

2013-01-01

We describe a storage system that removes I/O bottlenecks to achieve more than one million IOPS based on a user-space file abstraction for arrays of commodity SSDs. The file abstraction refactors I/O scheduling and placement for extreme parallelism and non-uniform memory and I/O. The system includes a set-associative, parallel page cache in the user space. We redesign page caching to eliminate CPU overhead and lock-contention in non-uniform memory architecture machines. We evaluate our design on a 32 core NUMA machine with four, eight-core processors. Experiments show that our design delivers 1.23 million 512-byte read IOPS. The page cache realizes the scalable IOPS of Linux asynchronous I/O (AIO) and increases user-perceived I/O performance linearly with cache hit rates. The parallel, set-associative cache matches the cache hit rates of the global Linux page cache under real workloads.
Contributions of Medial Temporal Lobe and Striatal Memory Systems to Learning and Retrieving Overlapping Spatial Memories

PubMed Central

Brown, Thackery I.; Stern, Chantal E.

2014-01-01

Many life experiences share information with other memories. In order to make decisions based on overlapping memories, we need to distinguish between experiences to determine the appropriate behavior for the current situation. Previous work suggests that the medial temporal lobe (MTL) and medial caudate interact to support the retrieval of overlapping navigational memories in different contexts. The present study used functional magnetic resonance imaging (fMRI) in humans to test the prediction that the MTL and medial caudate play complementary roles in learning novel mazes that cross paths with, and must be distinguished from, previously learned routes. During fMRI scanning, participants navigated virtual routes that were well learned from prior training while also learning new mazes. Critically, some routes learned during scanning shared hallways with those learned during pre-scan training. Overlap between mazes required participants to use contextual cues to select between alternative behaviors. Results demonstrated parahippocampal cortex activity specific for novel spatial cues that distinguish between overlapping routes. The hippocampus and medial caudate were active for learning overlapping spatial memories, and increased their activity for previously learned routes when they became context dependent. Our findings provide novel evidence that the MTL and medial caudate play complementary roles in the learning, updating, and execution of context-dependent navigational behaviors. PMID:23448868
Symbiosis of executive and selective attention in working memory

PubMed Central

Vandierendonck, André

2014-01-01

The notion of working memory (WM) was introduced to account for the usage of short-term memory resources by other cognitive tasks such as reasoning, mental arithmetic, language comprehension, and many others. This collaboration between memory and other cognitive tasks can only be achieved by a dedicated WM system that controls task coordination. To that end, WM models include executive control. Nevertheless, other attention control systems may be involved in coordination of memory and cognitive tasks calling on memory resources. The present paper briefly reviews the evidence concerning the role of selective attention in WM activities. A model is proposed in which selective attention control is directly linked to the executive control part of the WM system. The model assumes that apart from storage of declarative information, the system also includes an executive WM module that represents the current task set. Control processes are automatically triggered when particular conditions in these modules are met. As each task set represents the parameter settings and the actions needed to achieve the task goal, it will depend on the specific settings and actions whether selective attention control will have to be shared among the active tasks. Only when such sharing is required, task performance will be affected by the capacity limits of the control system involved. PMID:25152723

Symbiosis of executive and selective attention in working memory.

PubMed

Vandierendonck, André

2014-01-01

The notion of working memory (WM) was introduced to account for the usage of short-term memory resources by other cognitive tasks such as reasoning, mental arithmetic, language comprehension, and many others. This collaboration between memory and other cognitive tasks can only be achieved by a dedicated WM system that controls task coordination. To that end, WM models include executive control. Nevertheless, other attention control systems may be involved in coordination of memory and cognitive tasks calling on memory resources. The present paper briefly reviews the evidence concerning the role of selective attention in WM activities. A model is proposed in which selective attention control is directly linked to the executive control part of the WM system. The model assumes that apart from storage of declarative information, the system also includes an executive WM module that represents the current task set. Control processes are automatically triggered when particular conditions in these modules are met. As each task set represents the parameter settings and the actions needed to achieve the task goal, it will depend on the specific settings and actions whether selective attention control will have to be shared among the active tasks. Only when such sharing is required, task performance will be affected by the capacity limits of the control system involved.
Recursive computer architecture for VLSI

DOE Office of Scientific and Technical Information (OSTI.GOV)

Treleaven, P.C.; Hopkins, R.P.

1982-01-01

A general-purpose computer architecture based on the concept of recursion and suitable for VLSI computer systems built from replicated (lego-like) computing elements is presented. The recursive computer architecture is defined by presenting a program organisation, a machine organisation and an experimental machine implementation oriented to VLSI. The experimental implementation is being restricted to simple, identical microcomputers each containing a memory, a processor and a communications capability. This future generation of lego-like computer systems are termed fifth generation computers by the Japanese. 30 references.
The prevalence and quality of silent, socially silent, and disclosed autobiographical memories across adulthood.

PubMed

Alea, Nicole

2010-02-01

Two separate studies examined the prevalence and quality of silent (infrequently recalled), socially silent (i.e., recalled but not shared), and disclosed autobiographical memories. In Study 1 young and older men and women remembered positive events. Positive memories were more likely to be disclosed than to be kept socially silent or completely silent. However, socially silent and disclosed memories did not differ in memory quality: the memories were equally vivid, significant, and emotional. Silent memories were less qualitatively rich. This pattern of results was generally replicated in Study 2 with a lifespan sample for both positive and negative memories, and with additional qualitative variables. The exception was that negative memories were kept silent more often. Age differences were minimal. Women disclosed their autobiographical memories more, but men told a greater variety of people. Results are discussed in terms of the functions that memory telling and silences might serve for individuals.
Unstructured Adaptive Meshes: Bad for Your Memory?

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Feng, Hui-Yu; VanderWijngaart, Rob

2003-01-01

This viewgraph presentation explores the need for a NASA Advanced Supercomputing (NAS) parallel benchmark for problems with irregular dynamical memory access. This benchmark is important and necessary because: 1) Problems with localized error source benefit from adaptive nonuniform meshes; 2) Certain machines perform poorly on such problems; 3) Parallel implementation may provide further performance improvement but is difficult. Some examples of problems which use irregular dynamical memory access include: 1) Heat transfer problem; 2) Heat source term; 3) Spectral element method; 4) Base functions; 5) Elemental discrete equations; 6) Global discrete equations. Nonconforming Mesh and Mortar Element Method are covered in greater detail in this presentation.
Spatial light modulators and applications III; Proceedings of the Meeting, San Diego, CA, Aug. 7, 8, 1989

NASA Astrophysics Data System (ADS)

Efron, Uzi

Recent advances in the technology and applications of spatial light modulators (SLMs) are discussed in review essays by leading experts. Topics addressed include materials for SLMs, SLM devices and device technology, applications to optical data processing, and applications to artificial neural networks. Particular attention is given to nonlinear optical polymers, liquid crystals, magnetooptic SLMs, multiple-quantum-well SLMs, deformable-mirror SLMs, three-dimensional optical memories, applications of photorefractive devices to optical computing, photonic neurocomputers and learning machines, holographic associative memories, SLMs as parallel memories for optoelectronic neural networks, and coherent-optics implementations of neural-network models.
BIRD: A general interface for sparse distributed memory simulators

NASA Technical Reports Server (NTRS)

Rogers, David

1990-01-01

Kanerva's sparse distributed memory (SDM) has now been implemented for at least six different computers, including SUN3 workstations, the Apple Macintosh, and the Connection Machine. A common interface for input of commands would both aid testing of programs on a broad range of computer architectures and assist users in transferring results from research environments to applications. A common interface also allows secondary programs to generate command sequences for a sparse distributed memory, which may then be executed on the appropriate hardware. The BIRD program is an attempt to create such an interface. Simplifying access to different simulators should assist developers in finding appropriate uses for SDM.
Spatial light modulators and applications III; Proceedings of the Meeting, San Diego, CA, Aug. 7, 8, 1989

NASA Technical Reports Server (NTRS)

Efron, Uzi (Editor)

1990-01-01

Recent advances in the technology and applications of spatial light modulators (SLMs) are discussed in review essays by leading experts. Topics addressed include materials for SLMs, SLM devices and device technology, applications to optical data processing, and applications to artificial neural networks. Particular attention is given to nonlinear optical polymers, liquid crystals, magnetooptic SLMs, multiple-quantum-well SLMs, deformable-mirror SLMs, three-dimensional optical memories, applications of photorefractive devices to optical computing, photonic neurocomputers and learning machines, holographic associative memories, SLMs as parallel memories for optoelectronic neural networks, and coherent-optics implementations of neural-network models.
Porting Gravitational Wave Signal Extraction to Parallel Virtual Machine (PVM)

NASA Technical Reports Server (NTRS)

Thirumalainambi, Rajkumar; Thompson, David E.; Redmon, Jeffery

2009-01-01

Laser Interferometer Space Antenna (LISA) is a planned NASA-ESA mission to be launched around 2012. The Gravitational Wave detection is fundamentally the determination of frequency, source parameters, and waveform amplitude derived in a specific order from the interferometric time-series of the rotating LISA spacecrafts. The LISA Science Team has developed a Mock LISA Data Challenge intended to promote the testing of complicated nested search algorithms to detect the 100-1 millihertz frequency signals at amplitudes of 10E-21. However, it has become clear that, sequential search of the parameters is very time consuming and ultra-sensitive; hence, a new strategy has been developed. Parallelization of existing sequential search algorithms of Gravitational Wave signal identification consists of decomposing sequential search loops, beginning with outermost loops and working inward. In this process, the main challenge is to detect interdependencies among loops and partitioning the loops so as to preserve concurrency. Existing parallel programs are based upon either shared memory or distributed memory paradigms. In PVM, master and node programs are used to execute parallelization and process spawning. The PVM can handle process management and process addressing schemes using a virtual machine configuration. The task scheduling and the messaging and signaling can be implemented efficiently for the LISA Gravitational Wave search process using a master and 6 nodes. This approach is accomplished using a server that is available at NASA Ames Research Center, and has been dedicated to the LISA Data Challenge Competition. Historically, gravitational wave and source identification parameters have taken around 7 days in this dedicated single thread Linux based server. Using PVM approach, the parameter extraction problem can be reduced to within a day. The low frequency computation and a proxy signal-to-noise ratio are calculated in separate nodes that are controlled by the master using message and vector of data passing. The message passing among nodes follows a pattern of synchronous and asynchronous send-and-receive protocols. The communication model and the message buffers are allocated dynamically to address rapid search of gravitational wave source information in the Mock LISA data sets.
Improving Memory Error Handling Using Linux

DOE Office of Scientific and Technical Information (OSTI.GOV)

Carlton, Michael Andrew; Blanchard, Sean P.; Debardeleben, Nathan A.

As supercomputers continue to get faster and more powerful in the future, they will also have more nodes. If nothing is done, then the amount of memory in supercomputer clusters will soon grow large enough that memory failures will be unmanageable to deal with by manually replacing memory DIMMs. "Improving Memory Error Handling Using Linux" is a process oriented method to solve this problem by using the Linux kernel to disable (offline) faulty memory pages containing bad addresses, preventing them from being used again by a process. The process of offlining memory pages simplifies error handling and results in reducingmore » both hardware and manpower costs required to run Los Alamos National Laboratory (LANL) clusters. This process will be necessary for the future of supercomputing to allow the development of exascale computers. It will not be feasible without memory error handling to manually replace the number of DIMMs that will fail daily on a machine consisting of 32-128 petabytes of memory. Testing reveals the process of offlining memory pages works and is relatively simple to use. As more and more testing is conducted, the entire process will be automated within the high-performance computing (HPC) monitoring software, Zenoss, at LANL.« less
Enhancing an appointment diary on a pocket computer for use by people after brain injury.

PubMed

Wright, P; Rogers, N; Hall, C; Wilson, B; Evans, J; Emslie, H

2001-12-01

People with memory loss resulting from brain injury benefit from purpose-designed memory aids such as appointment diaries on pocket computers. The present study explores the effects of extending the range of memory aids and including games. For 2 months, 12 people who had sustained brain injury were loaned a pocket computer containing three purpose-designed memory aids: diary, notebook and to-do list. A month later they were given another computer with the same memory aids but a different method of text entry (physical keyboard or touch-screen keyboard). Machine order was counterbalanced across participants. Assessment was by interviews during the loan periods, rating scales, performance tests and computer log files. All participants could use the memory aids and ten people (83%) found them very useful. Correlations among the three memory aids were not significant, suggesting individual variation in how they were used. Games did not increase use of the memory aids, nor did loan of the preferred pocket computer (with physical keyboard). Significantly more diary entries were made by people who had previously used other memory aids, suggesting that a better understanding of how to use a range of memory aids could benefit some people with brain injury.
Personal semantics: at the crossroads of semantic and episodic memory.

PubMed

Renoult, Louis; Davidson, Patrick S R; Palombo, Daniela J; Moscovitch, Morris; Levine, Brian

2012-11-01

Declarative memory is usually described as consisting of two systems: semantic and episodic memory. Between these two poles, however, may lie a third entity: personal semantics (PS). PS concerns knowledge of one's past. Although typically assumed to be an aspect of semantic memory, it is essentially absent from existing models of knowledge. Furthermore, like episodic memory (EM), PS is idiosyncratically personal (i.e., not culturally-shared). We show that, depending on how it is operationalized, the neural correlates of PS can look more similar to semantic memory, more similar to EM, or dissimilar to both. We consider three different perspectives to better integrate PS into existing models of declarative memory and suggest experimental strategies for disentangling PS from semantic and episodic memory. Copyright © 2012 Elsevier Ltd. All rights reserved.
Characterization of NiTi Shape Memory Damping Elements designed for Automotive Safety Systems

NASA Astrophysics Data System (ADS)

Strittmatter, Joachim; Clipa, Victor; Gheorghita, Viorel; Gümpel, Paul

2014-07-01

Actuator elements made of NiTi shape memory material are more and more known in industry because of their unique properties. Due to the martensitic phase change, they can revert to their original shape by heating when subjected to an appropriate treatment. This thermal shape memory effect (SME) can show a significant shape change combined with a considerable force. Therefore such elements can be used to solve many technical tasks in the field of actuating elements and mechatronics and will play an increasing role in the next years, especially within the automotive technology, energy management, power, and mechanical engineering as well as medical technology. Beside this thermal SME, these materials also show a mechanical SME, characterized by a superelastic plateau with reversible elongations in the range of 8%. This behavior is based on the building of stress-induced martensite of loaded austenite material at constant temperature and facilitates a lot of applications especially in the medical field. Both SMEs are attended by energy dissipation during the martensitic phase change. This paper describes the first results obtained on different actuator and superelastic NiTi wires concerning their use as damping elements in automotive safety systems. In a first step, the damping behavior of small NiTi wires up to 0.5 mm diameter was examined at testing speeds varying between 0.1 and 50 mm/s upon an adapted tensile testing machine. In order to realize higher testing speeds, a drop impact testing machine was designed, which allows testing speeds up to 4000 mm/s. After introducing this new type of testing machine, the first results of vertical-shock tests of superelastic and electrically activated actuator wires are presented. The characterization of these high dynamic phase change parameters represents the basis for new applications for shape memory damping elements, especially in automotive safety systems.
CrossSim

DOE Office of Scientific and Technical Information (OSTI.GOV)

Plimpton, Steven J.; Agarwal, Sapan; Schiek, Richard

2016-09-02

CrossSim is a simulator for modeling neural-inspired machine learning algorithms on analog hardware, such as resistive memory crossbars. It includes noise models for reading and updating the resistances, which can be based on idealized equations or experimental data. It can also introduce noise and finite precision effects when converting values from digital to analog and vice versa. All of these effects can be turned on or off as an algorithm processes a data set and attempts to learn its salient attributes so that it can be categorized in the machine learning training/classification context. CrossSim thus allows the robustness, accuracy, andmore » energy usage of a machine learning algorithm to be tested on simulated hardware.« less
Etiological Distinction of Working Memory Components in Relation to Mathematics

PubMed Central

Lukowski, Sarah L.; Soden, Brooke; Hart, Sara A.; Thompson, Lee A.; Kovas, Yulia; Petrill, Stephen A.

2014-01-01

Working memory has been consistently associated with mathematics achievement, although the etiology of these relations remains poorly understood. The present study examined the genetic and environmental underpinnings of math story problem solving, timed calculation, and untimed calculation alongside working memory components in 12-year-old monozygotic (n = 105) and same-sex dizygotic (n = 143) twin pairs. Results indicated significant phenotypic correlation between each working memory component and all mathematics outcomes (r = 0.18 – 0.33). Additive genetic influences shared between the visuo-spatial sketchpad and mathematics achievement was significant, accounting for roughly 89% of the observed correlation. In addition, genetic covariance was found between the phonological loop and math story problem solving. In contrast, despite there being a significant observed relationship between phonological loop and timed and untimed calculation, there was no significant genetic or environmental covariance between the phonological loop and timed or untimed calculation skills. Further analyses indicated that genetic overlap between the visuo-spatial sketchpad and math story problem solving and math fluency was distinct from general genetic factors, whereas g, phonological loop, and mathematics shared generalist genes. Thus, although each working memory component was related to mathematics, the etiology of their relationships may be distinct. PMID:25477699
Test Sequence Priming in Recognition Memory

ERIC Educational Resources Information Center

Johns, Elizabeth E.; Mewhort, D. J. K.

2009-01-01

The authors examined priming within the test sequence in 3 recognition memory experiments. A probe primed its successor whenever both probes shared a feature with the same studied item ("interjacent priming"), indicating that the study item like the probe is central to the decision. Interjacent priming occurred even when the 2 probes did…
Two Maintenance Mechanisms of Verbal Information in Working Memory

ERIC Educational Resources Information Center

Camos, V.; Lagner, P.; Barrouillet, P.

2009-01-01

The present study evaluated the interplay between two mechanisms of maintenance of verbal information in working memory, namely articulatory rehearsal as described in Baddeley's model, and attentional refreshing as postulated in Barrouillet and Camos's Time-Based Resource-Sharing (TBRS) model. In four experiments using complex span paradigm, we…
Down Memory Lane: Recollections of Lamaze International's First 50 Years

PubMed Central

Zwelling, Elaine

2010-01-01

The 42-year involvement of one member of Lamaze International is chronicled through a decade-by-decade review of personal memories. The history of Lamaze International is shared through the recollections of her roles as a childbirth educator, faculty member, and member of the board of directors. PMID:21629385
Paul Ricoeur, Memory, and the Historical Gaze: Implications for Education Histories

ERIC Educational Resources Information Center

Colby, Sherri Rae

2012-01-01

In this article, the author shares the potential applications of Paul Ricoeur's philosophies of history, memory, and narrative to the interpretation of educational histories, and those histories' life spans: moving cyclically from early conception, to evidentiary construction, to published dissemination; and ultimately to death or immortality. Her…
Prospects of a mathematical theory of human behavior in complex man-machine systems tasks. [time sharing computer analogy of automobile driving

NASA Technical Reports Server (NTRS)

Johannsen, G.; Rouse, W. B.

1978-01-01

A hierarchy of human activities is derived by analyzing automobile driving in general terms. A structural description leads to a block diagram and a time-sharing computer analogy. The range of applicability of existing mathematical models is considered with respect to the hierarchy of human activities in actual complex tasks. Other mathematical tools so far not often applied to man machine systems are also discussed. The mathematical descriptions at least briefly considered here include utility, estimation, control, queueing, and fuzzy set theory as well as artificial intelligence techniques. Some thoughts are given as to how these methods might be integrated and how further work might be pursued.
Task Assignment Heuristics for Distributed CFD Applications

NASA Technical Reports Server (NTRS)

Lopez-Benitez, N.; Djomehri, M. J.; Biswas, R.; Biegel, Bryan (Technical Monitor)

2001-01-01

CFD applications require high-performance computational platforms: 1. Complex physics and domain configuration demand strongly coupled solutions; 2. Applications are CPU and memory intensive; and 3. Huge resource requirements can only be satisfied by teraflop-scale machines or distributed computing.

Microcomputer Calculated Diagnostic X-Ray Exposure Factors: Clinical Evaluation

PubMed Central

Markivee, C. R.; Edwards, F. Marc; Leonard, Patricia

1981-01-01

Calculation of correct settings for the controls of a diagnostic x-ray machine was established as feasible in a microcomputer with 4K memory. The cost effectiveness and other findings in the application of this method are discussed.
Implications of the Turing machine model of computation for processor and programming language design

NASA Astrophysics Data System (ADS)

Hunter, Geoffrey

2004-01-01

A computational process is classified according to the theoretical model that is capable of executing it; computational processes that require a non-predeterminable amount of intermediate storage for their execution are Turing-machine (TM) processes, while those whose storage are predeterminable are Finite Automation (FA) processes. Simple processes (such as traffic light controller) are executable by Finite Automation, whereas the most general kind of computation requires a Turing Machine for its execution. This implies that a TM process must have a non-predeterminable amount of memory allocated to it at intermediate instants of its execution; i.e. dynamic memory allocation. Many processes encountered in practice are TM processes. The implication for computational practice is that the hardware (CPU) architecture and its operating system must facilitate dynamic memory allocation, and that the programming language used to specify TM processes must have statements with the semantic attribute of dynamic memory allocation, for in Alan Turing"s thesis on computation (1936) the "standard description" of a process is invariant over the most general data that the process is designed to process; i.e. the program describing the process should never have to be modified to allow for differences in the data that is to be processed in different instantiations; i.e. data-invariant programming. Any non-trivial program is partitioned into sub-programs (procedures, subroutines, functions, modules, etc). Examination of the calls/returns between the subprograms reveals that they are nodes in a tree-structure; this tree-structure is independent of the programming language used to encode (define) the process. Each sub-program typically needs some memory for its own use (to store values intermediate between its received data and its computed results); this locally required memory is not needed before the subprogram commences execution, and it is not needed after its execution terminates; it may be allocated as its execution commences, and deallocated as its execution terminates, and if the amount of this local memory is not known until just before execution commencement, then it is essential that it be allocated dynamically as the first action of its execution. This dynamically allocated/deallocated storage of each subprogram"s intermediate values, conforms with the stack discipline; i.e. last allocated = first to be deallocated, an incidental benefit of which is automatic overlaying of variables. This stack-based dynamic memory allocation was a semantic implication of the nested block structure that originated in the ALGOL-60 programming language. AGLOL-60 was a TM language, because the amount of memory allocated on subprogram (block/procedure) entry (for arrays, etc) was computable at execution time. A more general requirement of a Turing machine process is for code generation at run-time; this mandates access to the source language processor (compiler/interpretor) during execution of the process. This fundamental aspect of computer science is important to the future of system design, because it has been overlooked throughout the 55 years since modern computing began in 1048. The popular computer systems of this first half-century of computing were constrained by compile-time (or even operating system boot-time) memory allocation, and were thus limited to executing FA processes. The practical effect was that the distinction between the data-invariant program and its variable data was blurred; programmers had to make trial and error executions, modifying the program"s compile-time constants (array dimensions) to iterate towards the values required at run-time by the data being processed. This era of trial and error computing still persists; it pervades the culture of current (2003) computing practice.
The DASH Project: An Overview

DTIC Science & Technology

1988-02-29

by memory copyin g will degrade system performance on shared-memory multiprocessors. Virtual memor y (VM) remapping, as opposed to memory copying...Bershad, G.D. Giuseppe Facchetti, Kevin Fall, G . Scott Graham, Ellen Nelson , P. Venkat Rangan, Bruno Sartirana, Shin-Yuan Tzou, Raj Vaswani, and Robert...Remote Execution in NEST", IEEE Trans. on Software Eng. 13, 8 (August 1987), 905-912. 3. G . T. Almes, A. P. Black, E. Lazowska and J. Noe, "The Eden
Self-folding with shape memory composites at the millimeter scale

NASA Astrophysics Data System (ADS)

Felton, S. M.; Becker, K. P.; Aukes, D. M.; Wood, R. J.

2015-08-01

Self-folding is an effective method for creating 3D shapes from flat sheets. In particular, shape memory composites—laminates containing shape memory polymers—have been used to self-fold complex structures and machines. To date, however, these composites have been limited to feature sizes larger than one centimeter. We present a new shape memory composite capable of folding millimeter-scale features. This technique can be activated by a global heat source for simultaneous folding, or by resistive heaters for sequential folding. It is capable of feature sizes ranging from 0.5 to 40 mm, and is compatible with multiple laminate compositions. We demonstrate the ability to produce complex structures and mechanisms by building two self-folding pieces: a model ship and a model bumblebee.
Memory-built-in quantum cloning in a hybrid solid-state spin register

NASA Astrophysics Data System (ADS)

Wang, W.-B.; Zu, C.; He, L.; Zhang, W.-G.; Duan, L.-M.

2015-07-01

As a way to circumvent the quantum no-cloning theorem, approximate quantum cloning protocols have received wide attention with remarkable applications. Copying of quantum states to memory qubits provides an important strategy for eavesdropping in quantum cryptography. We report an experiment that realizes cloning of quantum states from an electron spin to a nuclear spin in a hybrid solid-state spin register with near-optimal fidelity. The nuclear spin provides an ideal memory qubit at room temperature, which stores the cloned quantum states for a millisecond under ambient conditions, exceeding the lifetime of the original quantum state carried by the electron spin by orders of magnitude. The realization of a cloning machine with built-in quantum memory provides a key step for application of quantum cloning in quantum information science.
Memory-built-in quantum cloning in a hybrid solid-state spin register.

PubMed

Wang, W-B; Zu, C; He, L; Zhang, W-G; Duan, L-M

2015-07-16

As a way to circumvent the quantum no-cloning theorem, approximate quantum cloning protocols have received wide attention with remarkable applications. Copying of quantum states to memory qubits provides an important strategy for eavesdropping in quantum cryptography. We report an experiment that realizes cloning of quantum states from an electron spin to a nuclear spin in a hybrid solid-state spin register with near-optimal fidelity. The nuclear spin provides an ideal memory qubit at room temperature, which stores the cloned quantum states for a millisecond under ambient conditions, exceeding the lifetime of the original quantum state carried by the electron spin by orders of magnitude. The realization of a cloning machine with built-in quantum memory provides a key step for application of quantum cloning in quantum information science.
Machine Learning Feature Selection for Tuning Memory Page Swapping

DTIC Science & Technology

2013-09-01

environments we set up. 13 Figure 4.1 Updated Feature Vector List. Features we added to the kernel are anno - tated with “(MLVM...Feb. 1966. [2] P. J . Denning, “The working set model for program behavior,” Communications of the ACM, vol. 11, no. 5, pp. 323–333, May 1968. [3] L. A...8] R. W. Cart and J . L. Hennessy, “WSClock — A simple and effective algorithm for virtual memory management,” M.S. thesis, Dept. Computer Science
AFL-1: A programming Language for Massively Concurrent Computers.

DTIC Science & Technology

1986-11-01

Bibliography Ackley, D.H., Hinton, G.E., Sejnowski, T.J., "A Learning Algorithm for boltzmann Machines", Cognitive Science, 1985, 9, 147-169. Agre...P.E., "Routines", Memo 828, MIT AI Laboratory, Many 1985. Ballard, D.H., Hayes, P.J., "Parallel Logical Inference", Conference of the Cognitive Science...34Experiments on Semantic Memory and Language Com- 125 prehension", in L.W. Greg (Ed.), Cognition in Learning and Memory, New York, Wiley, 1972._ Collins
A three phase optimization method for precopy based VM live migration.

PubMed

Sharma, Sangeeta; Chawla, Meenu

2016-01-01

Virtual machine live migration is a method of moving virtual machine across hosts within a virtualized datacenter. It provides significant benefits for administrator to manage datacenter efficiently. It reduces service interruption by transferring the virtual machine without stopping at source. Transfer of large number of virtual machine memory pages results in long migration time as well as downtime, which also affects the overall system performance. This situation becomes unbearable when migration takes place over slower network or a long distance migration within a cloud. In this paper, precopy based virtual machine live migration method is thoroughly analyzed to trace out the issues responsible for its performance drops. In order to address these issues, this paper proposes three phase optimization (TPO) method. It works in three phases as follows: (i) reduce the transfer of memory pages in first phase, (ii) reduce the transfer of duplicate pages by classifying frequently and non-frequently updated pages, and (iii) reduce the data sent in last iteration of migration by applying the simple RLE compression technique. As a result, each phase significantly reduces total pages transferred, total migration time and downtime respectively. The proposed TPO method is evaluated using different representative workloads on a Xen virtualized environment. Experimental results show that TPO method reduces total pages transferred by 71 %, total migration time by 70 %, downtime by 3 % for higher workload, and it does not impose significant overhead as compared to traditional precopy method. Comparison of TPO method with other methods is also done for supporting and showing its effectiveness. TPO method and precopy methods are also tested at different number of iterations. The TPO method gives better performance even with less number of iterations.
CDISC SHARE, a Global, Cloud-based Resource of Machine-Readable CDISC Standards for Clinical and Translational Research

PubMed Central

Hume, Samuel; Chow, Anthony; Evans, Julie; Malfait, Frederik; Chason, Julie; Wold, J. Darcy; Kubick, Wayne; Becnel, Lauren B.

2018-01-01

The Clinical Data Interchange Standards Consortium (CDISC) is a global non-profit standards development organization that creates consensus-based standards for clinical and translational research. Several of these standards are now required by regulators for electronic submissions of regulated clinical trials’ data and by government funding agencies. These standards are free and open, available for download on the CDISC Website as PDFs. While these documents are human readable, they are not amenable to ready use by electronic systems. CDISC launched the CDISC Shared Health And Research Electronic library (SHARE) to provide the standards metadata in machine-readable formats to facilitate the automated management and implementation of the standards. This paper describes how CDISC SHARE’S standards can facilitate collecting, aggregating and analyzing standardized data from early design to end analysis; and its role as a central resource providing information systems with metadata that drives process automation including study setup and data pipelining. PMID:29888049
Efficiently modeling neural networks on massively parallel computers

NASA Technical Reports Server (NTRS)

Farber, Robert M.

1993-01-01

Neural networks are a very useful tool for analyzing and modeling complex real world systems. Applying neural network simulations to real world problems generally involves large amounts of data and massive amounts of computation. To efficiently handle the computational requirements of large problems, we have implemented at Los Alamos a highly efficient neural network compiler for serial computers, vector computers, vector parallel computers, and fine grain SIMD computers such as the CM-2 connection machine. This paper describes the mapping used by the compiler to implement feed-forward backpropagation neural networks for a SIMD (Single Instruction Multiple Data) architecture parallel computer. Thinking Machines Corporation has benchmarked our code at 1.3 billion interconnects per second (approximately 3 gigaflops) on a 64,000 processor CM-2 connection machine (Singer 1990). This mapping is applicable to other SIMD computers and can be implemented on MIMD computers such as the CM-5 connection machine. Our mapping has virtually no communications overhead with the exception of the communications required for a global summation across the processors (which has a sub-linear runtime growth on the order of O(log(number of processors)). We can efficiently model very large neural networks which have many neurons and interconnects and our mapping can extend to arbitrarily large networks (within memory limitations) by merging the memory space of separate processors with fast adjacent processor interprocessor communications. This paper will consider the simulation of only feed forward neural network although this method is extendable to recurrent networks.
Ghosts in the Machine II: Neural Correlates of Memory Interference from the Previous Trial.

PubMed

Papadimitriou, Charalampos; White, Robert L; Snyder, Lawrence H

2017-04-01

Previous memoranda interfere with working memory. For example, spatial memories are biased toward locations memorized on the previous trial. We predicted, based on attractor network models of memory, that activity in the frontal eye fields (FEFs) encoding a previous target location can persist into the subsequent trial and that this ghost will then bias the readout of the current target. Contrary to this prediction, we find that FEF memory representations appear biased away from (not toward) the previous target location. The behavioral and neural data can be reconciled by a model in which receptive fields of memory neurons converge toward remembered locations, much as receptive fields converge toward attended locations. Convergence increases the resources available to encode the relevant memoranda and decreases overall error in the network, but the residual convergence from the previous trial can give rise to an attractive behavioral bias on the next trial. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
An imperialist competitive algorithm for virtual machine placement in cloud computing

NASA Astrophysics Data System (ADS)

Jamali, Shahram; Malektaji, Sepideh; Analoui, Morteza

2017-05-01

Cloud computing, the recently emerged revolution in IT industry, is empowered by virtualisation technology. In this paradigm, the user's applications run over some virtual machines (VMs). The process of selecting proper physical machines to host these virtual machines is called virtual machine placement. It plays an important role on resource utilisation and power efficiency of cloud computing environment. In this paper, we propose an imperialist competitive-based algorithm for the virtual machine placement problem called ICA-VMPLC. The base optimisation algorithm is chosen to be ICA because of its ease in neighbourhood movement, good convergence rate and suitable terminology. The proposed algorithm investigates search space in a unique manner to efficiently obtain optimal placement solution that simultaneously minimises power consumption and total resource wastage. Its final solution performance is compared with several existing methods such as grouping genetic and ant colony-based algorithms as well as bin packing heuristic. The simulation results show that the proposed method is superior to other tested algorithms in terms of power consumption, resource wastage, CPU usage efficiency and memory usage efficiency.
Preliminary Development of Real Time Usage-Phase Monitoring System for CNC Machine Tools with a Case Study on CNC Machine VMC 250

NASA Astrophysics Data System (ADS)

Budi Harja, Herman; Prakosa, Tri; Raharno, Sri; Yuwana Martawirya, Yatna; Nurhadi, Indra; Setyo Nogroho, Alamsyah

2018-03-01

The production characteristic of job-shop industry at which products have wide variety but small amounts causes every machine tool will be shared to conduct production process with dynamic load. Its dynamic condition operation directly affects machine tools component reliability. Hence, determination of maintenance schedule for every component should be calculated based on actual usage of machine tools component. This paper describes study on development of monitoring system to obtaining information about each CNC machine tool component usage in real time approached by component grouping based on its operation phase. A special device has been developed for monitoring machine tool component usage by utilizing usage phase activity data taken from certain electronics components within CNC machine. The components are adaptor, servo driver and spindle driver, as well as some additional components such as microcontroller and relays. The obtained data are utilized for detecting machine utilization phases such as power on state, machine ready state or spindle running state. Experimental result have shown that the developed CNC machine tool monitoring system is capable of obtaining phase information of machine tool usage as well as its duration and displays the information at the user interface application.
Reversibility in Quantum Models of Stochastic Processes

NASA Astrophysics Data System (ADS)

Gier, David; Crutchfield, James; Mahoney, John; James, Ryan

Natural phenomena such as time series of neural firing, orientation of layers in crystal stacking and successive measurements in spin-systems are inherently probabilistic. The provably minimal classical models of such stochastic processes are ɛ-machines, which consist of internal states, transition probabilities between states and output values. The topological properties of the ɛ-machine for a given process characterize the structure, memory and patterns of that process. However ɛ-machines are often not ideal because their statistical complexity (Cμ) is demonstrably greater than the excess entropy (E) of the processes they represent. Quantum models (q-machines) of the same processes can do better in that their statistical complexity (Cq) obeys the relation Cμ >= Cq >= E. q-machines can be constructed to consider longer lengths of strings, resulting in greater compression. With code-words of sufficiently long length, the statistical complexity becomes time-symmetric - a feature apparently novel to this quantum representation. This result has ramifications for compression of classical information in quantum computing and quantum communication technology.
Initial Performance Results on IBM POWER6

NASA Technical Reports Server (NTRS)

Saini, Subbash; Talcott, Dale; Jespersen, Dennis; Djomehri, Jahed; Jin, Haoqiang; Mehrotra, Piysuh

2008-01-01

The POWER5+ processor has a faster memory bus than that of the previous generation POWER5 processor (533 MHz vs. 400 MHz), but the measured per-core memory bandwidth of the latter is better than that of the former (5.7 GB/s vs. 4.3 GB/s). The reason for this is that in the POWER5+, the two cores on the chip share the L2 cache, L3 cache and memory bus. The memory controller is also on the chip and is shared by the two cores. This serializes the path to memory. For consistently good performance on a wide range of applications, the performance of the processor, the memory subsystem, and the interconnects (both latency and bandwidth) should be balanced. Recognizing this, IBM has designed the Power6 processor so as to avoid the bottlenecks due to the L2 cache, memory controller and buffer chips of the POWER5+. Unlike the POWER5+, each core in the POWER6 has its own L2 cache (4 MB - double that of the Power5+), memory controller and buffer chips. Each core in the POWER6 runs at 4.7 GHz instead of 1.9 GHz in POWER5+. In this paper, we evaluate the performance of a dual-core Power6 based IBM p6-570 system, and we compare its performance with that of a dual-core Power5+ based IBM p575+ system. In this evaluation, we have used the High- Performance Computing Challenge (HPCC) benchmarks, NAS Parallel Benchmarks (NPB), and four real-world applications--three from computational fluid dynamics and one from climate modeling.
The scheme machine: A case study in progress in design derivation at system levels

NASA Technical Reports Server (NTRS)

Johnson, Steven D.

1995-01-01

The Scheme Machine is one of several design projects of the Digital Design Derivation group at Indiana University. It differs from the other projects in its focus on issues of system design and its connection to surrounding research in programming language semantics, compiler construction, and programming methodology underway at Indiana and elsewhere. The genesis of the project dates to the early 1980's, when digital design derivation research branched from the surrounding research effort in programming languages. Both branches have continued to develop in parallel, with this particular project serving as a bridge. However, by 1990 there remained little real interaction between the branches and recently we have undertaken to reintegrate them. On the software side, researchers have refined a mathematically rigorous (but not mechanized) treatment starting with the fully abstract semantic definition of Scheme and resulting in an efficient implementation consisting of a compiler and virtual machine model, the latter typically realized with a general purpose microprocessor. The derivation includes a number of sophisticated factorizations and representations and is also deep example of the underlying engineering methodology. The hardware research has created a mechanized algebra supporting the tedious and massive transformations often seen at lower levels of design. This work has progressed to the point that large scale devices, such as processors, can be derived from first-order finite state machine specifications. This is roughly where the language oriented research stops; thus, together, the two efforts establish a thread from the highest levels of abstract specification to detailed digital implementation. The Scheme Machine project challenges hardware derivation research in several ways, although the individual components of the system are of a similar scale to those we have worked with before. The machine has a custom dual-ported memory to support garbage collection. It consists of four tightly coupled processes--processor, collector, allocator, memory--with a very non-trivial synchronization relationship. Finally, there are deep issues of representation for the run-time objects of a symbolic processing language. The research centers on verification through integrated formal reasoning systems, but is also involved with modeling and prototyping environments. Since the derivation algebra is basd on an executable modeling language, there is opportunity to incorporate design animation in the design process. We are looking for ways to move smoothly and incrementally from executable specifications into hardware realization. For example, we can run the garbage collector specification, a Scheme program, directly against the physical memory prototype, and similarly, the instruction processor model against the heap implementation.
Research into display sharing techniques for distributed computing environments

NASA Technical Reports Server (NTRS)

Hugg, Steven B.; Fitzgerald, Paul F., Jr.; Rosson, Nina Y.; Johns, Stephen R.

1990-01-01

The X-based Display Sharing solution for distributed computing environments is described. The Display Sharing prototype includes the base functionality for telecast and display copy requirements. Since the prototype implementation is modular and the system design provided flexibility for the Mission Control Center Upgrade (MCCU) operational consideration, the prototype implementation can be the baseline for a production Display Sharing implementation. To facilitate the process the following discussions are presented: Theory of operation; System of architecture; Using the prototype; Software description; Research tools; Prototype evaluation; and Outstanding issues. The prototype is based on the concept of a dedicated central host performing the majority of the Display Sharing processing, allowing minimal impact on each individual workstation. Each workstation participating in Display Sharing hosts programs to facilitate the user's access to Display Sharing as host machine.
Enhancing Application Performance Using Mini-Apps: Comparison of Hybrid Parallel Programming Paradigms

NASA Technical Reports Server (NTRS)

Lawson, Gary; Poteat, Michael; Sosonkina, Masha; Baurle, Robert; Hammond, Dana

2016-01-01

In this work, several mini-apps have been created to enhance a real-world application performance, namely the VULCAN code for complex flow analysis developed at the NASA Langley Research Center. These mini-apps explore hybrid parallel programming paradigms with Message Passing Interface (MPI) for distributed memory access and either Shared MPI (SMPI) or OpenMP for shared memory accesses. Performance testing shows that MPI+SMPI yields the best execution performance, while requiring the largest number of code changes. A maximum speedup of 23X was measured for MPI+SMPI, but only 10X was measured for MPI+OpenMP.
Parallel discrete event simulation using shared memory

NASA Technical Reports Server (NTRS)

Reed, Daniel A.; Malony, Allen D.; Mccredie, Bradley D.

1988-01-01

With traditional event-list techniques, evaluating a detailed discrete-event simulation-model can often require hours or even days of computation time. By eliminating the event list and maintaining only sufficient synchronization to ensure causality, parallel simulation can potentially provide speedups that are linear in the numbers of processors. A set of shared-memory experiments, using the Chandy-Misra distributed-simulation algorithm, to simulate networks of queues is presented. Parameters of the study include queueing network topology and routing probabilities, number of processors, and assignment of network nodes to processors. These experiments show that Chandy-Misra distributed simulation is a questionable alternative to sequential-simulation of most queueing network models.

Production and evaluation of measuring equipment for share viscosity of polymer melts included nanofiller with injection molding machine

NASA Astrophysics Data System (ADS)

Kameda, Takao; Sugino, Naoto; Takei, Satoshi

2016-10-01

Shear viscosity measurement device was produced to evaluate the injection molding workability for high-performance resins. Observation was possible in shear rate from 10 to 10000 [1/sec] that were higher than rotary rheometer by measuring with a plasticization cylinder of the injection molding machine. The result of measurements extrapolated result of a measurement of the rotary rheometer.
System-Level Virtualization Research at Oak Ridge National Laboratory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Scott, Stephen L; Vallee, Geoffroy R; Naughton, III, Thomas J

2010-01-01

System-level virtualization is today enjoying a rebirth as a technique to effectively share what were then considered large computing resources to subsequently fade from the spotlight as individual workstations gained in popularity with a one machine - one user approach. One reason for this resurgence is that the simple workstation has grown in capability to rival that of anything available in the past. Thus, computing centers are again looking at the price/performance benefit of sharing that single computing box via server consolidation. However, industry is only concentrating on the benefits of using virtualization for server consolidation (enterprise computing) whereas ourmore » interest is in leveraging virtualization to advance high-performance computing (HPC). While these two interests may appear to be orthogonal, one consolidating multiple applications and users on a single machine while the other requires all the power from many machines to be dedicated solely to its purpose, we propose that virtualization does provide attractive capabilities that may be exploited to the benefit of HPC interests. This does raise the two fundamental questions of: is the concept of virtualization (a machine sharing technology) really suitable for HPC and if so, how does one go about leveraging these virtualization capabilities for the benefit of HPC. To address these questions, this document presents ongoing studies on the usage of system-level virtualization in a HPC context. These studies include an analysis of the benefits of system-level virtualization for HPC, a presentation of research efforts based on virtualization for system availability, and a presentation of research efforts for the management of virtual systems. The basis for this document was material presented by Stephen L. Scott at the Collaborative and Grid Computing Technologies meeting held in Cancun, Mexico on April 12-14, 2007.« less
Feature-based memory-driven attentional capture: visual working memory content affects visual attention.

PubMed

Olivers, Christian N L; Meijer, Frank; Theeuwes, Jan

2006-10-01

In 7 experiments, the authors explored whether visual attention (the ability to select relevant visual information) and visual working memory (the ability to retain relevant visual information) share the same content representations. The presence of singleton distractors interfered more strongly with a visual search task when it was accompanied by an additional memory task. Singleton distractors interfered even more when they were identical or related to the object held in memory, but only when it was difficult to verbalize the memory content. Furthermore, this content-specific interaction occurred for features that were relevant to the memory task but not for irrelevant features of the same object or for once-remembered objects that could be forgotten. Finally, memory-related distractors attracted more eye movements but did not result in longer fixations. The results demonstrate memory-driven attentional capture on the basis of content-specific representations. Copyright 2006 APA.
Self-defining memories, scripts, and the life story: narrative identity in personality and psychotherapy.

PubMed

Singer, Jefferson A; Blagov, Pavel; Berry, Meredith; Oost, Kathryn M

2013-12-01

An integrative model of narrative identity builds on a dual memory system that draws on episodic memory and a long-term self to generate autobiographical memories. Autobiographical memories related to critical goals in a lifetime period lead to life-story memories, which in turn become self-defining memories when linked to an individual's enduring concerns. Self-defining memories that share repetitive emotion-outcome sequences yield narrative scripts, abstracted templates that filter cognitive-affective processing. The life story is the individual's overarching narrative that provides unity and purpose over the life course. Healthy narrative identity combines memory specificity with adaptive meaning-making to achieve insight and well-being, as demonstrated through a literature review of personality and clinical research, as well as new findings from our own research program. A clinical case study drawing on this narrative identity model is also presented with implications for treatment and research. © 2012 Wiley Periodicals, Inc.
Working Memory in Children: A Time-Constrained Functioning Similar to Adults

ERIC Educational Resources Information Center

Portrat, Sophie; Camos, Valerie; Barrouillet, Pierre

2009-01-01

Within the time-based resource-sharing (TBRS) model, we tested a new conception of the relationships between processing and storage in which the core mechanisms of working memory (WM) are time constrained. However, our previous studies were restricted to adults. The current study aimed at demonstrating that these mechanisms are present and…
Close Associations and Memory in Brainwriting Groups

ERIC Educational Resources Information Center

Coskun, Hamit

2011-01-01

The present experiment examined whether or not the type of associations (close (e.g. apple-pear) and distant (e.g. apple-fish) word associations) and memory instruction (paying attention to the ideas of others) had effects on the idea generation performances in the brainwriting paradigm in which all participants shared their ideas by using paper…
Common data buffer

NASA Technical Reports Server (NTRS)

Byrne, F.

1981-01-01

Time-shared interface speeds data processing in distributed computer network. Two-level high-speed scanning approach routes information to buffer, portion of which is reserved for series of "first-in, first-out" memory stacks. Buffer address structure and memory are protected from noise or failed components by error correcting code. System is applicable to any computer or processing language.
Accumulating Evidence about What Prospective Memory Costs Actually Reveal

ERIC Educational Resources Information Center

Strickland, Luke; Heathcote, Andrew; Remington, Roger W.; Loft, Shayne

2017-01-01

Event-based prospective memory (PM) tasks require participants to substitute an atypical PM response for an ongoing task response when presented with PM targets. Responses to ongoing tasks are often slower with the addition of PM demands ("PM costs"). Prominent PM theories attribute costs to capacity-sharing between the ongoing and PM…
How communication goals determine when audience tuning biases memory.

PubMed

Echterhoff, Gerald; Higgins, E Tory; Kopietz, René; Groll, Stephan

2008-02-01

After tuning their message to suit their audience's attitude, communicators' own memories for the original information (e.g., a target person's behaviors) often reflect the biased view expressed in their message--producing an audience-congruent memory bias. Exploring the motivational circumstances of message production, the authors investigated whether this bias depends on the goals driving audience tuning. In 4 experiments, the memory bias was found to a greater extent when audience tuning served the creation of a shared reality than when it served alternative, nonshared reality goals (being polite toward a stigmatized-group audience; obtaining incentives; being entertaining; complying with a blatant demand). In addition, the authors found that these effects were mediated by the epistemic trust in the audience-congruent view but not by the rehearsal or accurate retrieval of the original input information, the ability to discriminate between the original and the message information, or a contrast away from extremely tuned messages. The central role of epistemic trust, a measure of the communicators' experience of shared reality, was supported in meta-analyses across the experiments. PsycINFO Database Record (c) 2008 APA, all rights reserved.
Shared filtering processes link attentional and visual short-term memory capacity limits.

PubMed

Bettencourt, Katherine C; Michalka, Samantha W; Somers, David C

2011-09-30

Both visual attention and visual short-term memory (VSTM) have been shown to have capacity limits of 4 ± 1 objects, driving the hypothesis that they share a visual processing buffer. However, these capacity limitations also show strong individual differences, making the degree to which these capacities are related unclear. Moreover, other research has suggested a distinction between attention and VSTM buffers. To explore the degree to which capacity limitations reflect the use of a shared visual processing buffer, we compared individual subject's capacities on attentional and VSTM tasks completed in the same testing session. We used a multiple object tracking (MOT) and a VSTM change detection task, with varying levels of distractors, to measure capacity. Significant correlations in capacity were not observed between the MOT and VSTM tasks when distractor filtering demands differed between the tasks. Instead, significant correlations were seen when the tasks shared spatial filtering demands. Moreover, these filtering demands impacted capacity similarly in both attention and VSTM tasks. These observations fail to support the view that visual attention and VSTM capacity limits result from a shared buffer but instead highlight the role of the resource demands of underlying processes in limiting capacity.
Finite element computation on nearest neighbor connected machines

NASA Technical Reports Server (NTRS)

Mcaulay, A. D.

1984-01-01

Research aimed at faster, more cost effective parallel machines and algorithms for improving designer productivity with finite element computations is discussed. A set of 8 boards, containing 4 nearest neighbor connected arrays of commercially available floating point chips and substantial memory, are inserted into a commercially available machine. One-tenth Mflop (64 bit operation) processors provide an 89% efficiency when solving the equations arising in a finite element problem for a single variable regular grid of size 40 by 40 by 40. This is approximately 15 to 20 times faster than a much more expensive machine such as a VAX 11/780 used in double precision. The efficiency falls off as faster or more processors are envisaged because communication times become dominant. A novel successive overrelaxation algorithm which uses cyclic reduction in order to permit data transfer and computation to overlap in time is proposed.
Machine Learning Toolkit for Extreme Scale

DOE Office of Scientific and Technical Information (OSTI.GOV)

2014-03-31

Support Vector Machines (SVM) is a popular machine learning technique, which has been applied to a wide range of domains such as science, finance, and social networks for supervised learning. MaTEx undertakes the challenge of designing a scalable parallel SVM training algorithm for large scale systems, which includes commodity multi-core machines, tightly connected supercomputers and cloud computing systems. Several techniques are proposed for improved speed and memory space usage including adaptive and aggressive elimination of samples for faster convergence , and sparse format representation of data samples. Several heuristics for earliest possible to lazy elimination of non-contributing samples are consideredmore » in MaTEx. In many cases, where an early sample elimination might result in a false positive, low overhead mechanisms for reconstruction of key data structures are proposed. The proposed algorithm and heuristics are implemented and evaluated on various publicly available datasets« less
PWM Inverter control and the application thereof within electric vehicles

DOEpatents

Geppert, Steven

1982-01-01

An inverter (34) which provides power to an A.C. machine (28) is controlled by a circuit (36) employing PWM control strategy whereby A.C. power is supplied to the machine at a preselectable frequency and preselectable voltage. This is accomplished by the technique of waveform notching in which the shapes of the notches are varied to determine the average energy content of the overall waveform. Through this arrangement, the operational efficiency of the A.C. machine is optimized. The control circuit includes a micro-computer and memory element which receive various parametric inputs and calculate optimized machine control data signals therefrom. The control data is asynchronously loaded into the inverter through an intermediate buffer (38). In its preferred embodiment, the present invention is incorporated within an electric vehicle (10) employing a 144 VDC battery pack (32) and a three-phase induction motor (18).
A class Hierarchical, object-oriented approach to virtual memory management

NASA Technical Reports Server (NTRS)

Russo, Vincent F.; Campbell, Roy H.; Johnston, Gary M.

1989-01-01

The Choices family of operating systems exploits class hierarchies and object-oriented programming to facilitate the construction of customized operating systems for shared memory and networked multiprocessors. The software is being used in the Tapestry laboratory to study the performance of algorithms, mechanisms, and policies for parallel systems. Described here are the architectural design and class hierarchy of the Choices virtual memory management system. The software and hardware mechanisms and policies of a virtual memory system implement a memory hierarchy that exploits the trade-off between response times and storage capacities. In Choices, the notion of a memory hierarchy is captured by abstract classes. Concrete subclasses of those abstractions implement a virtual address space, segmentation, paging, physical memory management, secondary storage, and remote (that is, networked) storage. Captured in the notion of a memory hierarchy are classes that represent memory objects. These classes provide a storage mechanism that contains encapsulated data and have methods to read or write the memory object. Each of these classes provides specializations to represent the memory hierarchy.
Saving all the bits

NASA Technical Reports Server (NTRS)

Denning, Peter J.

1990-01-01

The scientific tradition of saving all the data from experiments for independent validation and for further investigation is under profound challenge by modern satellite data collectors and by supercomputers. The volume of data is beyond the capacity to store, transmit, and comprehend the data. A promising line of study is discovery machines that study the data at the collection site and transmit statistical summaries of patterns observed. Examples of discovery machines are the Autoclass system and the genetic memory system of NASA-Ames, and the proposal for knowbots by Kahn and Cerf.
Mechanical ventilator - infants

MedlinePlus

... this page: //medlineplus.gov/ency/article/007240.htm Mechanical ventilator - infants To use the sharing features on this page, please enable JavaScript. A mechanical ventilator is a machine that assists with breathing. ...
Current problems in the dynamics and design of mechanisms and machines

NASA Astrophysics Data System (ADS)

Kestel'Man, V. N.

The papers contained in this volume deal with possible ways of improving the dynamic and structural properties of machines and mechanisms and also with problems associated with the design of aircraft equipment. Topics discussed include estimation of the stressed state of a model of an orbital film structure, a study of the operation of an aerodynamic angle transducer in flow of a hot gas, calculation of the efficiency of aircraft gear drives, and dynamic accuracy of a controlled manipulator. Papers are also presented on optimal synthesis of mechanical systems with variable properties, synthesis of mechanisms using initial kinematic chains, and using shape memory materials in the design of machines and mechanisms. (For individual items see A93-31202 to A93-31214)
Genetic and environmental contributions to the associations between intraindividual variability in reaction time and cognitive function.

PubMed

Finkel, Deborah; Pedersen, Nancy L

2014-01-01

Intraindividual variability (IIV) in reaction time has been related to cognitive decline, but questions remain about the nature of this relationship. Mean and range in movement and decision time for simple reaction time were available from 241 individuals aged 51-86 years at the fifth testing wave of the Swedish Adoption/Twin Study of Aging. Cognitive performance on four factors was also available: verbal, spatial, memory, and speed. Analyses indicated that range in reaction time could be used as an indicator of IIV. Heritability estimates were 35% for mean reaction and 20% for range in reaction. Multivariate analysis indicated that the genetic variance on the memory, speed, and spatial factors is shared with genetic variance for mean or range in reaction time. IIV shares significant genetic variance with fluid ability in late adulthood, over and above and genetic variance shared with mean reaction time.
A Study of Shared-Memory Mutual Exclusion Protocols Using CADP

NASA Astrophysics Data System (ADS)

Mateescu, Radu; Serwe, Wendelin

Mutual exclusion protocols are an essential building block of concurrent systems: indeed, such a protocol is required whenever a shared resource has to be protected against concurrent non-atomic accesses. Hence, many variants of mutual exclusion protocols exist in the shared-memory setting, such as Peterson's or Dekker's well-known protocols. Although the functional correctness of these protocols has been studied extensively, relatively little attention has been paid to their non-functional aspects, such as their performance in the long run. In this paper, we report on experiments with the performance evaluation of mutual exclusion protocols using Interactive Markov Chains. Steady-state analysis provides an additional criterion for comparing protocols, which complements the verification of their functional properties. We also carefully re-examined the functional properties, whose accurate formulation as temporal logic formulas in the action-based setting turns out to be quite involved.
Handling debugger breakpoints in a shared instruction system

DOEpatents

Gooding, Thomas Michael; Shok, Richard Michael

2014-01-21

A debugger debugs processes that execute shared instructions so that a breakpoint set for one process will not cause a breakpoint to occur in the other processes. A breakpoint is set by recording the original instruction at the desired location and writing a trap instruction to the shared instructions at that location. When a process encounters the breakpoint, the process passes control to the debugger for breakpoint processing if the breakpoint was set at that location for that process. If the trap was not set at that location for that process, the cacheline containing the trap is copied to a small scratchpad memory, and the virtual memory mappings are changed to translate the virtual address of the cacheline to the scratchpad. The original instruction is then written to replace the trap instruction in the scratchpad, so that process can execute the instructions in the scatchpad thereby avoiding the trap instruction.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.