distributed memory programming: Topics by Science.gov

Sample records for distributed memory programming

Programming distributed memory architectures using Kali

NASA Technical Reports Server (NTRS)

Mehrotra, Piyush; Vanrosendale, John

1990-01-01

Programming nonshared memory systems is more difficult than programming shared memory systems, in part because of the relatively low level of current programming environments for such machines. A new programming environment is presented, Kali, which provides a global name space and allows direct access to remote data values. In order to retain efficiency, Kali provides a system on annotations, allowing the user to control those aspects of the program critical to performance, such as data distribution and load balancing. The primitives and constructs provided by the language is described, and some of the issues raised in translating a Kali program for execution on distributed memory systems are also discussed.
Supporting shared data structures on distributed memory architectures

NASA Technical Reports Server (NTRS)

Koelbel, Charles; Mehrotra, Piyush; Vanrosendale, John

1990-01-01

Programming nonshared memory systems is more difficult than programming shared memory systems, since there is no support for shared data structures. Current programming languages for distributed memory architectures force the user to decompose all data structures into separate pieces, with each piece owned by one of the processors in the machine, and with all communication explicitly specified by low-level message-passing primitives. A new programming environment is presented for distributed memory architectures, providing a global name space and allowing direct access to remote parts of data values. The analysis and program transformations required to implement this environment are described, and the efficiency of the resulting code on the NCUBE/7 and IPSC/2 hypercubes are described.
Frequent Statement and Dereference Elimination for Imperative and Object-Oriented Distributed Programs

PubMed Central

El-Zawawy, Mohamed A.

2014-01-01

This paper introduces new approaches for the analysis of frequent statement and dereference elimination for imperative and object-oriented distributed programs running on parallel machines equipped with hierarchical memories. The paper uses languages whose address spaces are globally partitioned. Distributed programs allow defining data layout and threads writing to and reading from other thread memories. Three type systems (for imperative distributed programs) are the tools of the proposed techniques. The first type system defines for every program point a set of calculated (ready) statements and memory accesses. The second type system uses an enriched version of types of the first type system and determines which of the ready statements and memory accesses are used later in the program. The third type system uses the information gather so far to eliminate unnecessary statement computations and memory accesses (the analysis of frequent statement and dereference elimination). Extensions to these type systems are also presented to cover object-oriented distributed programs. Two advantages of our work over related work are the following. The hierarchical style of concurrent parallel computers is similar to the memory model used in this paper. In our approach, each analysis result is assigned a type derivation (serves as a correctness proof). PMID:24892098
Two demonstrators and a simulator for a sparse, distributed memory

NASA Technical Reports Server (NTRS)

Brown, Robert L.

1987-01-01

Described are two programs demonstrating different aspects of Kanerva's Sparse, Distributed Memory (SDM). These programs run on Sun 3 workstations, one using color, and have straightforward graphically oriented user interfaces and graphical output. Presented are descriptions of the programs, how to use them, and what they show. Additionally, this paper describes the software simulator behind each program.
Automatic selection of dynamic data partitioning schemes for distributed memory multicomputers

NASA Technical Reports Server (NTRS)

Palermo, Daniel J.; Banerjee, Prithviraj

1995-01-01

For distributed memory multicomputers such as the Intel Paragon, the IBM SP-2, the NCUBE/2, and the Thinking Machines CM-5, the quality of the data partitioning for a given application is crucial to obtaining high performance. This task has traditionally been the user's responsibility, but in recent years much effort has been directed to automating the selection of data partitioning schemes. Several researchers have proposed systems that are able to produce data distributions that remain in effect for the entire execution of an application. For complex programs, however, such static data distributions may be insufficient to obtain acceptable performance. The selection of distributions that dynamically change over the course of a program's execution adds another dimension to the data partitioning problem. In this paper, we present a technique that can be used to automatically determine which partitionings are most beneficial over specific sections of a program while taking into account the added overhead of performing redistribution. This system is being built as part of the PARADIGM (PARAllelizing compiler for DIstributed memory General-purpose Multicomputers) project at the University of Illinois. The complete system will provide a fully automated means to parallelize programs written in a serial programming model obtaining high performance on a wide range of distributed-memory multicomputers.
BIRD: A general interface for sparse distributed memory simulators

NASA Technical Reports Server (NTRS)

Rogers, David

1990-01-01

Kanerva's sparse distributed memory (SDM) has now been implemented for at least six different computers, including SUN3 workstations, the Apple Macintosh, and the Connection Machine. A common interface for input of commands would both aid testing of programs on a broad range of computer architectures and assist users in transferring results from research environments to applications. A common interface also allows secondary programs to generate command sequences for a sparse distributed memory, which may then be executed on the appropriate hardware. The BIRD program is an attempt to create such an interface. Simplifying access to different simulators should assist developers in finding appropriate uses for SDM.
Vienna FORTRAN: A FORTRAN language extension for distributed memory multiprocessors

NASA Technical Reports Server (NTRS)

Chapman, Barbara; Mehrotra, Piyush; Zima, Hans

1991-01-01

Exploiting the performance potential of distributed memory machines requires a careful distribution of data across the processors. Vienna FORTRAN is a language extension of FORTRAN which provides the user with a wide range of facilities for such mapping of data structures. However, programs in Vienna FORTRAN are written using global data references. Thus, the user has the advantage of a shared memory programming paradigm while explicitly controlling the placement of data. The basic features of Vienna FORTRAN are presented along with a set of examples illustrating the use of these features.
Programming in Vienna Fortran

NASA Technical Reports Server (NTRS)

Chapman, Barbara; Mehrotra, Piyush; Zima, Hans

1992-01-01

Exploiting the full performance potential of distributed memory machines requires a careful distribution of data across the processors. Vienna Fortran is a language extension of Fortran which provides the user with a wide range of facilities for such mapping of data structures. In contrast to current programming practice, programs in Vienna Fortran are written using global data references. Thus, the user has the advantages of a shared memory programming paradigm while explicitly controlling the data distribution. In this paper, we present the language features of Vienna Fortran for FORTRAN 77, together with examples illustrating the use of these features.
Effects of cacheing on multitasking efficiency and programming strategy on an ELXSI 6400

DOE Office of Scientific and Technical Information (OSTI.GOV)

Montry, G.R.; Benner, R.E.

1985-12-01

The impact of a cache/shared memory architecture, and, in particular, the cache coherency problem, upon concurrent algorithm and program development is discussed. In this context, a simple set of programming strategies are proposed which streamline code development and improve code performance when multitasking in a cache/shared memory or distributed memory environment.
Combining Distributed and Shared Memory Models: Approach and Evolution of the Global Arrays Toolkit

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nieplocha, Jarek; Harrison, Robert J.; Kumar, Mukul

2002-07-29

Both shared memory and distributed memory models have advantages and shortcomings. Shared memory model is much easier to use but it ignores data locality/placement. Given the hierarchical nature of the memory subsystems in the modern computers this characteristic might have a negative impact on performance and scalability. Various techniques, such as code restructuring to increase data reuse and introducing blocking in data accesses, can address the problem and yield performance competitive with message passing[Singh], however at the cost of compromising the ease of use feature. Distributed memory models such as message passing or one-sided communication offer performance and scalability butmore » they compromise the ease-of-use. In this context, the message-passing model is sometimes referred to as?assembly programming for the scientific computing?. The Global Arrays toolkit[GA1, GA2] attempts to offer the best features of both models. It implements a shared-memory programming model in which data locality is managed explicitly by the programmer. This management is achieved by explicit calls to functions that transfer data between a global address space (a distributed array) and local storage. In this respect, the GA model has similarities to the distributed shared-memory models that provide an explicit acquire/release protocol. However, the GA model acknowledges that remote data is slower to access than local data and allows data locality to be explicitly specified and hence managed. The GA model exposes to the programmer the hierarchical memory of modern high-performance computer systems, and by recognizing the communication overhead for remote data transfer, it promotes data reuse and locality of reference. This paper describes the characteristics of the Global Arrays programming model, capabilities of the toolkit, and discusses its evolution.« less
Support for Debugging Automatically Parallelized Programs

NASA Technical Reports Server (NTRS)

Hood, Robert; Jost, Gabriele; Biegel, Bryan (Technical Monitor)

2001-01-01

This viewgraph presentation provides information on the technical aspects of debugging computer code that has been automatically converted for use in a parallel computing system. Shared memory parallelization and distributed memory parallelization entail separate and distinct challenges for a debugging program. A prototype system has been developed which integrates various tools for the debugging of automatically parallelized programs including the CAPTools Database which provides variable definition information across subroutines as well as array distribution information.
Global Arrays

DOE Office of Scientific and Technical Information (OSTI.GOV)

Krishnamoorthy, Sriram; Daily, Jeffrey A.; Vishnu, Abhinav

2015-11-01

Global Arrays (GA) is a distributed-memory programming model that allows for shared-memory-style programming combined with one-sided communication, to create a set of tools that combine high performance with ease-of-use. GA exposes a relatively straightforward programming abstraction, while supporting fully-distributed data structures, locality of reference, and high-performance communication. GA was originally formulated in the early 1990’s to provide a communication layer for the Northwest Chemistry (NWChem) suite of chemistry modeling codes that was being developed concurrently.
UPC++ Programmer’s Guide (v1.0 2017.9)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bachan, J.; Baden, S.; Bonachea, D.

UPC++ is a C++11 library that provides Asynchronous Partitioned Global Address Space (APGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The APGAS model is single program, multiple-data (SPMD), with each separate thread of execution (referred to as a rank, a term borrowed from MPI) having access to local memory as it would in C++. However, APGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the ranks. UPC++ provides numerous methods for accessing and using global memory. In UPC++, allmore » operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.« less
UPC++ Programmer’s Guide, v1.0-2018.3.0

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bachan, J.; Baden, S.; Bonachea, Dan

UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate thread of execution (referred to as a rank, a term borrowed from MPI) having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the ranks. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operationsmore » that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.« less
Distributed memory compiler design for sparse problems

NASA Technical Reports Server (NTRS)

Wu, Janet; Saltz, Joel; Berryman, Harry; Hiranandani, Seema

1991-01-01

A compiler and runtime support mechanism is described and demonstrated. The methods presented are capable of solving a wide range of sparse and unstructured problems in scientific computing. The compiler takes as input a FORTRAN 77 program enhanced with specifications for distributing data, and the compiler outputs a message passing program that runs on a distributed memory computer. The runtime support for this compiler is a library of primitives designed to efficiently support irregular patterns of distributed array accesses and irregular distributed array partitions. A variety of Intel iPSC/860 performance results obtained through the use of this compiler are presented.
Shared versus distributed memory multiprocessors

NASA Technical Reports Server (NTRS)

Jordan, Harry F.

1991-01-01

The question of whether multiprocessors should have shared or distributed memory has attracted a great deal of attention. Some researchers argue strongly for building distributed memory machines, while others argue just as strongly for programming shared memory multiprocessors. A great deal of research is underway on both types of parallel systems. Special emphasis is placed on systems with a very large number of processors for computation intensive tasks and considers research and implementation trends. It appears that the two types of systems will likely converge to a common form for large scale multiprocessors.
Parallel Programming Paradigms

DTIC Science & Technology

1987-07-01

Unclassified IS.. DECLASSIFICATIONIOOWNGRADIN G 16. DISTRIBUTION STATEMENT (of this Report) Distribution of this report is unlimited. 17...8416878 and by the Office of Naval Research Contracts No. N00014-86-K-0264 and No. N00014-85- K-0328. 8 ?~~ O . G 1 49 II Parallel Programming Paradigms...processors -. "to fetch from the same memory cell (list head) and thus seems to favor a shared memory - g implementation [37). In this dissertation, we
Efficient iteration in data-parallel programs with irregular and dynamically distributed data structures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Littlefield, R.J.

1990-02-01

To implement an efficient data-parallel program on a non-shared memory MIMD multicomputer, data and computations must be properly partitioned to achieve good load balance and locality of reference. Programs with irregular data reference patterns often require irregular partitions. Although good partitions may be easy to determine, they can be difficult or impossible to implement in programming languages that provide only regular data distributions, such as blocked or cyclic arrays. We are developing Onyx, a programming system that provides a shared memory model of distributed data structures and extends the concept of data distribution to include irregular and dynamic distributions. Thismore » provides a powerful means to specify irregular partitions. Perhaps surprisingly, programs using it can also execute efficiently. In this paper, we describe and evaluate the Onyx implementation of a model problem that repeatedly executes an irregular but fixed data reference pattern. On an NCUBE hypercube, the speed of the Onyx implementation is comparable to that of carefully handwritten message-passing code.« less
Address tracing for parallel machines

NASA Technical Reports Server (NTRS)

Stunkel, Craig B.; Janssens, Bob; Fuchs, W. Kent

1991-01-01

Recently implemented parallel system address-tracing methods based on several metrics are surveyed. The issues specific to collection of traces for both shared and distributed memory parallel computers are highlighted. Five general categories of address-trace collection methods are examined: hardware-captured, interrupt-based, simulation-based, altered microcode-based, and instrumented program-based traces. The problems unique to shared memory and distributed memory multiprocessors are examined separately.
PIPS-SBB: A Parallel Distributed-Memory Branch-and-Bound Algorithm for Stochastic Mixed-Integer Programs

DOE PAGES

Munguia, Lluis-Miquel; Oxberry, Geoffrey; Rajan, Deepak

2016-05-01

Stochastic mixed-integer programs (SMIPs) deal with optimization under uncertainty at many levels of the decision-making process. When solved as extensive formulation mixed- integer programs, problem instances can exceed available memory on a single workstation. In order to overcome this limitation, we present PIPS-SBB: a distributed-memory parallel stochastic MIP solver that takes advantage of parallelism at multiple levels of the optimization process. We also show promising results on the SIPLIB benchmark by combining methods known for accelerating Branch and Bound (B&B) methods with new ideas that leverage the structure of SMIPs. Finally, we expect the performance of PIPS-SBB to improve furthermore » as more functionality is added in the future.« less

Memory operation mechanism of fullerene-containing polymer memory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nakajima, Anri, E-mail: anakajima@hiroshima-u.ac.jp; Fujii, Daiki

2015-03-09

The memory operation mechanism in fullerene-containing nanocomposite gate insulators was investigated while varying the kind of fullerene in a polymer gate insulator. It was cleared what kind of traps and which positions in the nanocomposite the injected electrons or holes are stored in. The reason for the difference in the easiness of programming was clarified taking the role of the charging energy of an injected electron into account. The dependence of the carrier dynamics on the kind of fullerene molecule was investigated. A nonuniform distribution of injected carriers occurred after application of a large magnitude programming voltage due to themore » width distribution of the polystyrene barrier between adjacent fullerene molecules. Through the investigations, we demonstrated a nanocomposite gate with fullerene molecules having excellent retention characteristics and a programming capability. This will lead to the realization of practical organic memories with fullerene-containing polymer nanocomposites.« less
Single Event Upset Analysis: On-orbit performance of the Alpha Magnetic Spectrometer Digital Signal Processor Memory aboard the International Space Station

NASA Astrophysics Data System (ADS)

Li, Jiaqiang; Choutko, Vitaly; Xiao, Liyi

2018-03-01

Based on the collection of error data from the Alpha Magnetic Spectrometer (AMS) Digital Signal Processors (DSP), on-orbit Single Event Upsets (SEUs) of the DSP program memory are analyzed. The daily error distribution and time intervals between errors are calculated to evaluate the reliability of the system. The particle density distribution of International Space Station (ISS) orbit is presented and the effects from the South Atlantic Anomaly (SAA) and the geomagnetic poles are analyzed. The impact of solar events on the DSP program memory is carried out combining data analysis and Monte Carlo simulation (MC). From the analysis and simulation results, it is concluded that the area corresponding to the SAA is the main source of errors on the ISS orbit. Solar events can also cause errors on DSP program memory, but the effect depends on the on-orbit particle density.
Performance Modeling and Measurement of Parallelized Code for Distributed Shared Memory Multiprocessors

NASA Technical Reports Server (NTRS)

Waheed, Abdul; Yan, Jerry

1998-01-01

This paper presents a model to evaluate the performance and overhead of parallelizing sequential code using compiler directives for multiprocessing on distributed shared memory (DSM) systems. With increasing popularity of shared address space architectures, it is essential to understand their performance impact on programs that benefit from shared memory multiprocessing. We present a simple model to characterize the performance of programs that are parallelized using compiler directives for shared memory multiprocessing. We parallelized the sequential implementation of NAS benchmarks using native Fortran77 compiler directives for an Origin2000, which is a DSM system based on a cache-coherent Non Uniform Memory Access (ccNUMA) architecture. We report measurement based performance of these parallelized benchmarks from four perspectives: efficacy of parallelization process; scalability; parallelization overhead; and comparison with hand-parallelized and -optimized version of the same benchmarks. Our results indicate that sequential programs can conveniently be parallelized for DSM systems using compiler directives but realizing performance gains as predicted by the performance model depends primarily on minimizing architecture-specific data locality overhead.
Distributed memory compiler methods for irregular problems: Data copy reuse and runtime partitioning

NASA Technical Reports Server (NTRS)

Das, Raja; Ponnusamy, Ravi; Saltz, Joel; Mavriplis, Dimitri

1991-01-01

Outlined here are two methods which we believe will play an important role in any distributed memory compiler able to handle sparse and unstructured problems. We describe how to link runtime partitioners to distributed memory compilers. In our scheme, programmers can implicitly specify how data and loop iterations are to be distributed between processors. This insulates users from having to deal explicitly with potentially complex algorithms that carry out work and data partitioning. We also describe a viable mechanism for tracking and reusing copies of off-processor data. In many programs, several loops access the same off-processor memory locations. As long as it can be verified that the values assigned to off-processor memory locations remain unmodified, we show that we can effectively reuse stored off-processor data. We present experimental data from a 3-D unstructured Euler solver run on iPSC/860 to demonstrate the usefulness of our methods.
Comparison of two paradigms for distributed shared memory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Levelt, W.G.; Kaashoek, M.F.; Bal, H.E.

1990-08-01

The paper compares two paradigms for Distributed Shared Memory on loosely coupled computing systems: the shared data-object model as used in Orca, a programming language specially designed for loosely coupled computing systems and the Shared Virtual Memory model. For both paradigms the authors have implemented two systems, one using only point-to-point messages, the other using broadcasting as well. They briefly describe these two paradigms and their implementations. Then they compare their performance on four applications: the traveling salesman problem, alpha-beta search, matrix multiplication and the all pairs shortest paths problem. The measurements show that both paradigms can be used efficientlymore » for programming large-grain parallel applications. Significant speedups were obtained on all applications. The unstructured Shared Virtual Memory paradigm achieves the best absolute performance, although this is largely due to the preliminary nature of the Orca compiler used. The structured shared data-object model achieves the highest speedups and is much easier to program and to debug.« less
A manual for PARTI runtime primitives

NASA Technical Reports Server (NTRS)

Berryman, Harry; Saltz, Joel

1990-01-01

Primitives are presented that are designed to help users efficiently program irregular problems (e.g., unstructured mesh sweeps, sparse matrix codes, adaptive mesh partial differential equations solvers) on distributed memory machines. These primitives are also designed for use in compilers for distributed memory multiprocessors. Communications patterns are captured at runtime, and the appropriate send and receive messages are automatically generated.
Execution models for mapping programs onto distributed memory parallel computers

NASA Technical Reports Server (NTRS)

Sussman, Alan

1992-01-01

The problem of exploiting the parallelism available in a program to efficiently employ the resources of the target machine is addressed. The problem is discussed in the context of building a mapping compiler for a distributed memory parallel machine. The paper describes using execution models to drive the process of mapping a program in the most efficient way onto a particular machine. Through analysis of the execution models for several mapping techniques for one class of programs, we show that the selection of the best technique for a particular program instance can make a significant difference in performance. On the other hand, the results of benchmarks from an implementation of a mapping compiler show that our execution models are accurate enough to select the best mapping technique for a given program.
Execution time support for scientific programs on distributed memory machines

NASA Technical Reports Server (NTRS)

Berryman, Harry; Saltz, Joel; Scroggs, Jeffrey

1990-01-01

Optimizations are considered that are required for efficient execution of code segments that consists of loops over distributed data structures. The PARTI (Parallel Automated Runtime Toolkit at ICASE) execution time primitives are designed to carry out these optimizations and can be used to implement a wide range of scientific algorithms on distributed memory machines. These primitives allow the user to control array mappings in a way that gives an appearance of shared memory. Computations can be based on a global index set. Primitives are used to carry out gather and scatter operations on distributed arrays. Communications patterns are derived at runtime, and the appropriate send and receive messages are automatically generated.
Directions in parallel programming: HPF, shared virtual memory and object parallelism in pC++

NASA Technical Reports Server (NTRS)

Bodin, Francois; Priol, Thierry; Mehrotra, Piyush; Gannon, Dennis

1994-01-01

Fortran and C++ are the dominant programming languages used in scientific computation. Consequently, extensions to these languages are the most popular for programming massively parallel computers. We discuss two such approaches to parallel Fortran and one approach to C++. The High Performance Fortran Forum has designed HPF with the intent of supporting data parallelism on Fortran 90 applications. HPF works by asking the user to help the compiler distribute and align the data structures with the distributed memory modules in the system. Fortran-S takes a different approach in which the data distribution is managed by the operating system and the user provides annotations to indicate parallel control regions. In the case of C++, we look at pC++ which is based on a concurrent aggregate parallel model.
A manual for PARTI runtime primitives, revision 1

NASA Technical Reports Server (NTRS)

Das, Raja; Saltz, Joel; Berryman, Harry

1991-01-01

Primitives are presented that are designed to help users efficiently program irregular problems (e.g., unstructured mesh sweeps, sparse matrix codes, adaptive mesh partial differential equations solvers) on distributed memory machines. These primitives are also designed for use in compilers for distributed memory multiprocessors. Communications patterns are captured at runtime, and the appropriate send and receive messages are automatically generated.
Memory access in shared virtual memory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Berrendorf, R.

1992-01-01

Shared virtual memory (SVM) is a virtual memory layer with a single address space on top of a distributed real memory on parallel computers. We examine the behavior and performance of SVM running a parallel program with medium-grained, loop-level parallelism on top of it. A simulator for the underlying parallel architecture can be used to examine the behavior of SVM more deeply. The influence of several parameters, such as the number of processors, page size, cold or warm start, and restricted page replication, is studied.
Memory access in shared virtual memory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Berrendorf, R.

1992-09-01

Shared virtual memory (SVM) is a virtual memory layer with a single address space on top of a distributed real memory on parallel computers. We examine the behavior and performance of SVM running a parallel program with medium-grained, loop-level parallelism on top of it. A simulator for the underlying parallel architecture can be used to examine the behavior of SVM more deeply. The influence of several parameters, such as the number of processors, page size, cold or warm start, and restricted page replication, is studied.
Array distribution in data-parallel programs

NASA Technical Reports Server (NTRS)

Chatterjee, Siddhartha; Gilbert, John R.; Schreiber, Robert; Sheffler, Thomas J.

1994-01-01

We consider distribution at compile time of the array data in a distributed-memory implementation of a data-parallel program written in a language like Fortran 90. We allow dynamic redistribution of data and define a heuristic algorithmic framework that chooses distribution parameters to minimize an estimate of program completion time. We represent the program as an alignment-distribution graph. We propose a divide-and-conquer algorithm for distribution that initially assigns a common distribution to each node of the graph and successively refines this assignment, taking computation, realignment, and redistribution costs into account. We explain how to estimate the effect of distribution on computation cost and how to choose a candidate set of distributions. We present the results of an implementation of our algorithms on several test problems.
Work stealing for GPU-accelerated parallel programs in a global address space framework: WORK STEALING ON GPU-ACCELERATED SYSTEMS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Arafat, Humayun; Dinan, James; Krishnamoorthy, Sriram

Task parallelism is an attractive approach to automatically load balance the computation in a parallel system and adapt to dynamism exhibited by parallel systems. Exploiting task parallelism through work stealing has been extensively studied in shared and distributed-memory contexts. In this paper, we study the design of a system that uses work stealing for dynamic load balancing of task-parallel programs executed on hybrid distributed-memory CPU-graphics processing unit (GPU) systems in a global-address space framework. We take into account the unique nature of the accelerator model employed by GPUs, the significant performance difference between GPU and CPU execution as a functionmore » of problem size, and the distinct CPU and GPU memory domains. We consider various alternatives in designing a distributed work stealing algorithm for CPU-GPU systems, while taking into account the impact of task distribution and data movement overheads. These strategies are evaluated using microbenchmarks that capture various execution configurations as well as the state-of-the-art CCSD(T) application module from the computational chemistry domain.« less
Work stealing for GPU-accelerated parallel programs in a global address space framework

DOE Office of Scientific and Technical Information (OSTI.GOV)

Arafat, Humayun; Dinan, James; Krishnamoorthy, Sriram

Task parallelism is an attractive approach to automatically load balance the computation in a parallel system and adapt to dynamism exhibited by parallel systems. Exploiting task parallelism through work stealing has been extensively studied in shared and distributed-memory contexts. In this paper, we study the design of a system that uses work stealing for dynamic load balancing of task-parallel programs executed on hybrid distributed-memory CPU-graphics processing unit (GPU) systems in a global-address space framework. We take into account the unique nature of the accelerator model employed by GPUs, the significant performance difference between GPU and CPU execution as a functionmore » of problem size, and the distinct CPU and GPU memory domains. We consider various alternatives in designing a distributed work stealing algorithm for CPU-GPU systems, while taking into account the impact of task distribution and data movement overheads. These strategies are evaluated using microbenchmarks that capture various execution configurations as well as the state-of-the-art CCSD(T) application module from the computational chemistry domain« less
Monitoring Data-Structure Evolution in Distributed Message-Passing Programs

NASA Technical Reports Server (NTRS)

Sarukkai, Sekhar R.; Beers, Andrew; Woodrow, Thomas S. (Technical Monitor)

1996-01-01

Monitoring the evolution of data structures in parallel and distributed programs, is critical for debugging its semantics and performance. However, the current state-of-art in tracking and presenting data-structure information on parallel and distributed environments is cumbersome and does not scale. In this paper we present a methodology that automatically tracks memory bindings (not the actual contents) of static and dynamic data-structures of message-passing C programs, using PVM. With the help of a number of examples we show that in addition to determining the impact of memory allocation overheads on program performance, graphical views can help in debugging the semantics of program execution. Scalable animations of virtual address bindings of source-level data-structures are used for debugging the semantics of parallel programs across all processors. In conjunction with light-weight core-files, this technique can be used to complement traditional debuggers on single processors. Detailed information (such as data-structure contents), on specific nodes, can be determined using traditional debuggers after the data structure evolution leading to the semantic error is observed graphically.
Parallelization of Program to Optimize Simulated Trajectories (POST3D)

NASA Technical Reports Server (NTRS)

Hammond, Dana P.; Korte, John J. (Technical Monitor)

2001-01-01

This paper describes the parallelization of the Program to Optimize Simulated Trajectories (POST3D). POST3D uses a gradient-based optimization algorithm that reaches an optimum design point by moving from one design point to the next. The gradient calculations required to complete the optimization process, dominate the computational time and have been parallelized using a Single Program Multiple Data (SPMD) on a distributed memory NUMA (non-uniform memory access) architecture. The Origin2000 was used for the tests presented.
Channel doping concentration and cell program state dependence on random telegraph noise spatial and statistical distribution in 30 nm NAND flash memory

NASA Astrophysics Data System (ADS)

Tomita, Toshihiro; Miyaji, Kousuke

2015-04-01

The dependence of spatial and statistical distribution of random telegraph noise (RTN) in a 30 nm NAND flash memory on channel doping concentration NA and cell program state Vth is comprehensively investigated using three-dimensional Monte Carlo device simulation considering random dopant fluctuation (RDF). It is found that single trap RTN amplitude ΔVth is larger at the center of the channel region in the NAND flash memory, which is closer to the jellium (uniform) doping results since NA is relatively low to suppress junction leakage current. In addition, ΔVth peak at the center of the channel decreases in the higher Vth state due to the current concentration at the shallow trench isolation (STI) edges induced by the high vertical electrical field through the fringing capacitance between the channel and control gate. In such cases, ΔVth distribution slope λ cannot be determined by only considering RDF and single trap.
Programming model for distributed intelligent systems

NASA Technical Reports Server (NTRS)

Sztipanovits, J.; Biegl, C.; Karsai, G.; Bogunovic, N.; Purves, B.; Williams, R.; Christiansen, T.

1988-01-01

A programming model and architecture which was developed for the design and implementation of complex, heterogeneous measurement and control systems is described. The Multigraph Architecture integrates artificial intelligence techniques with conventional software technologies, offers a unified framework for distributed and shared memory based parallel computational models and supports multiple programming paradigms. The system can be implemented on different hardware architectures and can be adapted to strongly different applications.
Study Trapped Charge Distribution in P-Channel Silicon-Oxide-Nitride-Oxide-Silicon Memory Device Using Dynamic Programming Scheme

NASA Astrophysics Data System (ADS)

Li, Fu-Hai; Chiu, Yung-Yueh; Lee, Yen-Hui; Chang, Ru-Wei; Yang, Bo-Jun; Sun, Wein-Town; Lee, Eric; Kuo, Chao-Wei; Shirota, Riichiro

2013-04-01

In this study, we precisely investigate the charge distribution in SiN layer by dynamic programming of channel hot hole induced hot electron injection (CHHIHE) in p-channel silicon-oxide-nitride-oxide-silicon (SONOS) memory device. In the dynamic programming scheme, gate voltage is increased as a staircase with fixed step amplitude, which can prohibits the injection of holes in SiN layer. Three-dimensional device simulation is calibrated and is compared with the measured programming characteristics. It is found, for the first time, that the hot electron injection point quickly traverses from drain to source side synchronizing to the expansion of charged area in SiN layer. As a result, the injected charges quickly spread over on the almost whole channel area uniformly during a short programming period, which will afford large tolerance against lateral trapped charge diffusion by baking.

High Band Technology Program (HiTeP)

DTIC Science & Technology

2005-03-01

clock distribution circuit. One Receiver Memory module receives 60MHz reference sine wave and distributes 60MHz clock signals to all Receiver Memory...Diagram UNCLASSIFIED 23 in N00014-99-C-0314 Integrated Defense Systems Final Report 1 March 2005 .. 4.ran FibreXpress Fibre-Channel PMC "Motrl Medea FCR...the Electrically Short Crossed-Notch (ESCN). It is shorter than traditional traveling wave notch antennas. The 2X ECSN fin length is approximately 1.2
An Interactive Simulation Program for Exploring Computational Models of Auto-Associative Memory.

PubMed

Fink, Christian G

2017-01-01

While neuroscience students typically learn about activity-dependent plasticity early in their education, they often struggle to conceptually connect modification at the synaptic scale with network-level neuronal dynamics, not to mention with their own everyday experience of recalling a memory. We have developed an interactive simulation program (based on the Hopfield model of auto-associative memory) that enables the user to visualize the connections generated by any pattern of neural activity, as well as to simulate the network dynamics resulting from such connectivity. An accompanying set of student exercises introduces the concepts of pattern completion, pattern separation, and sparse versus distributed neural representations. Results from a conceptual assessment administered before and after students worked through these exercises indicate that the simulation program is a useful pedagogical tool for illustrating fundamental concepts of computational models of memory.
Parallel computing for probabilistic fatigue analysis

NASA Technical Reports Server (NTRS)

Sues, Robert H.; Lua, Yuan J.; Smith, Mark D.

1993-01-01

This paper presents the results of Phase I research to investigate the most effective parallel processing software strategies and hardware configurations for probabilistic structural analysis. We investigate the efficiency of both shared and distributed-memory architectures via a probabilistic fatigue life analysis problem. We also present a parallel programming approach, the virtual shared-memory paradigm, that is applicable across both types of hardware. Using this approach, problems can be solved on a variety of parallel configurations, including networks of single or multiprocessor workstations. We conclude that it is possible to effectively parallelize probabilistic fatigue analysis codes; however, special strategies will be needed to achieve large-scale parallelism to keep large number of processors busy and to treat problems with the large memory requirements encountered in practice. We also conclude that distributed-memory architecture is preferable to shared-memory for achieving large scale parallelism; however, in the future, the currently emerging hybrid-memory architectures will likely be optimal.
Supercomputing '91; Proceedings of the 4th Annual Conference on High Performance Computing, Albuquerque, NM, Nov. 18-22, 1991

NASA Technical Reports Server (NTRS)

1991-01-01

Various papers on supercomputing are presented. The general topics addressed include: program analysis/data dependence, memory access, distributed memory code generation, numerical algorithms, supercomputer benchmarks, latency tolerance, parallel programming, applications, processor design, networks, performance tools, mapping and scheduling, characterization affecting performance, parallelism packaging, computing climate change, combinatorial algorithms, hardware and software performance issues, system issues. (No individual items are abstracted in this volume)
Implementations of BLAST for parallel computers.

PubMed

Jülich, A

1995-02-01

The BLAST sequence comparison programs have been ported to a variety of parallel computers-the shared memory machine Cray Y-MP 8/864 and the distributed memory architectures Intel iPSC/860 and nCUBE. Additionally, the programs were ported to run on workstation clusters. We explain the parallelization techniques and consider the pros and cons of these methods. The BLAST programs are very well suited for parallelization for a moderate number of processors. We illustrate our results using the program blastp as an example. As input data for blastp, a 799 residue protein query sequence and the protein database PIR were used.
Flexible language constructs for large parallel programs

NASA Technical Reports Server (NTRS)

Rosing, Matthew; Schnabel, Robert

1993-01-01

The goal of the research described is to develop flexible language constructs for writing large data parallel numerical programs for distributed memory (MIMD) multiprocessors. Previously, several models have been developed to support synchronization and communication. Models for global synchronization include SIMD (Single Instruction Multiple Data), SPMD (Single Program Multiple Data), and sequential programs annotated with data distribution statements. The two primary models for communication include implicit communication based on shared memory and explicit communication based on messages. None of these models by themselves seem sufficient to permit the natural and efficient expression of the variety of algorithms that occur in large scientific computations. An overview of a new language that combines many of these programming models in a clean manner is given. This is done in a modular fashion such that different models can be combined to support large programs. Within a module, the selection of a model depends on the algorithm and its efficiency requirements. An overview of the language and discussion of some of the critical implementation details is given.
3-D parallel program for numerical calculation of gas dynamics problems with heat conductivity on distributed memory computational systems (CS)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sofronov, I.D.; Voronin, B.L.; Butnev, O.I.

1997-12-31

The aim of the work performed is to develop a 3D parallel program for numerical calculation of gas dynamics problem with heat conductivity on distributed memory computational systems (CS), satisfying the condition of numerical result independence from the number of processors involved. Two basically different approaches to the structure of massive parallel computations have been developed. The first approach uses the 3D data matrix decomposition reconstructed at temporal cycle and is a development of parallelization algorithms for multiprocessor CS with shareable memory. The second approach is based on using a 3D data matrix decomposition not reconstructed during a temporal cycle.more » The program was developed on 8-processor CS MP-3 made in VNIIEF and was adapted to a massive parallel CS Meiko-2 in LLNL by joint efforts of VNIIEF and LLNL staffs. A large number of numerical experiments has been carried out with different number of processors up to 256 and the efficiency of parallelization has been evaluated in dependence on processor number and their parameters.« less
Numerical methods on some structured matrix algebra problems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jessup, E.R.

1996-06-01

This proposal concerned the design, analysis, and implementation of serial and parallel algorithms for certain structured matrix algebra problems. It emphasized large order problems and so focused on methods that can be implemented efficiently on distributed-memory MIMD multiprocessors. Such machines supply the computing power and extensive memory demanded by the large order problems. We proposed to examine three classes of matrix algebra problems: the symmetric and nonsymmetric eigenvalue problems (especially the tridiagonal cases) and the solution of linear systems with specially structured coefficient matrices. As all of these are of practical interest, a major goal of this work was tomore » translate our research in linear algebra into useful tools for use by the computational scientists interested in these and related applications. Thus, in addition to software specific to the linear algebra problems, we proposed to produce a programming paradigm and library to aid in the design and implementation of programs for distributed-memory MIMD computers. We now report on our progress on each of the problems and on the programming tools.« less
Loci-STREAM Version 0.9

NASA Technical Reports Server (NTRS)

Wright, Jeffrey; Thakur, Siddharth

2006-01-01

Loci-STREAM is an evolving computational fluid dynamics (CFD) software tool for simulating possibly chemically reacting, possibly unsteady flows in diverse settings, including rocket engines, turbomachines, oil refineries, etc. Loci-STREAM implements a pressure- based flow-solving algorithm that utilizes unstructured grids. (The benefit of low memory usage by pressure-based algorithms is well recognized by experts in the field.) The algorithm is robust for flows at all speeds from zero to hypersonic. The flexibility of arbitrary polyhedral grids enables accurate, efficient simulation of flows in complex geometries, including those of plume-impingement problems. The present version - Loci-STREAM version 0.9 - includes an interface with the Portable, Extensible Toolkit for Scientific Computation (PETSc) library for access to enhanced linear-equation-solving programs therein that accelerate convergence toward a solution. The name "Loci" reflects the creation of this software within the Loci computational framework, which was developed at Mississippi State University for the primary purpose of simplifying the writing of complex multidisciplinary application programs to run in distributed-memory computing environments including clusters of personal computers. Loci has been designed to relieve application programmers of the details of programming for distributed-memory computers.
Using Coarrays to Parallelize Legacy Fortran Applications: Strategy and Case Study

DOE PAGES

Radhakrishnan, Hari; Rouson, Damian W. I.; Morris, Karla; ...

2015-01-01

This paper summarizes a strategy for parallelizing a legacy Fortran 77 program using the object-oriented (OO) and coarray features that entered Fortran in the 2003 and 2008 standards, respectively. OO programming (OOP) facilitates the construction of an extensible suite of model-verification and performance tests that drive the development. Coarray parallel programming facilitates a rapid evolution from a serial application to a parallel application capable of running on multicore processors and many-core accelerators in shared and distributed memory. We delineate 17 code modernization steps used to refactor and parallelize the program and study the resulting performance. Our initial studies were donemore » using the Intel Fortran compiler on a 32-core shared memory server. Scaling behavior was very poor, and profile analysis using TAU showed that the bottleneck in the performance was due to our implementation of a collective, sequential summation procedure. We were able to improve the scalability and achieve nearly linear speedup by replacing the sequential summation with a parallel, binary tree algorithm. We also tested the Cray compiler, which provides its own collective summation procedure. Intel provides no collective reductions. With Cray, the program shows linear speedup even in distributed-memory execution. We anticipate similar results with other compilers once they support the new collective procedures proposed for Fortran 2015.« less
Corporate Memory and Spacecraft Course Development

NASA Technical Reports Server (NTRS)

1999-01-01

The Florida Space Grant Consortium (FSGC) entered into a cooperative agreement with NASA's Kennedy Space Center (KSC) to manage two programs, (1) Corporate Memory and (2) Spacecraft Course Development. The FSGC, with its central office at the University of Florida, administered these programs. In conjunction with KSC, the FGSC was responsible for the release of the Request for Proposals (RFP's) and its distribution to all its affiliates. FSGC received the applications, forwarded them to KSC for evaluations, and awarded grants to the institutions chosen by KSC.
Dynamic data distributions in Vienna Fortran

NASA Technical Reports Server (NTRS)

Chapman, Barbara; Mehrotra, Piyush; Moritsch, Hans; Zima, Hans

1993-01-01

Vienna Fortran is a machine-independent language extension of Fortran, which is based upon the Single-Program-Multiple-Data (SPMD) paradigm and allows the user to write programs for distributed-memory systems using global addresses. The language features focus mainly on the issue of distributing data across virtual processor structures. Those features of Vienna Fortran that allow the data distributions of arrays to change dynamically, depending on runtime conditions are discussed. The relevant language features are discussed, their implementation is outlined, and how they may be used in applications is described.
Stochastic switching of TiO2-based memristive devices with identical initial memory states

PubMed Central

2014-01-01

In this work, we show that identical TiO2-based memristive devices that possess the same initial resistive states are only phenomenologically similar as their internal structures may vary significantly, which could render quite dissimilar switching dynamics. We experimentally demonstrated that the resistive switching of practical devices with similar initial states could occur at different programming stimuli cycles. We argue that similar memory states can be transcribed via numerous distinct active core states through the dissimilar reduced TiO2-x filamentary distributions. Our hypothesis was finally verified via simulated results of the memory state evolution, by taking into account dissimilar initial filamentary distribution. PMID:24994953
Advanced compilation techniques in the PARADIGM compiler for distributed-memory multicomputers

NASA Technical Reports Server (NTRS)

Su, Ernesto; Lain, Antonio; Ramaswamy, Shankar; Palermo, Daniel J.; Hodges, Eugene W., IV; Banerjee, Prithviraj

1995-01-01

The PARADIGM compiler project provides an automated means to parallelize programs, written in a serial programming model, for efficient execution on distributed-memory multicomputers. .A previous implementation of the compiler based on the PTD representation allowed symbolic array sizes, affine loop bounds and array subscripts, and variable number of processors, provided that arrays were single or multi-dimensionally block distributed. The techniques presented here extend the compiler to also accept multidimensional cyclic and block-cyclic distributions within a uniform symbolic framework. These extensions demand more sophisticated symbolic manipulation capabilities. A novel aspect of our approach is to meet this demand by interfacing PARADIGM with a powerful off-the-shelf symbolic package, Mathematica. This paper describes some of the Mathematica routines that performs various transformations, shows how they are invoked and used by the compiler to overcome the new challenges, and presents experimental results for code involving cyclic and block-cyclic arrays as evidence of the feasibility of the approach.
Flexible Language Constructs for Large Parallel Programs

DOE PAGES

Rosing, Matt; Schnabel, Robert

1994-01-01

The goal of the research described in this article is to develop flexible language constructs for writing large data parallel numerical programs for distributed memory (multiple instruction multiple data [MIMD]) multiprocessors. Previously, several models have been developed to support synchronization and communication. Models for global synchronization include single instruction multiple data (SIMD), single program multiple data (SPMD), and sequential programs annotated with data distribution statements. The two primary models for communication include implicit communication based on shared memory and explicit communication based on messages. None of these models by themselves seem sufficient to permit the natural and efficient expression ofmore » the variety of algorithms that occur in large scientific computations. In this article, we give an overview of a new language that combines many of these programming models in a clean manner. This is done in a modular fashion such that different models can be combined to support large programs. Within a module, the selection of a model depends on the algorithm and its efficiency requirements. In this article, we give an overview of the language and discuss some of the critical implementation details.« less
Parallelization of NAS Benchmarks for Shared Memory Multiprocessors

NASA Technical Reports Server (NTRS)

Waheed, Abdul; Yan, Jerry C.; Saini, Subhash (Technical Monitor)

1998-01-01

This paper presents our experiences of parallelizing the sequential implementation of NAS benchmarks using compiler directives on SGI Origin2000 distributed shared memory (DSM) system. Porting existing applications to new high performance parallel and distributed computing platforms is a challenging task. Ideally, a user develops a sequential version of the application, leaving the task of porting to new generations of high performance computing systems to parallelization tools and compilers. Due to the simplicity of programming shared-memory multiprocessors, compiler developers have provided various facilities to allow the users to exploit parallelism. Native compilers on SGI Origin2000 support multiprocessing directives to allow users to exploit loop-level parallelism in their programs. Additionally, supporting tools can accomplish this process automatically and present the results of parallelization to the users. We experimented with these compiler directives and supporting tools by parallelizing sequential implementation of NAS benchmarks. Results reported in this paper indicate that with minimal effort, the performance gain is comparable with the hand-parallelized, carefully optimized, message-passing implementations of the same benchmarks.
Full Parallel Implementation of an All-Electron Four-Component Dirac-Kohn-Sham Program.

PubMed

Rampino, Sergio; Belpassi, Leonardo; Tarantelli, Francesco; Storchi, Loriano

2014-09-09

A full distributed-memory implementation of the Dirac-Kohn-Sham (DKS) module of the program BERTHA (Belpassi et al., Phys. Chem. Chem. Phys. 2011, 13, 12368-12394) is presented, where the self-consistent field (SCF) procedure is replicated on all the parallel processes, each process working on subsets of the global matrices. The key feature of the implementation is an efficient procedure for switching between two matrix distribution schemes, one (integral-driven) optimal for the parallel computation of the matrix elements and another (block-cyclic) optimal for the parallel linear algebra operations. This approach, making both CPU-time and memory scalable with the number of processors used, virtually overcomes at once both time and memory barriers associated with DKS calculations. Performance, portability, and numerical stability of the code are illustrated on the basis of test calculations on three gold clusters of increasing size, an organometallic compound, and a perovskite model. The calculations are performed on a Beowulf and a BlueGene/Q system.
Parallel language constructs for tensor product computations on loosely coupled architectures

NASA Technical Reports Server (NTRS)

Mehrotra, Piyush; Vanrosendale, John

1989-01-01

Distributed memory architectures offer high levels of performance and flexibility, but have proven awkard to program. Current languages for nonshared memory architectures provide a relatively low level programming environment, and are poorly suited to modular programming, and to the construction of libraries. A set of language primitives designed to allow the specification of parallel numerical algorithms at a higher level is described. Tensor product array computations are focused on along with a simple but important class of numerical algorithms. The problem of programming 1-D kernal routines is focused on first, such as parallel tridiagonal solvers, and then how such parallel kernels can be combined to form parallel tensor product algorithms is examined.
NavP: Structured and Multithreaded Distributed Parallel Programming

NASA Technical Reports Server (NTRS)

Pan, Lei

2007-01-01

We present Navigational Programming (NavP) -- a distributed parallel programming methodology based on the principles of migrating computations and multithreading. The four major steps of NavP are: (1) Distribute the data using the data communication pattern in a given algorithm; (2) Insert navigational commands for the computation to migrate and follow large-sized distributed data; (3) Cut the sequential migrating thread and construct a mobile pipeline; and (4) Loop back for refinement. NavP is significantly different from the current prevailing Message Passing (MP) approach. The advantages of NavP include: (1) NavP is structured distributed programming and it does not change the code structure of an original algorithm. This is in sharp contrast to MP as MP implementations in general do not resemble the original sequential code; (2) NavP implementations are always competitive with the best MPI implementations in terms of performance. Approaches such as DSM or HPF have failed to deliver satisfying performance as of today in contrast, even if they are relatively easy to use compared to MP; (3) NavP provides incremental parallelization, which is beyond the reach of MP; and (4) NavP is a unifying approach that allows us to exploit both fine- (multithreading on shared memory) and coarse- (pipelined tasks on distributed memory) grained parallelism. This is in contrast to the currently popular hybrid use of MP+OpenMP, which is known to be complex to use. We present experimental results that demonstrate the effectiveness of NavP.
Representation-Independent Iteration of Sparse Data Arrays

NASA Technical Reports Server (NTRS)

James, Mark

2007-01-01

An approach is defined that describes a method of iterating over massively large arrays containing sparse data using an approach that is implementation independent of how the contents of the sparse arrays are laid out in memory. What is unique and important here is the decoupling of the iteration over the sparse set of array elements from how they are internally represented in memory. This enables this approach to be backward compatible with existing schemes for representing sparse arrays as well as new approaches. What is novel here is a new approach for efficiently iterating over sparse arrays that is independent of the underlying memory layout representation of the array. A functional interface is defined for implementing sparse arrays in any modern programming language with a particular focus for the Chapel programming language. Examples are provided that show the translation of a loop that computes a matrix vector product into this representation for both the distributed and not-distributed cases. This work is directly applicable to NASA and its High Productivity Computing Systems (HPCS) program that JPL and our current program are engaged in. The goal of this program is to create powerful, scalable, and economically viable high-powered computer systems suitable for use in national security and industry by 2010. This is important to NASA for its computationally intensive requirements for analyzing and understanding the volumes of science data from our returned missions.

Tuning collective communication for Partitioned Global Address Space programming models

DOE PAGES

Nishtala, Rajesh; Zheng, Yili; Hargrove, Paul H.; ...

2011-06-12

Partitioned Global Address Space (PGAS) languages offer programmers the convenience of a shared memory programming style combined with locality control necessary to run on large-scale distributed memory systems. Even within a PGAS language programmers often need to perform global communication operations such as broadcasts or reductions, which are best performed as collective operations in which a group of threads work together to perform the operation. In this study we consider the problem of implementing collective communication within PGAS languages and explore some of the design trade-offs in both the interface and implementation. In particular, PGAS collectives have semantic issues thatmore » are different than in send–receive style message passing programs, and different implementation approaches that take advantage of the one-sided communication style in these languages. We present an implementation framework for PGAS collectives as part of the GASNet communication layer, which supports shared memory, distributed memory and hybrids. The framework supports a broad set of algorithms for each collective, over which the implementation may be automatically tuned. In conclusion, we demonstrate the benefit of optimized GASNet collectives using application benchmarks written in UPC, and demonstrate that the GASNet collectives can deliver scalable performance on a variety of state-of-the-art parallel machines including a Cray XT4, an IBM BlueGene/P, and a Sun Constellation system with InfiniBand interconnect.« less
Automatic data partitioning on distributed memory multicomputers. Ph.D. Thesis

NASA Technical Reports Server (NTRS)

Gupta, Manish

1992-01-01

Distributed-memory parallel computers are increasingly being used to provide high levels of performance for scientific applications. Unfortunately, such machines are not very easy to program. A number of research efforts seek to alleviate this problem by developing compilers that take over the task of generating communication. The communication overheads and the extent of parallelism exploited in the resulting target program are determined largely by the manner in which data is partitioned across different processors of the machine. Most of the compilers provide no assistance to the programmer in the crucial task of determining a good data partitioning scheme. A novel approach is presented, the constraints-based approach, to the problem of automatic data partitioning for numeric programs. In this approach, the compiler identifies some desirable requirements on the distribution of various arrays being referenced in each statement, based on performance considerations. These desirable requirements are referred to as constraints. For each constraint, the compiler determines a quality measure that captures its importance with respect to the performance of the program. The quality measure is obtained through static performance estimation, without actually generating the target data-parallel program with explicit communication. Each data distribution decision is taken by combining all the relevant constraints. The compiler attempts to resolve any conflicts between constraints such that the overall execution time of the parallel program is minimized. This approach has been implemented as part of a compiler called Paradigm, that accepts Fortran 77 programs, and specifies the partitioning scheme to be used for each array in the program. We have obtained results on some programs taken from the Linpack and Eispack libraries, and the Perfect Benchmarks. These results are quite promising, and demonstrate the feasibility of automatic data partitioning for a significant class of scientific application programs with regular computations.
Effects of Ordering Strategies and Programming Paradigms on Sparse Matrix Computations

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Li, Xiaoye; Husbands, Parry; Biswas, Rupak; Biegel, Bryan (Technical Monitor)

2002-01-01

The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. For systems that are ill-conditioned, it is often necessary to use a preconditioning technique. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and ILU(O) preconditioned CG (PCG) using different programming paradigms and architectures. Results show that for this class of applications: ordering significantly improves overall performance on both distributed and distributed shared-memory systems, that cache reuse may be more important than reducing communication, that it is possible to achieve message-passing performance using shared-memory constructs through careful data ordering and distribution, and that a hybrid MPI+OpenMP paradigm increases programming complexity with little performance gains. A implementation of CG on the Cray MTA does not require special ordering or partitioning to obtain high efficiency and scalability, giving it a distinct advantage for adaptive applications; however, it shows limited scalability for PCG due to a lack of thread level parallelism.
Compiling global name-space programs for distributed execution

NASA Technical Reports Server (NTRS)

Koelbel, Charles; Mehrotra, Piyush

1990-01-01

Distributed memory machines do not provide hardware support for a global address space. Thus programmers are forced to partition the data across the memories of the architecture and use explicit message passing to communicate data between processors. The compiler support required to allow programmers to express their algorithms using a global name-space is examined. A general method is presented for analysis of a high level source program and along with its translation to a set of independently executing tasks communicating via messages. If the compiler has enough information, this translation can be carried out at compile-time. Otherwise run-time code is generated to implement the required data movement. The analysis required in both situations is described and the performance of the generated code on the Intel iPSC/2 is presented.
Scaling Irregular Applications through Data Aggregation and Software Multithreading

DOE Office of Scientific and Technical Information (OSTI.GOV)

Morari, Alessandro; Tumeo, Antonino; Chavarría-Miranda, Daniel

Bioinformatics, data analytics, semantic databases, knowledge discovery are emerging high performance application areas that exploit dynamic, linked data structures such as graphs, unbalanced trees or unstructured grids. These data structures usually are very large, requiring significantly more memory than available on single shared memory systems. Additionally, these data structures are difficult to partition on distributed memory systems. They also present poor spatial and temporal locality, thus generating unpredictable memory and network accesses. The Partitioned Global Address Space (PGAS) programming model seems suitable for these applications, because it allows using a shared memory abstraction across distributed-memory clusters. However, current PGAS languagesmore » and libraries are built to target regular remote data accesses and block transfers. Furthermore, they usually rely on the Single Program Multiple Data (SPMD) parallel control model, which is not well suited to the fine grained, dynamic and unbalanced parallelism of irregular applications. In this paper we present {\\bf GMT} (Global Memory and Threading library), a custom runtime library that enables efficient execution of irregular applications on commodity clusters. GMT integrates a PGAS data substrate with simple fork/join parallelism and provides automatic load balancing on a per node basis. It implements multi-level aggregation and lightweight multithreading to maximize memory and network bandwidth with fine-grained data accesses and tolerate long data access latencies. A key innovation in the GMT runtime is its thread specialization (workers, helpers and communication threads) that realize the overall functionality. We compare our approach with other PGAS models, such as UPC running using GASNet, and hand-optimized MPI code on a set of typical large-scale irregular applications, demonstrating speedups of an order of magnitude.« less
Automatic Data Partitioning on Distributed Memory Multiprocessors

DTIC Science & Technology

1990-10-01

DISTRIBUTED MEMORY MULTIPROCESSORS D 1"’ 1 C . Manish Gupta NOV,1 41990.NOV 1 41990m Prithviraj Banerjee D Coordinated Science Laboratory College of...developed on the partitioning of arrays can as well be applied to other programming languages, such as C . 3 The rest of this paper is organized as follows...value 1, as in Fortran. a) N= 4, N 2 = 1: f(i) = J,(j) = 03 b) =Ni 1, N 2 =4: fA(i) =, f() - c ) NI 2, X) 2: f()=[., f2(j) = [L-.j d) N 1 , N 2 =4: fA(i
Implementation and performance of FDPS: a framework for developing parallel particle simulation codes

NASA Astrophysics Data System (ADS)

Iwasawa, Masaki; Tanikawa, Ataru; Hosono, Natsuki; Nitadori, Keigo; Muranushi, Takayuki; Makino, Junichiro

2016-08-01

We present the basic idea, implementation, measured performance, and performance model of FDPS (Framework for Developing Particle Simulators). FDPS is an application-development framework which helps researchers to develop simulation programs using particle methods for large-scale distributed-memory parallel supercomputers. A particle-based simulation program for distributed-memory parallel computers needs to perform domain decomposition, exchange of particles which are not in the domain of each computing node, and gathering of the particle information in other nodes which are necessary for interaction calculation. Also, even if distributed-memory parallel computers are not used, in order to reduce the amount of computation, algorithms such as the Barnes-Hut tree algorithm or the Fast Multipole Method should be used in the case of long-range interactions. For short-range interactions, some methods to limit the calculation to neighbor particles are required. FDPS provides all of these functions which are necessary for efficient parallel execution of particle-based simulations as "templates," which are independent of the actual data structure of particles and the functional form of the particle-particle interaction. By using FDPS, researchers can write their programs with the amount of work necessary to write a simple, sequential and unoptimized program of O(N2) calculation cost, and yet the program, once compiled with FDPS, will run efficiently on large-scale parallel supercomputers. A simple gravitational N-body program can be written in around 120 lines. We report the actual performance of these programs and the performance model. The weak scaling performance is very good, and almost linear speed-up was obtained for up to the full system of the K computer. The minimum calculation time per timestep is in the range of 30 ms (N = 107) to 300 ms (N = 109). These are currently limited by the time for the calculation of the domain decomposition and communication necessary for the interaction calculation. We discuss how we can overcome these bottlenecks.
Integrated Network Decompositions and Dynamic Programming for Graph Optimization (INDDGO)

DOE Office of Scientific and Technical Information (OSTI.GOV)

The INDDGO software package offers a set of tools for finding exact solutions to graph optimization problems via tree decompositions and dynamic programming algorithms. Currently the framework offers serial and parallel (distributed memory) algorithms for finding tree decompositions and solving the maximum weighted independent set problem. The parallel dynamic programming algorithm is implemented on top of the MADNESS task-based runtime.
Ambipolar organic thin-film transistor-based nano-floating-gate nonvolatile memory

NASA Astrophysics Data System (ADS)

Han, Jinhua; Wang, Wei; Ying, Jun; Xie, Wenfa

2014-01-01

An ambipolar organic thin-film transistor-based nano-floating-gate nonvolatile memory was demonstrated, with discrete distributed gold nanoparticles, tetratetracontane (TTC), pentacene as the floating-gate layer, tunneling layer, and active layer, respectively. The electron traps at the TTC/pentacene interface were significantly suppressed, which resulted in an ambipolar operation in present memory. As both electrons and holes were supplied in the channel and trapped in the floating-gate by programming/erasing operations, respectively, i.e., one type of charge carriers was used to overwrite the other, trapped, one, a large memory window, extending on both sides of the initial threshold voltage, was realized.
Static analysis of the hull plate using the finite element method

NASA Astrophysics Data System (ADS)

Ion, A.

2015-11-01

This paper aims at presenting the static analysis for two levels of a container ship's construction as follows: the first level is at the girder / hull plate and the second level is conducted at the entire strength hull of the vessel. This article will describe the work for the static analysis of a hull plate. We shall use the software package ANSYS Mechanical 14.5. The program is run on a computer with four Intel Xeon X5260 CPU processors at 3.33 GHz, 32 GB memory installed. In terms of software, the shared memory parallel version of ANSYS refers to running ANSYS across multiple cores on a SMP system. The distributed memory parallel version of ANSYS (Distributed ANSYS) refers to running ANSYS across multiple processors on SMP systems or DMP systems.
SharP

DOE Office of Scientific and Technical Information (OSTI.GOV)

Venkata, Manjunath Gorentla; Aderholdt, William F

The pre-exascale systems are expected to have a significant amount of hierarchical and heterogeneous on-node memory, and this trend of system architecture in extreme-scale systems is expected to continue into the exascale era. along with hierarchical-heterogeneous memory, the system typically has a high-performing network ad a compute accelerator. This system architecture is not only effective for running traditional High Performance Computing (HPC) applications (Big-Compute), but also for running data-intensive HPC applications and Big-Data applications. As a consequence, there is a growing desire to have a single system serve the needs of both Big-Compute and Big-Data applications. Though the system architecturemore » supports the convergence of the Big-Compute and Big-Data, the programming models and software layer have yet to evolve to support either hierarchical-heterogeneous memory systems or the convergence. A programming abstraction to address this problem. The programming abstraction is implemented as a software library and runs on pre-exascale and exascale systems supporting current and emerging system architecture. Using distributed data-structures as a central concept, it provides (1) a simple, usable, and portable abstraction for hierarchical-heterogeneous memory and (2) a unified programming abstraction for Big-Compute and Big-Data applications.« less
Real-time implementations of image segmentation algorithms on shared memory multicore architecture: a survey (Conference Presentation)

NASA Astrophysics Data System (ADS)

Akil, Mohamed

2017-05-01

The real-time processing is getting more and more important in many image processing applications. Image segmentation is one of the most fundamental tasks image analysis. As a consequence, many different approaches for image segmentation have been proposed. The watershed transform is a well-known image segmentation tool. The watershed transform is a very data intensive task. To achieve acceleration and obtain real-time processing of watershed algorithms, parallel architectures and programming models for multicore computing have been developed. This paper focuses on the survey of the approaches for parallel implementation of sequential watershed algorithms on multicore general purpose CPUs: homogeneous multicore processor with shared memory. To achieve an efficient parallel implementation, it's necessary to explore different strategies (parallelization/distribution/distributed scheduling) combined with different acceleration and optimization techniques to enhance parallelism. In this paper, we give a comparison of various parallelization of sequential watershed algorithms on shared memory multicore architecture. We analyze the performance measurements of each parallel implementation and the impact of the different sources of overhead on the performance of the parallel implementations. In this comparison study, we also discuss the advantages and disadvantages of the parallel programming models. Thus, we compare the OpenMP (an application programming interface for multi-Processing) with Ptheads (POSIX Threads) to illustrate the impact of each parallel programming model on the performance of the parallel implementations.
Tissue-specific programming of memory CD8 T cell subsets impacts protection against lethal respiratory virus infection

PubMed Central

Tahiliani, Vikas

2016-01-01

How tissue-specific anatomical distribution and phenotypic specialization are linked to protective efficacy of memory T cells against reinfection is unclear. Here, we show that lung environmental cues program recently recruited central-like memory cells with migratory potentials for their tissue-specific functions during lethal respiratory virus infection. After entering the lung, some central-like cells retain their original CD27hiCXCR3hi phenotype, enabling them to localize near the infected bronchiolar epithelium and airway lumen to function as the first line of defense against pathogen encounter. Others, in response to local cytokine triggers, undergo a secondary program of differentiation that leads to the loss of CXCR3, migration arrest, and clustering within peribronchoarterial areas and in interalveolar septa. Here, the immune system adapts its response to prevent systemic viral dissemination and mortality. These results reveal the striking and unexpected spatial organization of central- versus effector-like memory cells within the lung and how cooperation between these two subsets contributes to host defense. PMID:27879287
Using CLIPS in the domain of knowledge-based massively parallel programming

NASA Technical Reports Server (NTRS)

Dvorak, Jiri J.

1994-01-01

The Program Development Environment (PDE) is a tool for massively parallel programming of distributed-memory architectures. Adopting a knowledge-based approach, the PDE eliminates the complexity introduced by parallel hardware with distributed memory and offers complete transparency in respect of parallelism exploitation. The knowledge-based part of the PDE is realized in CLIPS. Its principal task is to find an efficient parallel realization of the application specified by the user in a comfortable, abstract, domain-oriented formalism. A large collection of fine-grain parallel algorithmic skeletons, represented as COOL objects in a tree hierarchy, contains the algorithmic knowledge. A hybrid knowledge base with rule modules and procedural parts, encoding expertise about application domain, parallel programming, software engineering, and parallel hardware, enables a high degree of automation in the software development process. In this paper, important aspects of the implementation of the PDE using CLIPS and COOL are shown, including the embedding of CLIPS with C++-based parts of the PDE. The appropriateness of the chosen approach and of the CLIPS language for knowledge-based software engineering are discussed.
Performance of the Heavy Flavor Tracker (HFT) detector in star experiment at RHIC

NASA Astrophysics Data System (ADS)

Alruwaili, Manal

With the growing technology, the number of the processors is becoming massive. Current supercomputer processing will be available on desktops in the next decade. For mass scale application software development on massive parallel computing available on desktops, existing popular languages with large libraries have to be augmented with new constructs and paradigms that exploit massive parallel computing and distributed memory models while retaining the user-friendliness. Currently, available object oriented languages for massive parallel computing such as Chapel, X10 and UPC++ exploit distributed computing, data parallel computing and thread-parallelism at the process level in the PGAS (Partitioned Global Address Space) memory model. However, they do not incorporate: 1) any extension at for object distribution to exploit PGAS model; 2) the programs lack the flexibility of migrating or cloning an object between places to exploit load balancing; and 3) lack the programming paradigms that will result from the integration of data and thread-level parallelism and object distribution. In the proposed thesis, I compare different languages in PGAS model; propose new constructs that extend C++ with object distribution and object migration; and integrate PGAS based process constructs with these extensions on distributed objects. Object cloning and object migration. Also a new paradigm MIDD (Multiple Invocation Distributed Data) is presented when different copies of the same class can be invoked, and work on different elements of a distributed data concurrently using remote method invocations. I present new constructs, their grammar and their behavior. The new constructs have been explained using simple programs utilizing these constructs.
Automatic generation of efficient array redistribution routines for distributed memory multicomputers

NASA Technical Reports Server (NTRS)

Ramaswamy, Shankar; Banerjee, Prithviraj

1994-01-01

Appropriate data distribution has been found to be critical for obtaining good performance on Distributed Memory Multicomputers like the CM-5, Intel Paragon and IBM SP-1. It has also been found that some programs need to change their distributions during execution for better performance (redistribution). This work focuses on automatically generating efficient routines for redistribution. We present a new mathematical representation for regular distributions called PITFALLS and then discuss algorithms for redistribution based on this representation. One of the significant contributions of this work is being able to handle arbitrary source and target processor sets while performing redistribution. Another important contribution is the ability to handle an arbitrary number of dimensions for the array involved in the redistribution in a scalable manner. Our implementation of these techniques is based on an MPI-like communication library. The results presented show the low overheads for our redistribution algorithm as compared to naive runtime methods.
Reduced distribution of threshold voltage shift in double layer NiSi2 nanocrystals for nano-floating gate memory applications.

PubMed

Choi, Sungjin; Lee, Junhyuk; Kim, Donghyoun; Oh, Seulki; Song, Wangyu; Choi, Seonjun; Choi, Eunsuk; Lee, Seung-Beck

2011-12-01

We report on the fabrication and capacitance-voltage characteristics of double layer nickel-silicide nanocrystals with Si3N4 interlayer tunnel barrier for nano-floating gate memory applications. Compared with devices using SiO2 interlayer, the use of Si3N4 interlayer separation reduced the average size (4 nm) and distribution (+/- 2.5 nm) of NiSi2 nanocrystal (NC) charge traps by more than 50% and giving a two fold increase in NC density to 2.3 x 10(12) cm(-2). The increased density and reduced NC size distribution resulted in a significantly decrease in the distribution of the device C-V characteristics. For each program voltage, the distribution of the shift in the threshold voltage was reduced by more than 50% on average to less than 0.7 V demonstrating possible multi-level-cell operation.
Virtual memory support for distributed computing environments using a shared data object model

NASA Astrophysics Data System (ADS)

Huang, F.; Bacon, J.; Mapp, G.

1995-12-01

Conventional storage management systems provide one interface for accessing memory segments and another for accessing secondary storage objects. This hinders application programming and affects overall system performance due to mandatory data copying and user/kernel boundary crossings, which in the microkernel case may involve context switches. Memory-mapping techniques may be used to provide programmers with a unified view of the storage system. This paper extends such techniques to support a shared data object model for distributed computing environments in which good support for coherence and synchronization is essential. The approach is based on a microkernel, typed memory objects, and integrated coherence control. A microkernel architecture is used to support multiple coherence protocols and the addition of new protocols. Memory objects are typed and applications can choose the most suitable protocols for different types of object to avoid protocol mismatch. Low-level coherence control is integrated with high-level concurrency control so that the number of messages required to maintain memory coherence is reduced and system-wide synchronization is realized without severely impacting the system performance. These features together contribute a novel approach to the support for flexible coherence under application control.
The effects of nongenetic memory on population level sensitivity to stress

NASA Astrophysics Data System (ADS)

Adams, Rhys; Nevozhay, Dmitry; van Itallie, Elizabeth; Bennett, Matthew; Balazsi, Gabor

2011-03-01

While gene expression is often thought of as a unidirectional determinant of cellular fitness, recent studies have shown how growth retardation due to protein expression can affect gene expression levels in single cells. We developed two yeast strains carrying a drug resistance protein under the control of different synthetic gene constructs, one of which was monostable, while the other was bistable. The gene expression of these cell populations was tuned using a molecular inducer so that their respective means and noises were identical, while their nongenetic memory properties were different. We tested the sensitivity of these two cell population distributions to the antibiotic zeocin. We found that the gene expression distributions of bistable cell populations were sensitive to stressful environments, while the gene expression distribution of monostable cells were nearly unchanged by stress. We conclude that cell populations with high nongenetic memory are more adaptable to their environment. This work was funded by the National Institutes of Health through the NIH Director's New Innovator Award Program, 1-DP2- OD006481-01.
Distributed Processor/Memory Architectures Design Program

DTIC Science & Technology

1975-02-01

Event Scheduling Plo 31 Globat LAl Message Input Event Sicheduling Fhou ..... ............... 106 32 It tc Iata Representation...298 138 GEX LEX Scheduling Phlmophy ....... ...................... 300 139 Executive Comirol Herarchy... Scheduler Subroutine lnterrelatiomhips . ..... ................. 312 145 Task Scheduler Message Scatuer. . ...... ....................... 315 146

Parallel algorithms for modeling flow in permeable media. Annual report, February 15, 1995 - February 14, 1996

DOE Office of Scientific and Technical Information (OSTI.GOV)

G.A. Pope; K. Sephernoori; D.C. McKinney

1996-03-15

This report describes the application of distributed-memory parallel programming techniques to a compositional simulator called UTCHEM. The University of Texas Chemical Flooding reservoir simulator (UTCHEM) is a general-purpose vectorized chemical flooding simulator that models the transport of chemical species in three-dimensional, multiphase flow through permeable media. The parallel version of UTCHEM addresses solving large-scale problems by reducing the amount of time that is required to obtain the solution as well as providing a flexible and portable programming environment. In this work, the original parallel version of UTCHEM was modified and ported to CRAY T3D and CRAY T3E, distributed-memory, multiprocessor computersmore » using CRAY-PVM as the interprocessor communication library. Also, the data communication routines were modified such that the portability of the original code across different computer architectures was mad possible.« less
Runtime Systems for Extreme Scale Platforms

DTIC Science & Technology

2013-12-01

Programming in Java, PPPJ ’11, (New York, NY, USA), pp. 51–61, ACM, 2011. [76] Y. Guo, R. Barik , R. Raman, and V. Sarkar, “Work-first and help-first...95] R. Barik , J. Zhao, D. Grove, I. Peshansky, Z. Budimlić, and V. Sarkar, “Commu- nication Optimizations for Distributed-Memory X10 Programs,” in
YAPPA: a Compiler-Based Parallelization Framework for Irregular Applications on MPSoCs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lovergine, Silvia; Tumeo, Antonino; Villa, Oreste

Modern embedded systems include hundreds of cores. Because of the difficulty in providing a fast, coherent memory architecture, these systems usually rely on non-coherent, non-uniform memory architectures with private memories for each core. However, programming these systems poses significant challenges. The developer must extract large amounts of parallelism, while orchestrating communication among cores to optimize application performance. These issues become even more significant with irregular applications, which present data sets difficult to partition, unpredictable memory accesses, unbalanced control flow and fine grained communication. Hand-optimizing every single aspect is hard and time-consuming, and it often does not lead to the expectedmore » performance. There is a growing gap between such complex and highly-parallel architectures and the high level languages used to describe the specification, which were designed for simpler systems and do not consider these new issues. In this paper we introduce YAPPA (Yet Another Parallel Programming Approach), a compilation framework for the automatic parallelization of irregular applications on modern MPSoCs based on LLVM. We start by considering an efficient parallel programming approach for irregular applications on distributed memory systems. We then propose a set of transformations that can reduce the development and optimization effort. The results of our initial prototype confirm the correctness of the proposed approach.« less
Enhancing Application Performance Using Mini-Apps: Comparison of Hybrid Parallel Programming Paradigms

NASA Technical Reports Server (NTRS)

Lawson, Gary; Poteat, Michael; Sosonkina, Masha; Baurle, Robert; Hammond, Dana

2016-01-01

In this work, several mini-apps have been created to enhance a real-world application performance, namely the VULCAN code for complex flow analysis developed at the NASA Langley Research Center. These mini-apps explore hybrid parallel programming paradigms with Message Passing Interface (MPI) for distributed memory access and either Shared MPI (SMPI) or OpenMP for shared memory accesses. Performance testing shows that MPI+SMPI yields the best execution performance, while requiring the largest number of code changes. A maximum speedup of 23X was measured for MPI+SMPI, but only 10X was measured for MPI+OpenMP.
What Multilevel Parallel Programs do when you are not Watching: A Performance Analysis Case Study Comparing MPI/OpenMP, MLP, and Nested OpenMP

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Labarta, Jesus; Gimenez, Judit

2004-01-01

With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors, parallel programming techniques have evolved that support parallelism beyond a single level. When comparing the performance of applications based on different programming paradigms, it is important to differentiate between the influence of the programming model itself and other factors, such as implementation specific behavior of the operating system (OS) or architectural issues. Rewriting-a large scientific application in order to employ a new programming paradigms is usually a time consuming and error prone task. Before embarking on such an endeavor it is important to determine that there is really a gain that would not be possible with the current implementation. A detailed performance analysis is crucial to clarify these issues. The multilevel programming paradigms considered in this study are hybrid MPI/OpenMP, MLP, and nested OpenMP. The hybrid MPI/OpenMP approach is based on using MPI [7] for the coarse grained parallelization and OpenMP [9] for fine grained loop level parallelism. The MPI programming paradigm assumes a private address space for each process. Data is transferred by explicitly exchanging messages via calls to the MPI library. This model was originally designed for distributed memory architectures but is also suitable for shared memory systems. The second paradigm under consideration is MLP which was developed by Taft. The approach is similar to MPi/OpenMP, using a mix of coarse grain process level parallelization and loop level OpenMP parallelization. As it is the case with MPI, a private address space is assumed for each process. The MLP approach was developed for ccNUMA architectures and explicitly takes advantage of the availability of shared memory. A shared memory arena which is accessible by all processes is required. Communication is done by reading from and writing to the shared memory.
Kanerva's sparse distributed memory: An associative memory algorithm well-suited to the Connection Machine

NASA Technical Reports Server (NTRS)

Rogers, David

1988-01-01

The advent of the Connection Machine profoundly changes the world of supercomputers. The highly nontraditional architecture makes possible the exploration of algorithms that were impractical for standard Von Neumann architectures. Sparse distributed memory (SDM) is an example of such an algorithm. Sparse distributed memory is a particularly simple and elegant formulation for an associative memory. The foundations for sparse distributed memory are described, and some simple examples of using the memory are presented. The relationship of sparse distributed memory to three important computational systems is shown: random-access memory, neural networks, and the cerebellum of the brain. Finally, the implementation of the algorithm for sparse distributed memory on the Connection Machine is discussed.
Block-Parallel Data Analysis with DIY2

DOE Office of Scientific and Technical Information (OSTI.GOV)

Morozov, Dmitriy; Peterka, Tom

DIY2 is a programming model and runtime for block-parallel analytics on distributed-memory machines. Its main abstraction is block-structured data parallelism: data are decomposed into blocks; blocks are assigned to processing elements (processes or threads); computation is described as iterations over these blocks, and communication between blocks is defined by reusable patterns. By expressing computation in this general form, the DIY2 runtime is free to optimize the movement of blocks between slow and fast memories (disk and flash vs. DRAM) and to concurrently execute blocks residing in memory with multiple threads. This enables the same program to execute in-core, out-of-core, serial,more » parallel, single-threaded, multithreaded, or combinations thereof. This paper describes the implementation of the main features of the DIY2 programming model and optimizations to improve performance. DIY2 is evaluated on benchmark test cases to establish baseline performance for several common patterns and on larger complete analysis codes running on large-scale HPC machines.« less
Proceedings of the second SISAL users` conference

DOE Office of Scientific and Technical Information (OSTI.GOV)

Feo, J T; Frerking, C; Miller, P J

1992-12-01

This report contains papers on the following topics: A sisal code for computing the fourier transform on S{sub N}; five ways to fill your knapsack; simulating material dislocation motion in sisal; candis as an interface for sisal; parallelisation and performance of the burg algorithm on a shared-memory multiprocessor; use of genetic algorithm in sisal to solve the file design problem; implementing FFT`s in sisal; programming and evaluating the performance of signal processing applications in the sisal programming environment; sisal and Von Neumann-based languages: translation and intercommunication; an IF2 code generator for ADAM architecture; program partitioning for NUMA multiprocessor computer systems;more » mapping functional parallelism on distributed memory machines; implicit array copying: prevention is better than cure ; mathematical syntax for sisal; an approach for optimizing recursive functions; implementing arrays in sisal 2.0; Fol: an object oriented extension to the sisal language; twine: a portable, extensible sisal execution kernel; and investigating the memory performance of the optimizing sisal compiler.« less
Support of Multidimensional Parallelism in the OpenMP Programming Model

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Jost, Gabriele

2003-01-01

OpenMP is the current standard for shared-memory programming. While providing ease of parallel programming, the OpenMP programming model also has limitations which often effect the scalability of applications. Examples for these limitations are work distribution and point-to-point synchronization among threads. We propose extensions to the OpenMP programming model which allow the user to easily distribute the work in multiple dimensions and synchronize the workflow among the threads. The proposed extensions include four new constructs and the associated runtime library. They do not require changes to the source code and can be implemented based on the existing OpenMP standard. We illustrate the concept in a prototype translator and test with benchmark codes and a cloud modeling code.
Over-Distribution in Source Memory

PubMed Central

Brainerd, C. J.; Reyna, V. F.; Holliday, R. E.; Nakamura, K.

2012-01-01

Semantic false memories are confounded with a second type of error, over-distribution, in which items are attributed to contradictory episodic states. Over-distribution errors have proved to be more common than false memories when the two are disentangled. We investigated whether over-distribution is prevalent in another classic false memory paradigm: source monitoring. It is. Conventional false memory responses (source misattributions) were predominantly over-distribution errors, but unlike semantic false memory, over-distribution also accounted for more than half of true memory responses (correct source attributions). Experimental control of over-distribution was achieved via a series of manipulations that affected either recollection of contextual details or item memory (concreteness, frequency, list-order, number of presentation contexts, and individual differences in verbatim memory). A theoretical model was used to analyze the data (conjoint process dissociation) that predicts that predicts that (a) over-distribution is directly proportional to item memory but inversely proportional to recollection and (b) item memory is not a necessary precondition for recollection of contextual details. The results were consistent with both predictions. PMID:21942494
Simulation of n-qubit quantum systems. I. Quantum registers and quantum gates

NASA Astrophysics Data System (ADS)

Radtke, T.; Fritzsche, S.

2005-12-01

During recent years, quantum computations and the study of n-qubit quantum systems have attracted a lot of interest, both in theory and experiment. Apart from the promise of performing quantum computations, however, these investigations also revealed a great deal of difficulties which still need to be solved in practice. In quantum computing, unitary and non-unitary quantum operations act on a given set of qubits to form (entangled) states, in which the information is encoded by the overall system often referred to as quantum registers. To facilitate the simulation of such n-qubit quantum systems, we present the FEYNMAN program to provide all necessary tools in order to define and to deal with quantum registers and quantum operations. Although the present version of the program is restricted to unitary transformations, it equally supports—whenever possible—the representation of the quantum registers both, in terms of their state vectors and density matrices. In addition to the composition of two or more quantum registers, moreover, the program also supports their decomposition into various parts by applying the partial trace operation and the concept of the reduced density matrix. Using an interactive design within the framework of MAPLE, therefore, we expect the FEYNMAN program to be helpful not only for teaching the basic elements of quantum computing but also for studying their physical realization in the future. Program summaryTitle of program:FEYNMAN Catalogue number:ADWE Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADWE Program obtainable from:CPC Program Library, Queen's University of Belfast, N. Ireland Licensing provisions:None Computers for which the program is designed:All computers with a license of the computer algebra system MAPLE [Maple is a registered trademark of Waterlo Maple Inc.] Operating systems or monitors under which the program has been tested:Linux, MS Windows XP Programming language used:MAPLE 9.5 (but should be compatible with 9.0 and 8.0, too) Memory and time required to execute with typical data:Storage and time requirements critically depend on the number of qubits, n, in the quantum registers due to the exponential increase of the associated Hilbert space. In particular, complex algebraic operations may require large amounts of memory even for small qubit numbers. However, most of the standard commands (see Section 4 for simple examples) react promptly for up to five qubits on a normal single-processor machine ( ⩾1GHz with 512 MB memory) and use less than 10 MB memory. No. of lines in distributed program, including test data, etc.: 8864 No. of bytes in distributed program, including test data, etc.: 493 182 Distribution format: tar.gz Nature of the physical problem:During the last decade, quantum computing has been found to provide a revolutionary new form of computation. The algorithms by Shor [P.W. Shor, SIAM J. Sci. Statist. Comput. 26 (1997) 1484] and Grover [L.K. Grover, Phys. Rev. Lett. 79 (1997) 325. [2
An OpenACC-Based Unified Programming Model for Multi-accelerator Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kim, Jungwon; Lee, Seyong; Vetter, Jeffrey S

2015-01-01

This paper proposes a novel SPMD programming model of OpenACC. Our model integrates the different granularities of parallelism from vector-level parallelism to node-level parallelism into a single, unified model based on OpenACC. It allows programmers to write programs for multiple accelerators using a uniform programming model whether they are in shared or distributed memory systems. We implement a prototype of our model and evaluate its performance with a GPU-based supercomputer using three benchmark applications.
Spaceborne Processor Array

NASA Technical Reports Server (NTRS)

Chow, Edward T.; Schatzel, Donald V.; Whitaker, William D.; Sterling, Thomas

2008-01-01

A Spaceborne Processor Array in Multifunctional Structure (SPAMS) can lower the total mass of the electronic and structural overhead of spacecraft, resulting in reduced launch costs, while increasing the science return through dynamic onboard computing. SPAMS integrates the multifunctional structure (MFS) and the Gilgamesh Memory, Intelligence, and Network Device (MIND) multi-core in-memory computer architecture into a single-system super-architecture. This transforms every inch of a spacecraft into a sharable, interconnected, smart computing element to increase computing performance while simultaneously reducing mass. The MIND in-memory architecture provides a foundation for high-performance, low-power, and fault-tolerant computing. The MIND chip has an internal structure that includes memory, processing, and communication functionality. The Gilgamesh is a scalable system comprising multiple MIND chips interconnected to operate as a single, tightly coupled, parallel computer. The array of MIND components shares a global, virtual name space for program variables and tasks that are allocated at run time to the distributed physical memory and processing resources. Individual processor- memory nodes can be activated or powered down at run time to provide active power management and to configure around faults. A SPAMS system is comprised of a distributed Gilgamesh array built into MFS, interfaces into instrument and communication subsystems, a mass storage interface, and a radiation-hardened flight computer.
Automated quantitative muscle biopsy analysis system

NASA Technical Reports Server (NTRS)

Castleman, Kenneth R. (Inventor)

1980-01-01

An automated system to aid the diagnosis of neuromuscular diseases by producing fiber size histograms utilizing histochemically stained muscle biopsy tissue. Televised images of the microscopic fibers are processed electronically by a multi-microprocessor computer, which isolates, measures, and classifies the fibers and displays the fiber size distribution. The architecture of the multi-microprocessor computer, which is iterated to any required degree of complexity, features a series of individual microprocessors P.sub.n each receiving data from a shared memory M.sub.n-1 and outputing processed data to a separate shared memory M.sub.n+1 under control of a program stored in dedicated memory M.sub.n.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Jones, J.P.; Bangs, A.L.; Butler, P.L.

Hetero Helix is a programming environment which simulates shared memory on a heterogeneous network of distributed-memory computers. The machines in the network may vary with respect to their native operating systems and internal representation of numbers. Hetero Helix presents a simple programming model to developers, and also considers the needs of designers, system integrators, and maintainers. The key software technology underlying Hetero Helix is the use of a compiler'' which analyzes the data structures in shared memory and automatically generates code which translates data representations from the format native to each machine into a common format, and vice versa. Themore » design of Hetero Helix was motivated in particular by the requirements of robotics applications. Hetero Helix has been used successfully in an integration effort involving 27 CPUs in a heterogeneous network and a body of software totaling roughly 100,00 lines of code. 25 refs., 6 figs.« less
Hypercluster Parallel Processor

NASA Technical Reports Server (NTRS)

Blech, Richard A.; Cole, Gary L.; Milner, Edward J.; Quealy, Angela

1992-01-01

Hypercluster computer system includes multiple digital processors, operation of which coordinated through specialized software. Configurable according to various parallel-computing architectures of shared-memory or distributed-memory class, including scalar computer, vector computer, reduced-instruction-set computer, and complex-instruction-set computer. Designed as flexible, relatively inexpensive system that provides single programming and operating environment within which one can investigate effects of various parallel-computing architectures and combinations on performance in solution of complicated problems like those of three-dimensional flows in turbomachines. Hypercluster software and architectural concepts are in public domain.
C++QEDv2: The multi-array concept and compile-time algorithms in the definition of composite quantum systems

NASA Astrophysics Data System (ADS)

Vukics, András

2012-06-01

C++QED is a versatile framework for simulating open quantum dynamics. It allows to build arbitrarily complex quantum systems from elementary free subsystems and interactions, and simulate their time evolution with the available time-evolution drivers. Through this framework, we introduce a design which should be generic for high-level representations of composite quantum systems. It relies heavily on the object-oriented and generic programming paradigms on one hand, and on the other hand, compile-time algorithms, in particular C++ template-metaprogramming techniques. The core of the design is the data structure which represents the state vectors of composite quantum systems. This data structure models the multi-array concept. The use of template metaprogramming is not only crucial to the design, but with its use all computations pertaining to the layout of the simulated system can be shifted to compile time, hence cutting on runtime. Program summaryProgram title: C++QED Catalogue identifier: AELU_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AELU_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions:http://cpc.cs.qub.ac.uk/licence/aelu_v1_0.html. The C++QED package contains other software packages, Blitz, Boost and FLENS, all of which may be distributed freely but have individual license requirements. Please see individual packages for license conditions. No. of lines in distributed program, including test data, etc.: 597 974 No. of bytes in distributed program, including test data, etc.: 4 874 839 Distribution format: tar.gz Programming language: C++ Computer: i386-i686, x86_64 Operating system: In principle cross-platform, as yet tested only on UNIX-like systems (including Mac OS X). RAM: The framework itself takes about 60 MB, which is fully shared. The additional memory taken by the program which defines the actual physical system (script) is typically less than 1 MB. The memory storing the actual data scales with the system dimension for state-vector manipulations, and the square of the dimension for density-operator manipulations. This might easily be GBs, and often the memory of the machine limits the size of the simulated system. Classification: 4.3, 4.13, 6.2, 20 External routines: Boost C++ libraries (http://www.boost.org/), GNU Scientific Library (http://www.gnu.org/software/gsl/), Blitz++ (http://www.oonumerics.org/blitz/), Linear Algebra Package - Flexible Library for Efficient Numerical Solutions (http://flens.sourceforge.net/). Nature of problem: Definition of (open) composite quantum systems out of elementary building blocks [1]. Manipulation of such systems, with emphasis on dynamical simulations such as Master-equation evolution [2] and Monte Carlo wave-function simulation [3]. Solution method: Master equation, Monte Carlo wave-function method. Restrictions: Total dimensionality of the system. Master equation - few thousands. Monte Carlo wave-function trajectory - several millions. Unusual features: Because of the heavy use of compile-time algorithms, compilation of programs written in the framework may take a long time and much memory (up to several GBs). Additional comments: The framework is not a program, but provides and implements an application-programming interface for developing simulations in the indicated problem domain. Supplementary information: http://cppqed.sourceforge.net/. Running time: Depending on the magnitude of the problem, can vary from a few seconds to weeks.
EM-ANIMATE - COMPUTER PROGRAM FOR DISPLAYING AND ANIMATING THE STEADY-STATE TIME-HARMONIC ELECTROMAGNETIC NEAR FIELD AND SURFACE-CURRENT SOLUTIONS

NASA Technical Reports Server (NTRS)

Hom, K. W.

1994-01-01

The EM-ANIMATE program is a specialized visualization program that displays and animates the near-field and surface-current solutions obtained from an electromagnetics program, in particular, that from MOM3D (LAR-15074). The EM-ANIMATE program is windows based and contains a user-friendly, graphical interface for setting viewing options, case selection, file manipulation, etc. EM-ANIMATE displays the field and surface-current magnitude as smooth shaded color fields (color contours) ranging from a minimum contour value to a maximum contour value for the fields and surface currents. The program can display either the total electric field or the scattered electric field in either time-harmonic animation mode or in the root mean square (RMS) average mode. The default setting is initially set to the minimum and maximum values within the field and surface current data and can be optionally set by the user. The field and surface-current value are animated by calculating and viewing the solution at user selectable radian time increments between 0 and 2pi. The surface currents can also be displayed in either time-harmonic animation mode or in RMS average mode. In RMS mode, the color contours do not vary with time, but show the constant time averaged field and surface-current magnitude solution. The electric field and surface-current directions can be displayed as scaled vector arrows which have a length proportional to the magnitude at each field grid point or surface node point. These vector properties can be viewed separately or concurrently with the field or surface-current magnitudes. Animation speed is improved by turning off the display of the vector arrows. In RMS modes, the direction vectors are still displayed as varying with time since the time averaged direction vectors would be zero length vectors. Other surface properties can optionally be viewed. These include the surface grid, the resistance value assigned to each element of the grid, and the power dissipation of each element which has an assigned resistance value. The EM-ANIMATE program will accept up to 10 different surface current cases each consisting of up to 20,000 node points and 10,000 triangle definitions and will animate one of these cases. The capability is used to compare surface-current distribution due to various initial excitation directions or electric field orientations. The program can accept up to 50 planes of field data consisting of a grid of 100 by 100 field points. These planes of data are user selectable and can be viewed individually or concurrently. With these preset limits, the program requires 55 megabytes of core memory to run. These limits can be changed in the header files to accommodate the available core memory of an individual workstation. An estimate of memory required can be made as follows: approximate memory in bytes equals (number of nodes times number of surfaces times 14 variables times bytes per word, typically 4 bytes per floating point) plus (number of field planes times number of nodes per plane times 21 variables times bytes per word). This gives the approximate memory size required to store the field and surface-current data. The total memory size is approximately 400,000 bytes plus the data memory size. The animation calculations are performed in real time at any user set time step. For Silicon Graphics Workstations that have multiple processors, this program has been optimized to perform these calculations on multiple processors to increase animation rates. The optimized program uses the SGI PFA (Power FORTRAN Accelerator) library. On single processor machines, the parallelization directives are seen as comments to the program and will have no effect on compilation or execution. EM-ANIMATE is written in FORTRAN 77 for implementation on SGI IRIS workstations running IRIX 3.0 or later. A minimum of 55Mb of RAM is required for execution of this program; however, the code may be modified to accommodate the available memory of an individual workstation. For program execution, twenty-four bit, double-buffered color capability is suggested, but not required. Sample input and output files and a sample executable are provided on the distribution medium. Electronic documentation is provided in PostScript format and in the form of IRIX man pages. The standard distribution medium for EM-ANIMATE is a .25 inch streaming magnetic IRIX tape cartridge in UNIX tar format. EM-ANIMATE is also available as part of a bundled package, COS-10048 that includes MOM3D, an IRIS program that produces electromagnetic near field and surface current solutions. This program was developed in 1993.
User-Defined Data Distributions in High-Level Programming Languages

NASA Technical Reports Server (NTRS)

Diaconescu, Roxana E.; Zima, Hans P.

2006-01-01

One of the characteristic features of today s high performance computing systems is a physically distributed memory. Efficient management of locality is essential for meeting key performance requirements for these architectures. The standard technique for dealing with this issue has involved the extension of traditional sequential programming languages with explicit message passing, in the context of a processor-centric view of parallel computation. This has resulted in complex and error-prone assembly-style codes in which algorithms and communication are inextricably interwoven. This paper presents a high-level approach to the design and implementation of data distributions. Our work is motivated by the need to improve the current parallel programming methodology by introducing a paradigm supporting the development of efficient and reusable parallel code. This approach is currently being implemented in the context of a new programming language called Chapel, which is designed in the HPCS project Cascade.
BSR: B-spline atomic R-matrix codes

NASA Astrophysics Data System (ADS)

Zatsarinny, Oleg

2006-02-01

BSR is a general program to calculate atomic continuum processes using the B-spline R-matrix method, including electron-atom and electron-ion scattering, and radiative processes such as bound-bound transitions, photoionization and polarizabilities. The calculations can be performed in LS-coupling or in an intermediate-coupling scheme by including terms of the Breit-Pauli Hamiltonian. New version program summaryTitle of program: BSR Catalogue identifier: ADWY Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADWY Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Computers on which the program has been tested: Microway Beowulf cluster; Compaq Beowulf cluster; DEC Alpha workstation; DELL PC Operating systems under which the new version has been tested: UNIX, Windows XP Programming language used: FORTRAN 95 Memory required to execute with typical data: Typically 256-512 Mwords. Since all the principal dimensions are allocatable, the available memory defines the maximum complexity of the problem No. of bits in a word: 8 No. of processors used: 1 Has the code been vectorized or parallelized?: no No. of lines in distributed program, including test data, etc.: 69 943 No. of bytes in distributed program, including test data, etc.: 746 450 Peripherals used: scratch disk store; permanent disk store Distribution format: tar.gz Nature of physical problem: This program uses the R-matrix method to calculate electron-atom and electron-ion collision processes, with options to calculate radiative data, photoionization, etc. The calculations can be performed in LS-coupling or in an intermediate-coupling scheme, with options to include Breit-Pauli terms in the Hamiltonian. Method of solution: The R-matrix method is used [P.G. Burke, K.A. Berrington, Atomic and Molecular Processes: An R-Matrix Approach, IOP Publishing, Bristol, 1993; P.G. Burke, W.D. Robb, Adv. At. Mol. Phys. 11 (1975) 143; K.A. Berrington, W.B. Eissner, P.H. Norrington, Comput. Phys. Comm. 92 (1995) 290].

A portable approach for PIC on emerging architectures

NASA Astrophysics Data System (ADS)

Decyk, Viktor

2016-03-01

A portable approach for designing Particle-in-Cell (PIC) algorithms on emerging exascale computers, is based on the recognition that 3 distinct programming paradigms are needed. They are: low level vector (SIMD) processing, middle level shared memory parallel programing, and high level distributed memory programming. In addition, there is a memory hierarchy associated with each level. Such algorithms can be initially developed using vectorizing compilers, OpenMP, and MPI. This is the approach recommended by Intel for the Phi processor. These algorithms can then be translated and possibly specialized to other programming models and languages, as needed. For example, the vector processing and shared memory programming might be done with CUDA instead of vectorizing compilers and OpenMP, but generally the algorithm itself is not greatly changed. The UCLA PICKSC web site at http://www.idre.ucla.edu/ contains example open source skeleton codes (mini-apps) illustrating each of these three programming models, individually and in combination. Fortran2003 now supports abstract data types, and design patterns can be used to support a variety of implementations within the same code base. Fortran2003 also supports interoperability with C so that implementations in C languages are also easy to use. Finally, main codes can be translated into dynamic environments such as Python, while still taking advantage of high performing compiled languages. Parallel languages are still evolving with interesting developments in co-Array Fortran, UPC, and OpenACC, among others, and these can also be supported within the same software architecture. Work supported by NSF and DOE Grants.
Preconditioned implicit solvers for the Navier-Stokes equations on distributed-memory machines

NASA Technical Reports Server (NTRS)

Ajmani, Kumud; Liou, Meng-Sing; Dyson, Rodger W.

1994-01-01

The GMRES method is parallelized, and combined with local preconditioning to construct an implicit parallel solver to obtain steady-state solutions for the Navier-Stokes equations of fluid flow on distributed-memory machines. The new implicit parallel solver is designed to preserve the convergence rate of the equivalent 'serial' solver. A static domain-decomposition is used to partition the computational domain amongst the available processing nodes of the parallel machine. The SPMD (Single-Program Multiple-Data) programming model is combined with message-passing tools to develop the parallel code on a 32-node Intel Hypercube and a 512-node Intel Delta machine. The implicit parallel solver is validated for internal and external flow problems, and is found to compare identically with flow solutions obtained on a Cray Y-MP/8. A peak computational speed of 2300 MFlops/sec has been achieved on 512 nodes of the Intel Delta machine,k for a problem size of 1024 K equations (256 K grid points).
Efficient implementation of multidimensional fast fourier transform on a distributed-memory parallel multi-node computer

DOEpatents

Bhanot, Gyan V [Princeton, NJ; Chen, Dong [Croton-On-Hudson, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Steinmacher-Burow, Burkhard D [Mount Kisco, NY; Vranas, Pavlos M [Bedford Hills, NY

2012-01-10

The present in invention is directed to a method, system and program storage device for efficiently implementing a multidimensional Fast Fourier Transform (FFT) of a multidimensional array comprising a plurality of elements initially distributed in a multi-node computer system comprising a plurality of nodes in communication over a network, comprising: distributing the plurality of elements of the array in a first dimension across the plurality of nodes of the computer system over the network to facilitate a first one-dimensional FFT; performing the first one-dimensional FFT on the elements of the array distributed at each node in the first dimension; re-distributing the one-dimensional FFT-transformed elements at each node in a second dimension via "all-to-all" distribution in random order across other nodes of the computer system over the network; and performing a second one-dimensional FFT on elements of the array re-distributed at each node in the second dimension, wherein the random order facilitates efficient utilization of the network thereby efficiently implementing the multidimensional FFT. The "all-to-all" re-distribution of array elements is further efficiently implemented in applications other than the multidimensional FFT on the distributed-memory parallel supercomputer.
Efficient implementation of a multidimensional fast fourier transform on a distributed-memory parallel multi-node computer

DOEpatents

Bhanot, Gyan V [Princeton, NJ; Chen, Dong [Croton-On-Hudson, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Steinmacher-Burow, Burkhard D [Mount Kisco, NY; Vranas, Pavlos M [Bedford Hills, NY

2008-01-01

The present in invention is directed to a method, system and program storage device for efficiently implementing a multidimensional Fast Fourier Transform (FFT) of a multidimensional array comprising a plurality of elements initially distributed in a multi-node computer system comprising a plurality of nodes in communication over a network, comprising: distributing the plurality of elements of the array in a first dimension across the plurality of nodes of the computer system over the network to facilitate a first one-dimensional FFT; performing the first one-dimensional FFT on the elements of the array distributed at each node in the first dimension; re-distributing the one-dimensional FFT-transformed elements at each node in a second dimension via "all-to-all" distribution in random order across other nodes of the computer system over the network; and performing a second one-dimensional FFT on elements of the array re-distributed at each node in the second dimension, wherein the random order facilitates efficient utilization of the network thereby efficiently implementing the multidimensional FFT. The "all-to-all" re-distribution of array elements is further efficiently implemented in applications other than the multidimensional FFT on the distributed-memory parallel supercomputer.
QRAP: A numerical code for projected (Q)uasiparticle (RA)ndom (P)hase approximation

NASA Astrophysics Data System (ADS)

Samana, A. R.; Krmpotić, F.; Bertulani, C. A.

2010-06-01

A computer code for quasiparticle random phase approximation - QRPA and projected quasiparticle random phase approximation - PQRPA models of nuclear structure is explained in details. The residual interaction is approximated by a simple δ-force. An important application of the code consists in evaluating nuclear matrix elements involved in neutrino-nucleus reactions. As an example, cross sections for 56Fe and 12C are calculated and the code output is explained. The application to other nuclei and the description of other nuclear and weak decay processes are also discussed. Program summaryTitle of program: QRAP ( Quasiparticle RAndom Phase approximation) Computers: The code has been created on a PC, but also runs on UNIX or LINUX machines Operating systems: WINDOWS or UNIX Program language used: Fortran-77 Memory required to execute with typical data: 16 Mbytes of RAM memory and 2 MB of hard disk space No. of lines in distributed program, including test data, etc.: ˜ 8000 No. of bytes in distributed program, including test data, etc.: ˜ 256 kB Distribution format: tar.gz Nature of physical problem: The program calculates neutrino- and antineutrino-nucleus cross sections as a function of the incident neutrino energy, and muon capture rates, using the QRPA or PQRPA as nuclear structure models. Method of solution: The QRPA, or PQRPA, equations are solved in a self-consistent way for even-even nuclei. The nuclear matrix elements for the neutrino-nucleus interaction are treated as the beta inverse reaction of odd-odd nuclei as function of the transfer momentum. Typical running time: ≈ 5 min on a 3 GHz processor for Data set 1.
Simulation of n-qubit quantum systems. III. Quantum operations

NASA Astrophysics Data System (ADS)

Radtke, T.; Fritzsche, S.

2007-05-01

During the last decade, several quantum information protocols, such as quantum key distribution, teleportation or quantum computation, have attracted a lot of interest. Despite the recent success and research efforts in quantum information processing, however, we are just at the beginning of understanding the role of entanglement and the behavior of quantum systems in noisy environments, i.e. for nonideal implementations. Therefore, in order to facilitate the investigation of entanglement and decoherence in n-qubit quantum registers, here we present a revised version of the FEYNMAN program for working with quantum operations and their associated (Jamiołkowski) dual states. Based on the implementation of several popular decoherence models, we provide tools especially for the quantitative analysis of quantum operations. Apart from the implementation of different noise models, the current program extension may help investigate the fragility of many quantum states, one of the main obstacles in realizing quantum information protocols today. Program summaryTitle of program: Feynman Catalogue identifier: ADWE_v3_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADWE_v3_0 Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Licensing provisions: None Operating systems: Any system that supports MAPLE; tested under Microsoft Windows XP, SuSe Linux 10 Program language used:MAPLE 10 Typical time and memory requirements: Most commands that act upon quantum registers with five or less qubits take ⩽10 seconds of processor time (on a Pentium 4 processor with ⩾2 GHz or equivalent) and 5-20 MB of memory. Especially when working with symbolic expressions, however, the memory and time requirements critically depend on the number of qubits in the quantum registers, owing to the exponential dimension growth of the associated Hilbert space. For example, complex (symbolic) noise models (with several Kraus operators) for multi-qubit systems often result in very large symbolic expressions that dramatically slow down the evaluation of measures or other quantities. In these cases, MAPLE's assume facility sometimes helps to reduce the complexity of symbolic expressions, but often only numerical evaluation is possible. Since the complexity of the FEYNMAN commands is very different, no general scaling law for the CPU time and memory usage can be given. No. of bytes in distributed program including test data, etc.: 799 265 No. of lines in distributed program including test data, etc.: 18 589 Distribution format: tar.gz Reasons for new version: While the previous program versions were designed mainly to create and manipulate the state of quantum registers, the present extension aims to support quantum operations as the essential ingredient for studying the effects of noisy environments. Does this version supersede the previous version: Yes Nature of the physical problem: Today, entanglement is identified as the essential resource in virtually all aspects of quantum information theory. In most practical implementations of quantum information protocols, however, decoherence typically limits the lifetime of entanglement. It is therefore necessary and highly desirable to understand the evolution of entanglement in noisy environments. Method of solution: Using the computer algebra system MAPLE, we have developed a set of procedures that support the definition and manipulation of n-qubit quantum registers as well as (unitary) logic gates and (nonunitary) quantum operations that act on the quantum registers. The provided hierarchy of commands can be used interactively in order to simulate and analyze the evolution of n-qubit quantum systems in ideal and nonideal quantum circuits.
Memory-efficient dynamic programming backtrace and pairwise local sequence alignment.

PubMed

Newberg, Lee A

2008-08-15

A backtrace through a dynamic programming algorithm's intermediate results in search of an optimal path, or to sample paths according to an implied probability distribution, or as the second stage of a forward-backward algorithm, is a task of fundamental importance in computational biology. When there is insufficient space to store all intermediate results in high-speed memory (e.g. cache) existing approaches store selected stages of the computation, and recompute missing values from these checkpoints on an as-needed basis. Here we present an optimal checkpointing strategy, and demonstrate its utility with pairwise local sequence alignment of sequences of length 10,000. Sample C++-code for optimal backtrace is available in the Supplementary Materials. Supplementary data is available at Bioinformatics online.
Hybrid-view programming of nuclear fusion simulation code in the PGAS parallel programming language XcalableMP

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tsugane, Keisuke; Boku, Taisuke; Murai, Hitoshi

Recently, the Partitioned Global Address Space (PGAS) parallel programming model has emerged as a usable distributed memory programming model. XcalableMP (XMP) is a PGAS parallel programming language that extends base languages such as C and Fortran with directives in OpenMP-like style. XMP supports a global-view model that allows programmers to define global data and to map them to a set of processors, which execute the distributed global data as a single thread. In XMP, the concept of a coarray is also employed for local-view programming. In this study, we port Gyrokinetic Toroidal Code - Princeton (GTC-P), which is a three-dimensionalmore » gyrokinetic PIC code developed at Princeton University to study the microturbulence phenomenon in magnetically confined fusion plasmas, to XMP as an example of hybrid memory model coding with the global-view and local-view programming models. In local-view programming, the coarray notation is simple and intuitive compared with Message Passing Interface (MPI) programming while the performance is comparable to that of the MPI version. Thus, because the global-view programming model is suitable for expressing the data parallelism for a field of grid space data, we implement a hybrid-view version using a global-view programming model to compute the field and a local-view programming model to compute the movement of particles. Finally, the performance is degraded by 20% compared with the original MPI version, but the hybrid-view version facilitates more natural data expression for static grid space data (in the global-view model) and dynamic particle data (in the local-view model), and it also increases the readability of the code for higher productivity.« less
Hybrid-view programming of nuclear fusion simulation code in the PGAS parallel programming language XcalableMP

DOE PAGES

Tsugane, Keisuke; Boku, Taisuke; Murai, Hitoshi; ...

2016-06-01

Recently, the Partitioned Global Address Space (PGAS) parallel programming model has emerged as a usable distributed memory programming model. XcalableMP (XMP) is a PGAS parallel programming language that extends base languages such as C and Fortran with directives in OpenMP-like style. XMP supports a global-view model that allows programmers to define global data and to map them to a set of processors, which execute the distributed global data as a single thread. In XMP, the concept of a coarray is also employed for local-view programming. In this study, we port Gyrokinetic Toroidal Code - Princeton (GTC-P), which is a three-dimensionalmore » gyrokinetic PIC code developed at Princeton University to study the microturbulence phenomenon in magnetically confined fusion plasmas, to XMP as an example of hybrid memory model coding with the global-view and local-view programming models. In local-view programming, the coarray notation is simple and intuitive compared with Message Passing Interface (MPI) programming while the performance is comparable to that of the MPI version. Thus, because the global-view programming model is suitable for expressing the data parallelism for a field of grid space data, we implement a hybrid-view version using a global-view programming model to compute the field and a local-view programming model to compute the movement of particles. Finally, the performance is degraded by 20% compared with the original MPI version, but the hybrid-view version facilitates more natural data expression for static grid space data (in the global-view model) and dynamic particle data (in the local-view model), and it also increases the readability of the code for higher productivity.« less
Parallel Computing for Probabilistic Response Analysis of High Temperature Composites

NASA Technical Reports Server (NTRS)

Sues, R. H.; Lua, Y. J.; Smith, M. D.

1994-01-01

The objective of this Phase I research was to establish the required software and hardware strategies to achieve large scale parallelism in solving PCM problems. To meet this objective, several investigations were conducted. First, we identified the multiple levels of parallelism in PCM and the computational strategies to exploit these parallelisms. Next, several software and hardware efficiency investigations were conducted. These involved the use of three different parallel programming paradigms and solution of two example problems on both a shared-memory multiprocessor and a distributed-memory network of workstations.
Upsets in Erased Floating Gate Cells With High-Energy Protons

DOE PAGES

Gerardin, S.; Bagatin, M.; Paccagnella, A.; ...

2017-01-01

We discuss upsets in erased floating gate cells, due to large threshold voltage shifts, using statistical distributions collected on a large number of memory cells. The spread in the neutral threshold voltage appears to be too low to quantitatively explain the experimental observations in terms of simple charge loss, at least in SLC devices. The possibility that memories exposed to high energy protons and heavy ions exhibit negative charge transfer between programmed and erased cells is investigated, although the analysis does not provide conclusive support to this hypothesis.
Schemas in Problem Solving: An Integrated Model of Learning, Memory, and Instruction

DTIC Science & Technology

1992-01-01

article: "Hybrid Computation in Cognitive Science: Neural Networks and Symbols" (J. A. Anderson, 1990). And, Marvin Minsky echoes the sentiment in his...distributed processing: A handbook of models, programs, and exercises. Cambridge, MA: The MIT Press. Minsky , M. (1991). Logical versus analogical or symbolic
Modeling the glass transition of amorphous networks for shape-memory behavior

NASA Astrophysics Data System (ADS)

Xiao, Rui; Choi, Jinwoo; Lakhera, Nishant; Yakacki, Christopher M.; Frick, Carl P.; Nguyen, Thao D.

2013-07-01

In this paper, a thermomechanical constitutive model was developed for the time-dependent behaviors of the glass transition of amorphous networks. The model used multiple discrete relaxation processes to describe the distribution of relaxation times for stress relaxation, structural relaxation, and stress-activated viscous flow. A non-equilibrium thermodynamic framework based on the fictive temperature was introduced to demonstrate the thermodynamic consistency of the constitutive theory. Experimental and theoretical methods were developed to determine the parameters describing the distribution of stress and structural relaxation times and the dependence of the relaxation times on temperature, structure, and driving stress. The model was applied to study the effects of deformation temperatures and physical aging on the shape-memory behavior of amorphous networks. The model was able to reproduce important features of the partially constrained recovery response observed in experiments. Specifically, the model demonstrated a strain-recovery overshoot for cases programmed below Tg and subjected to a constant mechanical load. This phenomenon was not observed for materials programmed above Tg. Physical aging, in which the material was annealed for an extended period of time below Tg, shifted the activation of strain recovery to higher temperatures and increased significantly the initial recovery rate. For fixed-strain recovery, the model showed a larger overshoot in the stress response for cases programmed below Tg, which was consistent with previous experimental observations. Altogether, this work demonstrates how an understanding of the time-dependent behaviors of the glass transition can be used to tailor the temperature and deformation history of the shape-memory programming process to achieve more complex shape recovery pathways, faster recovery responses, and larger activation stresses.
A general purpose subroutine for fast fourier transform on a distributed memory parallel machine

NASA Technical Reports Server (NTRS)

Dubey, A.; Zubair, M.; Grosch, C. E.

1992-01-01

One issue which is central in developing a general purpose Fast Fourier Transform (FFT) subroutine on a distributed memory parallel machine is the data distribution. It is possible that different users would like to use the FFT routine with different data distributions. Thus, there is a need to design FFT schemes on distributed memory parallel machines which can support a variety of data distributions. An FFT implementation on a distributed memory parallel machine which works for a number of data distributions commonly encountered in scientific applications is presented. The problem of rearranging the data after computing the FFT is also addressed. The performance of the implementation on a distributed memory parallel machine Intel iPSC/860 is evaluated.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Sayan Ghosh, Jeff Hammond

OpenSHMEM is a community effort to unifyt and standardize the SHMEM programming model. MPI (Message Passing Interface) is a well-known community standard for parallel programming using distributed memory. The most recen t release of MPI, version 3.0, was designed in part to support programming models like SHMEM.OSHMPI is an implementation of the OpenSHMEM standard using MPI-3 for the Linux operating system. It is the first implementation of SHMEM over MPI one-sided communication and has the potential to be widely adopted due to the portability and widely availability of Linux and MPI-3. OSHMPI has been tested on a variety of systemsmore » and implementations of MPI-3, includingInfiniBand clusters using MVAPICH2 and SGI shared-memory supercomputers using MPICH. Current support is limited to Linux but may be extended to Apple OSX if there is sufficient interest. The code is opensource via https://github.com/jeffhammond/oshmpi« less
The application of a sparse, distributed memory to the detection, identification and manipulation of physical objects

NASA Technical Reports Server (NTRS)

Kanerva, P.

1986-01-01

To determine the relation of the sparse, distributed memory to other architectures, a broad review of the literature was made. The memory is called a pattern memory because they work with large patterns of features (high-dimensional vectors). A pattern is stored in a pattern memory by distributing it over a large number of storage elements and by superimposing it over other stored patterns. A pattern is retrieved by mathematical or statistical reconstruction from the distributed elements. Three pattern memories are discussed.
SCELib2: the new revision of SCELib, the parallel computational library of molecular properties in the single center approach

NASA Astrophysics Data System (ADS)

Sanna, N.; Morelli, G.

2004-09-01

In this paper we present the new version of the SCELib program (CPC Catalogue identifier ADMG) a full numerical implementation of the Single Center Expansion (SCE) method. The physics involved is that of producing the SCE description of molecular electronic densities, of molecular electrostatic potentials and of molecular perturbed potentials due to a point negative or positive charge. This new revision of the program has been optimized to run in serial as well as in parallel execution mode, to support a larger set of molecular symmetries and to permit the restart of long-lasting calculations. To measure the performance of this new release, a comparative study has been carried out on the most powerful computing architectures in serial and parallel runs. The results of the calculations reported in this paper refer to real cases medium to large molecular systems and they are reported in full details to benchmark at best the parallel architectures the new SCELib code will run on. Program summaryTitle of program: SCELib2 Catalogue identifier: ADGU Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADGU Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Reference to previous versions: Comput. Phys. Commun. 128 (2) (2000) 139 (CPC catalogue identifier: ADMG) Does the new version supersede the original program?: Yes Computer for which the program is designed and others on which it has been tested: HP ES45 and rx2600, SUN ES4500, IBM SP and any single CPU workstation based on Alpha, SPARC, POWER, Itanium2 and X86 processors Installations: CASPUR, local Operating systems under which the program has been tested: HP Tru64 V5.X, SUNOS V5.8, IBM AIX V5.X, Linux RedHat V8.0 Programming language used: C Memory required to execute with typical data: 10 Mwords. Up to 2000 Mwords depending on the molecular system and runtime parameters No. of bits in a word: 64 No. of processors used: 1 to 32 Has the code been vectorized or parallelized?: Yes No. of bytes in distributed program, including test data, etc.: 3 798 507 No. of lines in distributed program, including test data, etc.: 187 226 Distribution format: tar.gz Nature of physical problem: In this set of codes an efficient procedure is implemented to describe the wavefunction and related molecular properties of a polyatomic molecular system within the Single Center of Expansion (SCE) approximation. The resulting SCE wavefunction, electron density, electrostatic and exchange/correlation potentials can then be used via a proper Application Programming Interface (API) to describe the target molecular system which can be employed in electron-molecule scattering calculations. The molecular properties expanded over a single center turn out to also be of more general application and some possible uses in quantum chemistry, biomodelling and drug design are also outlined. Method of solution: The polycentre Hartee-Fock solution for a molecule of arbitrary geometry, based on linear combination of Gaussian-Type Orbital (GTO), is expanded over a single center, typically the Center Of Mass (C.O.M.), by means of a Gauss-Legendre/Chebyschev quadrature over the θ, φ angular coordinates. The resulting SCE numerical wavefunction is then used to calculate the one-particle electron density, the electrostatic potential and two different models for the correlation/polarization potentials induced by the impinging electron, which have the correct asymptotic behaviour for the leading dipole molecular polarizabilities. Restrictions on the complexity of the problem: Depending on the molecular system under study and on the operating conditions the program may or may not fit into available RAM memory. In this case a feature of the program is to memory map a disk file in order to efficiently access the memory data through a disk device. Typical running time: The execution time strongly depends on the molecular target description and on the hardware/OS chosen, it is directly proportional to the ( r, θ, φ) grid size and to the number of angular basis functions used. Thus, from the program printout of the main arrays memory occupancy, the user can approximately derive the expected computer time needed for a given calculation executed in serial mode. For parallel executions the overall efficiency must be further taken into account, and this depends on the no. of processors used as well as on the parallel architecture chosen, so a simple general law is at present not determinable. Unusual features of the program: The code has been engineered to use dynamical, runtime determined, global parameters with the aim to have all the data fitted in the RAM memory. Some unusual circumstances, e.g., when using large values of those parameters, may cause the program to run with unexpected performance reductions due to runtime bottlenecks like those caused by memory swap operations which strongly depend on the hardware used. In such cases, a parallel execution of the code is generally sufficient to fix the problem since the data size is partitioned over the available processors. When a suitable parallel system is not available for execution, a mechanism of memory mapped file can be used; with this option on, all the available memory will be used as a buffer for a disk file which contains the whole data set, thus having a better throughput with respect to the traditional swapping/paging of the Unix OS.
ELAS: A general-purpose computer program for the equilibrium problems of linear structures. Volume 2: Documentation of the program. [subroutines and flow charts

NASA Technical Reports Server (NTRS)

Utku, S.

1969-01-01

A general purpose digital computer program for the in-core solution of linear equilibrium problems of structural mechanics is documented. The program requires minimum input for the description of the problem. The solution is obtained by means of the displacement method and the finite element technique. Almost any geometry and structure may be handled because of the availability of linear, triangular, quadrilateral, tetrahedral, hexahedral, conical, triangular torus, and quadrilateral torus elements. The assumption of piecewise linear deflection distribution insures monotonic convergence of the deflections from the stiffer side with decreasing mesh size. The stresses are provided by the best-fit strain tensors in the least squares at the mesh points where the deflections are given. The selection of local coordinate systems whenever necessary is automatic. The core memory is used by means of dynamic memory allocation, an optional mesh-point relabelling scheme and imposition of the boundary conditions during the assembly time.
Fortran programs for the time-dependent Gross-Pitaevskii equation in a fully anisotropic trap

NASA Astrophysics Data System (ADS)

Muruganandam, P.; Adhikari, S. K.

2009-10-01

Here we develop simple numerical algorithms for both stationary and non-stationary solutions of the time-dependent Gross-Pitaevskii (GP) equation describing the properties of Bose-Einstein condensates at ultra low temperatures. In particular, we consider algorithms involving real- and imaginary-time propagation based on a split-step Crank-Nicolson method. In a one-space-variable form of the GP equation we consider the one-dimensional, two-dimensional circularly-symmetric, and the three-dimensional spherically-symmetric harmonic-oscillator traps. In the two-space-variable form we consider the GP equation in two-dimensional anisotropic and three-dimensional axially-symmetric traps. The fully-anisotropic three-dimensional GP equation is also considered. Numerical results for the chemical potential and root-mean-square size of stationary states are reported using imaginary-time propagation programs for all the cases and compared with previously obtained results. Also presented are numerical results of non-stationary oscillation for different trap symmetries using real-time propagation programs. A set of convenient working codes developed in Fortran 77 are also provided for all these cases (twelve programs in all). In the case of two or three space variables, Fortran 90/95 versions provide some simplification over the Fortran 77 programs, and these programs are also included (six programs in all). Program summaryProgram title: (i) imagetime1d, (ii) imagetime2d, (iii) imagetime3d, (iv) imagetimecir, (v) imagetimesph, (vi) imagetimeaxial, (vii) realtime1d, (viii) realtime2d, (ix) realtime3d, (x) realtimecir, (xi) realtimesph, (xii) realtimeaxial Catalogue identifier: AEDU_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDU_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 122 907 No. of bytes in distributed program, including test data, etc.: 609 662 Distribution format: tar.gz Programming language: FORTRAN 77 and Fortran 90/95 Computer: PC Operating system: Linux, Unix RAM: 1 GByte (i, iv, v), 2 GByte (ii, vi, vii, x, xi), 4 GByte (iii, viii, xii), 8 GByte (ix) Classification: 2.9, 4.3, 4.12 Nature of problem: These programs are designed to solve the time-dependent Gross-Pitaevskii nonlinear partial differential equation in one-, two- or three-space dimensions with a harmonic, circularly-symmetric, spherically-symmetric, axially-symmetric or anisotropic trap. The Gross-Pitaevskii equation describes the properties of a dilute trapped Bose-Einstein condensate. Solution method: The time-dependent Gross-Pitaevskii equation is solved by the split-step Crank-Nicolson method by discretizing in space and time. The discretized equation is then solved by propagation, in either imaginary or real time, over small time steps. The method yields the solution of stationary and/or non-stationary problems. Additional comments: This package consists of 12 programs, see "Program title", above. FORTRAN77 versions are provided for each of the 12 and, in addition, Fortran 90/95 versions are included for ii, iii, vi, viii, ix, xii. For the particular purpose of each program please see the below. Running time: Minutes on a medium PC (i, iv, v, vii, x, xi), a few hours on a medium PC (ii, vi, viii, xii), days on a medium PC (iii, ix). Program summary (1)Title of program: imagtime1d.F Title of electronic file: imagtime1d.tar.gz Catalogue identifier: Program summary URL: Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Distribution format: tar.gz Computers: PC/Linux, workstation/UNIX Maximum RAM memory: 1 GByte Programming language used: Fortran 77 Typical running time: Minutes on a medium PC Unusual features: None Nature of physical problem: This program is designed to solve the time-dependent Gross-Pitaevskii nonlinear partial differential equation in one-space dimension with a harmonic trap. The Gross-Pitaevskii equation describes the properties of a dilute trapped Bose-Einstein condensate. Method of solution: The time-dependent Gross-Pitaevskii equation is solved by the split-step Crank-Nicolson method by discretizing in space and time. The discretized equation is then solved by propagation in imaginary time over small time steps. The method yields the solution of stationary problems. Program summary (2)Title of program: imagtimecir.F Title of electronic file: imagtimecir.tar.gz Catalogue identifier: Program summary URL: Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Distribution format: tar.gz Computers: PC/Linux, workstation/UNIX Maximum RAM memory: 1 GByte Programming language used: Fortran 77 Typical running time: Minutes on a medium PC Unusual features: None Nature of physical problem: This program is designed to solve the time-dependent Gross-Pitaevskii nonlinear partial differential equation in two-space dimensions with a circularly-symmetric trap. The Gross-Pitaevskii equation describes the properties of a dilute trapped Bose-Einstein condensate. Method of solution: The time-dependent Gross-Pitaevskii equation is solved by the split-step Crank-Nicolson method by discretizing in space and time. The discretized equation is then solved by propagation in imaginary time over small time steps. The method yields the solution of stationary problems. Program summary (3)Title of program: imagtimesph.F Title of electronic file: imagtimesph.tar.gz Catalogue identifier: Program summary URL: Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Distribution format: tar.gz Computers: PC/Linux, workstation/UNIX Maximum RAM memory: 1 GByte Programming language used: Fortran 77 Typical running time: Minutes on a medium PC Unusual features: None Nature of physical problem: This program is designed to solve the time-dependent Gross-Pitaevskii nonlinear partial differential equation in three-space dimensions with a spherically-symmetric trap. The Gross-Pitaevskii equation describes the properties of a dilute trapped Bose-Einstein condensate. Method of solution: The time-dependent Gross-Pitaevskii equation is solved by the split-step Crank-Nicolson method by discretizing in space and time. The discretized equation is then solved by propagation in imaginary time over small time steps. The method yields the solution of stationary problems. Program summary (4)Title of program: realtime1d.F Title of electronic file: realtime1d.tar.gz Catalogue identifier: Program summary URL: Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Distribution format: tar.gz Computers: PC/Linux, workstation/UNIX Maximum RAM memory: 2 GByte Programming language used: Fortran 77 Typical running time: Minutes on a medium PC Unusual features: None Nature of physical problem: This program is designed to solve the time-dependent Gross-Pitaevskii nonlinear partial differential equation in one-space dimension with a harmonic trap. The Gross-Pitaevskii equation describes the properties of a dilute trapped Bose-Einstein condensate. Method of solution: The time-dependent Gross-Pitaevskii equation is solved by the split-step Crank-Nicolson method by discretizing in space and time. The discretized equation is then solved by propagation in real time over small time steps. The method yields the solution of stationary and non-stationary problems. Program summary (5)Title of program: realtimecir.F Title of electronic file: realtimecir.tar.gz Catalogue identifier: Program summary URL: Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Distribution format: tar.gz Computers: PC/Linux, workstation/UNIX Maximum RAM memory: 2 GByte Programming language used: Fortran 77 Typical running time: Minutes on a medium PC Unusual features: None Nature of physical problem: This program is designed to solve the time-dependent Gross-Pitaevskii nonlinear partial differential equation in two-space dimensions with a circularly-symmetric trap. The Gross-Pitaevskii equation describes the properties of a dilute trapped Bose-Einstein condensate. Method of solution: The time-dependent Gross-Pitaevskii equation is solved by the split-step Crank-Nicolson method by discretizing in space and time. The discretized equation is then solved by propagation in real time over small time steps. The method yields the solution of stationary and non-stationary problems. Program summary (6)Title of program: realtimesph.F Title of electronic file: realtimesph.tar.gz Catalogue identifier: Program summary URL: Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Distribution format: tar.gz Computers: PC/Linux, workstation/UNIX Maximum RAM memory: 2 GByte Programming language used: Fortran 77 Typical running time: Minutes on a medium PC Unusual features: None Nature of physical problem: This program is designed to solve the time-dependent Gross-Pitaevskii nonlinear partial differential equation in three-space dimensions with a spherically-symmetric trap. The Gross-Pitaevskii equation describes the properties of a dilute trapped Bose-Einstein condensate. Method of solution: The time-dependent Gross-Pitaevskii equation is solved by the split-step Crank-Nicolson method by discretizing in space and time. The discretized equation is then solved by propagation in real time over small time steps. The method yields the solution of stationary and non-stationary problems. Program summary (7)Title of programs: imagtimeaxial.F and imagtimeaxial.f90 Title of electronic file: imagtimeaxial.tar.gz Catalogue identifier: Program summary URL: Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Distribution format: tar.gz Computers: PC/Linux, workstation/UNIX Maximum RAM memory: 2 GByte Programming language used: Fortran 77 and Fortran 90 Typical running time: Few hours on a medium PC Unusual features: None Nature of physical problem: This program is designed to solve the time-dependent Gross-Pitaevskii nonlinear partial differential equation in three-space dimensions with an axially-symmetric trap. The Gross-Pitaevskii equation describes the properties of a dilute trapped Bose-Einstein condensate. Method of solution: The time-dependent Gross-Pitaevskii equation is solved by the split-step Crank-Nicolson method by discretizing in space and time. The discretized equation is then solved by propagation in imaginary time over small time steps. The method yields the solution of stationary problems. Program summary (8)Title of program: imagtime2d.F and imagtime2d.f90 Title of electronic file: imagtime2d.tar.gz Catalogue identifier: Program summary URL: Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Distribution format: tar.gz Computers: PC/Linux, workstation/UNIX Maximum RAM memory: 2 GByte Programming language used: Fortran 77 and Fortran 90 Typical running time: Few hours on a medium PC Unusual features: None Nature of physical problem: This program is designed to solve the time-dependent Gross-Pitaevskii nonlinear partial differential equation in two-space dimensions with an anisotropic trap. The Gross-Pitaevskii equation describes the properties of a dilute trapped Bose-Einstein condensate. Method of solution: The time-dependent Gross-Pitaevskii equation is solved by the split-step Crank-Nicolson method by discretizing in space and time. The discretized equation is then solved by propagation in imaginary time over small time steps. The method yields the solution of stationary problems. Program summary (9)Title of program: realtimeaxial.F and realtimeaxial.f90 Title of electronic file: realtimeaxial.tar.gz Catalogue identifier: Program summary URL: Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Distribution format: tar.gz Computers: PC/Linux, workstation/UNIX Maximum RAM memory: 4 GByte Programming language used: Fortran 77 and Fortran 90 Typical running time Hours on a medium PC Unusual features: None Nature of physical problem: This program is designed to solve the time-dependent Gross-Pitaevskii nonlinear partial differential equation in three-space dimensions with an axially-symmetric trap. The Gross-Pitaevskii equation describes the properties of a dilute trapped Bose-Einstein condensate. Method of solution: The time-dependent Gross-Pitaevskii equation is solved by the split-step Crank-Nicolson method by discretizing in space and time. The discretized equation is then solved by propagation in real time over small time steps. The method yields the solution of stationary and non-stationary problems. Program summary (10)Title of program: realtime2d.F and realtime2d.f90 Title of electronic file: realtime2d.tar.gz Catalogue identifier: Program summary URL: Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Distribution format: tar.gz Computers: PC/Linux, workstation/UNIX Maximum RAM memory: 4 GByte Programming language used: Fortran 77 and Fortran 90 Typical running time: Hours on a medium PC Unusual features: None Nature of physical problem: This program is designed to solve the time-dependent Gross-Pitaevskii nonlinear partial differential equation in two-space dimensions with an anisotropic trap. The Gross-Pitaevskii equation describes the properties of a dilute trapped Bose-Einstein condensate. Method of solution: The time-dependent Gross-Pitaevskii equation is solved by the split-step Crank-Nicolson method by discretizing in space and time. The discretized equation is then solved by propagation in real time over small time steps. The method yields the solution of stationary and non-stationary problems. Program summary (11)Title of program: imagtime3d.F and imagtime3d.f90 Title of electronic file: imagtime3d.tar.gz Catalogue identifier: Program summary URL: Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Distribution format: tar.gz Computers: PC/Linux, workstation/UNIX Maximum RAM memory: 4 GByte Programming language used: Fortran 77 and Fortran 90 Typical running time: Few days on a medium PC Unusual features: None Nature of physical problem: This program is designed to solve the time-dependent Gross-Pitaevskii nonlinear partial differential equation in three-space dimensions with an anisotropic trap. The Gross-Pitaevskii equation describes the properties of a dilute trapped Bose-Einstein condensate. Method of solution: The time-dependent Gross-Pitaevskii equation is solved by the split-step Crank-Nicolson method by discretizing in space and time. The discretized equation is then solved by propagation in imaginary time over small time steps. The method yields the solution of stationary problems. Program summary (12)Title of program: realtime3d.F and realtime3d.f90 Title of electronic file: realtime3d.tar.gz Catalogue identifier: Program summary URL: Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Distribution format: tar.gz Computers: PC/Linux, workstation/UNIX Maximum Ram Memory: 8 GByte Programming language used: Fortran 77 and Fortran 90 Typical running time: Days on a medium PC Unusual features: None Nature of physical problem: This program is designed to solve the time-dependent Gross-Pitaevskii nonlinear partial differential equation in three-space dimensions with an anisotropic trap. The Gross-Pitaevskii equation describes the properties of a dilute trapped Bose-Einstein condensate. Method of solution: The time-dependent Gross-Pitaevskii equation is solved by the split-step Crank-Nicolson method by discretizing in space and time. The discretized equation is then solved by propagation in real time over small time steps. The method yields the solution of stationary and non-stationary problems.
Facilitating change in health-related behaviors and intentions: a randomized controlled trial of a multidimensional memory program for older adults.

PubMed

Wiegand, Melanie A; Troyer, Angela K; Gojmerac, Christina; Murphy, Kelly J

2013-01-01

Many older adults are concerned about memory changes with age and consequently seek ways to optimize their memory function. Memory programs are known to be variably effective in improving memory knowledge, other aspects of metamemory, and/or objective memory, but little is known about their impact on implementing and sustaining lifestyle and healthcare-seeking intentions and behaviors. We evaluated a multidimensional, evidence-based intervention, the Memory and Aging Program, that provides education about memory and memory change, training in the use of practical memory strategies, and support for implementation of healthy lifestyle behavior changes. In a randomized controlled trial, 42 healthy older adults participated in a program (n = 21) or a waitlist control (n = 21) group. Relative to the control group, participants in the program implemented more healthy lifestyle behaviors by the end of the program and maintained these changes 1 month later. Similarly, program participants reported a decreased intention to seek unnecessary medical attention for their memory immediately after the program and 1 month later. Findings support the use of multidimensional memory programs to promote healthy lifestyles and influence healthcare-seeking behaviors. Discussion focuses on implications of these changes for maximizing cognitive health and minimizing impact on healthcare resources.

Sparse distributed memory overview

NASA Technical Reports Server (NTRS)

Raugh, Mike

1990-01-01

The Sparse Distributed Memory (SDM) project is investigating the theory and applications of massively parallel computing architecture, called sparse distributed memory, that will support the storage and retrieval of sensory and motor patterns characteristic of autonomous systems. The immediate objectives of the project are centered in studies of the memory itself and in the use of the memory to solve problems in speech, vision, and robotics. Investigation of methods for encoding sensory data is an important part of the research. Examples of NASA missions that may benefit from this work are Space Station, planetary rovers, and solar exploration. Sparse distributed memory offers promising technology for systems that must learn through experience and be capable of adapting to new circumstances, and for operating any large complex system requiring automatic monitoring and control. Sparse distributed memory is a massively parallel architecture motivated by efforts to understand how the human brain works. Sparse distributed memory is an associative memory, able to retrieve information from cues that only partially match patterns stored in the memory. It is able to store long temporal sequences derived from the behavior of a complex system, such as progressive records of the system's sensory data and correlated records of the system's motor controls.
An enhanced Ada run-time system for real-time embedded processors

NASA Technical Reports Server (NTRS)

Sims, J. T.

1991-01-01

An enhanced Ada run-time system has been developed to support real-time embedded processor applications. The primary focus of this development effort has been on the tasking system and the memory management facilities of the run-time system. The tasking system has been extended to support efficient and precise periodic task execution as required for control applications. Event-driven task execution providing a means of task-asynchronous control and communication among Ada tasks is supported in this system. Inter-task control is even provided among tasks distributed on separate physical processors. The memory management system has been enhanced to provide object allocation and protected access support for memory shared between disjoint processors, each of which is executing a distinct Ada program.
Distributed Memory Breadth-First Search Revisited: Enabling Bottom-Up Search

DTIC Science & Technology

2013-01-03

Jun. 1972. [2] W. McLendon III, B. Hendrickson, S . J. Plimpton , and L. Rauchwerger, “Finding strongly connected components in distributed graphs,” J...Breadth-First Search Revisited: Enabling Bottom-Up Search 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR( S ) 5d. PROJECT...NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME( S ) AND ADDRESS(ES) University of California at Berkeley,Electrical
A parallel solver for huge dense linear systems

NASA Astrophysics Data System (ADS)

Badia, J. M.; Movilla, J. L.; Climente, J. I.; Castillo, M.; Marqués, M.; Mayo, R.; Quintana-Ortí, E. S.; Planelles, J.

2011-11-01

HDSS (Huge Dense Linear System Solver) is a Fortran Application Programming Interface (API) to facilitate the parallel solution of very large dense systems to scientists and engineers. The API makes use of parallelism to yield an efficient solution of the systems on a wide range of parallel platforms, from clusters of processors to massively parallel multiprocessors. It exploits out-of-core strategies to leverage the secondary memory in order to solve huge linear systems O(100.000). The API is based on the parallel linear algebra library PLAPACK, and on its Out-Of-Core (OOC) extension POOCLAPACK. Both PLAPACK and POOCLAPACK use the Message Passing Interface (MPI) as the communication layer and BLAS to perform the local matrix operations. The API provides a friendly interface to the users, hiding almost all the technical aspects related to the parallel execution of the code and the use of the secondary memory to solve the systems. In particular, the API can automatically select the best way to store and solve the systems, depending of the dimension of the system, the number of processes and the main memory of the platform. Experimental results on several parallel platforms report high performance, reaching more than 1 TFLOP with 64 cores to solve a system with more than 200 000 equations and more than 10 000 right-hand side vectors. New version program summaryProgram title: Huge Dense System Solver (HDSS) Catalogue identifier: AEHU_v1_1 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEHU_v1_1.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 87 062 No. of bytes in distributed program, including test data, etc.: 1 069 110 Distribution format: tar.gz Programming language: Fortran90, C Computer: Parallel architectures: multiprocessors, computer clusters Operating system: Linux/Unix Has the code been vectorized or parallelized?: Yes, includes MPI primitives. RAM: Tested for up to 190 GB Classification: 6.5 External routines: MPI ( http://www.mpi-forum.org/), BLAS ( http://www.netlib.org/blas/), PLAPACK ( http://www.cs.utexas.edu/~plapack/), POOCLAPACK ( ftp://ftp.cs.utexas.edu/pub/rvdg/PLAPACK/pooclapack.ps) (code for PLAPACK and POOCLAPACK is included in the distribution). Catalogue identifier of previous version: AEHU_v1_0 Journal reference of previous version: Comput. Phys. Comm. 182 (2011) 533 Does the new version supersede the previous version?: Yes Nature of problem: Huge scale dense systems of linear equations, Ax=B, beyond standard LAPACK capabilities. Solution method: The linear systems are solved by means of parallelized routines based on the LU factorization, using efficient secondary storage algorithms when the available main memory is insufficient. Reasons for new version: In many applications we need to guarantee a high accuracy in the solution of very large linear systems and we can do it by using double-precision arithmetic. Summary of revisions: Version 1.1 Can be used to solve linear systems using double-precision arithmetic. New version of the initialization routine. The user can choose the kind of arithmetic and the values of several parameters of the environment. Running time: About 5 hours to solve a system with more than 200 000 equations and more than 10 000 right-hand side vectors using double-precision arithmetic on an eight-node commodity cluster with a total of 64 Intel cores.
Schemas in Problem Solving: An Integrated Model of Learning, Memory, and Instruction

DTIC Science & Technology

1992-01-01

reflected in the title of a recent article: "lybid Coupation, in Cognitive Science: Neural Networks ad Symbl (3. A Andesson, 1990). And, Marvin Mtuky...Rumneihart, D. E (1989). Explorations in parallel distributed processing: A handbook of models, programs, and exercises. Cambridge, MA: The MrT Press. Minsky
Cross-scale efficient tensor contractions for coupled cluster computations through multiple programming model backends

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ibrahim, Khaled Z.; Epifanovsky, Evgeny; Williams, Samuel

Coupled-cluster methods provide highly accurate models of molecular structure through explicit numerical calculation of tensors representing the correlation between electrons. These calculations are dominated by a sequence of tensor contractions, motivating the development of numerical libraries for such operations. While based on matrix–matrix multiplication, these libraries are specialized to exploit symmetries in the molecular structure and in electronic interactions, and thus reduce the size of the tensor representation and the complexity of contractions. The resulting algorithms are irregular and their parallelization has been previously achieved via the use of dynamic scheduling or specialized data decompositions. We introduce our efforts tomore » extend the Libtensor framework to work in the distributed memory environment in a scalable and energy-efficient manner. We achieve up to 240× speedup compared with the optimized shared memory implementation of Libtensor. We attain scalability to hundreds of thousands of compute cores on three distributed-memory architectures (Cray XC30 and XC40, and IBM Blue Gene/Q), and on a heterogeneous GPU-CPU system (Cray XK7). As the bottlenecks shift from being compute-bound DGEMM's to communication-bound collectives as the size of the molecular system scales, we adopt two radically different parallelization approaches for handling load-imbalance, tasking and bulk synchronous models. Nevertheless, we preserve a unified interface to both programming models to maintain the productivity of computational quantum chemists.« less
Cross-scale efficient tensor contractions for coupled cluster computations through multiple programming model backends

DOE PAGES

Ibrahim, Khaled Z.; Epifanovsky, Evgeny; Williams, Samuel; ...

2017-03-08

Coupled-cluster methods provide highly accurate models of molecular structure through explicit numerical calculation of tensors representing the correlation between electrons. These calculations are dominated by a sequence of tensor contractions, motivating the development of numerical libraries for such operations. While based on matrix–matrix multiplication, these libraries are specialized to exploit symmetries in the molecular structure and in electronic interactions, and thus reduce the size of the tensor representation and the complexity of contractions. The resulting algorithms are irregular and their parallelization has been previously achieved via the use of dynamic scheduling or specialized data decompositions. We introduce our efforts tomore » extend the Libtensor framework to work in the distributed memory environment in a scalable and energy-efficient manner. We achieve up to 240× speedup compared with the optimized shared memory implementation of Libtensor. We attain scalability to hundreds of thousands of compute cores on three distributed-memory architectures (Cray XC30 and XC40, and IBM Blue Gene/Q), and on a heterogeneous GPU-CPU system (Cray XK7). As the bottlenecks shift from being compute-bound DGEMM's to communication-bound collectives as the size of the molecular system scales, we adopt two radically different parallelization approaches for handling load-imbalance, tasking and bulk synchronous models. Nevertheless, we preserve a unified interface to both programming models to maintain the productivity of computational quantum chemists.« less
Mobile and replicated alignment of arrays in data-parallel programs

NASA Technical Reports Server (NTRS)

Chatterjee, Siddhartha; Gilbert, John R.; Schreiber, Robert

1993-01-01

When a data-parallel language like FORTRAN 90 is compiled for a distributed-memory machine, aggregate data objects (such as arrays) are distributed across the processor memories. The mapping determines the amount of residual communication needed to bring operands of parallel operations into alignment with each other. A common approach is to break the mapping into two stages: first, an alignment that maps all the objects to an abstract template, and then a distribution that maps the template to the processors. We solve two facets of the problem of finding alignments that reduce residual communication: we determine alignments that vary in loops, and objects that should have replicated alignments. We show that loop-dependent mobile alignment is sometimes necessary for optimum performance, and we provide algorithms with which a compiler can determine good mobile alignments for objects within do loops. We also identify situations in which replicated alignment is either required by the program itself (via spread operations) or can be used to improve performance. We propose an algorithm based on network flow that determines which objects to replicate so as to minimize the total amount of broadcast communication in replication. This work on mobile and replicated alignment extends our earlier work on determining static alignment.
Test program for 4-K memory card, JOLT microprocessor

NASA Technical Reports Server (NTRS)

Lilley, R. W.

1976-01-01

A memory test program is described for use with the JOLT microcomputer 4,096-word memory board used in development of an Omega navigation receiver. The program allows a quick test of the memory board by cycling the memory through all possible bit combinations in all words.
Immunological memory is associative

DOE Office of Scientific and Technical Information (OSTI.GOV)

Smith, D.J.; Forrest, S.; Perelson, A.S.

1996-12-31

The purpose of this paper is to show that immunological memory is an associative and robust memory that belongs to the class of sparse distributed memories. This class of memories derives its associative and robust nature by sparsely sampling the input space and distributing the data among many independent agents. Other members of this class include a model of the cerebellar cortex and Sparse Distributed Memory (SDM). First we present a simplified account of the immune response and immunological memory. Next we present SDM, and then we show the correlations between immunological memory and SDM. Finally, we show how associativemore » recall in the immune response can be both beneficial and detrimental to the fitness of an individual.« less
A numerical differentiation library exploiting parallel architectures

NASA Astrophysics Data System (ADS)

Voglis, C.; Hadjidoukas, P. E.; Lagaris, I. E.; Papageorgiou, D. G.

2009-08-01

We present a software library for numerically estimating first and second order partial derivatives of a function by finite differencing. Various truncation schemes are offered resulting in corresponding formulas that are accurate to order O(h), O(h), and O(h), h being the differencing step. The derivatives are calculated via forward, backward and central differences. Care has been taken that only feasible points are used in the case where bound constraints are imposed on the variables. The Hessian may be approximated either from function or from gradient values. There are three versions of the software: a sequential version, an OpenMP version for shared memory architectures and an MPI version for distributed systems (clusters). The parallel versions exploit the multiprocessing capability offered by computer clusters, as well as modern multi-core systems and due to the independent character of the derivative computation, the speedup scales almost linearly with the number of available processors/cores. Program summaryProgram title: NDL (Numerical Differentiation Library) Catalogue identifier: AEDG_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDG_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 73 030 No. of bytes in distributed program, including test data, etc.: 630 876 Distribution format: tar.gz Programming language: ANSI FORTRAN-77, ANSI C, MPI, OPENMP Computer: Distributed systems (clusters), shared memory systems Operating system: Linux, Solaris Has the code been vectorised or parallelized?: Yes RAM: The library uses O(N) internal storage, N being the dimension of the problem Classification: 4.9, 4.14, 6.5 Nature of problem: The numerical estimation of derivatives at several accuracy levels is a common requirement in many computational tasks, such as optimization, solution of nonlinear systems, etc. The parallel implementation that exploits systems with multiple CPUs is very important for large scale and computationally expensive problems. Solution method: Finite differencing is used with carefully chosen step that minimizes the sum of the truncation and round-off errors. The parallel versions employ both OpenMP and MPI libraries. Restrictions: The library uses only double precision arithmetic. Unusual features: The software takes into account bound constraints, in the sense that only feasible points are used to evaluate the derivatives, and given the level of the desired accuracy, the proper formula is automatically employed. Running time: Running time depends on the function's complexity. The test run took 15 ms for the serial distribution, 0.6 s for the OpenMP and 4.2 s for the MPI parallel distribution on 2 processors.
Incremental Parallelization of Non-Data-Parallel Programs Using the Charon Message-Passing Library

NASA Technical Reports Server (NTRS)

VanderWijngaart, Rob F.

2000-01-01

Message passing is among the most popular techniques for parallelizing scientific programs on distributed-memory architectures. The reasons for its success are wide availability (MPI), efficiency, and full tuning control provided to the programmer. A major drawback, however, is that incremental parallelization, as offered by compiler directives, is not generally possible, because all data structures have to be changed throughout the program simultaneously. Charon remedies this situation through mappings between distributed and non-distributed data. It allows breaking up the parallelization into small steps, guaranteeing correctness at every stage. Several tools are available to help convert legacy codes into high-performance message-passing programs. They usually target data-parallel applications, whose loops carrying most of the work can be distributed among all processors without much dependency analysis. Others do a full dependency analysis and then convert the code virtually automatically. Even more toolkits are available that aid construction from scratch of message passing programs. None, however, allows piecemeal translation of codes with complex data dependencies (i.e. non-data-parallel programs) into message passing codes. The Charon library (available in both C and Fortran) provides incremental parallelization capabilities by linking legacy code arrays with distributed arrays. During the conversion process, non-distributed and distributed arrays exist side by side, and simple mapping functions allow the programmer to switch between the two in any location in the program. Charon also provides wrapper functions that leave the structure of the legacy code intact, but that allow execution on truly distributed data. Finally, the library provides a rich set of communication functions that support virtually all patterns of remote data demands in realistic structured grid scientific programs, including transposition, nearest-neighbor communication, pipelining, gather/scatter, and redistribution. At the end of the conversion process most intermediate Charon function calls will have been removed, the non-distributed arrays will have been deleted, and virtually the only remaining Charon functions calls are the high-level, highly optimized communications. Distribution of the data is under complete control of the programmer, although a wide range of useful distributions is easily available through predefined functions. A crucial aspect of the library is that it does not allocate space for distributed arrays, but accepts programmer-specified memory. This has two major consequences. First, codes parallelized using Charon do not suffer from encapsulation; user data is always directly accessible. This provides high efficiency, and also retains the possibility of using message passing directly for highly irregular communications. Second, non-distributed arrays can be interpreted as (trivial) distributions in the Charon sense, which allows them to be mapped to truly distributed arrays, and vice versa. This is the mechanism that enables incremental parallelization. In this paper we provide a brief introduction of the library and then focus on the actual steps in the parallelization process, using some representative examples from, among others, the NAS Parallel Benchmarks. We show how a complicated two-dimensional pipeline-the prototypical non-data-parallel algorithm- can be constructed with ease. To demonstrate the flexibility of the library, we give examples of the stepwise, efficient parallel implementation of nonlocal boundary conditions common in aircraft simulations, as well as the construction of the sequence of grids required for multigrid.
Wnt signaling inhibits CTL memory programming

PubMed Central

Xiao, Zhengguo; Sun, Zhifeng; Smyth, Kendra; Li, Lei

2013-01-01

Induction of functional CTLs is one of the major goals for vaccine development and cancer therapy. Inflammatory cytokines are critical for memory CTL generation. Wnt signaling is important for CTL priming and memory formation, but its role in cytokine-driven memory CTL programming is unclear. We found that wnt signaling inhibited IL-12-driven CTL activation and memory programming. This impaired memory CTL programming was attributed to up-regulation of eomes and down-regulation of T-bet. Wnt signaling suppressed the mTOR pathway during CTL activation, which was different to its effects on other cell types. Interestingly, the impaired memory CTL programming by wnt was partially rescued by mTOR inhibitor rapamycin. In conclusion, we found that crosstalk between wnt and the IL-12 signaling inhibits T-bet and mTOR pathways and impairs memory programming which can be recovered in part by rapamycin. In addition, direct inhibition of wnt signaling during CTL activation does not affect CTL memory programming. Therefore, wnt signaling may serve as a new tool for CTL manipulation in autoimmune diseases and immune therapy for certain cancers. PMID:23911398
Resolution of singularities for multi-loop integrals

NASA Astrophysics Data System (ADS)

Bogner, Christian; Weinzierl, Stefan

2008-04-01

We report on a program for the numerical evaluation of divergent multi-loop integrals. The program is based on iterated sector decomposition. We improve the original algorithm of Binoth and Heinrich such that the program is guaranteed to terminate. The program can be used to compute numerically the Laurent expansion of divergent multi-loop integrals regulated by dimensional regularisation. The symbolic and the numerical steps of the algorithm are combined into one program. Program summaryProgram title: sector_decomposition Catalogue identifier: AEAG_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEAG_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 47 506 No. of bytes in distributed program, including test data, etc.: 328 485 Distribution format: tar.gz Programming language: C++ Computer: all Operating system: Unix RAM: Depending on the complexity of the problem Classification: 4.4 External routines: GiNaC, available from http://www.ginac.de, GNU scientific library, available from http://www.gnu.org/software/gsl Nature of problem: Computation of divergent multi-loop integrals. Solution method: Sector decomposition. Restrictions: Only limited by the available memory and CPU time. Running time: Depending on the complexity of the problem.
NEWTPOIS- NEWTON POISSON DISTRIBUTION PROGRAM

NASA Technical Reports Server (NTRS)

Bowerman, P. N.

1994-01-01

The cumulative poisson distribution program, NEWTPOIS, is one of two programs which make calculations involving cumulative poisson distributions. Both programs, NEWTPOIS (NPO-17715) and CUMPOIS (NPO-17714), can be used independently of one another. NEWTPOIS determines percentiles for gamma distributions with integer shape parameters and calculates percentiles for chi-square distributions with even degrees of freedom. It can be used by statisticians and others concerned with probabilities of independent events occurring over specific units of time, area, or volume. NEWTPOIS determines the Poisson parameter (lambda), that is; the mean (or expected) number of events occurring in a given unit of time, area, or space. Given that the user already knows the cumulative probability for a specific number of occurrences (n) it is usually a simple matter of substitution into the Poisson distribution summation to arrive at lambda. However, direct calculation of the Poisson parameter becomes difficult for small positive values of n and unmanageable for large values. NEWTPOIS uses Newton's iteration method to extract lambda from the initial value condition of the Poisson distribution where n=0, taking successive estimations until some user specified error term (epsilon) is reached. The NEWTPOIS program is written in C. It was developed on an IBM AT with a numeric co-processor using Microsoft C 5.0. Because the source code is written using standard C structures and functions, it should compile correctly on most C compilers. The program format is interactive, accepting epsilon, n, and the cumulative probability of the occurrence of n as inputs. It has been implemented under DOS 3.2 and has a memory requirement of 30K. NEWTPOIS was developed in 1988.
Experiences with hypercube operating system instrumentation

NASA Technical Reports Server (NTRS)

Reed, Daniel A.; Rudolph, David C.

1989-01-01

The difficulties in conceptualizing the interactions among a large number of processors make it difficult both to identify the sources of inefficiencies and to determine how a parallel program could be made more efficient. This paper describes an instrumentation system that can trace the execution of distributed memory parallel programs by recording the occurrence of parallel program events. The resulting event traces can be used to compile summary statistics that provide a global view of program performance. In addition, visualization tools permit the graphic display of event traces. Visual presentation of performance data is particularly useful, indeed, necessary for large-scale parallel computers; the enormous volume of performance data mandates visual display.
Distributed representations in memory: Insights from functional brain imaging

PubMed Central

Rissman, Jesse; Wagner, Anthony D.

2015-01-01

Forging new memories for facts and events, holding critical details in mind on a moment-to-moment basis, and retrieving knowledge in the service of current goals all depend on a complex interplay between neural ensembles throughout the brain. Over the past decade, researchers have increasingly leveraged powerful analytical tools (e.g., multi-voxel pattern analysis) to decode the information represented within distributed fMRI activity patterns. In this review, we discuss how these methods can sensitively index neural representations of perceptual and semantic content, and how leverage on the engagement of distributed representations provides unique insights into distinct aspects of memory-guided behavior. We emphasize that, in addition to characterizing the contents of memories, analyses of distributed patterns shed light on the processes that influence how information is encoded, maintained, or retrieved, and thus inform memory theory. We conclude by highlighting open questions about memory that can be addressed through distributed pattern analyses. PMID:21943171
NearFar: A computer program for nearside farside decomposition of heavy-ion elastic scattering amplitude

NASA Astrophysics Data System (ADS)

Cha, Moon Hoe

2007-02-01

The NearFar program is a package for carrying out an interactive nearside-farside decomposition of heavy-ion elastic scattering amplitude. The program is implemented in Java to perform numerical operations on the nearside and farside angular distributions. It contains a graphical display interface for the numerical results. A test run has been applied to the elastic O16+Si28 scattering at E=1503 MeV. Program summaryTitle of program: NearFar Catalogue identifier: ADYP_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADYP_v1_0 Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Licensing provisions: none Computers: designed for any machine capable of running Java, developed on PC-Pentium-4 Operating systems under which the program has been tested: Microsoft Windows XP (Home Edition) Program language used: Java Number of bits in a word: 64 Memory required to execute with typical data: case dependent No. of lines in distributed program, including test data, etc.: 3484 Number of bytes distributed program, including test data, etc.: 142 051 Distribution format: tar.gz Other software required: A Java runtime interpreter, or the Java Development Kit, version 5.0 Nature of physical problem: Interactive nearside-farside decomposition of heavy-ion elastic scattering amplitude. Method of solution: The user must supply a external data file or PPSM parameters which calculates theoretical values of the quantities to be decomposed. Typical running time: Problem dependent. In a test run, it is about 35 s on a 2.40 GHz Intel P4-processor machine.
Browndye: A software package for Brownian dynamics

NASA Astrophysics Data System (ADS)

Huber, Gary A.; McCammon, J. Andrew

2010-11-01

A new software package, Browndye, is presented for simulating the diffusional encounter of two large biological molecules. It can be used to estimate second-order rate constants and encounter probabilities, and to explore reaction trajectories. Browndye builds upon previous knowledge and algorithms from software packages such as UHBD, SDA, and Macrodox, while implementing algorithms that scale to larger systems. Program summaryProgram title: Browndye Catalogue identifier: AEGT_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEGT_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: MIT license, included in distribution No. of lines in distributed program, including test data, etc.: 143 618 No. of bytes in distributed program, including test data, etc.: 1 067 861 Distribution format: tar.gz Programming language: C++, OCaml ( http://caml.inria.fr/) Computer: PC, Workstation, Cluster Operating system: Linux Has the code been vectorised or parallelized?: Yes. Runs on multiple processors with shared memory using pthreads RAM: Depends linearly on size of physical system Classification: 3 External routines: uses the output of APBS [1] ( http://www.poissonboltzmann.org/apbs/) as input. APBS must be obtained and installed separately. Expat 2.0.1, CLAPACK, ocaml-expat, Mersenne Twister. These are included in the Browndye distribution. Nature of problem: Exploration and determination of rate constants of bimolecular interactions involving large biological molecules. Solution method: Brownian dynamics with electrostatic, excluded volume, van der Waals, and desolvation forces. Running time: Depends linearly on size of physical system and quadratically on precision of results. The included example executes in a few minutes.
Compile-time estimation of communication costs in multicomputers

NASA Technical Reports Server (NTRS)

Gupta, Manish; Banerjee, Prithviraj

1991-01-01

An important problem facing numerous research projects on parallelizing compilers for distributed memory machines is that of automatically determining a suitable data partitioning scheme for a program. Any strategy for automatic data partitioning needs a mechanism for estimating the performance of a program under a given partitioning scheme, the most crucial part of which involves determining the communication costs incurred by the program. A methodology is described for estimating the communication costs at compile-time as functions of the numbers of processors over which various arrays are distributed. A strategy is described along with its theoretical basis, for making program transformations that expose opportunities for combining of messages, leading to considerable savings in the communication costs. For certain loops with regular dependences, the compiler can detect the possibility of pipelining, and thus estimate communication costs more accurately than it could otherwise. These results are of great significance to any parallelization system supporting numeric applications on multicomputers. In particular, they lay down a framework for effective synthesis of communication on multicomputers from sequential program references.

Algorithms for Automatic Alignment of Arrays

NASA Technical Reports Server (NTRS)

Chatterjee, Siddhartha; Gilbert, John R.; Oliker, Leonid; Schreiber, Robert; Sheffler, Thomas J.

1996-01-01

Aggregate data objects (such as arrays) are distributed across the processor memories when compiling a data-parallel language for a distributed-memory machine. The mapping determines the amount of communication needed to bring operands of parallel operations into alignment with each other. A common approach is to break the mapping into two stages: an alignment that maps all the objects to an abstract template, followed by a distribution that maps the template to the processors. This paper describes algorithms for solving the various facets of the alignment problem: axis and stride alignment, static and mobile offset alignment, and replication labeling. We show that optimal axis and stride alignment is NP-complete for general program graphs, and give a heuristic method that can explore the space of possible solutions in a number of ways. We show that some of these strategies can give better solutions than a simple greedy approach proposed earlier. We also show how local graph contractions can reduce the size of the problem significantly without changing the best solution. This allows more complex and effective heuristics to be used. We show how to model the static offset alignment problem using linear programming, and we show that loop-dependent mobile offset alignment is sometimes necessary for optimum performance. We describe an algorithm with for determining mobile alignments for objects within do loops. We also identify situations in which replicated alignment is either required by the program itself or can be used to improve performance. We describe an algorithm based on network flow that replicates objects so as to minimize the total amount of broadcast communication in replication.
Mnemonic transmission, social contagion, and emergence of collective memory: Influence of emotional valence, group structure, and information distribution.

PubMed

Choi, Hae-Yoon; Kensinger, Elizabeth A; Rajaram, Suparna

2017-09-01

Social transmission of memory and its consequence on collective memory have generated enduring interdisciplinary interest because of their widespread significance in interpersonal, sociocultural, and political arenas. We tested the influence of 3 key factors-emotional salience of information, group structure, and information distribution-on mnemonic transmission, social contagion, and collective memory. Participants individually studied emotionally salient (negative or positive) and nonemotional (neutral) picture-word pairs that were completely shared, partially shared, or unshared within participant triads, and then completed 3 consecutive recalls in 1 of 3 conditions: individual-individual-individual (control), collaborative-collaborative (identical group; insular structure)-individual, and collaborative-collaborative (reconfigured group; diverse structure)-individual. Collaboration enhanced negative memories especially in insular group structure and especially for shared information, and promoted collective forgetting of positive memories. Diverse group structure reduced this negativity effect. Unequally distributed information led to social contagion that creates false memories; diverse structure propagated a greater variety of false memories whereas insular structure promoted confidence in false recognition and false collective memory. A simultaneous assessment of network structure, information distribution, and emotional valence breaks new ground to specify how network structure shapes the spread of negative memories and false memories, and the emergence of collective memory. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Detecting Potentially Compromised Credentials in a Large-Scale Production Single-Signon System

DTIC Science & Technology

2014-06-01

Attention Deficit Hyperactivity Disorder ( ADHD ), Post-Traumatic Stress Disorder (PTSD), anxiety, they are neurotic, and have memory issues. They... Deficit Hyperactivity Disorder API Application Programming Interface CAC Common Access Card CBL Composite Blocking List CDF Cumulative Distribution...Service Logons (DSLs) system . . . . . . . . . . . . . . . . 49 xi THIS PAGE INTENTIONALLY LEFT BLANK xii List of Acronyms and Abbreviations ADHD Attention
Evaluation of the Effectiveness of the Storage and Distribution Entry-Level Computer-Based Training (CBT) Program

DTIC Science & Technology

1990-09-01

learning occurs when this final Zink is made into long-term memory (13:79). Cognitive scientists realize the role of the trainee as a passive receiver of...of property on the computer, and when they did, this piece of paperwork printed out on their printer . Someone from the receiving section brought this
Run-time scheduling and execution of loops on message passing machines

NASA Technical Reports Server (NTRS)

Crowley, Kay; Saltz, Joel; Mirchandaney, Ravi; Berryman, Harry

1989-01-01

Sparse system solvers and general purpose codes for solving partial differential equations are examples of the many types of problems whose irregularity can result in poor performance on distributed memory machines. Often, the data structures used in these problems are very flexible. Crucial details concerning loop dependences are encoded in these structures rather than being explicitly represented in the program. Good methods for parallelizing and partitioning these types of problems require assignment of computations in rather arbitrary ways. Naive implementations of programs on distributed memory machines requiring general loop partitions can be extremely inefficient. Instead, the scheduling mechanism needs to capture the data reference patterns of the loops in order to partition the problem. First, the indices assigned to each processor must be locally numbered. Next, it is necessary to precompute what information is needed by each processor at various points in the computation. The precomputed information is then used to generate an execution template designed to carry out the computation, communication, and partitioning of data, in an optimized manner. The design is presented for a general preprocessor and schedule executer, the structures of which do not vary, even though the details of the computation and of the type of information are problem dependent.
Run-time scheduling and execution of loops on message passing machines

NASA Technical Reports Server (NTRS)

Saltz, Joel; Crowley, Kathleen; Mirchandaney, Ravi; Berryman, Harry

1990-01-01

Sparse system solvers and general purpose codes for solving partial differential equations are examples of the many types of problems whose irregularity can result in poor performance on distributed memory machines. Often, the data structures used in these problems are very flexible. Crucial details concerning loop dependences are encoded in these structures rather than being explicitly represented in the program. Good methods for parallelizing and partitioning these types of problems require assignment of computations in rather arbitrary ways. Naive implementations of programs on distributed memory machines requiring general loop partitions can be extremely inefficient. Instead, the scheduling mechanism needs to capture the data reference patterns of the loops in order to partition the problem. First, the indices assigned to each processor must be locally numbered. Next, it is necessary to precompute what information is needed by each processor at various points in the computation. The precomputed information is then used to generate an execution template designed to carry out the computation, communication, and partitioning of data, in an optimized manner. The design is presented for a general preprocessor and schedule executer, the structures of which do not vary, even though the details of the computation and of the type of information are problem dependent.
Three Types of Memory in Emergency Medical Services Communication

ERIC Educational Resources Information Center

Angeli, Elizabeth L.

2015-01-01

This article examines memory and distributed cognition involved in the writing practices of emergency medical services (EMS) professionals. Results from a 16-month study indicate that EMS professionals rely on distributed cognition and three kinds of memory: individual, collaborative, and professional. Distributed cognition and the three types of…
SKIRT: Hybrid parallelization of radiative transfer simulations

NASA Astrophysics Data System (ADS)

Verstocken, S.; Van De Putte, D.; Camps, P.; Baes, M.

2017-07-01

We describe the design, implementation and performance of the new hybrid parallelization scheme in our Monte Carlo radiative transfer code SKIRT, which has been used extensively for modelling the continuum radiation of dusty astrophysical systems including late-type galaxies and dusty tori. The hybrid scheme combines distributed memory parallelization, using the standard Message Passing Interface (MPI) to communicate between processes, and shared memory parallelization, providing multiple execution threads within each process to avoid duplication of data structures. The synchronization between multiple threads is accomplished through atomic operations without high-level locking (also called lock-free programming). This improves the scaling behaviour of the code and substantially simplifies the implementation of the hybrid scheme. The result is an extremely flexible solution that adjusts to the number of available nodes, processors and memory, and consequently performs well on a wide variety of computing architectures.
Lambert W function for applications in physics

NASA Astrophysics Data System (ADS)

Veberič, Darko

2012-12-01

The Lambert W(x) function and its possible applications in physics are presented. The actual numerical implementation in C++ consists of Halley's and Fritsch's iterations with initial approximations based on branch-point expansion, asymptotic series, rational fits, and continued-logarithm recursion. Program summaryProgram title: LambertW Catalogue identifier: AENC_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AENC_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU General Public License version 3 No. of lines in distributed program, including test data, etc.: 1335 No. of bytes in distributed program, including test data, etc.: 25 283 Distribution format: tar.gz Programming language: C++ (with suitable wrappers it can be called from C, Fortran etc.), the supplied command-line utility is suitable for other scripting languages like sh, csh, awk, perl etc. Computer: All systems with a C++ compiler. Operating system: All Unix flavors, Windows. It might work with others. RAM: Small memory footprint, less than 1 MB Classification: 1.1, 4.7, 11.3, 11.9. Nature of problem: Find fast and accurate numerical implementation for the Lambert W function. Solution method: Halley's and Fritsch's iterations with initial approximations based on branch-point expansion, asymptotic series, rational fits, and continued logarithm recursion. Additional comments: Distribution file contains the command-line utility lambert-w. Doxygen comments, included in the source files. Makefile. Running time: The tests provided take only a few seconds to run.
Large-scale parallel lattice Boltzmann-cellular automaton model of two-dimensional dendritic growth

NASA Astrophysics Data System (ADS)

Jelinek, Bohumir; Eshraghi, Mohsen; Felicelli, Sergio; Peters, John F.

2014-03-01

An extremely scalable lattice Boltzmann (LB)-cellular automaton (CA) model for simulations of two-dimensional (2D) dendritic solidification under forced convection is presented. The model incorporates effects of phase change, solute diffusion, melt convection, and heat transport. The LB model represents the diffusion, convection, and heat transfer phenomena. The dendrite growth is driven by a difference between actual and equilibrium liquid composition at the solid-liquid interface. The CA technique is deployed to track the new interface cells. The computer program was parallelized using the Message Passing Interface (MPI) technique. Parallel scaling of the algorithm was studied and major scalability bottlenecks were identified. Efficiency loss attributable to the high memory bandwidth requirement of the algorithm was observed when using multiple cores per processor. Parallel writing of the output variables of interest was implemented in the binary Hierarchical Data Format 5 (HDF5) to improve the output performance, and to simplify visualization. Calculations were carried out in single precision arithmetic without significant loss in accuracy, resulting in 50% reduction of memory and computational time requirements. The presented solidification model shows a very good scalability up to centimeter size domains, including more than ten million of dendrites. Catalogue identifier: AEQZ_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEQZ_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, UK Licensing provisions: Standard CPC license, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 29,767 No. of bytes in distributed program, including test data, etc.: 3131,367 Distribution format: tar.gz Programming language: Fortran 90. Computer: Linux PC and clusters. Operating system: Linux. Has the code been vectorized or parallelized?: Yes. Program is parallelized using MPI. Number of processors used: 1-50,000 RAM: Memory requirements depend on the grid size Classification: 6.5, 7.7. External routines: MPI (http://www.mcs.anl.gov/research/projects/mpi/), HDF5 (http://www.hdfgroup.org/HDF5/) Nature of problem: Dendritic growth in undercooled Al-3 wt% Cu alloy melt under forced convection. Solution method: The lattice Boltzmann model solves the diffusion, convection, and heat transfer phenomena. The cellular automaton technique is deployed to track the solid/liquid interface. Restrictions: Heat transfer is calculated uncoupled from the fluid flow. Thermal diffusivity is constant. Unusual features: Novel technique, utilizing periodic duplication of a pre-grown “incubation” domain, is applied for the scaleup test. Running time: Running time varies from minutes to days depending on the domain size and number of computational cores.
Multibit Polycristalline Silicon-Oxide-Silicon Nitride-Oxide-Silicon Memory Cells with High Density Designed Utilizing a Separated Control Gate

NASA Astrophysics Data System (ADS)

Rok Kim, Kyeong; You, Joo Hyung; Dal Kwack, Kae; Kim, Tae Whan

2010-10-01

Unique multibit NAND polycrystalline silicon-oxide-silicon nitride-oxide-silicon (SONOS) memory cells utilizing a separated control gate (SCG) were designed to increase memory density. The proposed NAND SONOS memory device based on a SCG structure was operated as two bits, resulting in an increase in the storage density of the NVM devices in comparison with conventional single-bit memories. The electrical properties of the SONOS memory cells with a SCG were investigated to clarify the charging effects in the SONOS memory cells. When the program voltage was supplied to each gate of the NAND SONOS flash memory cells, the electrons were trapped in the nitride region of the oxide-nitride-oxide layer under the gate to supply the program voltage. The electrons were accumulated without affecting the other gate during the programming operation, indicating the absence of cross-talk between two trap charge regions. It is expected that the inference effect will be suppressed by the lower program voltage than the program voltage of the conventional NAND flash memory. The simulation results indicate that the proposed unique NAND SONOS memory cells with a SCG can be used to increase memory density.
Distributed-Memory Fast Maximal Independent Set

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kanewala Appuhamilage, Thejaka Amila J.; Zalewski, Marcin J.; Lumsdaine, Andrew

The Maximal Independent Set (MIS) graph problem arises in many applications such as computer vision, information theory, molecular biology, and process scheduling. The growing scale of MIS problems suggests the use of distributed-memory hardware as a cost-effective approach to providing necessary compute and memory resources. Luby proposed four randomized algorithms to solve the MIS problem. All those algorithms are designed focusing on shared-memory machines and are analyzed using the PRAM model. These algorithms do not have direct efficient distributed-memory implementations. In this paper, we extend two of Luby’s seminal MIS algorithms, “Luby(A)” and “Luby(B),” to distributed-memory execution, and we evaluatemore » their performance. We compare our results with the “Filtered MIS” implementation in the Combinatorial BLAS library for two types of synthetic graph inputs.« less
Human Memory Organization for Computer Programs.

ERIC Educational Resources Information Center

Norcio, A. F.; Kerst, Stephen M.

1983-01-01

Results of study investigating human memory organization in processing of computer programming languages indicate that algorithmic logic segments form a cognitive organizational structure in memory for programs. Statement indentation and internal program documentation did not enhance organizational process of recall of statements in five Fortran…
Parallel implementation of an adaptive and parameter-free N-body integrator

NASA Astrophysics Data System (ADS)

Pruett, C. David; Ingham, William H.; Herman, Ralph D.

2011-05-01

Previously, Pruett et al. (2003) [3] described an N-body integrator of arbitrarily high order M with an asymptotic operation count of O(MN). The algorithm's structure lends itself readily to data parallelization, which we document and demonstrate here in the integration of point-mass systems subject to Newtonian gravitation. High order is shown to benefit parallel efficiency. The resulting N-body integrator is robust, parameter-free, highly accurate, and adaptive in both time-step and order. Moreover, it exhibits linear speedup on distributed parallel processors, provided that each processor is assigned at least a handful of bodies. Program summaryProgram title: PNB.f90 Catalogue identifier: AEIK_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEIK_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC license, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 3052 No. of bytes in distributed program, including test data, etc.: 68 600 Distribution format: tar.gz Programming language: Fortran 90 and OpenMPI Computer: All shared or distributed memory parallel processors Operating system: Unix/Linux Has the code been vectorized or parallelized?: The code has been parallelized but has not been explicitly vectorized. RAM: Dependent upon N Classification: 4.3, 4.12, 6.5 Nature of problem: High accuracy numerical evaluation of trajectories of N point masses each subject to Newtonian gravitation. Solution method: Parallel and adaptive extrapolation in time via power series of arbitrary degree. Running time: 5.1 s for the demo program supplied with the package.
Effects of a Memory Training Program in Older People with Severe Memory Loss

ERIC Educational Resources Information Center

Mateos, Pedro M.; Valentin, Alberto; González-Tablas, Maria del Mar; Espadas, Verónica; Vera, Juan L.; Jorge, Inmaculada García

2016-01-01

Strategies based memory training programs are widely used to enhance the cognitive abilities of the elderly. Participants in these training programs are usually people whose mental abilities remain intact. Occasionally, people with cognitive impairment also participate. The aim of this study was to test if memory training designed specifically for…
Continuous Time Random Walks with memory and financial distributions

NASA Astrophysics Data System (ADS)

Montero, Miquel; Masoliver, Jaume

2017-11-01

We study financial distributions from the perspective of Continuous Time Random Walks with memory. We review some of our previous developments and apply them to financial problems. We also present some new models with memory that can be useful in characterizing tendency effects which are inherent in most markets. We also briefly study the effect on return distributions of fractional behaviors in the distribution of pausing times between successive transactions.
Proceedings: Sisal `93

DOE Office of Scientific and Technical Information (OSTI.GOV)

Feo, J.T.

1993-10-01

This report contain papers on: Programmability and performance issues; The case of an iterative partial differential equation solver; Implementing the kernal of the Australian Region Weather Prediction Model in Sisal; Even and quarter-even prime length symmetric FFTs and their Sisal Implementations; Top-down thread generation for Sisal; Overlapping communications and computations on NUMA architechtures; Compiling technique based on dataflow analysis for funtional programming language Valid; Copy elimination for true multidimensional arrays in Sisal 2.0; Increasing parallelism for an optimization that reduces copying in IF2 graphs; Caching in on Sisal; Cache performance of Sisal Vs. FORTRAN; FFT algorithms on a shared-memory multiprocessor;more » A parallel implementation of nonnumeric search problems in Sisal; Computer vision algorithms in Sisal; Compilation of Sisal for a high-performance data driven vector processor; Sisal on distributed memory machines; A virtual shared addressing system for distributed memory Sisal; Developing a high-performance FFT algorithm in Sisal for a vector supercomputer; Implementation issues for IF2 on a static data-flow architechture; and Systematic control of parallelism in array-based data-flow computation. Selected papers have been indexed separately for inclusion in the Energy Science and Technology Database.« less
Total recall in distributive associative memories

NASA Technical Reports Server (NTRS)

Danforth, Douglas G.

1991-01-01

Iterative error correction of asymptotically large associative memories is equivalent to a one-step learning rule. This rule is the inverse of the activation function of the memory. Spectral representations of nonlinear activation functions are used to obtain the inverse in closed form for Sparse Distributed Memory, Selected-Coordinate Design, and Radial Basis Functions.
Implementing Access to Data Distributed on Many Processors

NASA Technical Reports Server (NTRS)

James, Mark

2006-01-01

A reference architecture is defined for an object-oriented implementation of domains, arrays, and distributions written in the programming language Chapel. This technology primarily addresses domains that contain arrays that have regular index sets with the low-level implementation details being beyond the scope of this discussion. What is defined is a complete set of object-oriented operators that allows one to perform data distributions for domain arrays involving regular arithmetic index sets. What is unique is that these operators allow for the arbitrary regions of the arrays to be fragmented and distributed across multiple processors with a single point of access giving the programmer the illusion that all the elements are collocated on a single processor. Today's massively parallel High Productivity Computing Systems (HPCS) are characterized by a modular structure, with a large number of processing and memory units connected by a high-speed network. Locality of access as well as load balancing are primary concerns in these systems that are typically used for high-performance scientific computation. Data distributions address these issues by providing a range of methods for spreading large data sets across the components of a system. Over the past two decades, many languages, systems, tools, and libraries have been developed for the support of distributions. Since the performance of data parallel applications is directly influenced by the distribution strategy, users often resort to low-level programming models that allow fine-tuning of the distribution aspects affecting performance, but, at the same time, are tedious and error-prone. This technology presents a reusable design of a data-distribution framework for data parallel high-performance applications. Distributions are a means to express locality in systems composed of large numbers of processor and memory components connected by a network. Since distributions have a great effect on the performance of applications, it is important that the distribution strategy is flexible, so its behavior can change depending on the needs of the application. At the same time, high productivity concerns require that the user be shielded from error-prone, tedious details such as communication and synchronization.
How can individual differences in autobiographical memory distributions of older adults be explained?

PubMed

Wolf, Tabea; Zimprich, Daniel

2016-10-01

The reminiscence bump phenomenon has frequently been reported for the recall of autobiographical memories. The present study complements previous research by examining individual differences in the distribution of word-cued autobiographical memories. More importantly, we introduce predictor variables that might account for individual differences in the mean (location) and the standard deviation (scale) of individual memory distributions. All variables were derived from different theoretical accounts for the reminiscence bump phenomenon. We used a mixed location-scale logitnormal model, to analyse the 4602 autobiographical memories reported by 118 older participants. Results show reliable individual differences in the location and the scale. After controlling for age and gender, individual proportions of first-time experiences and individual proportions of positive memories, as well as the ratings on Openness to new Experiences and Self-Concept Clarity accounted for 29% of individual differences in location and 42% of individual differences in scale of autobiographical memory distributions. Results dovetail with a life-story account for the reminiscence bump which integrates central components of previous accounts.

Automation of Data Traffic Control on DSM Architecture

NASA Technical Reports Server (NTRS)

Frumkin, Michael; Jin, Hao-Qiang; Yan, Jerry

2001-01-01

The design of distributed shared memory (DSM) computers liberates users from the duty to distribute data across processors and allows for the incremental development of parallel programs using, for example, OpenMP or Java threads. DSM architecture greatly simplifies the development of parallel programs having good performance on a few processors. However, to achieve a good program scalability on DSM computers requires that the user understand data flow in the application and use various techniques to avoid data traffic congestions. In this paper we discuss a number of such techniques, including data blocking, data placement, data transposition and page size control and evaluate their efficiency on the NAS (NASA Advanced Supercomputing) Parallel Benchmarks. We also present a tool which automates the detection of constructs causing data congestions in Fortran array oriented codes and advises the user on code transformations for improving data traffic in the application.
NAS Parallel Benchmark. Results 11-96: Performance Comparison of HPF and MPI Based NAS Parallel Benchmarks. 1.0

NASA Technical Reports Server (NTRS)

Saini, Subash; Bailey, David; Chancellor, Marisa K. (Technical Monitor)

1997-01-01

High Performance Fortran (HPF), the high-level language for parallel Fortran programming, is based on Fortran 90. HALF was defined by an informal standards committee known as the High Performance Fortran Forum (HPFF) in 1993, and modeled on TMC's CM Fortran language. Several HPF features have since been incorporated into the draft ANSI/ISO Fortran 95, the next formal revision of the Fortran standard. HPF allows users to write a single parallel program that can execute on a serial machine, a shared-memory parallel machine, or a distributed-memory parallel machine. HPF eliminates the complex, error-prone task of explicitly specifying how, where, and when to pass messages between processors on distributed-memory machines, or when to synchronize processors on shared-memory machines. HPF is designed in a way that allows the programmer to code an application at a high level, and then selectively optimize portions of the code by dropping into message-passing or calling tuned library routines as 'extrinsics'. Compilers supporting High Performance Fortran features first appeared in late 1994 and early 1995 from Applied Parallel Research (APR) Digital Equipment Corporation, and The Portland Group (PGI). IBM introduced an HPF compiler for the IBM RS/6000 SP/2 in April of 1996. Over the past two years, these implementations have shown steady improvement in terms of both features and performance. The performance of various hardware/ programming model (HPF and MPI (message passing interface)) combinations will be compared, based on latest NAS (NASA Advanced Supercomputing) Parallel Benchmark (NPB) results, thus providing a cross-machine and cross-model comparison. Specifically, HPF based NPB results will be compared with MPI based NPB results to provide perspective on performance currently obtainable using HPF versus MPI or versus hand-tuned implementations such as those supplied by the hardware vendors. In addition we would also present NPB (Version 1.0) performance results for the following systems: DEC Alpha Server 8400 5/440, Fujitsu VPP Series (VX, VPP300, and VPP700), HP/Convex Exemplar SPP2000, IBM RS/6000 SP P2SC node (120 MHz) NEC SX-4/32, SGI/CRAY T3E, SGI Origin2000.
Fault-Tolerant Multiprocessor and VLSI-Based Systems.

DTIC Science & Technology

1987-03-15

54590 170 Table 1: Statistics for the Benchmark Programs pages are distributed amongst the groups of the reconfigured memory in proportion to the...distances are proportional to only the logarithm of the sure that possesses relevance to a system which consists of alare nmbe ofhomgenouseleent...and comn.unication overhead resulting from faults communicating with all of the other elements in the system the network to degrade proportionately to
Revision of FMM-Yukawa: An adaptive fast multipole method for screened Coulomb interactions

NASA Astrophysics Data System (ADS)

Zhang, Bo; Huang, Jingfang; Pitsianis, Nikos P.; Sun, Xiaobai

2010-12-01

FMM-YUKAWA is a mathematical software package primarily for rapid evaluation of the screened Coulomb interactions of N particles in three dimensional space. Since its release, we have revised and re-organized the data structure, software architecture, and user interface, for the purpose of enabling more flexible, broader and easier use of the package. The package and its documentation are available at http://www.fastmultipole.org/, along with a few other closely related mathematical software packages. New version program summaryProgram title: FMM-Yukawa Catalogue identifier: AEEQ_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEEQ_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU GPL 2.0 No. of lines in distributed program, including test data, etc.: 78 704 No. of bytes in distributed program, including test data, etc.: 854 265 Distribution format: tar.gz Programming language: FORTRAN 77, FORTRAN 90, and C. Requires gcc and gfortran version 4.4.3 or later Computer: All Operating system: Any Classification: 4.8, 4.12 Catalogue identifier of previous version: AEEQ_v1_0 Journal reference of previous version: Comput. Phys. Comm. 180 (2009) 2331 Does the new version supersede the previous version?: Yes Nature of problem: To evaluate the screened Coulomb potential and force field of N charged particles, and to evaluate a convolution type integral where the Green's function is the fundamental solution of the modified Helmholtz equation. Solution method: The new version of fast multipole method (FMM) that diagonalizes the multipole-to-local translation operator is applied with the tree structure adaptive to sample particle locations. Reasons for new version: To handle much larger particle ensembles, to enable the iterative use of the subroutines in a solver, and to remove potential contention in assignments for parallelization. Summary of revisions: The software package FMM-Yukawa has been revised and re-organized in data structure, software architecture, programming methods, and user interface. The revision enables more flexible use of the package and economic use of memory resources. It consists of five stages. The initial stage (stage 1) determines, based on the accuracy requirement and FMM theory, the length of multipole expansions and the number of quadrature points for diagonalization, and loads the quadrature nodes and weights that are computed off line. Stage 2 constructs the oct-tree and interaction lists, with adaptation to the sparsity or density of particles and employing a dynamic memory allocation scheme at every tree level. Stage 3 executes the core FMM subroutine for numerical calculation of the particle interactions. The subroutine can now be used iteratively as in a solver, while the particle locations remain the same. Stage 4 releases the memory allocated in Stage 2 for the adaptive tree and interaction lists. The user can modify the iterative routine easily. When the particle locations are changed such as in a molecular dynamics simulation, stage 2 to 4 can also be used together repeatedly. The final stage releases the memory space used for the quadrature and other remaining FMM parameters. Programs at the stage level and at the user interface are re-written in the C programming language, while most of the translation and interaction operations remain in FORTRAN. As a result of the change in data structures and memory allocation, the revised package can accommodate much larger particle ensembles while maintaining the same accuracy-efficiency performance. The new version is also developed as an important precursor to its parallel counterpart on multi-core or many core processors in a shared memory programming environment. Particularly, in order to ensure mutual exclusion in concurrent updates without incurring extra latency, we have replaced all the assignment statements at a source box that put its data to multiple target boxes with assignments at every target box that gather data from source boxes. This amounts to replacing the column version of matrix-vector multiplication with the row version. The matrix here, however, is in compressive representation. Sufficient care is taken in the revision not to alter the algorithmic complexity or numerical behavior, as concurrent writing potentially takes place in the upward calculation of the multipole expansion coefficients, interactions at every level of the FMM tree, and downward calculation of the local expansion coefficients. The software modules and their compositions are also organized according to the stages they are used. Demonstration files and makefiles for merging the user routines and the library routines are provided. Restrictions: Accuracy requirement is described in terms of three or six digits. Higher multiples of three digits will be allowed in a later version. Finer decimation in digits for accuracy specification may or may not be necessary. Unusual features: Ready and friendly for customized use and instrumental in expression of concurrency and dependency for efficient parallelization. Running time: The running time depends linearly on the number N of particles, and varies with the distribution characteristics of the particle distribution. It also depends on the accuracy requirement, a higher accuracy requirement takes relatively longer time. The code outperforms the direct summation method when N⩾750.
Initial Feasibility and Validity of a Prospective Memory Training Program in a Substance Use Treatment Population

PubMed Central

Sweeney, Mary M.; Rass, Olga; Johnson, Patrick S.; Strain, Eric C.; Berry, Meredith S.; Vo, Hoa T.; Fishman, Marc J.; Munro, Cynthia A.; Rebok, George W.; Mintzer, Miriam Z.; Johnson, Matthew W.

2016-01-01

Individuals with substance use disorders have shown deficits in the ability to implement future intentions, called prospective memory. Deficits in prospective memory and working memory, a critical underlying component of prospective memory, likely contribute to substance use treatment failures. Thus, improvement of prospective memory and working memory in substance use patients is an innovative target for intervention. We sought to develop a feasible and valid prospective memory training program that incorporates working memory training and may serve as a useful adjunct to substance use disorder treatment. We administered a single session of the novel prospective memory and working memory training program to participants (n = 22; 13 male; 9 female) enrolled in outpatient substance use disorder treatment and correlated performance to existing measures of prospective memory and working memory. Generally accurate prospective memory performance in a single session suggests feasibility in a substance use treatment population. However, training difficulty should be increased to avoid ceiling effects across repeated sessions. Consistent with existing literature, we observed superior performance on event-based relative to time-based prospective memory tasks. Performance on the prospective memory and working memory training components correlated with validated assessments of prospective memory and working memory, respectively. Correlations between novel memory training program performance and established measures suggest that our training engages appropriate cognitive processes. Further, differential event- and time-based prospective memory task performance suggests internal validity of our training. These data support development of this intervention as an adjunctive therapy for substance use disorders. PMID:27690506
Initial feasibility and validity of a prospective memory training program in a substance use treatment population.

PubMed

Sweeney, Mary M; Rass, Olga; Johnson, Patrick S; Strain, Eric C; Berry, Meredith S; Vo, Hoa T; Fishman, Marc J; Munro, Cynthia A; Rebok, George W; Mintzer, Miriam Z; Johnson, Matthew W

2016-10-01

Individuals with substance use disorders have shown deficits in the ability to implement future intentions, called prospective memory. Deficits in prospective memory and working memory, a critical underlying component of prospective memory, likely contribute to substance use treatment failures. Thus, improvement of prospective memory and working memory in substance use patients is an innovative target for intervention. We sought to develop a feasible and valid prospective memory training program that incorporates working memory training and may serve as a useful adjunct to substance use disorder treatment. We administered a single session of the novel prospective memory and working memory training program to participants (n = 22; 13 men, 9 women) enrolled in outpatient substance use disorder treatment and correlated performance to existing measures of prospective memory and working memory. Generally accurate prospective memory performance in a single session suggests feasibility in a substance use treatment population. However, training difficulty should be increased to avoid ceiling effects across repeated sessions. Consistent with existing literature, we observed superior performance on event-based relative to time-based prospective memory tasks. Performance on the prospective memory and working memory training components correlated with validated assessments of prospective memory and working memory, respectively. Correlations between novel memory training program performance and established measures suggest that our training engages appropriate cognitive processes. Further, differential event- and time-based prospective memory task performance suggests internal validity of our training. These data support the development of this intervention as an adjunctive therapy for substance use disorders. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
CUMBIN - CUMULATIVE BINOMIAL PROGRAMS

NASA Technical Reports Server (NTRS)

Bowerman, P. N.

1994-01-01

The cumulative binomial program, CUMBIN, is one of a set of three programs which calculate cumulative binomial probability distributions for arbitrary inputs. The three programs, CUMBIN, NEWTONP (NPO-17556), and CROSSER (NPO-17557), can be used independently of one another. CUMBIN can be used by statisticians and users of statistical procedures, test planners, designers, and numerical analysts. The program has been used for reliability/availability calculations. CUMBIN calculates the probability that a system of n components has at least k operating if the probability that any one operating is p and the components are independent. Equivalently, this is the reliability of a k-out-of-n system having independent components with common reliability p. CUMBIN can evaluate the incomplete beta distribution for two positive integer arguments. CUMBIN can also evaluate the cumulative F distribution and the negative binomial distribution, and can determine the sample size in a test design. CUMBIN is designed to work well with all integer values 0 < k <= n. To run the program, the user simply runs the executable version and inputs the information requested by the program. The program is not designed to weed out incorrect inputs, so the user must take care to make sure the inputs are correct. Once all input has been entered, the program calculates and lists the result. The CUMBIN program is written in C. It was developed on an IBM AT with a numeric co-processor using Microsoft C 5.0. Because the source code is written using standard C structures and functions, it should compile correctly with most C compilers. The program format is interactive. It has been implemented under DOS 3.2 and has a memory requirement of 26K. CUMBIN was developed in 1988.
The Generalized Quantum Episodic Memory Model.

PubMed

Trueblood, Jennifer S; Hemmer, Pernille

2017-11-01

Recent evidence suggests that experienced events are often mapped to too many episodic states, including those that are logically or experimentally incompatible with one another. For example, episodic over-distribution patterns show that the probability of accepting an item under different mutually exclusive conditions violates the disjunction rule. A related example, called subadditivity, occurs when the probability of accepting an item under mutually exclusive and exhaustive instruction conditions sums to a number >1. Both the over-distribution effect and subadditivity have been widely observed in item and source-memory paradigms. These phenomena are difficult to explain using standard memory frameworks, such as signal-detection theory. A dual-trace model called the over-distribution (OD) model (Brainerd & Reyna, 2008) can explain the episodic over-distribution effect, but not subadditivity. Our goal is to develop a model that can explain both effects. In this paper, we propose the Generalized Quantum Episodic Memory (GQEM) model, which extends the Quantum Episodic Memory (QEM) model developed by Brainerd, Wang, and Reyna (2013). We test GQEM by comparing it to the OD model using data from a novel item-memory experiment and a previously published source-memory experiment (Kellen, Singmann, & Klauer, 2014) examining the over-distribution effect. Using the best-fit parameters from the over-distribution experiments, we conclude by showing that the GQEM model can also account for subadditivity. Overall these results add to a growing body of evidence suggesting that quantum probability theory is a valuable tool in modeling recognition memory. Copyright © 2016 Cognitive Science Society, Inc.
Differentiation and Response Bias in Episodic Memory: Evidence from Reaction Time Distributions

ERIC Educational Resources Information Center

Criss, Amy H.

2010-01-01

In differentiation models, the processes of encoding and retrieval produce an increase in the distribution of memory strength for targets and a decrease in the distribution of memory strength for foils as the amount of encoding increases. This produces an increase in the hit rate and decrease in the false-alarm rate for a strongly encoded compared…
A computer program for two-particle generalized coefficients of fractional parentage

NASA Astrophysics Data System (ADS)

Deveikis, A.; Juodagalvis, A.

2008-10-01

We present a FORTRAN90 program GCFP for the calculation of the generalized coefficients of fractional parentage (generalized CFPs or GCFP). The approach is based on the observation that the multi-shell CFPs can be expressed in terms of single-shell CFPs, while the latter can be readily calculated employing a simple enumeration scheme of antisymmetric A-particle states and an efficient method of construction of the idempotent matrix eigenvectors. The program provides fast calculation of GCFPs for a given particle number and produces results possessing numerical uncertainties below the desired tolerance. A single j-shell is defined by four quantum numbers, (e,l,j,t). A supplemental C++ program parGCFP allows calculation to be done in batches and/or in parallel. Program summaryProgram title:GCFP, parGCFP Catalogue identifier: AEBI_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEBI_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 17 199 No. of bytes in distributed program, including test data, etc.: 88 658 Distribution format: tar.gz Programming language: FORTRAN 77/90 ( GCFP), C++ ( parGCFP) Computer: Any computer with suitable compilers. The program GCFP requires a FORTRAN 77/90 compiler. The auxiliary program parGCFP requires GNU-C++ compatible compiler, while its parallel version additionally requires MPI-1 standard libraries Operating system: Linux (Ubuntu, Scientific) (all programs), also checked on Windows XP ( GCFP, serial version of parGCFP) RAM: The memory demand depends on the computation and output mode. If this mode is not 4, the program GCFP demands the following amounts of memory on a computer with Linux operating system. It requires around 2 MB of RAM for the A=12 system at E⩽2. Computation of the A=50 particle system requires around 60 MB of RAM at E=0 and ˜70 MB at E=2 (note, however, that the calculation of this system will take a very long time). If the computation and output mode is set to 4, the memory demands by GCFP are significantly larger. Calculation of GCFPs of A=12 system at E=1 requires 145 MB. The program parGCFP requires additional 2.5 and 4.5 MB of memory for the serial and parallel version, respectively. Classification: 17.18 Nature of problem: The program GCFP generates a list of two-particle coefficients of fractional parentage for several j-shells with isospin. Solution method: The method is based on the observation that multishell coefficients of fractional parentage can be expressed in terms of single-shell CFPs [1]. The latter are calculated using the algorithm [2,3] for a spectral decomposition of an antisymmetrization operator matrix Y. The coefficients of fractional parentage are those eigenvectors of the antisymmetrization operator matrix Y that correspond to unit eigenvalues. A computer code for these coefficients is available [4]. The program GCFP offers computation of two-particle multishell coefficients of fractional parentage. The program parGCFP allows a batch calculation using one input file. Sets of GCFPs are independent and can be calculated in parallel. Restrictions:A<86 when E=0 (due to the memory constraints); small numbers of particles allow significantly higher excitations, though the shell with j⩾11/2 cannot get full (it is the implementation constraint). Unusual features: Using the program GCFP it is possible to determine allowed particle configurations without the GCFP computation. The GCFPs can be calculated either for all particle configurations at once or for a specified particle configuration. The values of GCFPs can be printed out with a complete specification in either one file or with the parent and daughter configurations printed in separate files. The latter output mode requires additional time and RAM memory. It is possible to restrict the ( J,T) values of the considered particle configurations. (Here J is the total angular momentum and T is the total isospin of the system.) The program parGCFP produces several result files the number of which equals to the number of particle configurations. To work correctly, the program GCFP needs to be compiled to read parameters from the standard input (the default setting). Running time: It depends on the size of the problem. The minimum time is required, if the computation and output mode ( CompMode) is not 4, but the resulting file is larger. A system with A=12 particles at E=0 (all 9411 GCFPs) took around 1 sec on a Pentium4 2.8 GHz processor with 1 MB L2 cache. The program required about 14 min to calculate all 1.3×10 GCFPs of E=1. The time for all 5.5×10 GCFPs of E=2 was about 53 hours. For this number of particles, the calculation time of both E=0 and E=1 with CompMode = 1 and 4 is nearly the same, when no other processes are running. The case of E=2 could not be calculated with CompMode = 4, because the RAM memory was insufficient. In general, the latter CompMode requires a longer computation time, although the resulting files are smaller in size. The program parGCFP puts virtually no time overhead. Its parallel version speeds-up the calculation. However, the results need to be collected from several files created for each configuration. References: [1] J. Levinsonas, Works of Lithuanian SSR Academy of Sciences 4 (1957) 17. [2] A. Deveikis, A. Bončkus, R. Kalinauskas, Lithuanian Phys. J. 41 (2001) 3. [3] A. Deveikis, R.K. Kalinauskas, B.R. Barrett, Ann. Phys. 296 (2002) 287. [4] A. Deveikis, Comput. Phys. Comm. 173 (2005) 186. (CPC Catalogue ID. ADWI_v1_0)
Research about Memory Detection Based on the Embedded Platform

NASA Astrophysics Data System (ADS)

Sun, Hao; Chu, Jian

As is known to us all, the resources of memory detection of the embedded systems are very limited. Taking the Linux-based embedded arm as platform, this article puts forward two efficient memory detection technologies according to the characteristics of the embedded software. Especially for the programs which need specific libraries, the article puts forwards portable memory detection methods to help program designers to reduce human errors,improve programming quality and therefore make better use of the valuable embedded memory resource.
Whitmore, Henschke, and Hilaris: The reorientation of prostate brachytherapy (1970-1987).

PubMed

Aronowitz, Jesse N

2012-01-01

Urologists had performed prostate brachytherapy for decades before New York's Memorial Hospital retropubic program. This paper explores the contribution of Willet Whitmore, Ulrich Henschke, Basil Hilaris, and Memorial's physicists to the evolution of the procedure. Literature review and interviews with program participants. More than 1000 retropubic implants were performed at Memorial between 1970 and 1987. Unlike previous efforts, Memorial's program benefited from the participation of three disciplines in its conception and execution. Memorial's retropubic program was a collaboration of urologists, radiation therapists, and physicists. Their approach focused greater attention on dosimetry and radiation safety, and served as a template for subsequent prostate brachytherapy programs. Copyright Â© 2012 American Brachytherapy Society. Published by Elsevier Inc. All rights reserved.
Flash Memory Reliability: Read, Program, and Erase Latency Versus Endurance Cycling

NASA Technical Reports Server (NTRS)

Heidecker, Jason

2010-01-01

This report documents the efforts and results of the fiscal year (FY) 2010 NASA Electronic Parts and Packaging Program (NEPP) task for nonvolatile memory (NVM) reliability. This year's focus was to measure latency (read, program, and erase) of NAND Flash memories and determine how these parameters drift with erase/program/read endurance cycling.
Relation of Physical Activity to Memory Functioning in Older Adults: The Memory Workout Program.

ERIC Educational Resources Information Center

Rebok, George W.; Plude, Dana J.

2001-01-01

The Memory Workout, a CD-ROM program designed to help older adults increase changes in physical and cognitive activity influencing memory, was tested with 24 subjects. Results revealed a significant relationship between exercise time, exercise efficacy, and cognitive function, as well as interest in improving memory and physical activity.…
- XSUMMER- Transcendental functions and symbolic summation in FORM

NASA Astrophysics Data System (ADS)

Moch, S.; Uwer, P.

2006-05-01

Harmonic sums and their generalizations are extremely useful in the evaluation of higher-order perturbative corrections in quantum field theory. Of particular interest have been the so-called nested sums, where the harmonic sums and their generalizations appear as building blocks, originating for example, from the expansion of generalized hypergeometric functions around integer values of the parameters. In this paper we discuss the implementation of several algorithms to solve these sums by algebraic means, using the computer algebra system FORM. Program summaryTitle of program:XSUMMER Catalogue identifier:ADXQ_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADXQ_v1_0 Program obtainable from:CPC Program Library, Queen's University of Belfast, N. Ireland License:GNU Public License and FORM License Computers:all Operating system:all Program language:FORM Memory required to execute:Depending on the complexity of the problem, recommended at least 64 MB RAM No. of lines in distributed program, including test data, etc.:9854 No. of bytes in distributed program, including test data, etc.:126 551 Distribution format:tar.gz Other programs called:none External files needed:none Nature of the physical problem:Systematic expansion of higher transcendental functions in a small parameter. The expansions arise in the calculation of loop integrals in perturbative quantum field theory. Method of solution:Algebraic manipulations of nested sums. Restrictions on complexity of the problem:Usually limited only by the available disk space. Typical running time:Dependent on the complexity of the problem.
Statistical prediction with Kanerva's sparse distributed memory

NASA Technical Reports Server (NTRS)

Rogers, David

1989-01-01

A new viewpoint of the processing performed by Kanerva's sparse distributed memory (SDM) is presented. In conditions of near- or over-capacity, where the associative-memory behavior of the model breaks down, the processing performed by the model can be interpreted as that of a statistical predictor. Mathematical results are presented which serve as the framework for a new statistical viewpoint of sparse distributed memory and for which the standard formulation of SDM is a special case. This viewpoint suggests possible enhancements to the SDM model, including a procedure for improving the predictiveness of the system based on Holland's work with genetic algorithms, and a method for improving the capacity of SDM even when used as an associative memory.
Differential memory in the earth's magnetotail

NASA Technical Reports Server (NTRS)

Burkhart, G. R.; Chen, J.

1991-01-01

The process of 'differential memory' in the earth's magnetotail is studied in the framework of the modified Harris magnetotail geometry. It is verified that differential memory can generate non-Maxwellian features in the modified Harris field model. The time scales and the potentially observable distribution functions associated with the process of differential memory are investigated, and it is shown that non-Maxwelllian distributions can evolve as a test particle response to distribution function boundary conditions in a Harris field magnetotail model. The non-Maxwellian features which arise from distribution function mapping have definite time scales associated with them, which are generally shorter than the earthward convection time scale but longer than the typical Alfven crossing time.
Low Latency Messages on Distributed Memory Multiprocessors

DOE PAGES

Rosing, Matt; Saltz, Joel

1995-01-01

This article describes many of the issues in developing an efficient interface for communication on distributed memory machines. Although the hardware component of message latency is less than 1 ws on many distributed memory machines, the software latency associated with sending and receiving typed messages is on the order of 50 μs. The reason for this imbalance is that the software interface does not match the hardware. By changing the interface to match the hardware more closely, applications with fine grained communication can be put on these machines. This article describes several tests performed and many of the issues involvedmore » in supporting low latency messages on distributed memory machines.« less
Limits of the memory coefficient in measuring correlated bursts

NASA Astrophysics Data System (ADS)

Jo, Hang-Hyun; Hiraoka, Takayuki

2018-03-01

Temporal inhomogeneities in event sequences of natural and social phenomena have been characterized in terms of interevent times and correlations between interevent times. The inhomogeneities of interevent times have been extensively studied, while the correlations between interevent times, often called correlated bursts, are far from being fully understood. For measuring the correlated bursts, two relevant approaches were suggested, i.e., memory coefficient and burst size distribution. Here a burst size denotes the number of events in a bursty train detected for a given time window. Empirical analyses have revealed that the larger memory coefficient tends to be associated with the heavier tail of the burst size distribution. In particular, empirical findings in human activities appear inconsistent, such that the memory coefficient is close to 0, while burst size distributions follow a power law. In order to comprehend these observations, by assuming the conditional independence between consecutive interevent times, we derive the analytical form of the memory coefficient as a function of parameters describing interevent time and burst size distributions. Our analytical result can explain the general tendency of the larger memory coefficient being associated with the heavier tail of burst size distribution. We also find that the apparently inconsistent observations in human activities are compatible with each other, indicating that the memory coefficient has limits to measure the correlated bursts.
Notes on implementation of sparsely distributed memory

NASA Technical Reports Server (NTRS)

Keeler, J. D.; Denning, P. J.

1986-01-01

The Sparsely Distributed Memory (SDM) developed by Kanerva is an unconventional memory design with very interesting and desirable properties. The memory works in a manner that is closely related to modern theories of human memory. The SDM model is discussed in terms of its implementation in hardware. Two appendices discuss the unconventional approaches of the SDM: Appendix A treats a resistive circuit for fast, parallel address decoding; and Appendix B treats a systolic array for high throughput read and write operations.

Integrating Cache Performance Modeling and Tuning Support in Parallelization Tools

NASA Technical Reports Server (NTRS)

Waheed, Abdul; Yan, Jerry; Saini, Subhash (Technical Monitor)

1998-01-01

With the resurgence of distributed shared memory (DSM) systems based on cache-coherent Non Uniform Memory Access (ccNUMA) architectures and increasing disparity between memory and processors speeds, data locality overheads are becoming the greatest bottlenecks in the way of realizing potential high performance of these systems. While parallelization tools and compilers facilitate the users in porting their sequential applications to a DSM system, a lot of time and effort is needed to tune the memory performance of these applications to achieve reasonable speedup. In this paper, we show that integrating cache performance modeling and tuning support within a parallelization environment can alleviate this problem. The Cache Performance Modeling and Prediction Tool (CPMP), employs trace-driven simulation techniques without the overhead of generating and managing detailed address traces. CPMP predicts the cache performance impact of source code level "what-if" modifications in a program to assist a user in the tuning process. CPMP is built on top of a customized version of the Computer Aided Parallelization Tools (CAPTools) environment. Finally, we demonstrate how CPMP can be applied to tune a real Computational Fluid Dynamics (CFD) application.
System for simultaneously loading program to master computer memory devices and corresponding slave computer memory devices

NASA Technical Reports Server (NTRS)

Hall, William A. (Inventor)

1993-01-01

A bus programmable slave module card for use in a computer control system is disclosed which comprises a master computer and one or more slave computer modules interfacing by means of a bus. Each slave module includes its own microprocessor, memory, and control program for acting as a single loop controller. The slave card includes a plurality of memory means (S1, S2...) corresponding to a like plurality of memory devices (C1, C2...) in the master computer, for each slave memory means its own communication lines connectable through the bus with memory communication lines of an associated memory device in the master computer, and a one-way electronic door which is switchable to either a closed condition or a one-way open condition. With the door closed, communication lines between master computer memory (C1, C2...) and slave memory (S1, S2...) are blocked. In the one-way open condition invention, the memory communication lines or each slave memory means (S1, S2...) connect with the memory communication lines of its associated memory device (C1, C2...) in the master computer, and the memory devices (C1, C2...) of the master computer and slave card are electrically parallel such that information seen by the master's memory is also seen by the slave's memory. The slave card is also connectable to a switch for electronically removing the slave microprocessor from the system. With the master computer and the slave card in programming mode relationship, and the slave microprocessor electronically removed from the system, loading a program in the memory devices (C1, C2...) of the master accomplishes a parallel loading into the memory devices (S1, S2...) of the slave.
An integrated tool for loop calculations: AITALC

NASA Astrophysics Data System (ADS)

Lorca, Alejandro; Riemann, Tord

2006-01-01

AITALC, a new tool for automating loop calculations in high energy physics, is described. The package creates Fortran code for two-fermion scattering processes automatically, starting from the generation and analysis of the Feynman graphs. We describe the modules of the tool, the intercommunication between them and illustrate its use with three examples. Program summaryTitle of the program:AITALC version 1.2.1 (9 August 2005) Catalogue identifier:ADWO Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADWO Program obtainable from:CPC Program Library, Queen's University of Belfast, N. Ireland Computer:PC i386 Operating system:GNU/ LINUX, tested on different distributions SuSE 8.2 to 9.3, Red Hat 7.2, Debian 3.0, Ubuntu 5.04. Also on SOLARIS Programming language used:GNU MAKE, DIANA, FORM, FORTRAN77 Additional programs/libraries used:DIANA 2.35 ( QGRAF 2.0), FORM 3.1, LOOPTOOLS 2.1 ( FF) Memory required to execute with typical data:Up to about 10 MB No. of processors used:1 No. of lines in distributed program, including test data, etc.:40 926 No. of bytes in distributed program, including test data, etc.:371 424 Distribution format:tar gzip file High-speed storage required:from 1.5 to 30 MB, depending on modules present and unfolding of examples Nature of the physical problem:Calculation of differential cross sections for ee annihilation in one-loop approximation. Method of solution:Generation and perturbative analysis of Feynman diagrams with later evaluation of matrix elements and form factors. Restriction of the complexity of the problem:The limit of application is, for the moment, the 2→2 particle reactions in the electro-weak standard model. Typical running time:Few minutes, being highly depending on the complexity of the process and the FORTRAN compiler.
Memories as Useful Outcomes of Residential Outdoor Environmental Education

ERIC Educational Resources Information Center

Liddicoat, Kendra R.; Krasny, Marianne E.

2014-01-01

Residential outdoor environmental education (ROEE) programs for youth have been shown to yield lasting autobiographical episodic memories. This article explores how past program participants have used such memories, and draws on the memory psychology literature to offer a new perspective on the long-term impacts of environmental education.…
Testing New Programming Paradigms with NAS Parallel Benchmarks

NASA Technical Reports Server (NTRS)

Jin, H.; Frumkin, M.; Schultz, M.; Yan, J.

2000-01-01

Over the past decade, high performance computing has evolved rapidly, not only in hardware architectures but also with increasing complexity of real applications. Technologies have been developing to aim at scaling up to thousands of processors on both distributed and shared memory systems. Development of parallel programs on these computers is always a challenging task. Today, writing parallel programs with message passing (e.g. MPI) is the most popular way of achieving scalability and high performance. However, writing message passing programs is difficult and error prone. Recent years new effort has been made in defining new parallel programming paradigms. The best examples are: HPF (based on data parallelism) and OpenMP (based on shared memory parallelism). Both provide simple and clear extensions to sequential programs, thus greatly simplify the tedious tasks encountered in writing message passing programs. HPF is independent of memory hierarchy, however, due to the immaturity of compiler technology its performance is still questionable. Although use of parallel compiler directives is not new, OpenMP offers a portable solution in the shared-memory domain. Another important development involves the tremendous progress in the internet and its associated technology. Although still in its infancy, Java promisses portability in a heterogeneous environment and offers possibility to "compile once and run anywhere." In light of testing these new technologies, we implemented new parallel versions of the NAS Parallel Benchmarks (NPBs) with HPF and OpenMP directives, and extended the work with Java and Java-threads. The purpose of this study is to examine the effectiveness of alternative programming paradigms. NPBs consist of five kernels and three simulated applications that mimic the computation and data movement of large scale computational fluid dynamics (CFD) applications. We started with the serial version included in NPB2.3. Optimization of memory and cache usage was applied to several benchmarks, noticeably BT and SP, resulting in better sequential performance. In order to overcome the lack of an HPF performance model and guide the development of the HPF codes, we employed an empirical performance model for several primitives found in the benchmarks. We encountered a few limitations of HPF, such as lack of supporting the "REDISTRIBUTION" directive and no easy way to handle irregular computation. The parallelization with OpenMP directives was done at the outer-most loop level to achieve the largest granularity. The performance of six HPF and OpenMP benchmarks is compared with their MPI counterparts for the Class-A problem size in the figure in next page. These results were obtained on an SGI Origin2000 (195MHz) with MIPSpro-f77 compiler 7.2.1 for OpenMP and MPI codes and PGI pghpf-2.4.3 compiler with MPI interface for HPF programs.
Parallel, Distributed Scripting with Python

DOE Office of Scientific and Technical Information (OSTI.GOV)

Miller, P J

2002-05-24

Parallel computers used to be, for the most part, one-of-a-kind systems which were extremely difficult to program portably. With SMP architectures, the advent of the POSIX thread API and OpenMP gave developers ways to portably exploit on-the-box shared memory parallelism. Since these architectures didn't scale cost-effectively, distributed memory clusters were developed. The associated MPI message passing libraries gave these systems a portable paradigm too. Having programmers effectively use this paradigm is a somewhat different question. Distributed data has to be explicitly transported via the messaging system in order for it to be useful. In high level languages, the MPI librarymore » gives access to data distribution routines in C, C++, and FORTRAN. But we need more than that. Many reasonable and common tasks are best done in (or as extensions to) scripting languages. Consider sysadm tools such as password crackers, file purgers, etc ... These are simple to write in a scripting language such as Python (an open source, portable, and freely available interpreter). But these tasks beg to be done in parallel. Consider the a password checker that checks an encrypted password against a 25,000 word dictionary. This can take around 10 seconds in Python (6 seconds in C). It is trivial to parallelize if you can distribute the information and co-ordinate the work.« less
Cooperative Data Sharing: Simple Support for Clusters of SMP Nodes

NASA Technical Reports Server (NTRS)

DiNucci, David C.; Balley, David H. (Technical Monitor)

1997-01-01

Libraries like PVM and MPI send typed messages to allow for heterogeneous cluster computing. Lower-level libraries, such as GAM, provide more efficient access to communication by removing the need to copy messages between the interface and user space in some cases. still lower-level interfaces, such as UNET, get right down to the hardware level to provide maximum performance. However, these are all still interfaces for passing messages from one process to another, and have limited utility in a shared-memory environment, due primarily to the fact that message passing is just another term for copying. This drawback is made more pertinent by today's hybrid architectures (e.g. clusters of SMPs), where it is difficult to know beforehand whether two communicating processes will share memory. As a result, even portable language tools (like HPF compilers) must either map all interprocess communication, into message passing with the accompanying performance degradation in shared memory environments, or they must check each communication at run-time and implement the shared-memory case separately for efficiency. Cooperative Data Sharing (CDS) is a single user-level API which abstracts all communication between processes into the sharing and access coordination of memory regions, in a model which might be described as "distributed shared messages" or "large-grain distributed shared memory". As a result, the user programs to a simple latency-tolerant abstract communication specification which can be mapped efficiently to either a shared-memory or message-passing based run-time system, depending upon the available architecture. Unlike some distributed shared memory interfaces, the user still has complete control over the assignment of data to processors, the forwarding of data to its next likely destination, and the queuing of data until it is needed, so even the relatively high latency present in clusters can be accomodated. CDS does not require special use of an MMU, which can add overhead to some DSM systems, and does not require an SPMD programming model. unlike some message-passing interfaces, CDS allows the user to implement efficient demand-driven applications where processes must "fight" over data, and does not perform copying if processes share memory and do not attempt concurrent writes. CDS also supports heterogeneous computing, dynamic process creation, handlers, and a very simple thread-arbitration mechanism. Additional support for array subsections is currently being considered. The CDS1 API, which forms the kernel of CDS, is built primarily upon only 2 communication primitives, one process initiation primitive, and some data translation (and marshalling) routines, memory allocation routines, and priority control routines. The entire current collection of 28 routines provides enough functionality to implement most (or all) of MPI 1 and 2, which has a much larger interface consisting of hundreds of routines. still, the API is small enough to consider integrating into standard os interfaces for handling inter-process communication in a network-independent way. This approach would also help to solve many of the problems plaguing other higher-level standards such as MPI and PVM which must, in some cases, "play OS" to adequately address progress and process control issues. The CDS2 API, a higher level of interface roughly equivalent in functionality to MPI and to be built entirely upon CDS1, is still being designed. It is intended to add support for the equivalent of communicators, reduction and other collective operations, process topologies, additional support for process creation, and some automatic memory management. CDS2 will not exactly match MPI, because the copy-free semantics of communication from CDS1 will be supported. CDS2 application programs will be free to carefully also use CDS1. CDS1 has been implemented on networks of workstations running unmodified Unix-based operating systems, using UDP/IP and vendor-supplied high- performance locks. Although its inter-node performance is currently unimpressive due to rudimentary implementation technique, it even now outperforms highly-optimized MPI implementation on intra-node communication due to its support for non-copy communication. The similarity of the CDS1 architecture to that of other projects such as UNET and TRAP suggests that the inter-node performance can be increased significantly to surpass MPI or PVM, and it may be possible to migrate some of its functionality to communication controllers.
Solution of large nonlinear quasistatic structural mechanics problems on distributed-memory multiprocessor computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Blanford, M.

1997-12-31

Most commercially-available quasistatic finite element programs assemble element stiffnesses into a global stiffness matrix, then use a direct linear equation solver to obtain nodal displacements. However, for large problems (greater than a few hundred thousand degrees of freedom), the memory size and computation time required for this approach becomes prohibitive. Moreover, direct solution does not lend itself to the parallel processing needed for today`s multiprocessor systems. This talk gives an overview of the iterative solution strategy of JAS3D, the nonlinear large-deformation quasistatic finite element program. Because its architecture is derived from an explicit transient-dynamics code, it does not ever assemblemore » a global stiffness matrix. The author describes the approach he used to implement the solver on multiprocessor computers, and shows examples of problems run on hundreds of processors and more than a million degrees of freedom. Finally, he describes some of the work he is presently doing to address the challenges of iterative convergence for ill-conditioned problems.« less
Enhancing Application Performance Using Mini-Apps: Comparison of Hybrid Parallel Programming Paradigms

NASA Technical Reports Server (NTRS)

Lawson, Gary; Sosonkina, Masha; Baurle, Robert; Hammond, Dana

2017-01-01

In many fields, real-world applications for High Performance Computing have already been developed. For these applications to stay up-to-date, new parallel strategies must be explored to yield the best performance; however, restructuring or modifying a real-world application may be daunting depending on the size of the code. In this case, a mini-app may be employed to quickly explore such options without modifying the entire code. In this work, several mini-apps have been created to enhance a real-world application performance, namely the VULCAN code for complex flow analysis developed at the NASA Langley Research Center. These mini-apps explore hybrid parallel programming paradigms with Message Passing Interface (MPI) for distributed memory access and either Shared MPI (SMPI) or OpenMP for shared memory accesses. Performance testing shows that MPI+SMPI yields the best execution performance, while requiring the largest number of code changes. A maximum speedup of 23 was measured for MPI+SMPI, but only 11 was measured for MPI+OpenMP.
Design, Implementation and Case Study of WISEMAN: WIreless Sensors Employing Mobile AgeNts

NASA Astrophysics Data System (ADS)

González-Valenzuela, Sergio; Chen, Min; Leung, Victor C. M.

We describe the practical implementation of Wiseman: our proposed scheme for running mobile agents in Wireless Sensor Networks. Wiseman’s architecture derives from a much earlier agent system originally conceived for distributed process coordination in wired networks. Given the memory constraints associated with small sensor devices, we revised the architecture of the original agent system to make it applicable to this type of networks. Agents are programmed as compact text scripts that are interpreted at the sensor nodes. Wiseman is currently implemented in TinyOS ver. 1, its binary image occupies 19Kbytes of ROM memory, and it occupies 3Kbytes of RAM to operate. We describe the rationale behind Wiseman’s interpreter architecture and unique programming features that can help reduce packet overhead in sensor networks. In addition, we gauge the proposed system’s efficiency in terms of task duration with different network topologies through a case study that involves an early-fire-detection application in a fictitious forest setting.
[Series: Medical Applications of the PHITS Code (2): Acceleration by Parallel Computing].

PubMed

Furuta, Takuya; Sato, Tatsuhiko

2015-01-01

Time-consuming Monte Carlo dose calculation becomes feasible owing to the development of computer technology. However, the recent development is due to emergence of the multi-core high performance computers. Therefore, parallel computing becomes a key to achieve good performance of software programs. A Monte Carlo simulation code PHITS contains two parallel computing functions, the distributed-memory parallelization using protocols of message passing interface (MPI) and the shared-memory parallelization using open multi-processing (OpenMP) directives. Users can choose the two functions according to their needs. This paper gives the explanation of the two functions with their advantages and disadvantages. Some test applications are also provided to show their performance using a typical multi-core high performance workstation.
Compiler analysis for irregular problems in FORTRAN D

NASA Technical Reports Server (NTRS)

Vonhanxleden, Reinhard; Kennedy, Ken; Koelbel, Charles; Das, Raja; Saltz, Joel

1992-01-01

We developed a dataflow framework which provides a basis for rigorously defining strategies to make use of runtime preprocessing methods for distributed memory multiprocessors. In many programs, several loops access the same off-processor memory locations. Our runtime support gives us a mechanism for tracking and reusing copies of off-processor data. A key aspect of our compiler analysis strategy is to determine when it is safe to reuse copies of off-processor data. Another crucial function of the compiler analysis is to identify situations which allow runtime preprocessing overheads to be amortized. This dataflow analysis will make it possible to effectively use the results of interprocedural analysis in our efforts to reduce interprocessor communication and the need for runtime preprocessing.
Optimal Jet Finder (v1.0 C++)

NASA Astrophysics Data System (ADS)

Chumakov, S.; Jankowski, E.; Tkachov, F. V.

2006-10-01

We describe a C++ implementation of the Optimal Jet Definition for identification of jets in hadronic final states of particle collisions. We explain interface subroutines and provide a usage example. The source code is available from http://www.inr.ac.ru/~ftkachov/projects/jets/. Program summaryTitle of program: Optimal Jet Finder (v1.0 C++) Catalogue identifier: ADSB_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADSB_v2_0 Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Computer: any computer with a standard C++ compiler Tested with:GNU gcc 3.4.2, Linux Fedora Core 3, Intel i686; Forte Developer 7 C++ 5.4, SunOS 5.9, UltraSPARC III+; Microsoft Visual C++ Toolkit 2003 (compiler 13.10.3077, linker 7.10.30777, option /EHsc), Windows XP, Intel i686. Programming language used: C++ Memory required:˜1 MB (or more, depending on the settings) No. of lines in distributed program, including test data, etc.: 3047 No. of bytes in distributed program, including test data, etc.: 17 884 Distribution format: tar.gz Nature of physical problem: Analysis of hadronic final states in high energy particle collision experiments often involves identification of hadronic jets. A large number of hadrons detected in the calorimeter is reduced to a few jets by means of a jet finding algorithm. The jets are used in further analysis which would be difficult or impossible when applied directly to the hadrons. Grigoriev et al. [D.Yu. Grigoriev, E. Jankowski, F.V. Tkachov, Phys. Rev. Lett. 91 (2003) 061801] provide brief introduction to the subject of jet finding algorithms and a general review of the physics of jets can be found in [R. Barlow, Rep. Prog. Phys. 36 (1993) 1067]. Method of solution: The software we provide is an implementation of the so-called Optimal Jet Definition (OJD). The theory of OJD was developed in [F.V. Tkachov, Phys. Rev. Lett. 73 (1994) 2405; Erratum, Phys. Rev. Lett. 74 (1995) 2618; F.V. Tkachov, Int. J. Modern Phys. A 12 (1997) 5411; F.V. Tkachov, Int. J. Modern Phys. A 17 (2002) 2783]. The desired jet configuration is obtained as the one that minimizes Ω, a certain function of the input particles and jet configuration. A FORTRAN 77 implementation of OJD is described in [D.Yu. Grigoriev, E. Jankowski, F.V. Tkachov, Comput. Phys. Comm. 155 (2003) 42]. Restrictions on the complexity of the program: Memory required by the program is proportional to the number of particles in the input × the number of jets in the output. For example, for 650 particles and 20 jets ˜300 KB memory is required. Typical running time: The running time (in the running mode with a fixed number of jets) is proportional to the number of particles in the input × the number of jets in the output × times the number of different random initial configurations tried ( ntries). For example, for 65 particles in the input and 4 jets in the output, the running time is ˜4ṡ10 s per try (Pentium 4 2.8 GHz).
DOE Office of Scientific and Technical Information (OSTI.GOV)

Ibrahim, Khaled Z.; Epifanovsky, Evgeny; Williams, Samuel W.

Coupled-cluster methods provide highly accurate models of molecular structure by explicit numerical calculation of tensors representing the correlation between electrons. These calculations are dominated by a sequence of tensor contractions, motivating the development of numerical libraries for such operations. While based on matrix-matrix multiplication, these libraries are specialized to exploit symmetries in the molecular structure and in electronic interactions, and thus reduce the size of the tensor representation and the complexity of contractions. The resulting algorithms are irregular and their parallelization has been previously achieved via the use of dynamic scheduling or specialized data decompositions. We introduce our efforts tomore » extend the Libtensor framework to work in the distributed memory environment in a scalable and energy efficient manner. We achieve up to 240 speedup compared with the best optimized shared memory implementation. We attain scalability to hundreds of thousands of compute cores on three distributed-memory architectures, (Cray XC30&XC40, BlueGene/Q), and on a heterogeneous GPU-CPU system (Cray XK7). As the bottlenecks shift from being compute-bound DGEMM's to communication-bound collectives as the size of the molecular system scales, we adopt two radically different parallelization approaches for handling load-imbalance. Nevertheless, we preserve a uni ed interface to both programming models to maintain the productivity of computational quantum chemists.« less
Simulation of n-qubit quantum systems. IV. Parametrizations of quantum states, matrices and probability distributions

NASA Astrophysics Data System (ADS)

Radtke, T.; Fritzsche, S.

2008-11-01

Entanglement is known today as a key resource in many protocols from quantum computation and quantum information theory. However, despite the successful demonstration of several protocols, such as teleportation or quantum key distribution, there are still many open questions of how entanglement affects the efficiency of quantum algorithms or how it can be protected against noisy environments. The investigation of these and related questions often requires a search or optimization over the set of quantum states and, hence, a parametrization of them and various other objects. To facilitate this kind of studies in quantum information theory, here we present an extension of the FEYNMAN program that was developed during recent years as a toolbox for the simulation and analysis of quantum registers. In particular, we implement parameterizations of hermitian and unitary matrices (of arbitrary order), pure and mixed quantum states as well as separable states. In addition to being a prerequisite for the study of many optimization problems, these parameterizations also provide the necessary basis for heuristic studies which make use of random states, unitary matrices and other objects. Program summaryProgram title: FEYNMAN Catalogue identifier: ADWE_v4_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADWE_v4_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 24 231 No. of bytes in distributed program, including test data, etc.: 1 416 085 Distribution format: tar.gz Programming language: Maple 11 Computer: Any computer with Maple software installed Operating system: Any system that supports Maple; program has been tested under Microsoft Windows XP, Linux Classification: 4.15 Does the new version supersede the previous version?: Yes Nature of problem: During the last decades, quantum information science has contributed to our understanding of quantum mechanics and has provided also new and efficient protocols, based on the use of entangled quantum states. To determine the behavior and entanglement of n-qubit quantum registers, symbolic and numerical simulations need to be applied in order to analyze how these quantum information protocols work and which role the entanglement plays hereby. Solution method: Using the computer algebra system Maple, we have developed a set of procedures that support the definition, manipulation and analysis of n-qubit quantum registers. These procedures also help to deal with (unitary) logic gates and (nonunitary) quantum operations that act upon the quantum registers. With the parameterization of various frequently-applied objects, that are implemented in the present version, the program now facilitates a wider range of symbolic and numerical studies. All commands can be used interactively in order to simulate and analyze the evolution of n-qubit quantum systems, both in ideal and noisy quantum circuits. Reasons for new version: In the first version of the FEYNMAN program [1], we implemented the data structures and tools that are necessary to create, manipulate and to analyze the state of quantum registers. Later [2,3], support was added to deal with quantum operations (noisy channels) as an ingredient which is essential for studying the effects of decoherence. With the present extension, we add a number of parametrizations of objects frequently utilized in decoherence and entanglement studies, such that as hermitian and unitary matrices, probability distributions, or various kinds of quantum states. This extension therefore provides the basis, for example, for the optimization of a given function over the set of pure states or the simple generation of random objects. Running time: Most commands that act upon quantum registers with five or less qubits take ⩽10 seconds of processor time on a Pentium 4 processor with ⩾2GHz or newer, and about 5-20 MB of working memory (in addition to the memory for the Maple environment). Especially when working with symbolic expressions, however, the requirements on CPU time and memory critically depend on the size of the quantum registers, owing to the exponential growth of the dimension of the associated Hilbert space. For example, complex (symbolic) noise models, i.e. with several symbolic Kraus operators, result for multi-qubit systems often in very large expressions that dramatically slow down the evaluation of e.g. distance measures or the final-state entropy, etc. In these cases, Maple's assume facility sometimes helps to reduce the complexity of the symbolic expressions, but more often only a numerical evaluation is possible eventually. Since the complexity of the various commands of the FEYNMAN program and the possible usage scenarios can be very different, no general scaling law for CPU time or the memory requirements can be given. References: [1] T. Radtke, S. Fritzsche, Comput. Phys. Comm. 173 (2005) 91. [2] T. Radtke, S. Fritzsche, Comput. Phys. Comm. 175 (2006) 145. [3] T. Radtke, S. Fritzsche, Comput. Phys. Comm. 176 (2007) 617.
Implementation of a Fully-Balanced Periodic Tridiagonal Solver on a Parallel Distributed Memory Architecture

DTIC Science & Technology

1994-05-01

PARALLEL DISTRIBUTED MEMORY ARCHITECTURE LTJh T. M. Eidson 0 - 8 l 9 5 " G. Erlebacher _ _ _. _ DTIe QUALITY INSPECTED a Contract NAS I - 19480 May 1994...DISTRIBUTED MEMORY ARCHITECTURE T.M. Eidson * High Technology Corporation Hampton, VA 23665 G. Erlebachert Institute for Computer Applications in Science and...developed and evaluated. Simple model calculations as well as timing results are pres.nted to evaluate the various strategies. The particular
Examining procedural working memory processing in obsessive-compulsive disorder.

PubMed

Shahar, Nitzan; Teodorescu, Andrei R; Anholt, Gideon E; Karmon-Presser, Anat; Meiran, Nachshon

2017-07-01

Previous research has suggested that a deficit in working memory might underlie the difficulty of obsessive-compulsive disorder (OCD) patients to control their thoughts and actions. However, a recent meta-analyses found only small effect sizes for working memory deficits in OCD. Recently, a distinction has been made between declarative and procedural working memory. Working memory in OCD was tested mostly using declarative measurements. However, OCD symptoms typically concerns actions, making procedural working-memory more relevant. Here, we tested the operation of procedural working memory in OCD. Participants with OCD and healthy controls performed a battery of choice reaction tasks under high and low procedural working memory demands. Reaction-times (RT) were estimated using ex-Gaussian distribution fitting, revealing no group differences in the size of the RT distribution tail (i.e., τ parameter), known to be sensitive to procedural working memory manipulations. Group differences, unrelated to working memory manipulations, were found in the leading-edge of the RT distribution and analyzed using a two-stage evidence accumulation model. Modeling results suggested that perceptual difficulties might underlie the current group differences. In conclusion, our results suggest that procedural working-memory processing is most likely intact in OCD, and raise a novel, yet untested assumption regarding perceptual deficits in OCD. Copyright © 2017 Elsevier Ireland Ltd. All rights reserved.
Partitioning problems in parallel, pipelined and distributed computing

NASA Technical Reports Server (NTRS)

Bokhari, S.

1985-01-01

The problem of optimally assigning the modules of a parallel program over the processors of a multiple computer system is addressed. A Sum-Bottleneck path algorithm is developed that permits the efficient solution of many variants of this problem under some constraints on the structure of the partitions. In particular, the following problems are solved optimally for a single-host, multiple satellite system: partitioning multiple chain structured parallel programs, multiple arbitrarily structured serial programs and single tree structured parallel programs. In addition, the problems of partitioning chain structured parallel programs across chain connected systems and across shared memory (or shared bus) systems are also solved under certain constraints. All solutions for parallel programs are equally applicable to pipelined programs. These results extend prior research in this area by explicitly taking concurrency into account and permit the efficient utilization of multiple computer architectures for a wide range of problems of practical interest.
Programming and memory dynamics of innate leukocytes during tissue homeostasis and inflammation.

PubMed

Lee, Christina; Geng, Shuo; Zhang, Yao; Rahtes, Allison; Li, Liwu

2017-09-01

The field of innate immunity is witnessing a paradigm shift regarding "memory" and "programming" dynamics. Past studies of innate leukocytes characterized them as first responders to danger signals with no memory. However, recent findings suggest that innate leukocytes, such as monocytes and neutrophils, are capable of "memorizing" not only the chemical nature but also the history and dosages of external stimulants. As a consequence, innate leukocytes can be dynamically programmed or reprogrammed into complex inflammatory memory states. Key examples of innate leukocyte memory dynamics include the development of primed and tolerant monocytes when "programmed" with a variety of inflammatory stimulants at varying signal strengths. The development of innate leukocyte memory may have far-reaching translational implications, as programmed innate leukocytes may affect the pathogenesis of both acute and chronic inflammatory diseases. This review intends to critically discuss some of the recent studies that address this emerging concept and its implication in the pathogenesis of inflammatory diseases. © Society for Leukocyte Biology.
Vienna Fortran - A Language Specification. Version 1.1

DTIC Science & Technology

1992-03-01

other computer archi- tectures is the fact that the memory is physically distributed among the processors; the time required to access a non-local...datum may be an order of magnitude higher than the time taken to access locally stored data. This has important consequences for program efficiency. In...machine in many aspects. It is tedious, time -consuming and error prone. It has led to particularly slow software development cycles and, in consequence

Turbo Pascal Implementation of a Distributed Processing Network of MS-DOS Microcomputers Connected in a Master-Slave Configuration

DTIC Science & Technology

1989-12-01

Interrupt Procedures ....... 29 13. Support for a Larger Memory Model ................ 29 C. IMPLEMENTATION ........................................ 29...describe the programmer’s model of the hardware utilized in the microcomputers and interrupt driven serial communication considerations. Chapter III...Central Processor Unit The programming model of Table 2.1 is common to the Intel 8088, 8086 and 80x86 series of microprocessors used in the IBM PC/AT
CUMPOIS- CUMULATIVE POISSON DISTRIBUTION PROGRAM

NASA Technical Reports Server (NTRS)

Bowerman, P. N.

1994-01-01

The Cumulative Poisson distribution program, CUMPOIS, is one of two programs which make calculations involving cumulative poisson distributions. Both programs, CUMPOIS (NPO-17714) and NEWTPOIS (NPO-17715), can be used independently of one another. CUMPOIS determines the approximate cumulative binomial distribution, evaluates the cumulative distribution function (cdf) for gamma distributions with integer shape parameters, and evaluates the cdf for chi-square distributions with even degrees of freedom. It can be used by statisticians and others concerned with probabilities of independent events occurring over specific units of time, area, or volume. CUMPOIS calculates the probability that n or less events (ie. cumulative) will occur within any unit when the expected number of events is given as lambda. Normally, this probability is calculated by a direct summation, from i=0 to n, of terms involving the exponential function, lambda, and inverse factorials. This approach, however, eventually fails due to underflow for sufficiently large values of n. Additionally, when the exponential term is moved outside of the summation for simplification purposes, there is a risk that the terms remaining within the summation, and the summation itself, will overflow for certain values of i and lambda. CUMPOIS eliminates these possibilities by multiplying an additional exponential factor into the summation terms and the partial sum whenever overflow/underflow situations threaten. The reciprocal of this term is then multiplied into the completed sum giving the cumulative probability. The CUMPOIS program is written in C. It was developed on an IBM AT with a numeric co-processor using Microsoft C 5.0. Because the source code is written using standard C structures and functions, it should compile correctly on most C compilers. The program format is interactive, accepting lambda and n as inputs. It has been implemented under DOS 3.2 and has a memory requirement of 26K. CUMPOIS was developed in 1988.
Honoring our donors: a survey of memorial ceremonies in United States anatomy programs.

PubMed

Jones, Trahern W; Lachman, Nirusha; Pawlina, Wojciech

2014-01-01

Many anatomy programs that incorporate dissection of donated human bodies hold memorial ceremonies of gratitude towards body donors. The content of these ceremonies may include learners' reflections on mortality, respect, altruism, and personal growth told through various humanities modalities. The task of planning is usually student- and faculty-led with participation from other health care students. Objective information on current memorial ceremonies for body donors in anatomy programs in the United States appears to be lacking. The number of programs in the United States that currently plan these memorial ceremonies and information on trends in programs undertaking such ceremonies remain unknown. Gross anatomy program directors throughout the United States were contacted and asked to respond to a voluntary questionnaire on memorial ceremonies held at their institution. The results (response rate 68.2%) indicated that a majority of human anatomy programs (95.5%) hold memorial ceremonies. These ceremonies are, for the most part, student-driven and nondenominational or secular in nature. Participants heavily rely upon speech, music, poetry, and written essays, with a small inclusion of other humanities modalities, such as dance or visual art, to explore a variety of themes during these ceremonies. © 2013 American Association of Anatomists.
Episodic memory in aspects of large-scale brain networks

PubMed Central

Jeong, Woorim; Chung, Chun Kee; Kim, June Sic

2015-01-01

Understanding human episodic memory in aspects of large-scale brain networks has become one of the central themes in neuroscience over the last decade. Traditionally, episodic memory was regarded as mostly relying on medial temporal lobe (MTL) structures. However, recent studies have suggested involvement of more widely distributed cortical network and the importance of its interactive roles in the memory process. Both direct and indirect neuro-modulations of the memory network have been tried in experimental treatments of memory disorders. In this review, we focus on the functional organization of the MTL and other neocortical areas in episodic memory. Task-related neuroimaging studies together with lesion studies suggested that specific sub-regions of the MTL are responsible for specific components of memory. However, recent studies have emphasized that connectivity within MTL structures and even their network dynamics with other cortical areas are essential in the memory process. Resting-state functional network studies also have revealed that memory function is subserved by not only the MTL system but also a distributed network, particularly the default-mode network (DMN). Furthermore, researchers have begun to investigate memory networks throughout the entire brain not restricted to the specific resting-state network (RSN). Altered patterns of functional connectivity (FC) among distributed brain regions were observed in patients with memory impairments. Recently, studies have shown that brain stimulation may impact memory through modulating functional networks, carrying future implications of a novel interventional therapy for memory impairment. PMID:26321939
A program to compute the two-step excitation of mesospheric sodium atoms for the Polychromatic Laser Guide Star Project

NASA Astrophysics Data System (ADS)

Bellanger, Véronique; Courcelle, Arnaud; Petit, Alain

2004-09-01

A program to compute the two-step excitation of sodium atoms ( 3S→3P→4D) using the density-matrix formalism is presented. The BEACON program calculates population evolution and the number of photons emitted by fluorescence from the 3P, 4D, 4P, 4S levels. Program summaryTitle of program: BEACON Catalogue identifier:ADSX Program Summary URL:http://cpc.cs.qub.ac.uk/cpc/summaries/ADSX Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Licensing provisions: none Operating systems under which the program has been tested: Win; Unix Programming language used: FORTRAN 77 Memory required to execute with typical data: 1 Mw Number of bits in a word: 32 Number of processors used: 1 (a parallel version of this code is also available and can be obtained on request) Number of lines in distributed program, including test data, etc.: 29 287 Number of bytes in distributed program, including test data, etc.: 830 331 Distribution format: tar.gz CPC Program Library subprograms used: none Nature of physical problem: Resolution of the Bloch equations in the case of the two-step laser excitation of sodium atoms. Method of solution: The program BEACON calculates the evolution of level population versus time using the density-matrix formalism. The number of photons emitted from the 3P, 4D and 4P levels is calculated using the branching ratios and the level lifetimes. Restriction on the complexity of the problem: Since the backscatter emission is calculated after the excitation process, excitation with laser pulse duration longer than the 4D level lifetime cannot be rigorously treated. Particularly, cw laser excitation cannot be calculated with this code. Typical running time:12 h
The iconic memory skills of brain injury survivors and non-brain injured controls after visual scanning training.

PubMed

McClure, J T; Browning, R T; Vantrease, C M; Bittle, S T

1994-01-01

Previous research suggests that traumatic brain injury (TBI) results in impairment of iconic memory abilities.We would like to acknowledge the contribution of Jeffrey D. Vantrease, who wrote the software program for the Iconic Memory procedure and measurement. This raises serious implications for brain injury rehabilitation. Most cognitive rehabilitation programs do not include iconic memory training. Instead it is common for cognitive rehabilitation programs to focus on attention and concentration skills, memory skills, and visual scanning skills.This study compared the iconic memory skills of brain-injury survivors and control subjects who all reached criterion levels of visual scanning skills. This involved previous training for the brain-injury survivors using popular visual scanning programs that allowed them to visually scan with response time and accuracy within normal limits. Control subjects required only minimal training to reach normal limits criteria. This comparison allows for the dissociation of visual scanning skills and iconic memory skills.The results are discussed in terms of their implications for cognitive rehabilitation and the relationship between visual scanning training and iconic memory skills.
[Cortical potentials evoked to response to a signal to make a memory-guided saccade].

PubMed

Slavutskaia, M V; Moiseeva, V V; Shul'govskiĭ, V V

2010-01-01

The difference in parameters of visually guided and memory-guided saccades was shown. Increase in the memory-guided saccade latency as compared to that of the visually guided saccades may indicate the deceleration of saccadic programming on the basis of information extraction from the memory. The comparison of parameters and topography of evoked components N1 and P1 of the evoked potential on the signal to make a memory- or visually guided saccade suggests that the early stage of the saccade programming associated with the space information processing is performed predominantly with top-down attention mechanism before the memory-guided saccade and bottom-up mechanism before the visually guided saccade. The findings show that the increase in the latency of the memory-guided saccades is connected with decision making at the central stage of the saccade programming. We proposed that wave N2, which develops in the middle of the latent period of the memory-guided saccades, is correlated with this process. Topography and spatial dynamics of components N1, P1 and N2 testify that the memory-guided saccade programming is controlled by the frontal mediothalamic system of selective attention and left-hemispheric brain mechanisms of motor attention.
Modeling Confidence and Response Time in Recognition Memory

ERIC Educational Resources Information Center

Ratcliff, Roger; Starns, Jeffrey J.

2009-01-01

A new model for confidence judgments in recognition memory is presented. In the model, the match between a single test item and memory produces a distribution of evidence, with better matches corresponding to distributions with higher means. On this match dimension, confidence criteria are placed, and the areas between the criteria under the…
CLOCS (Computer with Low Context-Switching Time) Architecture Reference Documents

DTIC Science & Technology

1988-05-06

Peculiarities The only state inside the central processing unit(CPU) is a program status word. All data operations are memory to memory. One result of this... to the challenge "if I whore to design RISC, this is how I would do it." The architecture was designed by Mark Davis and Bill Gallmeister. 1.2...are memory to memory. Any special devices added should be memory mapped. The program counter is even memory mapped. 1.3.1 Working storage There is no
Chemically programmed ink-jet printed resistive WORM memory array and readout circuit

NASA Astrophysics Data System (ADS)

Andersson, H.; Manuilskiy, A.; Sidén, J.; Gao, J.; Hummelgård, M.; Kunninmel, G. V.; Nilsson, H.-E.

2014-09-01

In this paper an ink-jet printed write once read many (WORM) resistive memory fabricated on paper substrate is presented. The memory elements are programmed for different resistance states by printing triethylene glycol monoethyl ether on the substrate before the actual memory element is printed using silver nano particle ink. The resistance is thus able to be set to a broad range of values without changing the geometry of the elements. A memory card consisting of 16 elements is manufactured for which the elements are each programmed to one of four defined logic levels, providing a total of 4294 967 296 unique possible combinations. Using a readout circuit, originally developed for resistive sensors to avoid crosstalk between elements, a memory card reader is manufactured that is able to read the values of the memory card and transfer the data to a PC. Such printed memory cards can be used in various applications.
Distributed simulation using a real-time shared memory network

NASA Technical Reports Server (NTRS)

Simon, Donald L.; Mattern, Duane L.; Wong, Edmond; Musgrave, Jeffrey L.

1993-01-01

The Advanced Control Technology Branch of the NASA Lewis Research Center performs research in the area of advanced digital controls for aeronautic and space propulsion systems. This work requires the real-time implementation of both control software and complex dynamical models of the propulsion system. We are implementing these systems in a distributed, multi-vendor computer environment. Therefore, a need exists for real-time communication and synchronization between the distributed multi-vendor computers. A shared memory network is a potential solution which offers several advantages over other real-time communication approaches. A candidate shared memory network was tested for basic performance. The shared memory network was then used to implement a distributed simulation of a ramjet engine. The accuracy and execution time of the distributed simulation was measured and compared to the performance of the non-partitioned simulation. The ease of partitioning the simulation, the minimal time required to develop for communication between the processors and the resulting execution time all indicate that the shared memory network is a real-time communication technique worthy of serious consideration.
Satellite Image Mosaic Engine

NASA Technical Reports Server (NTRS)

Plesea, Lucian

2006-01-01

A computer program automatically builds large, full-resolution mosaics of multispectral images of Earth landmasses from images acquired by Landsat 7, complete with matching of colors and blending between adjacent scenes. While the code has been used extensively for Landsat, it could also be used for other data sources. A single mosaic of as many as 8,000 scenes, represented by more than 5 terabytes of data and the largest set produced in this work, demonstrated what the code could do to provide global coverage. The program first statistically analyzes input images to determine areas of coverage and data-value distributions. It then transforms the input images from their original universal transverse Mercator coordinates to other geographical coordinates, with scaling. It applies a first-order polynomial brightness correction to each band in each scene. It uses a data-mask image for selecting data and blending of input scenes. Under control by a user, the program can be made to operate on small parts of the output image space, with check-point and restart capabilities. The program runs on SGI IRIX computers. It is capable of parallel processing using shared-memory code, large memories, and tens of central processing units. It can retrieve input data and store output data at locations remote from the processors on which it is executed.
A communication-avoiding, hybrid-parallel, rank-revealing orthogonalization method.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hoemmen, Mark

2010-11-01

Orthogonalization consumes much of the run time of many iterative methods for solving sparse linear systems and eigenvalue problems. Commonly used algorithms, such as variants of Gram-Schmidt or Householder QR, have performance dominated by communication. Here, 'communication' includes both data movement between the CPU and memory, and messages between processors in parallel. Our Tall Skinny QR (TSQR) family of algorithms requires asymptotically fewer messages between processors and data movement between CPU and memory than typical orthogonalization methods, yet achieves the same accuracy as Householder QR factorization. Furthermore, in block orthogonalizations, TSQR is faster and more accurate than existing approaches formore » orthogonalizing the vectors within each block ('normalization'). TSQR's rank-revealing capability also makes it useful for detecting deflation in block iterative methods, for which existing approaches sacrifice performance, accuracy, or both. We have implemented a version of TSQR that exploits both distributed-memory and shared-memory parallelism, and supports real and complex arithmetic. Our implementation is optimized for the case of orthogonalizing a small number (5-20) of very long vectors. The shared-memory parallel component uses Intel's Threading Building Blocks, though its modular design supports other shared-memory programming models as well, including computation on the GPU. Our implementation achieves speedups of 2 times or more over competing orthogonalizations. It is available now in the development branch of the Trilinos software package, and will be included in the 10.8 release.« less
Cognitive stimulation in healthy older adults: a cognitive stimulation program using leisure activities compared to a conventional cognitive stimulation program.

PubMed

Grimaud, Élisabeth; Taconnat, Laurence; Clarys, David

2017-06-01

The aim of this study was to compare two methods of cognitive stimulation for the cognitive functions. The first method used an usual approach, the second used leisure activities in order to assess their benefits on cognitive functions (speed of processing; working memory capacity and executive functions) and psychoaffective measures (memory span and self esteem). 67 participants over 60 years old took part in the experiment. They were divided into three groups: 1 group followed a program of conventional cognitive stimulation, 1 group a program of cognitive stimulation using leisure activities and 1 control group. The different measures have been evaluated before and after the training program. Results show that the cognitive stimulation program using leisure activities is as effective on memory span, updating and memory self-perception as the program using conventional cognitive stimulation, and more effective on self-esteem than the conventional program. There is no difference between the two stimulated groups and the control group on speed of processing. Neither of the two cognitive stimulation programs provides a benefit over shifting and inhibition. These results indicate that it seems to be possible to enhance working memory and to observe far transfer benefits over self-perception (self-esteem and memory self-perception) when using leisure activities as a tool for cognitive stimulation.
Distributional effects of Oportunidades on early child development.

PubMed

Figueroa, José Luis

2014-07-01

The Mexican Oportunidades program is designed to increase human capital through investments in education, health, and nutrition for children in extreme poverty. Although the program is not expressly designed to promote a child's cognitive and non-cognitive development, the set of actions carried out by the program could eventually facilitate improvements in these domains. Previous studies on the Oportunidades program have found little impact on children's cognition but have found positive effects on their non-cognitive development. However, the majority of these studies use the average outcome to measure the impact of the program and thus overlook other "non-average" effects. This paper uses stochastic dominance methods to investigate results beyond the mean by comparing cumulative distributions for both children who are and children who are not aided by the program. Four indicators of cognitive development and one indicator of non-cognitive development are analyzed using a sample of 2595 children aged two to six years. The sample was collected in rural communities in Mexico in 2003 as part of the program evaluation. Similar to previous studies, the program is found to positively influence children's non-cognitive abilities: children enrolled in the program manifest fewer behavioral problems compared with children who are not enrolled. In addition, different program effects are found for girls and boys and for indigenous and non-indigenous children. The ranges where the effect is measured cover a large part of the outcome's distribution, and these ranges include a large proportion of the children in the sample. With regard to cognitive development, only one indicator (short-term memory) shows positive effects. Nevertheless, the results for this indicator demonstrate that children with low values of cognitive development benefit from the program, whereas children with high values do not. Overall, the program has positive effects on child development, especially for the most vulnerable, who are at the bottom of the distribution. Copyright © 2014 Elsevier Ltd. All rights reserved.
Dynamic modulation of innate immunity programming and memory.

PubMed

Yuan, Ruoxi; Li, Liwu

2016-01-01

Recent progress harkens back to the old theme of immune memory, except this time in the area of innate immunity, to which traditional paradigm only prescribes a rudimentary first-line defense function with no memory. However, both in vitro and in vivo studies reveal that innate leukocytes may adopt distinct activation states such as priming, tolerance, and exhaustion, depending upon the history of prior challenges. The dynamic programming and potential memory of innate leukocytes may have far-reaching consequences in health and disease. This review aims to provide some salient features of innate programing and memory, patho-physiological consequences, underlying mechanisms, and current pressing issues.
An empirical investigation of sparse distributed memory using discrete speech recognition

NASA Technical Reports Server (NTRS)

Danforth, Douglas G.

1990-01-01

Presented here is a step by step analysis of how the basic Sparse Distributed Memory (SDM) model can be modified to enhance its generalization capabilities for classification tasks. Data is taken from speech generated by a single talker. Experiments are used to investigate the theory of associative memories and the question of generalization from specific instances.
Implementation of a parallel unstructured Euler solver on shared and distributed memory architectures

NASA Technical Reports Server (NTRS)

Mavriplis, D. J.; Das, Raja; Saltz, Joel; Vermeland, R. E.

1992-01-01

An efficient three dimensional unstructured Euler solver is parallelized on a Cray Y-MP C90 shared memory computer and on an Intel Touchstone Delta distributed memory computer. This paper relates the experiences gained and describes the software tools and hardware used in this study. Performance comparisons between two differing architectures are made.
Efficacy of a perceptual and visual-motor skill intervention program for students with dyslexia.

PubMed

Fusco, Natália; Germano, Giseli Donadon; Capellini, Simone Aparecida

2015-01-01

To verify the efficacy of a perceptual and visual-motor skill intervention program for students with dyslexia. The participants were 20 students from third to fifth grade of a public elementary school in Marília, São Paulo, aged from 8 years to 11 years and 11 months, distributed into the following groups: Group I (GI; 10 students with developmental dyslexia) and Group II (GII; 10 students with good academic performance). A perceptual and visual-motor intervention program was applied, which comprised exercises for visual-motor coordination, visual discrimination, visual memory, visual-spatial relationship, shape constancy, sequential memory, visual figure-ground coordination, and visual closure. In pre- and post-testing situations, both groups were submitted to the Test of Visual-Perceptual Skills (TVPS-3), and the quality of handwriting was analyzed using the Dysgraphia Scale. The analyzed statistical results showed that both groups of students had dysgraphia in pretesting situation. In visual perceptual skills, GI presented a lower performance compared to GII, as well as in the quality of writing. After undergoing the intervention program, GI increased the average of correct answers in TVPS-3 and improved the quality of handwriting. The developed intervention program proved appropriate for being applied to students with dyslexia, and showed positive effects because it provided improved visual perception skills and quality of writing for students with developmental dyslexia.
Switching behavior of resistive change memory using oxide nanowires

NASA Astrophysics Data System (ADS)

Aono, Takashige; Sugawa, Kosuke; Shimizu, Tomohiro; Shingubara, Shoso; Takase, Kouichi

2018-06-01

Resistive change random access memory (ReRAM), which is expected to be the next-generation nonvolatile memory, often has wide switching voltage distributions due to many kinds of conductive filaments. In this study, we have tried to suppress the distribution through the structural restriction of the filament-forming area using NiO nanowires. The capacitor with Ni metal nanowires whose surface is oxidized showed good switching behaviors with narrow distributions. The knowledge gained from our study will be very helpful in producing practical ReRAM devices.

Cultural scripts guide recall of intensely positive life events.

PubMed

Collins, Katherine A; Pillemer, David B; Ivcevic, Zorana; Gooze, Rachel A

2007-06-01

In four studies, we examined the temporal distribution of positive and negative memories of momentous life events. College students and middle-aged adults reported events occurring from the ages of 8 to 18 years in which they had felt especially good or especially bad about themselves. Distributions of positive memories showed a marked peak at ages 17 and 18. In contrast, distributions of negative memories were relatively flat. These patterns were consistent for males and females and for younger and older adults. Content analyses indicated that a substantial proportion of positive memories from late adolescence described culturally prescribed landmark events surrounding the major life transition from high school to college. When the participants were asked for recollections from life periods that lack obvious age-linked milestone events, age distributions of positive and negative memories were similar. The results support and extend Berntsen and Rubin's (2004) conclusion that cultural expectations, or life scripts, organize recall of positive, but not negative, events.
Photofragment image analysis using the Onion-Peeling Algorithm

NASA Astrophysics Data System (ADS)

Manzhos, Sergei; Loock, Hans-Peter

2003-07-01

With the growing popularity of the velocity map imaging technique, a need for the analysis of photoion and photoelectron images arose. Here, a computer program is presented that allows for the analysis of cylindrically symmetric images. It permits the inversion of the projection of the 3D charged particle distribution using the Onion Peeling Algorithm. Further analysis includes the determination of radial and angular distributions, from which velocity distributions and spatial anisotropy parameters are obtained. Identification and quantification of the different photolysis channels is therefore straightforward. In addition, the program features geometry correction, centering, and multi-Gaussian fitting routines, as well as a user-friendly graphical interface and the possibility of generating synthetic images using either the fitted or user-defined parameters. Program summaryTitle of program: Glass Onion Catalogue identifier: ADRY Program Summary URL:http://cpc.cs.qub.ac.uk/summaries/ADRY Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Licensing provisions: none Computer: IBM PC Operating system under which the program has been tested: Windows 98, Windows 2000, Windows NT Programming language used: Delphi 4.0 Memory required to execute with typical data: 18 Mwords No. of bits in a word: 32 No. of bytes in distributed program, including test data, etc.: 9 911 434 Distribution format: zip file Keywords: Photofragment image, onion peeling, anisotropy parameters Nature of physical problem: Information about velocity and angular distributions of photofragments is the basis on which the analysis of the photolysis process resides. Reconstructing the three-dimensional distribution from the photofragment image is the first step, further processing involving angular and radial integration of the inverted image to obtain velocity and angular distributions. Provisions have to be made to correct for slight distortions of the image, and to verify the accuracy of the analysis process. Method of solution: The "Onion Peeling" algorithm described by Helm [Rev. Sci. Instrum. 67 (6) (1996)] is used to perform the image reconstruction. Angular integration with a subsequent multi-Gaussian fit supplies information about the velocity distribution of the photofragments, whereas radial integration with subsequent expansion of the angular distributions over Legendre Polynomials gives the spatial anisotropy parameters. Fitting algorithms have been developed to centre the image and to correct for image distortion. Restrictions on the complexity of the problem: The maximum image size (1280×1280) and resolution (16 bit) are restricted by available memory and can be changed in the source code. Initial centre coordinates within 5 pixels may be required for the correction and the centering algorithm to converge. Peaks on the velocity profile separated by less then the peak width may not be deconvolved. In the charged particle image reconstruction, it is assumed that the kinetic energy released in the dissociation process is small compared to the energy acquired in the electric field. For the fitting parameters to be physically meaningful, cylindrical symmetry of the image has to be assumed but the actual inversion algorithm is stable to distortions of such symmetry in experimental images. Typical running time: The analysis procedure can be divided into three parts: inversion, fitting, and geometry correction. The inversion time grows approx. as R3, where R is the radius of the region of interest: for R=200 pixels it is less than a minute, for R=400 pixels less then 6 min on a 400 MHz IBM personal computer. The time for the velocity fitting procedure to converge depends strongly on the number of peaks in the velocity profile and the convergence criterion. It ranges between less then a second for simple curves and a few minutes for profiles with up to twenty peaks. The time taken for the image correction scales as R2 and depends on the curve profile. It is on the order of a few minutes for images with R=500 pixels. Unusual features of the program: Our centering and image correction algorithm is based on Fourier analysis of the radial distribution to insure the sharpest velocity profile and is insensitive to an uneven intensity distribution. There exists an angular averaging option to stabilize the inversion algorithm and not to loose the resolution at the same time.
The Aging Well through Interaction and Scientific Education (AgeWISE) Program.

PubMed

O'Connor, Maureen K; Kraft, Malissa L; Daley, Ryan; Sugarman, Michael A; Clark, Erika L; Scoglio, Arielle A J; Shirk, Steven D

2017-12-08

We conducted a randomized controlled trial of the Aging Well through Interaction and Scientific Education (AgeWISE) program, a 12-week manualized cognitive rehabilitation program designed to provide psychoeducation to older adults about the aging brain, lifestyle factors associated with successful brain aging, and strategies to compensate for age related cognitive decline. Forty-nine cognitively intact participants ≥ 60 years old were randomly assigned to the AgeWISE program (n = 25) or a no-treatment control group (n = 24). Questionnaire data were collected prior to group assignment and post intervention. Two-factor repeated-measures analyses of covariance (ANCOVAs) were used to compare group outcomes. Upon completion, participants in the AgeWISE program reported increases in memory contentment and their sense of control in improving memory; no significant changes were observed in the control group. Surprisingly, participation in the group was not associated with significant changes in knowledge of memory aging, perception of memory ability, or greater use of strategies. The AgeWISE program was successfully implemented and increased participants' memory contentment and their sense of control in improving memory in advancing age. This study supports the use of AgeWISE to improve perspectives on healthy cognitive aging.
Distributed learning enhances relational memory consolidation.

PubMed

Litman, Leib; Davachi, Lila

2008-09-01

It has long been known that distributed learning (DL) provides a mnemonic advantage over massed learning (ML). However, the underlying mechanisms that drive this robust mnemonic effect remain largely unknown. In two experiments, we show that DL across a 24 hr interval does not enhance immediate memory performance but instead slows the rate of forgetting relative to ML. Furthermore, we demonstrate that this savings in forgetting is specific to relational, but not item, memory. In the context of extant theories and knowledge of memory consolidation, these results suggest that an important mechanism underlying the mnemonic benefit of DL is enhanced memory consolidation. We speculate that synaptic strengthening mechanisms supporting long-term memory consolidation may be differentially mediated by the spacing of memory reactivation. These findings have broad implications for the scientific study of episodic memory consolidation and, more generally, for educational curriculum development and policy.
Effects of a Memory and Visual-Motor Integration Program for Older Adults Based on Self-Efficacy Theory.

PubMed

Kim, Eun Hwi; Suh, Soon Rim

2017-06-01

This study was conducted to verify the effects of a memory and visual-motor integration program for older adults based on self-efficacy theory. A non-equivalent control group pretest-posttest design was implemented in this quasi-experimental study. The participants were 62 older adults from senior centers and older adult welfare facilities in D and G city (Experimental group=30, Control group=32). The experimental group took part in a 12-session memory and visual-motor integration program over 6 weeks. Data regarding memory self-efficacy, memory, visual-motor integration, and depression were collected from July to October of 2014 and analyzed with independent t-test and Mann-Whitney U test using PASW Statistics (SPSS) 18.0 to determine the effects of the interventions. Memory self-efficacy (t=2.20, p=.031), memory (Z=-2.92, p=.004), and visual-motor integration (Z=-2.49, p=.013) increased significantly in the experimental group as compared to the control group. However, depression (Z=-0.90, p=.367) did not decrease significantly. This program is effective for increasing memory, visual-motor integration, and memory self-efficacy in older adults. Therefore, it can be used to improve cognition and prevent dementia in older adults. © 2017 Korean Society of Nursing Science
Virtual reality-based prospective memory training program for people with acquired brain injury.

PubMed

Yip, Ben C B; Man, David W K

2013-01-01

Acquired brain injuries (ABI) may display cognitive impairments and lead to long-term disabilities including prospective memory (PM) failure. Prospective memory serves to remember to execute an intended action in the future. PM problems would be a challenge to an ABI patient's successful community reintegration. While retrospective memory (RM) has been extensively studied, treatment programs for prospective memory are rarely reported. The development of a treatment program for PM, which is considered timely, can be cost-effective and appropriate to the patient's environment. A 12-session virtual reality (VR)-based cognitive rehabilitation program was developed using everyday PM activities as training content. 37 subjects were recruited to participate in a pretest-posttest control experimental study to evaluate its treatment effectiveness. Results suggest that significantly better changes were seen in both VR-based and real-life PM outcome measures, related cognitive attributes such as frontal lobe functions and semantic fluency. VR-based training may be well accepted by ABI patients as encouraging improvement has been shown. Large-scale studies of a virtual reality-based prospective memory (VRPM) training program are indicated.
The Memory Fitness Program: Cognitive Effects of a Healthy Aging Intervention

PubMed Central

Miller, Karen J.; Siddarth, Prabha; Gaines, Jean M.; Parrish, John M.; Ercoli, Linda M.; Marx, Katherine; Ronch, Judah; Pilgram, Barbara; Burke, Kasey; Barczak, Nancy; Babcock, Bridget; Small, Gary W.

2014-01-01

Context Age-related memory decline affects a large proportion of older adults. Cognitive training, physical exercise, and other lifestyle habits may help to minimize self-perception of memory loss and a decline in objective memory performance. Objective The purpose of this study was to determine whether a 6-week educational program on memory training, physical activity, stress reduction, and healthy diet led to improved memory performance in older adults. Design A convenience sample of 115 participants (mean age: 80.9 [SD: 6.0 years]) was recruited from two continuing care retirement communities. The intervention consisted of 60-minute classes held twice weekly with 15–20 participants per class. Testing of both objective and subjective cognitive performance occurred at baseline, preintervention, and postintervention. Objective cognitive measures evaluated changes in five domains: immediate verbal memory, delayed verbal memory, retention of verbal information, memory recognition, and verbal fluency. A standardized metamemory instrument assessed four domains of memory self-awareness: frequency and severity of forgetting, retrospective functioning, and mnemonics use. Results The intervention program resulted in significant improvements on objective measures of memory, including recognition of word pairs (t[114] = 3.62, p < 0.001) and retention of verbal information from list learning (t[114] = 2.98, p < 0.01). No improvement was found for verbal fluency. Regarding subjective memory measures, the retrospective functioning score increased significantly following the intervention (t[114] = 4.54, p < 0.0001), indicating perception of a better memory. Conclusions These findings indicate that a 6-week healthy lifestyle program can improve both encoding and recalling of new verbal information, as well as self-perception of memory ability in older adults residing in continuing care retirement communities. PMID:21765343
Working Memory Training for Children with Cochlear Implants: A Pilot Study

ERIC Educational Resources Information Center

Kronenberger, William G.; Pisoni, David B.; Henning, Shirley C.; Colson, Bethany G.; Hazzard, Lindsey M.

2011-01-01

Purpose: This study investigated the feasibility and efficacy of a working memory training program for improving memory and language skills in a sample of 9 children who are deaf (age 7-15 years) with cochlear implants (CIs). Method: All children completed the Cogmed Working Memory Training program on a home computer over a 5-week period.…
Experiences using OpenMP based on Computer Directed Software DSM on a PC Cluster

NASA Technical Reports Server (NTRS)

Hess, Matthias; Jost, Gabriele; Mueller, Matthias; Ruehle, Roland

2003-01-01

In this work we report on our experiences running OpenMP programs on a commodity cluster of PCs running a software distributed shared memory (DSM) system. We describe our test environment and report on the performance of a subset of the NAS Parallel Benchmarks that have been automaticaly parallelized for OpenMP. We compare the performance of the OpenMP implementations with that of their message passing counterparts and discuss performance differences.
Parallelization and automatic data distribution for nuclear reactor simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liebrock, L.M.

1997-07-01

Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directlymore » affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed.« less
Bermuda Triangle: a subsystem of the 168/E interfacing scheme used by Group B at SLAC

DOE Office of Scientific and Technical Information (OSTI.GOV)

Oxoby, G.J.; Levinson, L.J.; Trang, Q.H.

1979-12-01

The Bermuda Triangle system is a method of interfacing several 168/E microprocessors to a central system for control of the processors and overlaying their memories. The system is a three-way interface with I/O ports to a large buffer memory, a PDP11 Unibus and a bus to the 168/E processors. Data may be transferred bidirectionally between any two ports. Two Bermuda Triangles are used, one for the program memory and one for the data memory. The program buffer memory stores the overlay programs for the 168/E, and the data buffer memory, the incoming raw data, the data portion of the overlays,more » and the outgoing processed events. This buffering is necessary since the memories of 168/E microprocessors are small compared to the main program and the amount of data being processed. The link to the computer facility is via a Unibus to IBM channel interface. A PDP11/04 controls the data flow. 7 figures, 4 tables. (RWR)« less
Guidance system operations plan for manned CSM earth orbital and lunar missions using program COLOSSUS 3. Section 7: Erasable memory programs

NASA Technical Reports Server (NTRS)

Hamilton, M. H.

1972-01-01

Erasable-memory programs designed for guidance computers used in command and lunar modules are presented. The purpose, functional description, assumptions, restrictions, and imitations are given for each program.
Remembering forward: Neural correlates of memory and prediction in human motor adaptation

PubMed Central

Scheidt, Robert A; Zimbelman, Janice L; Salowitz, Nicole M G; Suminski, Aaron J; Mosier, Kristine M; Houk, James; Simo, Lucia

2011-01-01

We used functional MR imaging (FMRI), a robotic manipulandum and systems identification techniques to examine neural correlates of predictive compensation for spring-like loads during goal-directed wrist movements in neurologically-intact humans. Although load changed unpredictably from one trial to the next, subjects nevertheless used sensorimotor memories from recent movements to predict and compensate upcoming loads. Prediction enabled subjects to adapt performance so that the task was accomplished with minimum effort. Population analyses of functional images revealed a distributed, bilateral network of cortical and subcortical activity supporting predictive load compensation during visual target capture. Cortical regions - including prefrontal, parietal and hippocampal cortices - exhibited trial-by-trial fluctuations in BOLD signal consistent with the storage and recall of sensorimotor memories or “states” important for spatial working memory. Bilateral activations in associative regions of the striatum demonstrated temporal correlation with the magnitude of kinematic performance error (a signal that could drive reward-optimizing reinforcement learning and the prospective scaling of previously learned motor programs). BOLD signal correlations with load prediction were observed in the cerebellar cortex and red nuclei (consistent with the idea that these structures generate adaptive fusimotor signals facilitating cancellation of expected proprioceptive feedback, as required for conditional feedback adjustments to ongoing motor commands and feedback error learning). Analysis of single subject images revealed that predictive activity was at least as likely to be observed in more than one of these neural systems as in just one. We conclude therefore that motor adaptation is mediated by predictive compensations supported by multiple, distributed, cortical and subcortical structures. PMID:21840405
Efficient generation of sum-of-products representations of high-dimensional potential energy surfaces based on multimode expansions

NASA Astrophysics Data System (ADS)

Ziegler, Benjamin; Rauhut, Guntram

2016-03-01

The transformation of multi-dimensional potential energy surfaces (PESs) from a grid-based multimode representation to an analytical one is a standard procedure in quantum chemical programs. Within the framework of linear least squares fitting, a simple and highly efficient algorithm is presented, which relies on a direct product representation of the PES and a repeated use of Kronecker products. It shows the same scalings in computational cost and memory requirements as the potfit approach. In comparison to customary linear least squares fitting algorithms, this corresponds to a speed-up and memory saving by several orders of magnitude. Different fitting bases are tested, namely, polynomials, B-splines, and distributed Gaussians. Benchmark calculations are provided for the PESs of a set of small molecules.
Efficient generation of sum-of-products representations of high-dimensional potential energy surfaces based on multimode expansions.

PubMed

Ziegler, Benjamin; Rauhut, Guntram

2016-03-21

The transformation of multi-dimensional potential energy surfaces (PESs) from a grid-based multimode representation to an analytical one is a standard procedure in quantum chemical programs. Within the framework of linear least squares fitting, a simple and highly efficient algorithm is presented, which relies on a direct product representation of the PES and a repeated use of Kronecker products. It shows the same scalings in computational cost and memory requirements as the potfit approach. In comparison to customary linear least squares fitting algorithms, this corresponds to a speed-up and memory saving by several orders of magnitude. Different fitting bases are tested, namely, polynomials, B-splines, and distributed Gaussians. Benchmark calculations are provided for the PESs of a set of small molecules.
A message passing kernel for the hypercluster parallel processing test bed

NASA Technical Reports Server (NTRS)

Blech, Richard A.; Quealy, Angela; Cole, Gary L.

1989-01-01

A Message-Passing Kernel (MPK) for the Hypercluster parallel-processing test bed is described. The Hypercluster is being developed at the NASA Lewis Research Center to support investigations of parallel algorithms and architectures for computational fluid and structural mechanics applications. The Hypercluster resembles the hypercube architecture except that each node consists of multiple processors communicating through shared memory. The MPK efficiently routes information through the Hypercluster, using a message-passing protocol when necessary and faster shared-memory communication whenever possible. The MPK also interfaces all of the processors with the Hypercluster operating system (HYCLOPS), which runs on a Front-End Processor (FEP). This approach distributes many of the I/O tasks to the Hypercluster processors and eliminates the need for a separate I/O support program on the FEP.
Design of a QoS-controlled ATM-based communications system in chorus

NASA Astrophysics Data System (ADS)

Coulson, Geoff; Campbell, Andrew; Robin, Philippe; Blair, Gordon; Papathomas, Michael; Shepherd, Doug

1995-05-01

We describe the design of an application platform able to run distributed real-time and multimedia applications alongside conventional UNIX programs. The platform is embedded in a microkernel/PC environment and supported by an ATM-based, QoS-driven communications stack. In particular, we focus on resource-management aspects of the design and deal with CPU scheduling, network resource-management and memory-management issues. An architecture is presented that guarantees QoS levels of both communications and processing with varying degrees of commitment as specified by user-level QoS parameters. The architecture uses admission tests to determine whether or not new activities can be accepted and includes modules to translate user-level QoS parameters into representations usable by the scheduling, network, and memory-management subsystems.
Apollo guidance, navigation and control: Guidance system operations plans for manned LM earth orbital and lunar missions using Program COLOSSUS 3. Section 7: Erasable memory programs

NASA Technical Reports Server (NTRS)

Hamilton, M. H.

1972-01-01

Erasable-memory programs (EMPs) designed for the guidance computers used in the command (CMC) and lunar modules (LGC) are described. CMC programs are designated COLOSSUS 3, and the associated EMPs are identified by a three-digit number beginning with 5. LGC programs are designated LUMINARY 1E, and the associated EMPs are identified, with one exception, by a three-digit number beginning with 1. The exception is EMP 99. The EMPs vary in complexity from a simple flagbit setting to a long and intricate logical structure. They all, however, cause the computer to behave in a way not intended in the original design of the programs; they accomplish this off-nominal behavior by some alteration of erasable memory to interface with existing fixed-memory programs to effect a desired result.
Automatic Generation of Directive-Based Parallel Programs for Shared Memory Parallel Systems

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Yan, Jerry; Frumkin, Michael

2000-01-01

The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. Due to its ease of programming and its good performance, the technique has become very popular. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate directive-based, OpenMP, parallel programs. We outline techniques used in the implementation of the tool and present test results on the NAS parallel benchmarks and ARC3D, a CFD application. This work demonstrates the great potential of using computer-aided tools to quickly port parallel programs and also achieve good performance.
The efficacy of a multifactorial memory training in older adults living in residential care settings.

PubMed

Vranić, Andrea; Španić, Ana Marija; Carretti, Barbara; Borella, Erika

2013-11-01

Several studies have shown an increase in memory performance after teaching mnemonic techniques to older participants. However, transfer effects to non-trained tasks are generally either very small, or not found. The present study investigates the efficacy of a multifactorial memory training program for older adults living in a residential care center. The program combines teaching of memory strategies with activities based on metacognitive (metamemory) and motivational aspects. Specific training-related gains in the Immediate list recall task (criterion task), as well as transfer effects on measures of short-term memory, long-term memory, working memory, motivational (need for cognition), and metacognitive aspects (subjective measure of one's memory) were examined. Maintenance of training benefits was assessed after seven months. Fifty-one older adults living in a residential care center, with no cognitive impairments, participated in the study. Participants were randomly assigned to two programs: the experimental group attended the training program, while the active control group was involved in a program in which different psychological issues were discussed. A benefit in the criterion task and substantial general transfer effects were found for the trained group, but not for the active control, and they were maintained at the seven months follow-up. Our results suggest that training procedures, which combine teaching of strategies with metacognitive-motivational aspects, can improve cognitive functioning and attitude toward cognitive activities in older adults.

Fault Tolerant Frequent Pattern Mining

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shohdy, Sameh; Vishnu, Abhinav; Agrawal, Gagan

FP-Growth algorithm is a Frequent Pattern Mining (FPM) algorithm that has been extensively used to study correlations and patterns in large scale datasets. While several researchers have designed distributed memory FP-Growth algorithms, it is pivotal to consider fault tolerant FP-Growth, which can address the increasing fault rates in large scale systems. In this work, we propose a novel parallel, algorithm-level fault-tolerant FP-Growth algorithm. We leverage algorithmic properties and MPI advanced features to guarantee an O(1) space complexity, achieved by using the dataset memory space itself for checkpointing. We also propose a recovery algorithm that can use in-memory and disk-based checkpointing,more » though in many cases the recovery can be completed without any disk access, and incurring no memory overhead for checkpointing. We evaluate our FT algorithm on a large scale InfiniBand cluster with several large datasets using up to 2K cores. Our evaluation demonstrates excellent efficiency for checkpointing and recovery in comparison to the disk-based approach. We have also observed 20x average speed-up in comparison to Spark, establishing that a well designed algorithm can easily outperform a solution based on a general fault-tolerant programming model.« less
An Investigation of Unified Memory Access Performance in CUDA

PubMed Central

Landaverde, Raphael; Zhang, Tiansheng; Coskun, Ayse K.; Herbordt, Martin

2015-01-01

Managing memory between the CPU and GPU is a major challenge in GPU computing. A programming model, Unified Memory Access (UMA), has been recently introduced by Nvidia to simplify the complexities of memory management while claiming good overall performance. In this paper, we investigate this programming model and evaluate its performance and programming model simplifications based on our experimental results. We find that beyond on-demand data transfers to the CPU, the GPU is also able to request subsets of data it requires on demand. This feature allows UMA to outperform full data transfer methods for certain parallel applications and small data sizes. We also find, however, that for the majority of applications and memory access patterns, the performance overheads associated with UMA are significant, while the simplifications to the programming model restrict flexibility for adding future optimizations. PMID:26594668
Residual stresses in injection molded shape memory polymer parts

NASA Astrophysics Data System (ADS)

Katmer, Sukran; Esen, Huseyin; Karatas, Cetin

2016-03-01

Shape memory polymers (SMPs) are materials which have shape memory effect (SME). SME is a property which has the ability to change shape when induced by a stimulator such as temperature, moisture, pH, electric current, magnetic field, light, etc. A process, known as programming, is applied to SMP parts in order to alter them from their permanent shape to their temporary shape. In this study we investigated effects of injection molding and programming processes on residual stresses in molded thermoplastic polyurethane shape memory polymer, experimentally. The residual stresses were measured by layer removal method. The study shows that injection molding and programming process conditions have significantly influence on residual stresses in molded shape memory polyurethane parts.
Performance Evaluation of Remote Memory Access (RMA) Programming on Shared Memory Parallel Computers

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Jost, Gabriele; Biegel, Bryan A. (Technical Monitor)

2002-01-01

The purpose of this study is to evaluate the feasibility of remote memory access (RMA) programming on shared memory parallel computers. We discuss different RMA based implementations of selected CFD application benchmark kernels and compare them to corresponding message passing based codes. For the message-passing implementation we use MPI point-to-point and global communication routines. For the RMA based approach we consider two different libraries supporting this programming model. One is a shared memory parallelization library (SMPlib) developed at NASA Ames, the other is the MPI-2 extensions to the MPI Standard. We give timing comparisons for the different implementation strategies and discuss the performance.
NEWTONP - CUMULATIVE BINOMIAL PROGRAMS

NASA Technical Reports Server (NTRS)

Bowerman, P. N.

1994-01-01

The cumulative binomial program, NEWTONP, is one of a set of three programs which calculate cumulative binomial probability distributions for arbitrary inputs. The three programs, NEWTONP, CUMBIN (NPO-17555), and CROSSER (NPO-17557), can be used independently of one another. NEWTONP can be used by statisticians and users of statistical procedures, test planners, designers, and numerical analysts. The program has been used for reliability/availability calculations. NEWTONP calculates the probably p required to yield a given system reliability V for a k-out-of-n system. It can also be used to determine the Clopper-Pearson confidence limits (either one-sided or two-sided) for the parameter p of a Bernoulli distribution. NEWTONP can determine Bayesian probability limits for a proportion (if the beta prior has positive integer parameters). It can determine the percentiles of incomplete beta distributions with positive integer parameters. It can also determine the percentiles of F distributions and the midian plotting positions in probability plotting. NEWTONP is designed to work well with all integer values 0 < k <= n. To run the program, the user simply runs the executable version and inputs the information requested by the program. NEWTONP is not designed to weed out incorrect inputs, so the user must take care to make sure the inputs are correct. Once all input has been entered, the program calculates and lists the result. It also lists the number of iterations of Newton's method required to calculate the answer within the given error. The NEWTONP program is written in C. It was developed on an IBM AT with a numeric co-processor using Microsoft C 5.0. Because the source code is written using standard C structures and functions, it should compile correctly with most C compilers. The program format is interactive. It has been implemented under DOS 3.2 and has a memory requirement of 26K. NEWTONP was developed in 1988.
Expressing Parallelism with ROOT

NASA Astrophysics Data System (ADS)

Piparo, D.; Tejedor, E.; Guiraud, E.; Ganis, G.; Mato, P.; Moneta, L.; Valls Pla, X.; Canal, P.

2017-10-01

The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.
An integrated runtime and compile-time approach for parallelizing structured and block structured applications

NASA Technical Reports Server (NTRS)

Agrawal, Gagan; Sussman, Alan; Saltz, Joel

1993-01-01

Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid codes) and/or irregularly coupled (called multiblock or irregularly coupled regular mesh problems). A combined runtime and compile-time approach for parallelizing these applications on distributed memory parallel machines in an efficient and machine-independent fashion was described. A runtime library which can be used to port these applications on distributed memory machines was designed and implemented. The library is currently implemented on several different systems. To further ease the task of application programmers, methods were developed for integrating this runtime library with compilers for HPK-like parallel programming languages. How this runtime library was integrated with the Fortran 90D compiler being developed at Syracuse University is discussed. Experimental results to demonstrate the efficacy of our approach are presented. A multiblock Navier-Stokes solver template and a multigrid code were experimented with. Our experimental results show that our primitives have low runtime communication overheads. Further, the compiler parallelized codes perform within 20 percent of the code parallelized by manually inserting calls to the runtime library.
Expressing Parallelism with ROOT

DOE Office of Scientific and Technical Information (OSTI.GOV)

Piparo, D.; Tejedor, E.; Guiraud, E.

The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module inmore » Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.« less
Feasibility study of current pulse induced 2-bit/4-state multilevel programming in phase-change memory

NASA Astrophysics Data System (ADS)

Liu, Yan; Fan, Xi; Chen, Houpeng; Wang, Yueqing; Liu, Bo; Song, Zhitang; Feng, Songlin

2017-08-01

In this brief, multilevel data storage for phase-change memory (PCM) has attracted more attention in the memory market to implement high capacity memory system and reduce cost-per-bit. In this work, we present a universal programing method of SET stair-case current pulse in PCM cells, which can exploit the optimum programing scheme to achieve 2-bit/ 4state resistance-level with equal logarithm interval. SET stair-case waveform can be optimized by TCAD real time simulation to realize multilevel data storage efficiently in an arbitrary phase change material. Experimental results from 1 k-bit PCM test-chip have validated the proposed multilevel programing scheme. This multilevel programming scheme has improved the information storage density, robustness of resistance-level, energy efficient and avoiding process complexity.
The Effect of the Underlying Distribution in Hurst Exponent Estimation

PubMed Central

Sánchez, Miguel Ángel; Trinidad, Juan E.; García, José; Fernández, Manuel

2015-01-01

In this paper, a heavy-tailed distribution approach is considered in order to explore the behavior of actual financial time series. We show that this kind of distribution allows to properly fit the empirical distribution of the stocks from S&P500 index. In addition to that, we explain in detail why the underlying distribution of the random process under study should be taken into account before using its self-similarity exponent as a reliable tool to state whether that financial series displays long-range dependence or not. Finally, we show that, under this model, no stocks from S&P500 index show persistent memory, whereas some of them do present anti-persistent memory and most of them present no memory at all. PMID:26020942
DOE Office of Scientific and Technical Information (OSTI.GOV)

An, Ho-Myoung; Kim, Hee-Dong; Kim, Tae Geun, E-mail: tgkim1@korea.ac.kr

Graphical abstract: The degradation tendency extracted by CP technique was almost the same in both the bulk-type and TFT-type cells. - Highlights: • D{sub it} is directly investigated from bulk-type and TFT-type CTF memory. • Charge pumping technique was employed to analyze the D{sub it} information. • To apply the CP technique to monitor the reliability of the 3D NAND flash. - Abstract: The energy distribution and density of interface traps (D{sub it}) are directly investigated from bulk-type and thin-film transistor (TFT)-type charge trap flash memory cells with tunnel oxide degradation, under program/erase (P/E) cycling using a charge pumping (CP)more » technique, in view of application in a 3-demension stackable NAND flash memory cell. After P/E cycling in bulk-type devices, the interface trap density gradually increased from 1.55 × 10{sup 12} cm{sup −2} eV{sup −1} to 3.66 × 10{sup 13} cm{sup −2} eV{sup −1} due to tunnel oxide damage, which was consistent with the subthreshold swing and transconductance degradation after P/E cycling. Its distribution moved toward shallow energy levels with increasing cycling numbers, which coincided with the decay rate degradation with short-term retention time. The tendency extracted with the CP technique for D{sub it} of the TFT-type cells was similar to those of bulk-type cells.« less
Modeling Distributions of Immediate Memory Effects: No Strategies Needed?

ERIC Educational Resources Information Center

Beaman, C. Philip; Neath, Ian; Surprenant, Aimee M.

2008-01-01

Many models of immediate memory predict the presence or absence of various effects, but none have been tested to see whether they predict an appropriate distribution of effect sizes. The authors show that the feature model (J. S. Nairne, 1990) produces appropriate distributions of effect sizes for both the phonological confusion effect and the…
Line-by-line spectroscopic simulations on graphics processing units

NASA Astrophysics Data System (ADS)

Collange, Sylvain; Daumas, Marc; Defour, David

2008-01-01

We report here on software that performs line-by-line spectroscopic simulations on gases. Elaborate models (such as narrow band and correlated-K) are accurate and efficient for bands where various components are not simultaneously and significantly active. Line-by-line is probably the most accurate model in the infrared for blends of gases that contain high proportions of H 2O and CO 2 as this was the case for our prototype simulation. Our implementation on graphics processing units sustains a speedup close to 330 on computation-intensive tasks and 12 on memory intensive tasks compared to implementations on one core of high-end processors. This speedup is due to data parallelism, efficient memory access for specific patterns and some dedicated hardware operators only available in graphics processing units. It is obtained leaving most of processor resources available and it would scale linearly with the number of graphics processing units in parallel machines. Line-by-line simulation coupled with simulation of fluid dynamics was long believed to be economically intractable but our work shows that it could be done with some affordable additional resources compared to what is necessary to perform simulations on fluid dynamics alone. Program summaryProgram title: GPU4RE Catalogue identifier: ADZY_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADZY_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 62 776 No. of bytes in distributed program, including test data, etc.: 1 513 247 Distribution format: tar.gz Programming language: C++ Computer: x86 PC Operating system: Linux, Microsoft Windows. Compilation requires either gcc/g++ under Linux or Visual C++ 2003/2005 and Cygwin under Windows. It has been tested using gcc 4.1.2 under Ubuntu Linux 7.04 and using Visual C++ 2005 with Cygwin 1.5.24 under Windows XP. RAM: 1 gigabyte Classification: 21.2 External routines: OpenGL ( http://www.opengl.org) Nature of problem: Simulating radiative transfer on high-temperature high-pressure gases. Solution method: Line-by-line Monte-Carlo ray-tracing. Unusual features: Parallel computations are moved to the GPU. Additional comments: nVidia GeForce 7000 or ATI Radeon X1000 series graphics processing unit is required. Running time: A few minutes.
CoMD Implementation Suite in Emerging Programming Models

DOE Office of Scientific and Technical Information (OSTI.GOV)

Haque, Riyaz; Reeve, Sam; Juallmes, Luc

CoMD-Em is a software implementation suite of the CoMD [4] proxy app using different emerging programming models. It is intended to analyze the features and capabilities of novel programming models that could help ensure code and performance portability and scalability across heterogeneous platforms while improving programmer productivity. Another goal is to provide the authors and venders with some meaningful feedback regarding the capabilities and limitations of their models. The actual application is a classical molecular dynamics (MD) simulation using either the Lennard-Jones method (LJ) or the embedded atom method (EAM) for primary particle interaction. The code can be extended tomore » support alternate interaction models. The code is expected ro run on a wide class of heterogeneous hardware configurations like shard/distributed/hybrid memory, GPU's and any other platform supported by the underlying programming model.« less
Method for programming a flash memory

DOEpatents

Brosky, Alexander R.; Locke, William N.; Maher, Conrado M.

2016-08-23

A method of programming a flash memory is described. The method includes partitioning a flash memory into a first group having a first level of write-protection, a second group having a second level of write-protection, and a third group having a third level of write-protection. The write-protection of the second and third groups is disabled using an installation adapter. The third group is programmed using a Software Installation Device.
Sparse distributed memory prototype: Principles of operation

NASA Technical Reports Server (NTRS)

Flynn, Michael J.; Kanerva, Pentti; Ahanin, Bahram; Bhadkamkar, Neal; Flaherty, Paul; Hickey, Philip

1988-01-01

Sparse distributed memory is a generalized random access memory (RAM) for long binary words. Such words can be written into and read from the memory, and they can be used to address the memory. The main attribute of the memory is sensitivity to similarity, meaning that a word can be read back not only by giving the original right address but also by giving one close to it as measured by the Hamming distance between addresses. Large memories of this kind are expected to have wide use in speech and scene analysis, in signal detection and verification, and in adaptive control of automated equipment. The memory can be realized as a simple, massively parallel computer. Digital technology has reached a point where building large memories is becoming practical. The research is aimed at resolving major design issues that have to be faced in building the memories. The design of a prototype memory with 256-bit addresses and from 8K to 128K locations for 256-bit words is described. A key aspect of the design is extensive use of dynamic RAM and other standard components.
Performance and Application of Parallel OVERFLOW Codes on Distributed and Shared Memory Platforms

NASA Technical Reports Server (NTRS)

Djomehri, M. Jahed; Rizk, Yehia M.

1999-01-01

The presentation discusses recent studies on the performance of the two parallel versions of the aerodynamics CFD code, OVERFLOW_MPI and _MLP. Developed at NASA Ames, the serial version, OVERFLOW, is a multidimensional Navier-Stokes flow solver based on overset (Chimera) grid technology. The code has recently been parallelized in two ways. One is based on the explicit message-passing interface (MPI) across processors and uses the _MPI communication package. This approach is primarily suited for distributed memory systems and workstation clusters. The second, termed the multi-level parallel (MLP) method, is simple and uses shared memory for all communications. The _MLP code is suitable on distributed-shared memory systems. For both methods, the message passing takes place across the processors or processes at the advancement of each time step. This procedure is, in effect, the Chimera boundary conditions update, which is done in an explicit "Jacobi" style. In contrast, the update in the serial code is done in more of the "Gauss-Sidel" fashion. The programming efforts for the _MPI code is more complicated than for the _MLP code; the former requires modification of the outer and some inner shells of the serial code, whereas the latter focuses only on the outer shell of the code. The _MPI version offers a great deal of flexibility in distributing grid zones across a specified number of processors in order to achieve load balancing. The approach is capable of partitioning zones across multiple processors or sending each zone and/or cluster of several zones into a single processor. The message passing across the processors consists of Chimera boundary and/or an overlap of "halo" boundary points for each partitioned zone. The MLP version is a new coarse-grain parallel concept at the zonal and intra-zonal levels. A grouping strategy is used to distribute zones into several groups forming sub-processes which will run in parallel. The total volume of grid points in each group are approximately balanced. A proper number of threads are initially allocated to each group, and in subsequent iterations during the run-time, the number of threads are adjusted to achieve load balancing across the processes. Each process exploits the multitasking directives already established in Overflow.
PREDICTING TURBINE STAGE PERFORMANCE

NASA Technical Reports Server (NTRS)

Boyle, R. J.

1994-01-01

This program was developed to predict turbine stage performance taking into account the effects of complex passage geometries. The method uses a quasi-3D inviscid-flow analysis iteratively coupled to calculated losses so that changes in losses result in changes in the flow distribution. In this manner the effects of both the geometry on the flow distribution and the flow distribution on losses are accounted for. The flow may be subsonic or shock-free transonic. The blade row may be fixed or rotating, and the blades may be twisted and leaned. This program has been applied to axial and radial turbines, and is helpful in the analysis of mixed flow machines. This program is a combination of the flow analysis programs MERIDL and TSONIC coupled to the boundary layer program BLAYER. The subsonic flow solution is obtained by a finite difference, stream function analysis. Transonic blade-to-blade solutions are obtained using information from the finite difference, stream function solution with a reduced flow factor. Upstream and downstream flow variables may vary from hub to shroud and provision is made to correct for loss of stagnation pressure. Boundary layer analyses are made to determine profile and end-wall friction losses. Empirical loss models are used to account for incidence, secondary flow, disc windage, and clearance losses. The total losses are then used to calculate stator, rotor, and stage efficiency. This program is written in FORTRAN IV for batch execution and has been implemented on an IBM 370/3033 under TSS with a central memory requirement of approximately 4.5 Megs of 8 bit bytes. This program was developed in 1985.
Multithreaded transactions in scientific computing: New versions of a computer program for kinematical calculations of RHEED intensity oscillations

NASA Astrophysics Data System (ADS)

Brzuszek, Marcin; Daniluk, Andrzej

2006-11-01

Writing a concurrent program can be more difficult than writing a sequential program. Programmer needs to think about synchronisation, race conditions and shared variables. Transactions help reduce the inconvenience of using threads. A transaction is an abstraction, which allows programmers to group a sequence of actions on the program into a logical, higher-level computation unit. This paper presents multithreaded versions of the GROWTH program, which allow to calculate the layer coverages during the growth of thin epitaxial films and the corresponding RHEED intensities according to the kinematical approximation. The presented programs also contain graphical user interfaces, which enable displaying program data at run-time. New version program summaryTitles of programs:GROWTHGr, GROWTH06 Catalogue identifier:ADVL_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADVL_v2_0 Program obtainable from:CPC Program Library, Queen's University of Belfast, N. Ireland Catalogue identifier of previous version:ADVL Does the new version supersede the original program:No Computer for which the new version is designed and others on which it has been tested: Pentium-based PC Operating systems or monitors under which the new version has been tested: Windows 9x, XP, NT Programming language used:Object Pascal Memory required to execute with typical data:More than 1 MB Number of bits in a word:64 bits Number of processors used:1 No. of lines in distributed program, including test data, etc.:20 931 Number of bytes in distributed program, including test data, etc.: 1 311 268 Distribution format:tar.gz Nature of physical problem: The programs compute the RHEED intensities during the growth of thin epitaxial structures prepared using the molecular beam epitaxy (MBE). The computations are based on the use of kinematical diffraction theory [P.I. Cohen, G.S. Petrich, P.R. Pukite, G.J. Whaley, A.S. Arrott, Surf. Sci. 216 (1989) 222. [1
Portable parallel stochastic optimization for the design of aeropropulsion components

NASA Technical Reports Server (NTRS)

Sues, Robert H.; Rhodes, G. S.

1994-01-01

This report presents the results of Phase 1 research to develop a methodology for performing large-scale Multi-disciplinary Stochastic Optimization (MSO) for the design of aerospace systems ranging from aeropropulsion components to complete aircraft configurations. The current research recognizes that such design optimization problems are computationally expensive, and require the use of either massively parallel or multiple-processor computers. The methodology also recognizes that many operational and performance parameters are uncertain, and that uncertainty must be considered explicitly to achieve optimum performance and cost. The objective of this Phase 1 research was to initialize the development of an MSO methodology that is portable to a wide variety of hardware platforms, while achieving efficient, large-scale parallelism when multiple processors are available. The first effort in the project was a literature review of available computer hardware, as well as review of portable, parallel programming environments. The first effort was to implement the MSO methodology for a problem using the portable parallel programming language, Parallel Virtual Machine (PVM). The third and final effort was to demonstrate the example on a variety of computers, including a distributed-memory multiprocessor, a distributed-memory network of workstations, and a single-processor workstation. Results indicate the MSO methodology can be well-applied towards large-scale aerospace design problems. Nearly perfect linear speedup was demonstrated for computation of optimization sensitivity coefficients on both a 128-node distributed-memory multiprocessor (the Intel iPSC/860) and a network of workstations (speedups of almost 19 times achieved for 20 workstations). Very high parallel efficiencies (75 percent for 31 processors and 60 percent for 50 processors) were also achieved for computation of aerodynamic influence coefficients on the Intel. Finally, the multi-level parallelization strategy that will be needed for large-scale MSO problems was demonstrated to be highly efficient. The same parallel code instructions were used on both platforms, demonstrating portability. There are many applications for which MSO can be applied, including NASA's High-Speed-Civil Transport, and advanced propulsion systems. The use of MSO will reduce design and development time and testing costs dramatically.

Experiences Using OpenMP Based on Compiler Directed Software DSM on a PC Cluster

NASA Technical Reports Server (NTRS)

Hess, Matthias; Jost, Gabriele; Mueller, Matthias; Ruehle, Roland; Biegel, Bryan (Technical Monitor)

2002-01-01

In this work we report on our experiences running OpenMP (message passing) programs on a commodity cluster of PCs (personal computers) running a software distributed shared memory (DSM) system. We describe our test environment and report on the performance of a subset of the NAS (NASA Advanced Supercomputing) Parallel Benchmarks that have been automatically parallelized for OpenMP. We compare the performance of the OpenMP implementations with that of their message passing counterparts and discuss performance differences.
lsjk—a C++ library for arbitrary-precision numeric evaluation of the generalized log-sine functions

NASA Astrophysics Data System (ADS)

Kalmykov, M. Yu.; Sheplyakov, A.

2005-10-01

Generalized log-sine functions Lsj(k)(θ) appear in higher order ɛ-expansion of different Feynman diagrams. We present an algorithm for the numerical evaluation of these functions for real arguments. This algorithm is implemented as a C++ library with arbitrary-precision arithmetics for integer 0⩽k⩽9 and j⩾2. Some new relations and representations of the generalized log-sine functions are given. Program summaryTitle of program:lsjk Catalogue number:ADVS Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADVS Program obtained from: CPC Program Library, Queen's University of Belfast, N. Ireland Licensing terms: GNU General Public License Computers:all Operating systems:POSIX Programming language:C++ Memory required to execute:Depending on the complexity of the problem, at least 32 MB RAM recommended No. of lines in distributed program, including testing data, etc.:41 975 No. of bytes in distributed program, including testing data, etc.:309 156 Distribution format:tar.gz Other programs called:The CLN library for arbitrary-precision arithmetics is required at version 1.1.5 or greater External files needed:none Nature of the physical problem:Numerical evaluation of the generalized log-sine functions for real argument in the region 0<θ<π. These functions appear in Feynman integrals Method of solution:Series representation for the real argument in the region 0<θ<π Restriction on the complexity of the problem:Limited up to Lsj(9)(θ), and j is an arbitrary integer number. Thus, all function up to the weight 12 in the region 0<θ<π can be evaluated. The algorithm can be extended up to higher values of k(k>9) without modification Typical running time:Depending on the complexity of problem. See text below.
Memory-assisted quantum key distribution resilient against multiple-excitation effects

NASA Astrophysics Data System (ADS)

Lo Piparo, Nicolò; Sinclair, Neil; Razavi, Mohsen

2018-01-01

Memory-assisted measurement-device-independent quantum key distribution (MA-MDI-QKD) has recently been proposed as a technique to improve the rate-versus-distance behavior of QKD systems by using existing, or nearly-achievable, quantum technologies. The promise is that MA-MDI-QKD would require less demanding quantum memories than the ones needed for probabilistic quantum repeaters. Nevertheless, early investigations suggest that, in order to beat the conventional memory-less QKD schemes, the quantum memories used in the MA-MDI-QKD protocols must have high bandwidth-storage products and short interaction times. Among different types of quantum memories, ensemble-based memories offer some of the required specifications, but they typically suffer from multiple excitation effects. To avoid the latter issue, in this paper, we propose two new variants of MA-MDI-QKD both relying on single-photon sources for entangling purposes. One is based on known techniques for entanglement distribution in quantum repeaters. This scheme turns out to offer no advantage even if one uses ideal single-photon sources. By finding the root cause of the problem, we then propose another setup, which can outperform single memory-less setups even if we allow for some imperfections in our single-photon sources. For such a scheme, we compare the key rate for different types of ensemble-based memories and show that certain classes of atomic ensembles can improve the rate-versus-distance behavior.
C++QEDv2 Milestone 10: A C++/Python application-programming framework for simulating open quantum dynamics

NASA Astrophysics Data System (ADS)

Sandner, Raimar; Vukics, András

2014-09-01

The v2 Milestone 10 release of C++QED is primarily a feature release, which also corrects some problems of the previous release, especially as regards the build system. The adoption of C++11 features has led to many simplifications in the codebase. A full doxygen-based API manual [1] is now provided together with updated user guides. A largely automated, versatile new testsuite directed both towards computational and physics features allows for quickly spotting arising errors. The states of trajectories are now savable and recoverable with full binary precision, allowing for trajectory continuation regardless of evolution method (single/ensemble Monte Carlo wave-function or Master equation trajectory). As the main new feature, the framework now presents Python bindings to the highest-level programming interface, so that actual simulations for given composite quantum systems can now be performed from Python. Catalogue identifier: AELU_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AELU_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: yes No. of lines in distributed program, including test data, etc.: 492422 No. of bytes in distributed program, including test data, etc.: 8070987 Distribution format: tar.gz Programming language: C++/Python. Computer: i386-i686, x86 64. Operating system: In principle cross-platform, as yet tested only on UNIX-like systems (including Mac OS X). RAM: The framework itself takes about 60MB, which is fully shared. The additional memory taken by the program which defines the actual physical system (script) is typically less than 1MB. The memory storing the actual data scales with the system dimension for state-vector manipulations, and the square of the dimension for density-operator manipulations. This might easily be GBs, and often the memory of the machine limits the size of the simulated system. Classification: 4.3, 4.13, 6.2. External routines: Boost C++ libraries, GNU Scientific Library, Blitz++, FLENS, NumPy, SciPy Catalogue identifier of previous version: AELU_v1_0 Journal reference of previous version: Comput. Phys. Comm. 183 (2012) 1381 Does the new version supersede the previous version?: Yes Nature of problem: Definition of (open) composite quantum systems out of elementary building blocks [2,3]. Manipulation of such systems, with emphasis on dynamical simulations such as Master-equation evolution [4] and Monte Carlo wave-function simulation [5]. Solution method: Master equation, Monte Carlo wave-function method Reasons for new version: The new version is mainly a feature release, but it does correct some problems of the previous version, especially as regards the build system. Summary of revisions: We give an example for a typical Python script implementing the ring-cavity system presented in Sec. 3.3 of Ref. [2]: Restrictions: Total dimensionality of the system. Master equation-few thousands. Monte Carlo wave-function trajectory-several millions. Unusual features: Because of the heavy use of compile-time algorithms, compilation of programs written in the framework may take a long time and much memory (up to several GBs). Additional comments: The framework is not a program, but provides and implements an application-programming interface for developing simulations in the indicated problem domain. We use several C++11 features which limits the range of supported compilers (g++ 4.7, clang++ 3.1) Documentation, http://cppqed.sourceforge.net/ Running time: Depending on the magnitude of the problem, can vary from a few seconds to weeks. References: [1] Entry point: http://cppqed.sf.net [2] A. Vukics, C++QEDv2: The multi-array concept and compile-time algorithms in the definition of composite quantum systems, Comp. Phys. Comm. 183(2012)1381. [3] A. Vukics, H. Ritsch, C++QED: an object-oriented framework for wave-function simulations of cavity QED systems, Eur. Phys. J. D 44 (2007) 585. [4] H. J. Carmichael, An Open Systems Approach to Quantum Optics, Springer, 1993. [5] J. Dalibard, Y. Castin, K. Molmer, Wave-function approach to dissipative processes in quantum optics, Phys. Rev. Lett. 68 (1992) 580.
Performance and scalability evaluation of "Big Memory" on Blue Gene Linux.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yoshii, K.; Iskra, K.; Naik, H.

2011-05-01

We address memory performance issues observed in Blue Gene Linux and discuss the design and implementation of 'Big Memory' - an alternative, transparent memory space introduced to eliminate the memory performance issues. We evaluate the performance of Big Memory using custom memory benchmarks, NAS Parallel Benchmarks, and the Parallel Ocean Program, at a scale of up to 4,096 nodes. We find that Big Memory successfully resolves the performance issues normally encountered in Blue Gene Linux. For the ocean simulation program, we even find that Linux with Big Memory provides better scalability than does the lightweight compute node kernel designed solelymore » for high-performance applications. Originally intended exclusively for compute node tasks, our new memory subsystem dramatically improves the performance of certain I/O node applications as well. We demonstrate this performance using the central processor of the LOw Frequency ARray radio telescope as an example.« less
Low latency and persistent data storage

DOEpatents

Fitch, Blake G; Franceschini, Michele M; Jagmohan, Ashish; Takken, Todd

2014-11-04

Persistent data storage is provided by a computer program product that includes computer program code configured for receiving a low latency store command that includes write data. The write data is written to a first memory device that is implemented by a nonvolatile solid-state memory technology characterized by a first access speed. It is acknowledged that the write data has been successfully written to the first memory device. The write data is written to a second memory device that is implemented by a volatile memory technology. At least a portion of the data in the first memory device is written to a third memory device when a predetermined amount of data has been accumulated in the first memory device. The third memory device is implemented by a nonvolatile solid-state memory technology characterized by a second access speed that is slower than the first access speed.
Execution time supports for adaptive scientific algorithms on distributed memory machines

NASA Technical Reports Server (NTRS)

Berryman, Harry; Saltz, Joel; Scroggs, Jeffrey

1990-01-01

Optimizations are considered that are required for efficient execution of code segments that consists of loops over distributed data structures. The PARTI (Parallel Automated Runtime Toolkit at ICASE) execution time primitives are designed to carry out these optimizations and can be used to implement a wide range of scientific algorithms on distributed memory machines. These primitives allow the user to control array mappings in a way that gives an appearance of shared memory. Computations can be based on a global index set. Primitives are used to carry out gather and scatter operations on distributed arrays. Communications patterns are derived at runtime, and the appropriate send and receive messages are automatically generated.
Neuro-Cognitive Intervention for Working Memory: Preliminary Results and Future Directions.

PubMed

Bree, Kathleen D; Beljan, Paul

2016-01-01

Definitions of working memory identify it as a function of the executive function system in which an individual maintains two or more pieces of information in mind and uses that information simultaneously for some purpose. In academics, working memory is necessary for a variety of functions, including attending to the information one's teacher presents and then using that information simultaneously for problem solving. Research indicates difficulties with working memory are observed in children with mathematics learning disorder (MLD) and reading disorders (RD). To improve working memory and other executive function difficulties, and as an alternative to medication treatments for attention and executive function disorders, the Motor Cognition(2)® (MC(2)®)program was developed. Preliminary research on this program indicates statistically significant improvements in working memory, mathematics, and nonsense word decoding for reading. Further research on the MC(2)® program and its impact on working memory, as well as other areas of executive functioning, is warranted.
Genome-wide Functional Analysis of CREB/Long-Term Memory-Dependent Transcription Reveals Distinct Basal and Memory Gene Expression Programs

PubMed Central

Lakhina, Vanisha; Arey, Rachel N.; Kaletsky, Rachel; Kauffman, Amanda; Stein, Geneva; Keyes, William; Xu, Daniel; Murphy, Coleen T.

2014-01-01

SUMMARY Induced CREB activity is a hallmark of long-term memory, but the full repertoire of CREB transcriptional targets required specifically for memory is not known in any system. To obtain a more complete picture of the mechanisms involved in memory, we combined memory training with genome-wide transcriptional analysis of C. elegans CREB mutants. This approach identified 757 significant CREB/memory-induced targets and confirmed the involvement of known memory genes from other organisms, but also suggested new mechanisms and novel components that may be conserved through mammals. CREB mediates distinct basal and memory transcriptional programs at least partially through spatial restriction of CREB activity: basal targets are regulated primarily in nonneuronal tissues, while memory targets are enriched for neuronal expression, emanating from CREB activity in AIM neurons. This suite of novel memory-associated genes will provide a platform for the discovery of orthologous mammalian long-term memory components. PMID:25611510
Memory for radio advertisements: the effect of program and typicality.

PubMed

Martín-Luengo, Beatriz; Luna, Karlos; Migueles, Malen

2013-01-01

We examined the influence of the type of radio program on the memory for radio advertisements. We also investigated the role in memory of the typicality (high or low) of the elements of the products advertised. Participants listened to three types of programs (interesting, boring, enjoyable) with two advertisements embedded in each. After completing a filler task, the participants performed a true/false recognition test. Hits and false alarm rates were higher for the interesting and enjoyable programs than for the boring one. There were also more hits and false alarms for the high-typicality elements. The response criterion for the advertisements embedded in the boring program was stricter than for the advertisements in other types of programs. We conclude that the type of program in which an advertisement is inserted and the nature of the elements of the advertisement affect both the number of hits and false alarms and the response criterion, but not the accuracy of the memory.
Array processor architecture

NASA Technical Reports Server (NTRS)

Barnes, George H. (Inventor); Lundstrom, Stephen F. (Inventor); Shafer, Philip E. (Inventor)

1983-01-01

A high speed parallel array data processing architecture fashioned under a computational envelope approach includes a data base memory for secondary storage of programs and data, and a plurality of memory modules interconnected to a plurality of processing modules by a connection network of the Omega gender. Programs and data are fed from the data base memory to the plurality of memory modules and from hence the programs are fed through the connection network to the array of processors (one copy of each program for each processor). Execution of the programs occur with the processors operating normally quite independently of each other in a multiprocessing fashion. For data dependent operations and other suitable operations, all processors are instructed to finish one given task or program branch before all are instructed to proceed in parallel processing fashion on the next instruction. Even when functioning in the parallel processing mode however, the processors are not locked-step but execute their own copy of the program individually unless or until another overall processor array synchronization instruction is issued.
Electrically programmable-erasable In-Ga-Zn-O thin-film transistor memory with atomic-layer-deposited Al{sub 2}O{sub 3}/Pt nanocrystals/Al{sub 2}O{sub 3} gate stack

DOE Office of Scientific and Technical Information (OSTI.GOV)

Qian, Shi-Bing; Zhang, Wen-Peng; Liu, Wen-Jun

Amorphous indium-gallium-zinc oxide (a-IGZO) thin-film transistor (TFT) memory is very promising for transparent and flexible system-on-panel displays; however, electrical erasability has always been a severe challenge for this memory. In this article, we demonstrated successfully an electrically programmable-erasable memory with atomic-layer-deposited Al{sub 2}O{sub 3}/Pt nanocrystals/Al{sub 2}O{sub 3} gate stack under a maximal processing temperature of 300 {sup o}C. As the programming voltage was enhanced from 14 to 19 V for a constant pulse of 0.2 ms, the threshold voltage shift increased significantly from 0.89 to 4.67 V. When the programmed device was subjected to an appropriate pulse under negative gatemore » bias, it could return to the original state with a superior erasing efficiency. The above phenomena could be attributed to Fowler-Nordheim tunnelling of electrons from the IGZO channel to the Pt nanocrystals during programming, and inverse tunnelling of the trapped electrons during erasing. In terms of 0.2-ms programming at 16 V and 350-ms erasing at −17 V, a large memory window of 3.03 V was achieved successfully. Furthermore, the memory exhibited stable repeated programming/erasing (P/E) characteristics and good data retention, i.e., for 2-ms programming at 14 V and 250-ms erasing at −14 V, a memory window of 2.08 V was still maintained after 10{sup 3} P/E cycles, and a memory window of 1.1 V was retained after 10{sup 5} s retention time.« less
Simulation of n-qubit quantum systems. V. Quantum measurements

NASA Astrophysics Data System (ADS)

Radtke, T.; Fritzsche, S.

2010-02-01

The FEYNMAN program has been developed during the last years to support case studies on the dynamics and entanglement of n-qubit quantum registers. Apart from basic transformations and (gate) operations, it currently supports a good number of separability criteria and entanglement measures, quantum channels as well as the parametrizations of various frequently applied objects in quantum information theory, such as (pure and mixed) quantum states, hermitian and unitary matrices or classical probability distributions. With the present update of the FEYNMAN program, we provide a simple access to (the simulation of) quantum measurements. This includes not only the widely-applied projective measurements upon the eigenspaces of some given operator but also single-qubit measurements in various pre- and user-defined bases as well as the support for two-qubit Bell measurements. In addition, we help perform generalized and POVM measurements. Knowing the importance of measurements for many quantum information protocols, e.g., one-way computing, we hope that this update makes the FEYNMAN code an attractive and versatile tool for both, research and education. New version program summaryProgram title: FEYNMAN Catalogue identifier: ADWE_v5_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADWE_v5_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 27 210 No. of bytes in distributed program, including test data, etc.: 1 960 471 Distribution format: tar.gz Programming language: Maple 12 Computer: Any computer with Maple software installed Operating system: Any system that supports Maple; the program has been tested under Microsoft Windows XP and Linux Classification: 4.15 Catalogue identifier of previous version: ADWE_v4_0 Journal reference of previous version: Comput. Phys. Commun. 179 (2008) 647 Does the new version supersede the previous version?: Yes Nature of problem: During the last decade, the field of quantum information science has largely contributed to our understanding of quantum mechanics, and has provided also new and efficient protocols that are used on quantum entanglement. To further analyze the amount and transfer of entanglement in n-qubit quantum protocols, symbolic and numerical simulations need to be handled efficiently. Solution method: Using the computer algebra system Maple, we developed a set of procedures in order to support the definition, manipulation and analysis of n-qubit quantum registers. These procedures also help to deal with (unitary) logic gates and (nonunitary) quantum operations and measurements that act upon the quantum registers. All commands are organized in a hierarchical order and can be used interactively in order to simulate and analyze the evolution of n-qubit quantum systems, both in ideal and noisy quantum circuits. Reasons for new version: Until the present, the FEYNMAN program supported the basic data structures and operations of n-qubit quantum registers [1], a good number of separability and entanglement measures [2], quantum operations (noisy channels) [3] as well as the parametrizations of various frequently applied objects, such as (pure and mixed) quantum states, hermitian and unitary matrices or classical probability distributions [4]. With the current extension, we here add all necessary features to simulate quantum measurements, including the projective measurements in various single-qubit and the two-qubit Bell basis, and POVM measurements. Together with the previously implemented functionality, this greatly enhances the possibilities of analyzing quantum information protocols in which measurements play a central role, e.g., one-way computation. Running time: Most commands require ⩽10 seconds of processor time on a Pentium 4 processor with ⩾2 GHz RAM or newer, if they work with quantum registers with five or less qubits. Moreover, about 5-20 MB of working memory is typically needed (in addition to the memory for the Maple environment itself). However, especially when working with symbolic expressions, the requirements on the CPU time and memory critically depend on the size of the quantum registers owing to the exponential growth of the dimension of the associated Hilbert space. For example, complex (symbolic) noise models, i.e. with several Kraus operators, may result in very large expressions that dramatically slow down the evaluation of e.g. distance measures or the final-state entropy, etc. In these cases, Maple's assume facility sometimes helps to reduce the complexity of the symbolic expressions, but more often than not only a numerical evaluation is feasible. Since the various commands can be applied to quite different scenarios, no general scaling rule can be given for the CPU time or the request of memory. References:[1] T. Radtke, S. Fritzsche, Comput. Phys. Commun. 173 (2005) 91.[2] T. Radtke, S. Fritzsche, Comput. Phys. Commun. 175 (2006) 145.[3] T. Radtke, S. Fritzsche, Comput. Phys. Commun. 176 (2007) 617.[4] T. Radtke, S. Fritzsche, Comput. Phys. Commun. 179 (2008) 647.
Degree of vertical integration between the undergraduate program and clinical internship with respect to lumbopelvic diagnostic and therapeutic procedures taught at the canadian memorial chiropractic college.

PubMed

Vermet, Shannon; McGinnis, Karen; Boodham, Melissa; Gleberzon, Brian J

2010-01-01

The objective of this study was to determine to what extent the diagnostic and therapeutic procedures taught in the undergraduate program used for patients with lumbopelvic conditions are expected to be utilized by students during their clinical internship program at Canadian Memorial Chiropractic College or are being used by the clinical faculty. A confidential survey was distributed to clinical faculty at the college. It consisted of a list of diagnostic and therapeutic procedures used for lumbopelvic conditions taught at that college. Clinicians were asked to indicate the frequency with which they performed or they required students to perform each item. Seventeen of 23 clinicians responded. The following procedures were most likely required to be performed by clinicians: posture; ranges of motion; lower limb sensory, motor, and reflex testing; and core orthopedic tests. The following were less likely to be required to be performed: Waddell testing, Schober's test, Gillet tests, and abdominal palpation. Students were expected to perform (or clinicians performed) most of the mobilization (in particular, iliocostal, iliotransverse, and iliofemoral) and spinal manipulative therapies (in particular, the procedures referred to as the lumbar roll, lumbar pull/hook, and upper sacroiliac) taught at the college. This study suggests that there was considerable, but not complete, vertical integration between the undergraduate and clinical education program at this college.
Human memory CD8 T cell effector potential is epigenetically preserved during in vivo homeostasis.

PubMed

Abdelsamed, Hossam A; Moustaki, Ardiana; Fan, Yiping; Dogra, Pranay; Ghoneim, Hazem E; Zebley, Caitlin C; Triplett, Brandon M; Sekaly, Rafick-Pierre; Youngblood, Ben

2017-06-05

Antigen-independent homeostasis of memory CD8 T cells is vital for sustaining long-lived T cell-mediated immunity. In this study, we report that maintenance of human memory CD8 T cell effector potential during in vitro and in vivo homeostatic proliferation is coupled to preservation of acquired DNA methylation programs. Whole-genome bisulfite sequencing of primary human naive, short-lived effector memory (T EM ), and longer-lived central memory (T CM ) and stem cell memory (T SCM ) CD8 T cells identified effector molecules with demethylated promoters and poised for expression. Effector-loci demethylation was heritably preserved during IL-7- and IL-15-mediated in vitro cell proliferation. Conversely, cytokine-driven proliferation of T CM and T SCM memory cells resulted in phenotypic conversion into T EM cells and was coupled to increased methylation of the CCR7 and Tcf7 loci. Furthermore, haploidentical donor memory CD8 T cells undergoing in vivo proliferation in lymphodepleted recipients also maintained their effector-associated demethylated status but acquired T EM -associated programs. These data demonstrate that effector-associated epigenetic programs are preserved during cytokine-driven subset interconversion of human memory CD8 T cells. © 2017 Abdelsamed et al.
Working Memory Training Does Not Improve Performance on Measures of Intelligence or Other Measures of “Far Transfer”

PubMed Central

Melby-Lervåg, Monica; Redick, Thomas S.; Hulme, Charles

2016-01-01

It has been claimed that working memory training programs produce diverse beneficial effects. This article presents a meta-analysis of working memory training studies (with a pretest-posttest design and a control group) that have examined transfer to other measures (nonverbal ability, verbal ability, word decoding, reading comprehension, or arithmetic; 87 publications with 145 experimental comparisons). Immediately following training there were reliable improvements on measures of intermediate transfer (verbal and visuospatial working memory). For measures of far transfer (nonverbal ability, verbal ability, word decoding, reading comprehension, arithmetic) there was no convincing evidence of any reliable improvements when working memory training was compared with a treated control condition. Furthermore, mediation analyses indicated that across studies, the degree of improvement on working memory measures was not related to the magnitude of far-transfer effects found. Finally, analysis of publication bias shows that there is no evidential value from the studies of working memory training using treated controls. The authors conclude that working memory training programs appear to produce short-term, specific training effects that do not generalize to measures of “real-world” cognitive skills. These results seriously question the practical and theoretical importance of current computerized working memory programs as methods of training working memory skills. PMID:27474138
A portable MPI-based parallel vector template library

NASA Technical Reports Server (NTRS)

Sheffler, Thomas J.

1995-01-01

This paper discusses the design and implementation of a polymorphic collection library for distributed address-space parallel computers. The library provides a data-parallel programming model for C++ by providing three main components: a single generic collection class, generic algorithms over collections, and generic algebraic combining functions. Collection elements are the fourth component of a program written using the library and may be either of the built-in types of C or of user-defined types. Many ideas are borrowed from the Standard Template Library (STL) of C++, although a restricted programming model is proposed because of the distributed address-space memory model assumed. Whereas the STL provides standard collections and implementations of algorithms for uniprocessors, this paper advocates standardizing interfaces that may be customized for different parallel computers. Just as the STL attempts to increase programmer productivity through code reuse, a similar standard for parallel computers could provide programmers with a standard set of algorithms portable across many different architectures. The efficacy of this approach is verified by examining performance data collected from an initial implementation of the library running on an IBM SP-2 and an Intel Paragon.
A Portable MPI-Based Parallel Vector Template Library

NASA Technical Reports Server (NTRS)

Sheffler, Thomas J.

1995-01-01

This paper discusses the design and implementation of a polymorphic collection library for distributed address-space parallel computers. The library provides a data-parallel programming model for C + + by providing three main components: a single generic collection class, generic algorithms over collections, and generic algebraic combining functions. Collection elements are the fourth component of a program written using the library and may be either of the built-in types of c or of user-defined types. Many ideas are borrowed from the Standard Template Library (STL) of C++, although a restricted programming model is proposed because of the distributed address-space memory model assumed. Whereas the STL provides standard collections and implementations of algorithms for uniprocessors, this paper advocates standardizing interfaces that may be customized for different parallel computers. Just as the STL attempts to increase programmer productivity through code reuse, a similar standard for parallel computers could provide programmers with a standard set of algorithms portable across many different architectures. The efficacy of this approach is verified by examining performance data collected from an initial implementation of the library running on an IBM SP-2 and an Intel Paragon.
Parallel computation and the Basis system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Smith, G.R.

1992-12-16

A software package has been written that can facilitate efforts to develop powerful, flexible, and easy-to-use programs that can run in single-processor, massively parallel, and distributed computing environments. Particular attention has been given to the difficulties posed by a program consisting of many science packages that represent subsystems of a complicated, coupled system. Methods have been found to maintain independence of the packages by hiding data structures without increasing the communication costs in a parallel computing environment. Concepts developed in this work are demonstrated by a prototype program that uses library routines from two existing software systems, Basis and Parallelmore » Virtual Machine (PVM). Most of the details of these libraries have been encapsulated in routines and macros that could be rewritten for alternative libraries that possess certain minimum capabilities. The prototype software uses a flexible master-and-slaves paradigm for parallel computation and supports domain decomposition with message passing for partitioning work among slaves. Facilities are provided for accessing variables that are distributed among the memories of slaves assigned to subdomains. The software is named PROTOPAR.« less
Parallel computation and the basis system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Smith, G.R.

1993-05-01

A software package has been written that can facilitate efforts to develop powerful, flexible, and easy-to use programs that can run in single-processor, massively parallel, and distributed computing environments. Particular attention has been given to the difficulties posed by a program consisting of many science packages that represent subsystems of a complicated, coupled system. Methods have been found to maintain independence of the packages by hiding data structures without increasing the communications costs in a parallel computing environment. Concepts developed in this work are demonstrated by a prototype program that uses library routines from two existing software systems, Basis andmore » Parallel Virtual Machine (PVM). Most of the details of these libraries have been encapsulated in routines and macros that could be rewritten for alternative libraries that possess certain minimum capabilities. The prototype software uses a flexible master-and-slaves paradigm for parallel computation and supports domain decomposition with message passing for partitioning work among slaves. Facilities are provided for accessing variables that are distributed among the memories of slaves assigned to subdomains. The software is named PROTOPAR.« less

A Wideband Fast Multipole Method for the two-dimensional complex Helmholtz equation

NASA Astrophysics Data System (ADS)

Cho, Min Hyung; Cai, Wei

2010-12-01

A Wideband Fast Multipole Method (FMM) for the 2D Helmholtz equation is presented. It can evaluate the interactions between N particles governed by the fundamental solution of 2D complex Helmholtz equation in a fast manner for a wide range of complex wave number k, which was not easy with the original FMM due to the instability of the diagonalized conversion operator. This paper includes the description of theoretical backgrounds, the FMM algorithm, software structures, and some test runs. Program summaryProgram title: 2D-WFMM Catalogue identifier: AEHI_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEHI_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 4636 No. of bytes in distributed program, including test data, etc.: 82 582 Distribution format: tar.gz Programming language: C Computer: Any Operating system: Any operating system with gcc version 4.2 or newer Has the code been vectorized or parallelized?: Multi-core processors with shared memory RAM: Depending on the number of particles N and the wave number k Classification: 4.8, 4.12 External routines: OpenMP ( http://openmp.org/wp/) Nature of problem: Evaluate interaction between N particles governed by the fundamental solution of 2D Helmholtz equation with complex k. Solution method: Multilevel Fast Multipole Algorithm in a hierarchical quad-tree structure with cutoff level which combines low frequency method and high frequency method. Running time: Depending on the number of particles N, wave number k, and number of cores in CPU. CPU time increases as N log N.
LORES: Low resolution shape program for the calculation of small angle scattering profiles for biological macromolecules in solution

NASA Astrophysics Data System (ADS)

Zhou, J.; Deyhim, A.; Krueger, S.; Gregurick, S. K.

2005-08-01

A program for determining the low resolution shape of biological macromolecules, based on the optimization of a small angle neutron scattering profile to experimental data, is presented. This program, termed LORES, relies on a Monte Carlo optimization procedure and will allow for multiple scattering length densities of complex structures. It is therefore more versatile than utilizing a form factor approach to produce low resolution structural models. LORES is easy to compile and use, and allows for structural modeling of biological samples in real time. To illustrate the effectiveness and versatility of the program, we present four specific biological examples, Apoferritin (shell model), Ribonuclease S (ellipsoidal model), a 10-mer dsDNA (duplex helix) and a construct of a 10-mer DNA/PNA duplex helix (heterogeneous structure). These examples are taken from protein and nucleic acid SANS studies, of both large and small scale structures. We find, in general, that our program will accurately reproduce the geometric shape of a given macromolecule, when compared with the known crystallographic structures. We also present results to illustrate the lower limit of the experimental resolution which the LORES program is capable of modeling. Program summaryTitle of program:LORES Catalogue identifier: ADVC Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADVC Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Computer:SGI Origin200, SGI Octane, SGI Linux, Intel Pentium PC Operating systems:UNIX64 6.5 and LINUX 2.4.7 Programming language used:C Memory required to execute with typical data:8 MB No. of lines in distributed program, including test data, etc.:2270 No. of bytes in distributed program, including test data, etc.:13 302 Distribution format:tar.gz External subprograms used:The entire code must be linked with the MATH library
45 CFR 2490.149 - Program accessibility: Discrimination prohibited.

Code of Federal Regulations, 2014 CFR

2014-10-01

.... 2490.149 Section 2490.149 Public Welfare Regulations Relating to Public Welfare (Continued) JAMES MADISON MEMORIAL FELLOWSHIP FOUNDATION ENFORCEMENT OF NONDISCRIMINATION ON THE BASIS OF HANDICAP IN PROGRAMS OR ACTIVITIES CONDUCTED BY THE JAMES MADISON MEMORIAL FELLOWSHIP FOUNDATION § 2490.149 Program...
45 CFR 2490.149 - Program accessibility: Discrimination prohibited.

Code of Federal Regulations, 2013 CFR

2013-10-01

.... 2490.149 Section 2490.149 Public Welfare Regulations Relating to Public Welfare (Continued) JAMES MADISON MEMORIAL FELLOWSHIP FOUNDATION ENFORCEMENT OF NONDISCRIMINATION ON THE BASIS OF HANDICAP IN PROGRAMS OR ACTIVITIES CONDUCTED BY THE JAMES MADISON MEMORIAL FELLOWSHIP FOUNDATION § 2490.149 Program...
45 CFR 2490.149 - Program accessibility: Discrimination prohibited.

Code of Federal Regulations, 2012 CFR

2012-10-01

.... 2490.149 Section 2490.149 Public Welfare Regulations Relating to Public Welfare (Continued) JAMES MADISON MEMORIAL FELLOWSHIP FOUNDATION ENFORCEMENT OF NONDISCRIMINATION ON THE BASIS OF HANDICAP IN PROGRAMS OR ACTIVITIES CONDUCTED BY THE JAMES MADISON MEMORIAL FELLOWSHIP FOUNDATION § 2490.149 Program...
45 CFR 2490.149 - Program accessibility: Discrimination prohibited.

Code of Federal Regulations, 2011 CFR

2011-10-01

.... 2490.149 Section 2490.149 Public Welfare Regulations Relating to Public Welfare (Continued) JAMES MADISON MEMORIAL FELLOWSHIP FOUNDATION ENFORCEMENT OF NONDISCRIMINATION ON THE BASIS OF HANDICAP IN PROGRAMS OR ACTIVITIES CONDUCTED BY THE JAMES MADISON MEMORIAL FELLOWSHIP FOUNDATION § 2490.149 Program...
45 CFR 2490.149 - Program accessibility: Discrimination prohibited.

Code of Federal Regulations, 2010 CFR

2010-10-01

.... 2490.149 Section 2490.149 Public Welfare Regulations Relating to Public Welfare (Continued) JAMES MADISON MEMORIAL FELLOWSHIP FOUNDATION ENFORCEMENT OF NONDISCRIMINATION ON THE BASIS OF HANDICAP IN PROGRAMS OR ACTIVITIES CONDUCTED BY THE JAMES MADISON MEMORIAL FELLOWSHIP FOUNDATION § 2490.149 Program...
Bim controls IL-15 availability and limits engagement of multiple BH3-only proteins

PubMed Central

Kurtulus, S; Sholl, A; Toe, J; Tripathi, P; Raynor, J; Li, K-P; Pellegrini, M; Hildeman, D A

2015-01-01

During the effector CD8+ T-cell response, transcriptional differentiation programs are engaged that promote effector T cells with varying memory potential. Although these differentiation programs have been used to explain which cells die as effectors and which cells survive and become memory cells, it is unclear if the lack of cell death enhances memory. Here, we investigated effector CD8+ T-cell fate in mice whose death program has been largely disabled because of the loss of Bim. Interestingly, the absence of Bim resulted in a significant enhancement of effector CD8+ T cells with more memory potential. Bim-driven control of memory T-cell development required T-cell-specific, but not dendritic cell-specific, expression of Bim. Both total and T-cell-specific loss of Bim promoted skewing toward memory precursors, by enhancing the survival of memory precursors, and limiting the availability of IL-15. Decreased IL-15 availability in Bim-deficient mice facilitated the elimination of cells with less memory potential via the additional pro-apoptotic molecules Noxa and Puma. Combined, these data show that Bim controls memory development by limiting the survival of pre-memory effector cells. Further, by preventing the consumption of IL-15, Bim limits the role of Noxa and Puma in causing the death of effector cells with less memory potential. PMID:25124553
Bim controls IL-15 availability and limits engagement of multiple BH3-only proteins.

PubMed

Kurtulus, S; Sholl, A; Toe, J; Tripathi, P; Raynor, J; Li, K-P; Pellegrini, M; Hildeman, D A

2015-01-01

During the effector CD8+ T-cell response, transcriptional differentiation programs are engaged that promote effector T cells with varying memory potential. Although these differentiation programs have been used to explain which cells die as effectors and which cells survive and become memory cells, it is unclear if the lack of cell death enhances memory. Here, we investigated effector CD8+ T-cell fate in mice whose death program has been largely disabled because of the loss of Bim. Interestingly, the absence of Bim resulted in a significant enhancement of effector CD8+ T cells with more memory potential. Bim-driven control of memory T-cell development required T-cell-specific, but not dendritic cell-specific, expression of Bim. Both total and T-cell-specific loss of Bim promoted skewing toward memory precursors, by enhancing the survival of memory precursors, and limiting the availability of IL-15. Decreased IL-15 availability in Bim-deficient mice facilitated the elimination of cells with less memory potential via the additional pro-apoptotic molecules Noxa and Puma. Combined, these data show that Bim controls memory development by limiting the survival of pre-memory effector cells. Further, by preventing the consumption of IL-15, Bim limits the role of Noxa and Puma in causing the death of effector cells with less memory potential.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Lee, Juhee; Lee, Sungpyo; Lee, Moo Hyung

Quasi-unipolar non-volatile organic transistor memory (NOTM) can combine the best characteristics of conventional unipolar and ambipolar NOTMs and, as a result, exhibit improved device performance. Unipolar NOTMs typically exhibit a large signal ratio between the programmed and erased current signals but also require a large voltage to program and erase the memory cells. Meanwhile, an ambipolar NOTM can be programmed and erased at lower voltages, but the resulting signal ratio is small. By embedding a discontinuous n-type fullerene layer within a p-type pentacene film, quasi-unipolar NOTMs are fabricated, of which the signal storage utilizes both electrons and holes while themore » electrical signal relies on only hole conduction. These devices exhibit superior memory performance relative to both pristine unipolar pentacene devices and ambipolar fullerene/pentacene bilayer devices. The quasi-unipolar NOTM exhibited a larger signal ratio between the programmed and erased states while also reducing the voltage required to program and erase a memory cell. This simple approach should be readily applicable for various combinations of advanced organic semiconductors that have been recently developed and thereby should make a significant impact on organic memory research.« less
Two alternate proofs of Wang's lune formula for sparse distributed memory and an integral approximation

NASA Technical Reports Server (NTRS)

Jaeckel, Louis A.

1988-01-01

In Kanerva's Sparse Distributed Memory, writing to and reading from the memory are done in relation to spheres in an n-dimensional binary vector space. Thus it is important to know how many points are in the intersection of two spheres in this space. Two proofs are given of Wang's formula for spheres of unequal radii, and an integral approximation for the intersection in this case.
Injecting Artificial Memory Errors Into a Running Computer Program

NASA Technical Reports Server (NTRS)

Bornstein, Benjamin J.; Granat, Robert A.; Wagstaff, Kiri L.

2008-01-01

Single-event upsets (SEUs) or bitflips are computer memory errors caused by radiation. BITFLIPS (Basic Instrumentation Tool for Fault Localized Injection of Probabilistic SEUs) is a computer program that deliberately injects SEUs into another computer program, while the latter is running, for the purpose of evaluating the fault tolerance of that program. BITFLIPS was written as a plug-in extension of the open-source Valgrind debugging and profiling software. BITFLIPS can inject SEUs into any program that can be run on the Linux operating system, without needing to modify the program s source code. Further, if access to the original program source code is available, BITFLIPS offers fine-grained control over exactly when and which areas of memory (as specified via program variables) will be subjected to SEUs. The rate of injection of SEUs is controlled by specifying either a fault probability or a fault rate based on memory size and radiation exposure time, in units of SEUs per byte per second. BITFLIPS can also log each SEU that it injects and, if program source code is available, report the magnitude of effect of the SEU on a floating-point value or other program variable.
Sparse distributed memory: Principles and operation

NASA Technical Reports Server (NTRS)

Flynn, M. J.; Kanerva, P.; Bhadkamkar, N.

1989-01-01

Sparse distributed memory is a generalized random access memory (RAM) for long (1000 bit) binary words. Such words can be written into and read from the memory, and they can also be used to address the memory. The main attribute of the memory is sensitivity to similarity, meaning that a word can be read back not only by giving the original write address but also by giving one close to it as measured by the Hamming distance between addresses. Large memories of this kind are expected to have wide use in speech recognition and scene analysis, in signal detection and verification, and in adaptive control of automated equipment, in general, in dealing with real world information in real time. The memory can be realized as a simple, massively parallel computer. Digital technology has reached a point where building large memories is becoming practical. Major design issues were resolved which were faced in building the memories. The design is described of a prototype memory with 256 bit addresses and from 8 to 128 K locations for 256 bit words. A key aspect of the design is extensive use of dynamic RAM and other standard components.
Implementation of a fully-balanced periodic tridiagonal solver on a parallel distributed memory architecture

NASA Technical Reports Server (NTRS)

Eidson, T. M.; Erlebacher, G.

1994-01-01

While parallel computers offer significant computational performance, it is generally necessary to evaluate several programming strategies. Two programming strategies for a fairly common problem - a periodic tridiagonal solver - are developed and evaluated. Simple model calculations as well as timing results are presented to evaluate the various strategies. The particular tridiagonal solver evaluated is used in many computational fluid dynamic simulation codes. The feature that makes this algorithm unique is that these simulation codes usually require simultaneous solutions for multiple right-hand-sides (RHS) of the system of equations. Each RHS solutions is independent and thus can be computed in parallel. Thus a Gaussian elimination type algorithm can be used in a parallel computation and the more complicated approaches such as cyclic reduction are not required. The two strategies are a transpose strategy and a distributed solver strategy. For the transpose strategy, the data is moved so that a subset of all the RHS problems is solved on each of the several processors. This usually requires significant data movement between processor memories across a network. The second strategy attempts to have the algorithm allow the data across processor boundaries in a chained manner. This usually requires significantly less data movement. An approach to accomplish this second strategy in a near-perfect load-balanced manner is developed. In addition, an algorithm will be shown to directly transform a sequential Gaussian elimination type algorithm into the parallel chained, load-balanced algorithm.
Benefits of a Classroom Based Instrumental Music Program on Verbal Memory of Primary School Children: A Longitudinal Study

ERIC Educational Resources Information Center

Rickard, Nikki S.; Vasquez, Jorge T.; Murphy, Fintan; Gill, Anneliese; Toukhsati, Samia R.

2010-01-01

Previous research has demonstrated a benefit of music training on a number of cognitive functions including verbal memory performance. The impact of school-based music programs on memory processes is however relatively unknown. The current study explored the effect of increasing frequency and intensity of classroom-based instrumental training…
45 CFR 2490.151 - Program accessibility: New construction and alterations.

Code of Federal Regulations, 2012 CFR

2012-10-01

... alterations. 2490.151 Section 2490.151 Public Welfare Regulations Relating to Public Welfare (Continued) JAMES MADISON MEMORIAL FELLOWSHIP FOUNDATION ENFORCEMENT OF NONDISCRIMINATION ON THE BASIS OF HANDICAP IN PROGRAMS OR ACTIVITIES CONDUCTED BY THE JAMES MADISON MEMORIAL FELLOWSHIP FOUNDATION § 2490.151 Program...
45 CFR 2490.151 - Program accessibility: New construction and alterations.

Code of Federal Regulations, 2013 CFR

2013-10-01

... alterations. 2490.151 Section 2490.151 Public Welfare Regulations Relating to Public Welfare (Continued) JAMES MADISON MEMORIAL FELLOWSHIP FOUNDATION ENFORCEMENT OF NONDISCRIMINATION ON THE BASIS OF HANDICAP IN PROGRAMS OR ACTIVITIES CONDUCTED BY THE JAMES MADISON MEMORIAL FELLOWSHIP FOUNDATION § 2490.151 Program...
45 CFR 2490.151 - Program accessibility: New construction and alterations.

Code of Federal Regulations, 2014 CFR

2014-10-01

... alterations. 2490.151 Section 2490.151 Public Welfare Regulations Relating to Public Welfare (Continued) JAMES MADISON MEMORIAL FELLOWSHIP FOUNDATION ENFORCEMENT OF NONDISCRIMINATION ON THE BASIS OF HANDICAP IN PROGRAMS OR ACTIVITIES CONDUCTED BY THE JAMES MADISON MEMORIAL FELLOWSHIP FOUNDATION § 2490.151 Program...
45 CFR 2490.151 - Program accessibility: New construction and alterations.

Code of Federal Regulations, 2011 CFR

2011-10-01

... alterations. 2490.151 Section 2490.151 Public Welfare Regulations Relating to Public Welfare (Continued) JAMES MADISON MEMORIAL FELLOWSHIP FOUNDATION ENFORCEMENT OF NONDISCRIMINATION ON THE BASIS OF HANDICAP IN PROGRAMS OR ACTIVITIES CONDUCTED BY THE JAMES MADISON MEMORIAL FELLOWSHIP FOUNDATION § 2490.151 Program...
45 CFR 2490.151 - Program accessibility: New construction and alterations.

Code of Federal Regulations, 2010 CFR

2010-10-01

... alterations. 2490.151 Section 2490.151 Public Welfare Regulations Relating to Public Welfare (Continued) JAMES MADISON MEMORIAL FELLOWSHIP FOUNDATION ENFORCEMENT OF NONDISCRIMINATION ON THE BASIS OF HANDICAP IN PROGRAMS OR ACTIVITIES CONDUCTED BY THE JAMES MADISON MEMORIAL FELLOWSHIP FOUNDATION § 2490.151 Program...

Are memory traces localized or distributed?

PubMed

Thompson, R F

1991-01-01

Evidence supports the view that "memory traces" are formed in the hippocampus and in the cerebellum in classical conditioning of discrete behavioral responses (e.g. eyeblink conditioning). In the hippocampus, learning results in long-lasting increases in excitability of pyramidal neurons that appear to be localized to these neurons (i.e. changes in membrane properties and receptor function). However, these learning-altered pyramidal neurons are distributed widely throughout CA3 and CA1. Although it plays a key role in certain aspects of classical conditioning, the hippocampus is not necessary for learning and memory of the basic conditioned responses. The cerebellum and its associated brain stem circuitry, on the other hand, does appear to be essential (necessary and sufficient) for learning and memory of the conditioned response. Evidence to date is most consistent with a localized trace in the interpositus nucleus and multiple localized traces in cerebellar cortex, each involving relatively large ensembles of neurons. Perhaps "procedural" memory traces are relatively localized and "declarative" traces more widely distributed.
Distributed Saturation

NASA Technical Reports Server (NTRS)

Chung, Ming-Ying; Ciardo, Gianfranco; Siminiceanu, Radu I.

2007-01-01

The Saturation algorithm for symbolic state-space generation, has been a recent break-through in the exhaustive veri cation of complex systems, in particular globally-asyn- chronous/locally-synchronous systems. The algorithm uses a very compact Multiway Decision Diagram (MDD) encoding for states and the fastest symbolic exploration algo- rithm to date. The distributed version of Saturation uses the overall memory available on a network of workstations (NOW) to efficiently spread the memory load during the highly irregular exploration. A crucial factor in limiting the memory consumption during the symbolic state-space generation is the ability to perform garbage collection to free up the memory occupied by dead nodes. However, garbage collection over a NOW requires a nontrivial communication overhead. In addition, operation cache policies become critical while analyzing large-scale systems using the symbolic approach. In this technical report, we develop a garbage collection scheme and several operation cache policies to help on solving extremely complex systems. Experiments show that our schemes improve the performance of the original distributed implementation, SmArTNow, in terms of time and memory efficiency.
What we remember affects how we see: spatial working memory steers saccade programming.

PubMed

Wong, Jason H; Peterson, Matthew S

2013-02-01

Relationships between visual attention, saccade programming, and visual working memory have been hypothesized for over a decade. Awh, Jonides, and Reuter-Lorenz (Journal of Experimental Psychology: Human Perception and Performance 24(3):780-90, 1998) and Awh et al. (Psychological Science 10(5):433-437, 1999) proposed that rehearsing a location in memory also leads to enhanced attentional processing at that location. In regard to eye movements, Belopolsky and Theeuwes (Attention, Perception & Psychophysics 71(3):620-631, 2009) found that holding a location in working memory affects saccade programming, albeit negatively. In three experiments, we attempted to replicate the findings of Belopolsky and Theeuwes (Attention, Perception & Psychophysics 71(3):620-631, 2009) and determine whether the spatial memory effect can occur in other saccade-cuing paradigms, including endogenous central arrow cues and exogenous irrelevant singletons. In the first experiment, our results were the opposite of those in Belopolsky and Theeuwes (Attention, Perception & Psychophysics 71(3):620-631, 2009), in that we found facilitation (shorter saccade latencies) instead of inhibition when the saccade target matched the region in spatial working memory. In Experiment 2, we sought to determine whether the spatial working memory effect would generalize to other endogenous cuing tasks, such as a central arrow that pointed to one of six possible peripheral locations. As in Experiment 1, we found that saccade programming was facilitated when the cued location coincided with the saccade target. In Experiment 3, we explored how spatial memory interacts with other types of cues, such as a peripheral color singleton target or irrelevant onset. In both cases, the eyes were more likely to go to either singleton when it coincided with the location held in spatial working memory. On the basis of these results, we conclude that spatial working memory and saccade programming are likely to share common overlapping circuitry.
What’s working in working memory training? An educational perspective

PubMed Central

Redick, Thomas S.; Shipstead, Zach; Wiemers, Elizabeth A.; Melby-Lervåg, Monica; Hulme, Charles

2015-01-01

Working memory training programs have generated great interest, with claims that the training interventions can have profound beneficial effects on children’s academic and intellectual attainment. We describe the criteria by which to evaluate evidence for or against the benefit of working memory training. Despite the promising results of initial research studies, the current review of all of the available evidence of working memory training efficacy is less optimistic. Our conclusion is that working memory training produces limited benefits in terms of specific gains on short-term and working memory tasks that are very similar to the training programs, but no advantage for academic and achievement-based reading and arithmetic outcomes. PMID:26640352
Sparse distributed memory: understanding the speed and robustness of expert memory

PubMed Central

Brogliato, Marcelo S.; Chada, Daniel M.; Linhares, Alexandre

2014-01-01

How can experts, sometimes in exacting detail, almost immediately and very precisely recall memory items from a vast repertoire? The problem in which we will be interested concerns models of theoretical neuroscience that could explain the speed and robustness of an expert's recollection. The approach is based on Sparse Distributed Memory, which has been shown to be plausible, both in a neuroscientific and in a psychological manner, in a number of ways. A crucial characteristic concerns the limits of human recollection, the “tip-of-tongue” memory event—which is found at a non-linearity in the model. We expand the theoretical framework, deriving an optimization formula to solve this non-linearity. Numerical results demonstrate how the higher frequency of rehearsal, through work or study, immediately increases the robustness and speed associated with expert memory. PMID:24808842
Regular Latin Dancing and Health Education may Improve Cognition of Late Middle-Aged and Older Latinos

PubMed Central

Marquez, David X.; Wilson, Robert; Aguiñaga, Susan; Vásquez, Priscilla; Fogg, Louis; Yang, Zhi; Wilbur, JoEllen; Hughes, Susan; Spanbauer, Charles

2017-01-01

Disparities exist between Latinos and non-Latino whites in cognitive function. Dance is culturally appropriate and challenges individuals physically and cognitively, yet the impact of regular dancing on cognitive function in older Latinos has not been examined. A two-group pilot trial was employed among inactive, older Latinos. Participants (N = 57) participated in the BAILAMOS© dance program or a health education program. Cognitive test scores were converted to z-scores and measures of global cognition and specific domains (executive function, episodic memory, working memory) were derived. Results revealed a group × time interaction for episodic memory (p<0.05), such that the dance group showed greater improvement in episodic memory than the health education group. A main effect for time for global cognition (p<0.05) was also demonstrated, with participants in both groups improving. Structured Latin dance programs can positively influence episodic memory; and participation in structured programs may improve overall cognition among older Latinos. PMID:28095105
Regular Latin Dancing and Health Education May Improve Cognition of Late Middle-Aged and Older Latinos.

PubMed

Marquez, David X; Wilson, Robert; Aguiñaga, Susan; Vásquez, Priscilla; Fogg, Louis; Yang, Zhi; Wilbur, JoEllen; Hughes, Susan; Spanbauer, Charles

2017-07-01

Disparities exist between Latinos and non-Latino Whites in cognitive function. Dance is culturally appropriate and challenges individuals physically and cognitively, yet the impact of regular dancing on cognitive function in older Latinos has not been examined. A two-group pilot trial was employed among inactive, older Latinos. Participants (N = 57) participated in the BAILAMOS © dance program or a health education program. Cognitive test scores were converted to z-scores and measures of global cognition and specific domains (executive function, episodic memory, working memory) were derived. Results revealed a group × time interaction for episodic memory (p < .05), such that the dance group showed greater improvement in episodic memory than the health education group. A main effect for time for global cognition (p < .05) was also demonstrated, with participants in both groups improving. Structured Latin dance programs can positively influence episodic memory, and participation in structured programs may improve overall cognition among older Latinos.
Thermodynamic Model of Spatial Memory

NASA Astrophysics Data System (ADS)

Kaufman, Miron; Allen, P.

1998-03-01

We develop and test a thermodynamic model of spatial memory. Our model is an application of statistical thermodynamics to cognitive science. It is related to applications of the statistical mechanics framework in parallel distributed processes research. Our macroscopic model allows us to evaluate an entropy associated with spatial memory tasks. We find that older adults exhibit higher levels of entropy than younger adults. Thurstone's Law of Categorical Judgment, according to which the discriminal processes along the psychological continuum produced by presentations of a single stimulus are normally distributed, is explained by using a Hooke spring model of spatial memory. We have also analyzed a nonlinear modification of the ideal spring model of spatial memory. This work is supported by NIH/NIA grant AG09282-06.
Large-scale hydropower system optimization using dynamic programming and object-oriented programming: the case of the Northeast China Power Grid.

PubMed

Li, Ji-Qing; Zhang, Yu-Shan; Ji, Chang-Ming; Wang, Ai-Jing; Lund, Jay R

2013-01-01

This paper examines long-term optimal operation using dynamic programming for a large hydropower system of 10 reservoirs in Northeast China. Besides considering flow and hydraulic head, the optimization explicitly includes time-varying electricity market prices to maximize benefit. Two techniques are used to reduce the 'curse of dimensionality' of dynamic programming with many reservoirs. Discrete differential dynamic programming (DDDP) reduces the search space and computer memory needed. Object-oriented programming (OOP) and the ability to dynamically allocate and release memory with the C++ language greatly reduces the cumulative effect of computer memory for solving multi-dimensional dynamic programming models. The case study shows that the model can reduce the 'curse of dimensionality' and achieve satisfactory results.
Forensic Analysis of Window’s(Registered) Virtual Memory Incorporating the System’s Page-File

DTIC Science & Technology

2008-12-01

Management and Budget, Paperwork Reduction Project (0704-0188) Washington DC 20503. 1. AGENCY USE ONLY (Leave blank) 2. REPORT DATE December...data in a meaningful way. One reason for this is how memory is managed by the operating system. Data belonging to one process can be distributed...way. One reason for this is how memory is managed by the operating system. Data belonging to one process can be distributed arbitrarily across
Distributed practice can boost evaluative conditioning by increasing memory for the stimulus pairs.

PubMed

Richter, Jasmin; Gast, Anne

2017-09-01

When presenting a neutral stimulus (CS) in close temporal and spatial proximity to a positive or negative stimulus (US) the former is often observed to adopt the valence of the latter, a phenomenon named evaluative conditioning (EC). It is already well established that under most conditions, contingency awareness is important for an EC effect to occur. In addition to that, some findings suggest that awareness of the stimulus pairs is not only relevant during the learning phase, but that it is also relevant whether memory for the pairings is still available during the measurement phase. As previous research has shown that memory is better after temporally distributed than after contiguous (massed) repetitions, it seems plausible that also EC effects are moderated by distributed practice manipulations. This was tested in the current studies. In two experiments with successful distributed practice manipulations on memory, we show that also the magnitude of the EC effect was larger for pairs learned under spaced compared to massed conditions. Both effects, on memory and on EC, are found after a within-participant and after a between-participant manipulation. However, we did not find significant differences in the EC effect for different conditions of spaced practice. These findings are in line with the assumption that EC is based on similar processes as memory for the pairings. Copyright © 2017 Elsevier B.V. All rights reserved.
Autobiographical Memory in Semantic Dementia: A Longitudinal fMRI Study

ERIC Educational Resources Information Center

Maguire, Eleanor A.; Kumaran, Dharshan; Hassabis, Demis; Kopelman, Michael D.

2010-01-01

Whilst patients with semantic dementia (SD) are known to suffer from semantic memory and language impairments, there is less agreement about whether memory for personal everyday experiences, autobiographical memory, is compromised. In healthy individuals, functional MRI (fMRI) has helped to delineate a consistent and distributed brain network…
A model of attention-guided visual perception and recognition.

PubMed

Rybak, I A; Gusakova, V I; Golovan, A V; Podladchikova, L N; Shevtsova, N A

1998-08-01

A model of visual perception and recognition is described. The model contains: (i) a low-level subsystem which performs both a fovea-like transformation and detection of primary features (edges), and (ii) a high-level subsystem which includes separated 'what' (sensory memory) and 'where' (motor memory) structures. Image recognition occurs during the execution of a 'behavioral recognition program' formed during the primary viewing of the image. The recognition program contains both programmed attention window movements (stored in the motor memory) and predicted image fragments (stored in the sensory memory) for each consecutive fixation. The model shows the ability to recognize complex images (e.g. faces) invariantly with respect to shift, rotation and scale.
Working Memory Training Does Not Improve Performance on Measures of Intelligence or Other Measures of "Far Transfer": Evidence From a Meta-Analytic Review.

PubMed

Melby-Lervåg, Monica; Redick, Thomas S; Hulme, Charles

2016-07-01

It has been claimed that working memory training programs produce diverse beneficial effects. This article presents a meta-analysis of working memory training studies (with a pretest-posttest design and a control group) that have examined transfer to other measures (nonverbal ability, verbal ability, word decoding, reading comprehension, or arithmetic; 87 publications with 145 experimental comparisons). Immediately following training there were reliable improvements on measures of intermediate transfer (verbal and visuospatial working memory). For measures of far transfer (nonverbal ability, verbal ability, word decoding, reading comprehension, arithmetic) there was no convincing evidence of any reliable improvements when working memory training was compared with a treated control condition. Furthermore, mediation analyses indicated that across studies, the degree of improvement on working memory measures was not related to the magnitude of far-transfer effects found. Finally, analysis of publication bias shows that there is no evidential value from the studies of working memory training using treated controls. The authors conclude that working memory training programs appear to produce short-term, specific training effects that do not generalize to measures of "real-world" cognitive skills. These results seriously question the practical and theoretical importance of current computerized working memory programs as methods of training working memory skills. © The Author(s) 2016.
Efficient High Performance Collective Communication for Distributed Memory Environments

ERIC Educational Resources Information Center

Ali, Qasim

2009-01-01

Collective communication allows efficient communication and synchronization among a collection of processes, unlike point-to-point communication that only involves a pair of communicating processes. Achieving high performance for both kernels and full-scale applications running on a distributed memory system requires an efficient implementation of…
Reusable and Extensible High Level Data Distributions

NASA Technical Reports Server (NTRS)

Diaconescu, Roxana E.; Chamberlain, Bradford; James, Mark L.; Zima, Hans P.

2005-01-01

This paper presents a reusable design of a data distribution framework for data parallel high performance applications. We are implementing the design in the context of the Chapel high productivity programming language. Distributions in Chapel are a means to express locality in systems composed of large numbers of processor and memory components connected by a network. Since distributions have a great effect on,the performance of applications, it is important that the distribution strategy can be chosen by a user. At the same time, high productivity concerns require that the user is shielded from error-prone, tedious details such as communication and synchronization. We propose an approach to distributions that enables the user to refine a language-provided distribution type and adjust it to optimize the performance of the application. Additionally, we conceal from the user low-level communication and synchronization details to increase productivity. To emphasize the generality of our distribution machinery, we present its abstract design in the form of a design pattern, which is independent of a concrete implementation. To illustrate the applicability of our distribution framework design, we outline the implementation of data distributions in terms of the Chapel language.
Effects of a School-Based Instrumental Music Program on Verbal and Visual Memory in Primary School Children: A Longitudinal Study

PubMed Central

Roden, Ingo; Kreutz, Gunter; Bongard, Stephan

2012-01-01

This study examined the effects of a school-based instrumental training program on the development of verbal and visual memory skills in primary school children. Participants either took part in a music program with weekly 45 min sessions of instrumental lessons in small groups at school, or they received extended natural science training. A third group of children did not receive additional training. Each child completed verbal and visual memory tests three times over a period of 18 months. Significant Group by Time interactions were found in the measures of verbal memory. Children in the music group showed greater improvements than children in the control groups after controlling for children’s socio-economic background, age, and IQ. No differences between groups were found in the visual memory tests. These findings are consistent with and extend previous research by suggesting that children receiving music training may benefit from improvements in their verbal memory skills. PMID:23267341
SciSpark's SRDD : A Scientific Resilient Distributed Dataset for Multidimensional Data

NASA Astrophysics Data System (ADS)

Palamuttam, R. S.; Wilson, B. D.; Mogrovejo, R. M.; Whitehall, K. D.; Mattmann, C. A.; McGibbney, L. J.; Ramirez, P.

2015-12-01

Remote sensing data and climate model output are multi-dimensional arrays of massive sizes locked away in heterogeneous file formats (HDF5/4, NetCDF 3/4) and metadata models (HDF-EOS, CF) making it difficult to perform multi-stage, iterative science processing since each stage requires writing and reading data to and from disk. We have developed SciSpark, a robust Big Data framework, that extends ApacheTM Spark for scaling scientific computations. Apache Spark improves the map-reduce implementation in ApacheTM Hadoop for parallel computing on a cluster, by emphasizing in-memory computation, "spilling" to disk only as needed, and relying on lazy evaluation. Central to Spark is the Resilient Distributed Dataset (RDD), an in-memory distributed data structure that extends the functional paradigm provided by the Scala programming language. However, RDDs are ideal for tabular or unstructured data, and not for highly dimensional data. The SciSpark project introduces the Scientific Resilient Distributed Dataset (sRDD), a distributed-computing array structure which supports iterative scientific algorithms for multidimensional data. SciSpark processes data stored in NetCDF and HDF files by partitioning them across time or space and distributing the partitions among a cluster of compute nodes. We show usability and extensibility of SciSpark by implementing distributed algorithms for geospatial operations on large collections of multi-dimensional grids. In particular we address the problem of scaling an automated method for finding Mesoscale Convective Complexes. SciSpark provides a tensor interface to support the pluggability of different matrix libraries. We evaluate performance of the various matrix libraries in distributed pipelines, such as Nd4jTM and BreezeTM. We detail the architecture and design of SciSpark, our efforts to integrate climate science algorithms, parallel ingest and partitioning (sharding) of A-Train satellite observations from model grids. These solutions are encompassed in SciSpark, an open-source software framework for distributed computing on scientific data.
Effect of virtual memory on efficient solution of two model problems

NASA Technical Reports Server (NTRS)

Lambiotte, J. J., Jr.

1977-01-01

Computers with virtual memory architecture allow programs to be written as if they were small enough to be contained in memory. Two types of problems are investigated to show that this luxury can lead to quite an inefficient performance if the programmer does not interact strongly with the characteristics of the operating system when developing the program. The two problems considered are the simultaneous solutions of a large linear system of equations by Gaussian elimination and a model three-dimensional finite-difference problem. The Control Data STAR-100 computer runs are made to demonstrate the inefficiencies of programming the problems in the manner one would naturally do if the problems were indeed, small enough to be contained in memory. Program redesigns are presented which achieve large improvements in performance through changes in the computational procedure and the data base arrangement.
Multiprocessor architecture: Synthesis and evaluation

NASA Technical Reports Server (NTRS)

Standley, Hilda M.

1990-01-01

Multiprocessor computed architecture evaluation for structural computations is the focus of the research effort described. Results obtained are expected to lead to more efficient use of existing architectures and to suggest designs for new, application specific, architectures. The brief descriptions given outline a number of related efforts directed toward this purpose. The difficulty is analyzing an existing architecture or in designing a new computer architecture lies in the fact that the performance of a particular architecture, within the context of a given application, is determined by a number of factors. These include, but are not limited to, the efficiency of the computation algorithm, the programming language and support environment, the quality of the program written in the programming language, the multiplicity of the processing elements, the characteristics of the individual processing elements, the interconnection network connecting processors and non-local memories, and the shared memory organization covering the spectrum from no shared memory (all local memory) to one global access memory. These performance determiners may be loosely classified as being software or hardware related. This distinction is not clear or even appropriate in many cases. The effect of the choice of algorithm is ignored by assuming that the algorithm is specified as given. Effort directed toward the removal of the effect of the programming language and program resulted in the design of a high-level parallel programming language. Two characteristics of the fundamental structure of the architecture (memory organization and interconnection network) are examined.

Memory protection

NASA Technical Reports Server (NTRS)

Denning, Peter J.

1988-01-01

Accidental overwriting of files or of memory regions belonging to other programs, browsing of personal files by superusers, Trojan horses, and viruses are examples of breakdowns in workstations and personal computers that would be significantly reduced by memory protection. Memory protection is the capability of an operating system and supporting hardware to delimit segments of memory, to control whether segments can be read from or written into, and to confine accesses of a program to its segments alone. The absence of memory protection in many operating systems today is the result of a bias toward a narrow definition of performance as maximum instruction-execution rate. A broader definition, including the time to get the job done, makes clear that cost of recovery from memory interference errors reduces expected performance. The mechanisms of memory protection are well understood, powerful, efficient, and elegant. They add to performance in the broad sense without reducing instruction execution rate.
OCEAN-PC and a distributed network for ocean data

NASA Technical Reports Server (NTRS)

Mclain, Douglas R.

1992-01-01

The Intergovernmental Oceanographic Commission (IOC) wishes to develop an integrated software package for oceanographic data entry and access in developing countries. The software, called 'OCEAN-PC', would run on low cost PC microcomputers and would encourage and standardize: (1) entry of local ocean observations; (2) quality control of the local data; (3) merging local data with historical data; (4) improved display and analysis of the merged data; and (5) international data exchange. OCEAN-PC will link existing MS-DOS oceanographic programs and data sets with table-driven format conversions. Since many ocean data sets are now being distributed on optical discs (Compact Discs - Read Only Memory, CD-ROM, Mass et al. 1987), OCEAN-PC will emphasize access to CD-ROMs.
Efficient packing of patterns in sparse distributed memory by selective weighting of input bits

NASA Technical Reports Server (NTRS)

Kanerva, Pentti

1991-01-01

When a set of patterns is stored in a distributed memory, any given storage location participates in the storage of many patterns. From the perspective of any one stored pattern, the other patterns act as noise, and such noise limits the memory's storage capacity. The more similar the retrieval cues for two patterns are, the more the patterns interfere with each other in memory, and the harder it is to separate them on retrieval. A method is described of weighting the retrieval cues to reduce such interference and thus to improve the separability of patterns that have similar cues.
Avalanches and generalized memory associativity in a network model for conscious and unconscious mental functioning

NASA Astrophysics Data System (ADS)

Siddiqui, Maheen; Wedemann, Roseli S.; Jensen, Henrik Jeldtoft

2018-01-01

We explore statistical characteristics of avalanches associated with the dynamics of a complex-network model, where two modules corresponding to sensorial and symbolic memories interact, representing unconscious and conscious mental processes. The model illustrates Freud's ideas regarding the neuroses and that consciousness is related with symbolic and linguistic memory activity in the brain. It incorporates the Stariolo-Tsallis generalization of the Boltzmann Machine in order to model memory retrieval and associativity. In the present work, we define and measure avalanche size distributions during memory retrieval, in order to gain insight regarding basic aspects of the functioning of these complex networks. The avalanche sizes defined for our model should be related to the time consumed and also to the size of the neuronal region which is activated, during memory retrieval. This allows the qualitative comparison of the behaviour of the distribution of cluster sizes, obtained during fMRI measurements of the propagation of signals in the brain, with the distribution of avalanche sizes obtained in our simulation experiments. This comparison corroborates the indication that the Nonextensive Statistical Mechanics formalism may indeed be more well suited to model the complex networks which constitute brain and mental structure.
Automatic Parallelization of Numerical Python Applications using the Global Arrays Toolkit

DOE Office of Scientific and Technical Information (OSTI.GOV)

Daily, Jeffrey A.; Lewis, Robert R.

2011-11-30

Global Arrays is a software system from Pacific Northwest National Laboratory that enables an efficient, portable, and parallel shared-memory programming interface to manipulate distributed dense arrays. The NumPy module is the de facto standard for numerical calculation in the Python programming language, a language whose use is growing rapidly in the scientific and engineering communities. NumPy provides a powerful N-dimensional array class as well as other scientific computing capabilities. However, like the majority of the core Python modules, NumPy is inherently serial. Using a combination of Global Arrays and NumPy, we have reimplemented NumPy as a distributed drop-in replacement calledmore » Global Arrays in NumPy (GAiN). Serial NumPy applications can become parallel, scalable GAiN applications with only minor source code changes. Scalability studies of several different GAiN applications will be presented showing the utility of developing serial NumPy codes which can later run on more capable clusters or supercomputers.« less
Biology-Inspired Explorers for Space Systems

NASA Astrophysics Data System (ADS)

Ramohalli, Kumar; Lozano, Peter; Furfaro, Roberto

2002-01-01

Building upon three innovative technologies, each of which received a NTR award from NASA, a specific explorer is described. This "robot" does away with conventional gears, levers, pulleys,.... And uses "Muscle Materials" instead; these shape-memory materials, formerly in the Nickel-Titanium family, but now in the much wider class of ElectroActivePolymers(EAP), have the ability to precisely respond to pre"programmed" shape changes upon application of an electrical input. Of course, the pre"programs" are at the molecular level, much like in biological systems. Another important feature is the distributed power. That is, the power use in the "limbs" is distributed, so that if one "limb" should fail, the others can still function. The robot has been built and demonstrated to the media (newspapers and television). The fundamental control aspects are currently being worked upon, and we expect to have a more complete mathematical description of its operation. Future plans, and specific applications for reliable planetary exploration will be outlined.
An alternative design for a sparse distributed memory

NASA Technical Reports Server (NTRS)

Jaeckel, Louis A.

1989-01-01

A new design for a Sparse Distributed Memory, called the selected-coordinate design, is described. As in the original design, there are a large number of memory locations, each of which may be activated by many different addresses (binary vectors) in a very large address space. Each memory location is defined by specifying ten selected coordinates (bit positions in the address vectors) and a set of corresponding assigned values, consisting of one bit for each selected coordinate. A memory location is activated by an address if, for all ten of the locations's selected coordinates, the corresponding bits in the address vector match the respective assigned value bits, regardless of the other bits in the address vector. Some comparative memory capacity and signal-to-noise ratio estimates for the both the new and original designs are given. A few possible hardware embodiments of the new design are described.
Autobiographical memory distributions for negative self-images: memories are organised around negative as well as positive aspects of identity.

PubMed

Rathbone, Clare J; Steel, Craig

2015-01-01

The relationship between developmental experiences, and an individual's emerging beliefs about themselves and the world, is central to many forms of psychotherapy. People suffering from a variety of mental health problems have been shown to use negative memories when defining the self; however, little is known about how these negative memories might be organised and relate to negative self-images. In two online studies with middle-aged (N = 18; study 1) and young (N = 56; study 2) adults, we found that participants' negative self-images (e.g., I am a failure) were associated with sets of autobiographical memories that formed clustered distributions around times of self-formation, in much the same pattern as for positive self-images (e.g., I am talented). This novel result shows that highly organised sets of salient memories may be responsible for perpetuating negative beliefs about the self. Implications for therapy are discussed.
Sparse distributed memory and related models

NASA Technical Reports Server (NTRS)

Kanerva, Pentti

1992-01-01

Described here is sparse distributed memory (SDM) as a neural-net associative memory. It is characterized by two weight matrices and by a large internal dimension - the number of hidden units is much larger than the number of input or output units. The first matrix, A, is fixed and possibly random, and the second matrix, C, is modifiable. The SDM is compared and contrasted to (1) computer memory, (2) correlation-matrix memory, (3) feet-forward artificial neural network, (4) cortex of the cerebellum, (5) Marr and Albus models of the cerebellum, and (6) Albus' cerebellar model arithmetic computer (CMAC). Several variations of the basic SDM design are discussed: the selected-coordinate and hyperplane designs of Jaeckel, the pseudorandom associative neural memory of Hassoun, and SDM with real-valued input variables by Prager and Fallside. SDM research conducted mainly at the Research Institute for Advanced Computer Science (RIACS) in 1986-1991 is highlighted.
Persistently active neurons in human medial frontal and medial temporal lobe support working memory

PubMed Central

Kamiński, J; Sullivan, S; Chung, JM; Ross, IB; Mamelak, AN; Rutishauser, U

2017-01-01

Persistent neural activity is a putative mechanism for the maintenance of working memories. Persistent activity relies on the activity of a distributed network of areas, but the differential contribution of each area remains unclear. We recorded single neurons in the human medial frontal cortex and the medial temporal lobe while subjects held up to three items in memory. We found persistently active neurons in both areas. Persistent activity of hippocampal and amygdala neurons was stimulus-specific, formed stable attractors, and was predictive of memory content. Medial frontal cortex persistent activity, on the other hand, was modulated by memory load and task set but was not stimulus-specific. Trial-by-trial variability in persistent activity in both areas was related to memory strength, because it predicted the speed and accuracy by which stimuli were remembered. This work reveals, in humans, direct evidence for a distributed network of persistently active neurons supporting working memory maintenance. PMID:28218914
Feature-Based Visual Short-Term Memory Is Widely Distributed and Hierarchically Organized.

PubMed

Dotson, Nicholas M; Hoffman, Steven J; Goodell, Baldwin; Gray, Charles M

2018-06-15

Feature-based visual short-term memory is known to engage both sensory and association cortices. However, the extent of the participating circuit and the neural mechanisms underlying memory maintenance is still a matter of vigorous debate. To address these questions, we recorded neuronal activity from 42 cortical areas in monkeys performing a feature-based visual short-term memory task and an interleaved fixation task. We find that task-dependent differences in firing rates are widely distributed throughout the cortex, while stimulus-specific changes in firing rates are more restricted and hierarchically organized. We also show that microsaccades during the memory delay encode the stimuli held in memory and that units modulated by microsaccades are more likely to exhibit stimulus specificity, suggesting that eye movements contribute to visual short-term memory processes. These results support a framework in which most cortical areas, within a modality, contribute to mnemonic representations at timescales that increase along the cortical hierarchy. Copyright © 2018 Elsevier Inc. All rights reserved.
RSM 1.0 - A RESUPPLY SCHEDULER USING INTEGER OPTIMIZATION

NASA Technical Reports Server (NTRS)

Viterna, L. A.

1994-01-01

RSM, Resupply Scheduling Modeler, is a fully menu-driven program that uses integer programming techniques to determine an optimum schedule for replacing components on or before the end of a fixed replacement period. Although written to analyze the electrical power system on the Space Station Freedom, RSM is quite general and can be used to model the resupply of almost any system subject to user-defined resource constraints. RSM is based on a specific form of the general linear programming problem in which all variables in the objective function and all variables in the constraints are integers. While more computationally intensive, integer programming was required for accuracy when modeling systems with small quantities of components. Input values for component life cane be real numbers, RSM converts them to integers by dividing the lifetime by the period duration, then reducing the result to the next lowest integer. For each component, there is a set of constraints that insure that it is replaced before its lifetime expires. RSM includes user-defined constraints such as transportation mass and volume limits, as well as component life, available repair crew time and assembly sequences. A weighting factor allows the program to minimize factors such as cost. The program then performs an iterative analysis, which is displayed during the processing. A message gives the first period in which resources are being exceeded on each iteration. If the scheduling problem is unfeasible, the final message will also indicate the first period in which resources were exceeded. RSM is written in APL2 for IBM PC series computers and compatibles. A stand-alone executable version of RSM is provided; however, this is a "packed" version of RSM which can only utilize the memory within the 640K DOS limit. This executable requires at least 640K of memory and DOS 3.1 or higher. Source code for an APL2/PC workspace version is also provided. This version of RSM can make full use of any installed extended memory but must be run with the APL2 interpreter; and it requires an 80486 based microcomputer or an 80386 based microcomputer with an 80387 math coprocessor, at least 2Mb of extended memory, and DOS 3.3 or higher. The standard distribution medium for this package is one 5.25 inch 360K MS-DOS format diskette. RSM was developed in 1991. APL2 and IBM PC are registered trademarks of International Business Machines Corporation. MS-DOS is a registered trademark of Microsoft Corporation.
Learning to read aloud: A neural network approach using sparse distributed memory

NASA Technical Reports Server (NTRS)

Joglekar, Umesh Dwarkanath

1989-01-01

An attempt to solve a problem of text-to-phoneme mapping is described which does not appear amenable to solution by use of standard algorithmic procedures. Experiments based on a model of distributed processing are also described. This model (sparse distributed memory (SDM)) can be used in an iterative supervised learning mode to solve the problem. Additional improvements aimed at obtaining better performance are suggested.
Confessions of a robot lobotomist

NASA Technical Reports Server (NTRS)

Gottshall, R. Marc

1994-01-01

Since its inception, numerically controlled (NC) machining methods have been used throughout the aerospace industry to mill, drill, and turn complex shapes by sequentially stepping through motion programs. However, the recent demand for more precision, faster feeds, exotic sensors, and branching execution have existing computer numerical control (CNC) and distributed numerical control (DNC) systems running at maximum controller capacity. Typical disadvantages of current CNC's include fixed memory capacities, limited communication ports, and the use of multiple control languages. The need to tailor CNC's to meet specific applications, whether it be expanded memory, additional communications, or integrated vision, often requires replacing the original controller supplied with the commercial machine tool with a more powerful and capable system. This paper briefly describes the process and equipment requirements for new controllers and their evolutionary implementation in an aerospace environment. The process of controller retrofit with currently available machines is examined, along with several case studies and their computational and architectural implications.
A sample implementation for parallelizing Divide-and-Conquer algorithms on the GPU.

PubMed

Mei, Gang; Zhang, Jiayin; Xu, Nengxiong; Zhao, Kunyang

2018-01-01

The strategy of Divide-and-Conquer (D&C) is one of the frequently used programming patterns to design efficient algorithms in computer science, which has been parallelized on shared memory systems and distributed memory systems. Tzeng and Owens specifically developed a generic paradigm for parallelizing D&C algorithms on modern Graphics Processing Units (GPUs). In this paper, by following the generic paradigm proposed by Tzeng and Owens, we provide a new and publicly available GPU implementation of the famous D&C algorithm, QuickHull, to give a sample and guide for parallelizing D&C algorithms on the GPU. The experimental results demonstrate the practicality of our sample GPU implementation. Our research objective in this paper is to present a sample GPU implementation of a classical D&C algorithm to help interested readers to develop their own efficient GPU implementations with fewer efforts.
The Science of Computing: Virtual Memory

NASA Technical Reports Server (NTRS)

Denning, Peter J.

1986-01-01

In the March-April issue, I described how a computer's storage system is organized as a hierarchy consisting of cache, main memory, and secondary memory (e.g., disk). The cache and main memory form a subsystem that functions like main memory but attains speeds approaching cache. What happens if a program and its data are too large for the main memory? This is not a frivolous question. Every generation of computer users has been frustrated by insufficient memory. A new line of computers may have sufficient storage for the computations of its predecessor, but new programs will soon exhaust its capacity. In 1960, a longrange planning committee at MIT dared to dream of a computer with 1 million words of main memory. In 1985, the Cray-2 was delivered with 256 million words. Computational physicists dream of computers with 1 billion words. Computer architects have done an outstanding job of enlarging main memories yet they have never kept up with demand. Only the shortsighted believe they can.
Sequential associative memory with nonuniformity of the layer sizes.

PubMed

Teramae, Jun-Nosuke; Fukai, Tomoki

2007-01-01

Sequence retrieval has a fundamental importance in information processing by the brain, and has extensively been studied in neural network models. Most of the previous sequential associative memory embedded sequences of memory patterns have nearly equal sizes. It was recently shown that local cortical networks display many diverse yet repeatable precise temporal sequences of neuronal activities, termed "neuronal avalanches." Interestingly, these avalanches displayed size and lifetime distributions that obey power laws. Inspired by these experimental findings, here we consider an associative memory model of binary neurons that stores sequences of memory patterns with highly variable sizes. Our analysis includes the case where the statistics of these size variations obey the above-mentioned power laws. We study the retrieval dynamics of such memory systems by analytically deriving the equations that govern the time evolution of macroscopic order parameters. We calculate the critical sequence length beyond which the network cannot retrieve memory sequences correctly. As an application of the analysis, we show how the present variability in sequential memory patterns degrades the power-law lifetime distribution of retrieved neural activities.
Nicotine Inhibits Memory CTL Programming

PubMed Central

Sun, Zhifeng; Smyth, Kendra; Garcia, Karla; Mattson, Elliot; Li, Lei; Xiao, Zhengguo

2013-01-01

Nicotine is the main tobacco component responsible for tobacco addiction and is used extensively in smoking and smoking cessation therapies. However, little is known about its effects on the immune system. We confirmed that multiple nicotinic receptors are expressed on mouse and human cytotoxic T lymphocytes (CTLs) and demonstrated that nicotinic receptors on mouse CTLs are regulated during activation. Acute nicotine presence during activation increases primary CTL expansion in vitro, but impairs in vivo expansion after transfer and subsequent memory CTL differentiation, which reduces protection against subsequent pathogen challenges. Furthermore, nicotine abolishes the regulatory effect of rapamycin on memory CTL programming, which can be attributed to the fact that rapamycin enhances expression of nicotinic receptors. Interestingly, naïve CTLs from chronic nicotine-treated mice have normal memory programming, which is impaired by nicotine during activation in vitro. In conclusion, simultaneous exposure to nicotine and antigen during CTL activation negatively affects memory development. PMID:23844169
‘tripleint_cc’: A program for 2-centre variational leptonic Coulomb potential matrix elements using Hylleraas-type trial functions, with a performance optimization study

NASA Astrophysics Data System (ADS)

Plummer, M.; Armour, E. A. G.; Todd, A. C.; Franklin, C. P.; Cooper, J. N.

2009-12-01

We present a program used to calculate intricate three-particle integrals for variational calculations of solutions to the leptonic Schrödinger equation with two nuclear centres in which inter-leptonic distances (electron-electron and positron-electron) are included directly in the trial functions. The program has been used so far in calculations of He-H¯ interactions and positron H 2 scattering, however the precisely defined integrals are applicable to other situations. We include a summary discussion of how the program has been optimized from a 'legacy'-type code to a more modern high-performance code with a performance improvement factor of up to 1000. Program summaryProgram title: tripleint.cc Catalogue identifier: AEEV_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEEV_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 12 829 No. of bytes in distributed program, including test data, etc.: 91 798 Distribution format: tar.gz Programming language: Fortran 95 (fixed format) Computer: Modern PC (tested on AMD processor) [1], IBM Power5 [2] Cray XT4 [3], similar Operating system: Red Hat Linux [1], IBM AIX [2], UNICOS [3] Has the code been vectorized or parallelized?: Serial (multi-core shared memory may be needed for some large jobs) RAM: Dependent on parameter sizes and option to use intermediate I/O. Estimates for practical use: 0.5-2 GBytes (with intermediate I/O); 1-4 GBytes (all-memory: the preferred option). Classification: 2.4, 2.6, 2.7, 2.9, 16.5, 16.10, 20 Nature of problem: The 'tripleint.cc' code evaluates three-particle integrals needed in certain variational (in particular: Rayleigh-Ritz and generalized-Kohn) matrix elements for solution of the Schrödinger equation with two fixed centres (the solutions may then be used in subsequent dynamic nuclear calculations). Specifically the integrals are defined by Eq. (16) in the main text and contain terms proportional to r×r/r,i≠j,i≠k,j≠k, with r the distance between leptons i and j. The article also briefly describes the performance optimizations used to increase the speed of evaluation of the integrals enough to allow detailed testing and mapping of the effect of varying non-linear parameters in the variational trial functions. Solution method: Each integral is solved using prolate spheroidal coordinates and series expansions (with cut-offs) of the many-lepton expressions. 1-d integrals and sub-integrals are solved analytically by various means (the program automatically chooses the most accurate of the available methods for each set of parameters and function arguments), while two of the three integrations over the prolate spheroidal coordinates ' λ' are carried out numerically. Many similar integrals with identical non-linear variational parameters may be calculated with one call of the code. Restrictions: There are limits to the number of points for the numerical integrations, to the cut-off variable itaumax for the many-lepton series expansions, and to the maximum powers of Slater-like input functions. For runs near the limit of the cut-off variable and with certain small-magnitude values of variational non-linear parameters, the code can require large amounts of memory (an option using some intermediate I/O is included to offset this). Unusual features: In addition to the program, we also present a summary description of the techniques and ideology used to optimize the code, together with accuracy tests and indications of performance improvement. Running time: The test runs take 1-15 minutes on HPCx [2] as indicated in Section 5 of the main text. A practical run with 729 integrals, 40 quadrature points per dimension and itaumax = 8 took 150 minutes on a PC (e.g., [1]): a similar run with 'medium' accuracy, e.g. for parameter optimization (see Section 2 of the main text), with 30 points per dimension and itaumax = 6 took 35 minutes. References:PC: Memory: 2.72 GB, CPU: AMD Opteron 246 dual-core, 2×2 GHz, OS: GNU/Linux, kernel: Linux 2.6.9-34.0.2.ELsmp. HPCx, IBM eServer 575 running IBM AIX, http://www.hpcx.ac.uk/ (visited May 2009). HECToR, CRAY XT4 running UNICOS/lc, http://www.hector.ac.uk/ (visited May 2009).
Patterns of particle distribution in multiparticle systems by random walks with memory enhancement and decay

NASA Astrophysics Data System (ADS)

Tan, Zhi-Jie; Zou, Xian-Wu; Huang, Sheng-You; Zhang, Wei; Jin, Zhun-Zhi

2002-07-01

We investigate the pattern of particle distribution and its evolution with time in multiparticle systems using the model of random walks with memory enhancement and decay. This model describes some biological intelligent walks. With decrease in the memory decay exponent α, the distribution of particles changes from a random dispersive pattern to a locally dense one, and then returns to the random one. Correspondingly, the fractal dimension Df,p characterizing the distribution of particle positions increases from a low value to a maximum and then decreases to the low one again. This is determined by the degree of overlap of regions consisting of sites with remanent information. The second moment of the density ρ(2) was introduced to investigate the inhomogeneity of the particle distribution. The dependence of ρ(2) on α is similar to that of Df,p on α. ρ(2) increases with time as a power law in the process of adjusting the particle distribution, and then ρ(2) tends to a stable equilibrium value.

Distributed state-space generation of discrete-state stochastic models

NASA Technical Reports Server (NTRS)

Ciardo, Gianfranco; Gluckman, Joshua; Nicol, David

1995-01-01

High-level formalisms such as stochastic Petri nets can be used to model complex systems. Analysis of logical and numerical properties of these models of ten requires the generation and storage of the entire underlying state space. This imposes practical limitations on the types of systems which can be modeled. Because of the vast amount of memory consumed, we investigate distributed algorithms for the generation of state space graphs. The distributed construction allows us to take advantage of the combined memory readily available on a network of workstations. The key technical problem is to find effective methods for on-the-fly partitioning, so that the state space is evenly distributed among processors. In this paper we report on the implementation of a distributed state-space generator that may be linked to a number of existing system modeling tools. We discuss partitioning strategies in the context of Petri net models, and report on performance observed on a network of workstations, as well as on a distributed memory multi-computer.
RIP-REMOTE INTERACTIVE PARTICLE-TRACER

NASA Technical Reports Server (NTRS)

Rogers, S. E.

1994-01-01

Remote Interactive Particle-tracing (RIP) is a distributed-graphics program which computes particle traces for computational fluid dynamics (CFD) solution data sets. A particle trace is a line which shows the path a massless particle in a fluid will take; it is a visual image of where the fluid is going. The program is able to compute and display particle traces at a speed of about one trace per second because it runs on two machines concurrently. The data used by the program is contained in two files. The solution file contains data on density, momentum and energy quantities of a flow field at discrete points in three-dimensional space, while the grid file contains the physical coordinates of each of the discrete points. RIP requires two computers. A local graphics workstation interfaces with the user for program control and graphics manipulation, and a remote machine interfaces with the solution data set and performs time-intensive computations. The program utilizes two machines in a distributed mode for two reasons. First, the data to be used by the program is usually generated on the supercomputer. RIP avoids having to convert and transfer the data, eliminating any memory limitations of the local machine. Second, as computing the particle traces can be computationally expensive, RIP utilizes the power of the supercomputer for this task. Although the remote site code was developed on a CRAY, it is possible to port this to any supercomputer class machine with a UNIX-like operating system. Integration of a velocity field from a starting physical location produces the particle trace. The remote machine computes the particle traces using the particle-tracing subroutines from PLOT3D/AMES, a CFD post-processing graphics program available from COSMIC (ARC-12779). These routines use a second-order predictor-corrector method to integrate the velocity field. Then the remote program sends graphics tokens to the local machine via a remote-graphics library. The local machine interprets the graphics tokens and draws the particle traces. The program is menu driven. RIP is implemented on the silicon graphics IRIS 3000 (local workstation) with an IRIX operating system and on the CRAY2 (remote station) with a UNICOS 1.0 or 2.0 operating system. The IRIS 4D can be used in place of the IRIS 3000. The program is written in C (67%) and FORTRAN 77 (43%) and has an IRIS memory requirement of 4 MB. The remote and local stations must use the same user ID. PLOT3D/AMES unformatted data sets are required for the remote machine. The program was developed in 1988.
New NAS Parallel Benchmarks Results

NASA Technical Reports Server (NTRS)

Yarrow, Maurice; Saphir, William; VanderWijngaart, Rob; Woo, Alex; Kutler, Paul (Technical Monitor)

1997-01-01

NPB2 (NAS (NASA Advanced Supercomputing) Parallel Benchmarks 2) is an implementation, based on Fortran and the MPI (message passing interface) message passing standard, of the original NAS Parallel Benchmark specifications. NPB2 programs are run with little or no tuning, in contrast to NPB vendor implementations, which are highly optimized for specific architectures. NPB2 results complement, rather than replace, NPB results. Because they have not been optimized by vendors, NPB2 implementations approximate the performance a typical user can expect for a portable parallel program on distributed memory parallel computers. Together these results provide an insightful comparison of the real-world performance of high-performance computers. New NPB2 features: New implementation (CG), new workstation class problem sizes, new serial sample versions, more performance statistics.
ORCA Project: Research on high-performance parallel computer programming environments. Final report, 1 Apr-31 Mar 90

DOE Office of Scientific and Technical Information (OSTI.GOV)

Snyder, L.; Notkin, D.; Adams, L.

1990-03-31

This task relates to research on programming massively parallel computers. Previous work on the Ensamble concept of programming was extended and investigation into nonshared memory models of parallel computation was undertaken. Previous work on the Ensamble concept defined a set of programming abstractions and was used to organize the programming task into three distinct levels; Composition of machine instruction, composition of processes, and composition of phases. It was applied to shared memory models of computations. During the present research period, these concepts were extended to nonshared memory models. During the present research period, one Ph D. thesis was completed, onemore » book chapter, and six conference proceedings were published.« less
Violence and sex impair memory for television ads.

PubMed

Bushman, Brad J; Bonacci, Angelica M

2002-06-01

Participants watched a violent, sexually explicit, or neutral TV program that contained 9 ads. Participants recalled the advertised brands. They also identified the advertised brands from slides of supermarket shelves. The next day, participants were telephoned and asked to recall again the advertised brands. Results showed better memory for people who saw the ads during a neutral program than for people who saw the ads during a violent or sexual program both immediately after exposure and 24 hr later. Violence and sex impaired memory for males and females of all ages, regardless of whether they liked programs containing violence and sex. These results suggest that sponsoring violent and sexually explicit TV programs might not be a profitable venture for advertisers.
Degree of Vertical Integration Between the Undergraduate Program and Clinical Internship with Respect to Lumbopelvic Diagnostic and Therapeutic Procedures Taught at the Canadian Memorial Chiropractic College

PubMed Central

Vermet, Shannon; McGinnis, Karen; Boodham, Melissa; Gleberzon, Brian J.

2010-01-01

Purpose: The objective of this study was to determine to what extent the diagnostic and therapeutic procedures taught in the undergraduate program used for patients with lumbopelvic conditions are expected to be utilized by students during their clinical internship program at Canadian Memorial Chiropractic College or are being used by the clinical faculty. Methods: A confidential survey was distributed to clinical faculty at the college. It consisted of a list of diagnostic and therapeutic procedures used for lumbopelvic conditions taught at that college. Clinicians were asked to indicate the frequency with which they performed or they required students to perform each item. Results: Seventeen of 23 clinicians responded. The following procedures were most likely required to be performed by clinicians: posture; ranges of motion; lower limb sensory, motor, and reflex testing; and core orthopedic tests. The following were less likely to be required to be performed: Waddell testing, Schober's test, Gillet tests, and abdominal palpation. Students were expected to perform (or clinicians performed) most of the mobilization (in particular, iliocostal, iliotransverse, and iliofemoral) and spinal manipulative therapies (in particular, the procedures referred to as the lumbar roll, lumbar pull/hook, and upper sacroiliac) taught at the college. Conclusion: This study suggests that there was considerable, but not complete, vertical integration between the undergraduate and clinical education program at this college. PMID:20480014
Episodic and Semantic Memories of a Residential Environmental Education Program

ERIC Educational Resources Information Center

Knapp, Doug; Benton, Gregory M.

2006-01-01

This study used a phenomenological approach to investigate the recollections of participants of an environmental education (EE) residential program. Ten students who participated in a residential EE program in the fall of 2001 were interviewed in the fall of 2002. Three major themes relating to the participants' long-term memory of the residential…
78 FR 37741 - Approval and Promulgation of Implementation Plans; California; South Coast; Contingency Measures...

Federal Register 2010, 2011, 2012, 2013, 2014

2013-06-24

.... Id. \\7\\ Consistent with EPA's definition of ``design value'' in 40 CFR 58.1, we use the term ``design... evaluations below. Carl Moyer Memorial Air Quality Standards Attainment Program The Contingency Measures SIP identifies a portion of the Carl Moyer Memorial Air Quality Standards Attainment Program (Carl Moyer Program...
Vascular system modeling in parallel environment - distributed and shared memory approaches

PubMed Central

Jurczuk, Krzysztof; Kretowski, Marek; Bezy-Wendling, Johanne

2011-01-01

The paper presents two approaches in parallel modeling of vascular system development in internal organs. In the first approach, new parts of tissue are distributed among processors and each processor is responsible for perfusing its assigned parts of tissue to all vascular trees. Communication between processors is accomplished by passing messages and therefore this algorithm is perfectly suited for distributed memory architectures. The second approach is designed for shared memory machines. It parallelizes the perfusion process during which individual processing units perform calculations concerning different vascular trees. The experimental results, performed on a computing cluster and multi-core machines, show that both algorithms provide a significant speedup. PMID:21550891
Effects of Cogmed working memory training on cognitive performance.

PubMed

Etherton, Joseph L; Oberle, Crystal D; Rhoton, Jayson; Ney, Ashley

2018-04-16

Research on the cognitive benefits of working memory training programs has produced inconsistent results. Such research has frequently used laboratory-specific training tasks, or dual-task n-back training. The current study used the commercial Cogmed Working Memory (WM) Training program, involving several different training tasks involving visual and auditory input. Healthy college undergraduates were assigned to either the full Cogmed training program of 25, 40-min training sessions; an abbreviated Cogmed program of 25, 20-min training sessions; or a no-contact control group. Pretest and posttest measures included multiple measures of attention, working memory, fluid intelligence, and executive functions. Although improvement was observed for the full training group for a digit span task, no training-related improvement was observed for any of the other measures. Results of the study suggest that WM training does not improve performance on unrelated tasks or enhance other cognitive abilities.
Command and Control Software Development Memory Management

NASA Technical Reports Server (NTRS)

Joseph, Austin Pope

2017-01-01

This internship was initially meant to cover the implementation of unit test automation for a NASA ground control project. As is often the case with large development projects, the scope and breadth of the internship changed. Instead, the internship focused on finding and correcting memory leaks and errors as reported by a COTS software product meant to track such issues. Memory leaks come in many different flavors and some of them are more benign than others. On the extreme end a program might be dynamically allocating memory and not correctly deallocating it when it is no longer in use. This is called a direct memory leak and in the worst case can use all the available memory and crash the program. If the leaks are small they may simply slow the program down which, in a safety critical system (a system for which a failure or design error can cause a risk to human life), is still unacceptable. The ground control system is managed in smaller sub-teams, referred to as CSCIs. The CSCI that this internship focused on is responsible for monitoring the health and status of the system. This team's software had several methods/modules that were leaking significant amounts of memory. Since most of the code in this system is safety-critical, correcting memory leaks is a necessity.
aCLIMAX 4.0.1, The new version of the software for analyzing and interpreting INS spectra

NASA Astrophysics Data System (ADS)

Ramirez-Cuesta, A. J.

2004-03-01

In Inelastic Neutron Scattering Spectroscopy, the neutron scattering intensity is plotted versus neutron energy loss giving a spectrum that looks like an infrared or a Raman spectrum. Unlike IR or Raman, INS does not have selection rules, i.e. all transitions are in principle observable. This particular characteristic makes INS a test bed for Density Functional Theory calculations of vibrational modes. aCLIMAX is the first user friendly program, within the Windows environment, that uses the output of normal modes to generate the calculated INS of the model molecule, making a lot easier to establish a connection between theory and experiment. Program summaryTitle of program: aCLIMAX 4.0.1 Catalogue identifier: ADSW Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADSW Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Operating systems: Windows 95 onwards, except Windows ME where it does not work Programming language used: Visual Basic Memory requirements: 64 MB No. of processors: 1 Has the code been parallelized: No No. of bytes in distributed program, including test data, etc.: 2 432 775 No. of lines in distributed program, including test data, etc.: 17 998 Distribution format: tar gzip file Nature of physical problem: Calculation of the Inelastic Neutron Scattering Spectra from DFT calculations of the vibrational density of states for molecules. Method of solution: INS spectral intensity calculated from normal modes analysis. Isolated molecule approximation. Typical time of running: From few seconds to few minutes depending on the size of the molecule. Unusual features of the program: Special care has to be taken in the case of computers that have different regional options than the English speaking countries, the decimal separator has to be set as "." (dot) instead of the usual "," (comma) that most countries use.
Does rural generalist focused medical school and family medicine training make a difference? Memorial University of Newfoundland outcomes.

PubMed

Rourke, James; Asghari, Shabnam; Hurley, Oliver; Ravalia, Mohamed; Jong, Michael; Graham, Wendy; Parsons, Wanda; Duggan, Norah; O'Keefe, Danielle; Moffatt, Scott; Stringer, Katherine; Sturge Sparkes, Carolyn; Hippe, Janelle; Harris Walsh, Kristin; McKay, Donald; Samarasena, Asoka

2018-03-01

Rural recruitment and retention of physicians is a global issue. The Faculty of Medicine at Memorial University of Newfoundland, Canada, was established as a rural-focused medical school with a social accountability mandate that aimed to meet the healthcare needs of a sparse population distributed over a large landmass as well as the needs of other rural and remote areas of Canada. This study aimed to assess whether Memorial medical degree (MD) and postgraduate (PG) programs were effective at producing physicians for their province and rural physicians for Canada compared with other Canadian medical schools. This retrospective cohort study included medical school graduates who completed their PG training between 2004 and 2013 in Canada. Practice locations of study subjects were georeferenced and assigned to three geographic classes: Large Urban; Small City/Town; and Rural. Analyses were performed at two levels. (1) Provincial level analysis compared Memorial PG graduates practicing where they received their MD and/or PG training with other medical schools who are the only medical school in their province (n=4). (2) National-level analysis compared Memorial PG graduates practicing in rural Canada with all other Canadian medical schools (n=16). Descriptive and bivariate analyses were performed. Overall, 18 766 physicians practicing in Canada completed Canadian PG training (2004-2013), and of those, 8091 (43%) completed Family Medicine (FM) training. Of all physicians completing Canadian PG training, 1254 (7%) physicians were practicing rurally and of those, 1076 were family physicians. There were 379 Memorial PG graduates and of those, 208 (55%) completed FM training and 72 (19%) were practicing rurally, and of those practicing rurally, 56 were family physicians. At the national level, the percentage of all Memorial PG graduates (19.0%) and FM PG graduates (26.9%) practicing rurally was significantly better than the national average for PG (6.4%, p<0.000) and FM (12.9%, p<0.000). Among 391 physicians practicing in Newfoundland and Labrador (NL), 257 (65.7%) were Memorial PG graduates and 247 (63.2%) were Memorial MD graduates. Of the 163 FM graduates, 148 (90.8%) were Memorial FM graduates and 118 (72.4%) were Memorial MD graduates. Of the 68 in rural practice, 51 (75.0%) were Memorial PG graduates and 31 (45.6%) were Memorial MD graduates. Of the 41 FM graduates in rural practice, 39 (95.1%) were Memorial FM graduates and 22 (53.7%) were Memorial MD graduates. Two-sample proportion tests demonstrated Memorial University provided a larger proportion of its provincial physician resource supply than the other four single provincial medical schools, by medical school MD for FM (72.4% vs 44.3%, p<0.000) and for overall (63.2% vs 43.5% p<0.000), and by medical school PG for FM (90.8 % vs 72.0%, p<0.000). This study found Memorial University graduates were more likely to establish practice in rural areas compared with the national average for most program types as well as more likely to establish practice in NL compared with other single medical schools' graduates in their provinces. This study highlights the impact a comprehensive rural-focused social accountability approach can have at supplying the needs of a population both at the regional and rural national levels.
Optimizing NEURON Simulation Environment Using Remote Memory Access with Recursive Doubling on Distributed Memory Systems.

PubMed

Shehzad, Danish; Bozkuş, Zeki

2016-01-01

Increase in complexity of neuronal network models escalated the efforts to make NEURON simulation environment efficient. The computational neuroscientists divided the equations into subnets amongst multiple processors for achieving better hardware performance. On parallel machines for neuronal networks, interprocessor spikes exchange consumes large section of overall simulation time. In NEURON for communication between processors Message Passing Interface (MPI) is used. MPI_Allgather collective is exercised for spikes exchange after each interval across distributed memory systems. The increase in number of processors though results in achieving concurrency and better performance but it inversely affects MPI_Allgather which increases communication time between processors. This necessitates improving communication methodology to decrease the spikes exchange time over distributed memory systems. This work has improved MPI_Allgather method using Remote Memory Access (RMA) by moving two-sided communication to one-sided communication, and use of recursive doubling mechanism facilitates achieving efficient communication between the processors in precise steps. This approach enhanced communication concurrency and has improved overall runtime making NEURON more efficient for simulation of large neuronal network models.
Optimizing NEURON Simulation Environment Using Remote Memory Access with Recursive Doubling on Distributed Memory Systems

PubMed Central

Bozkuş, Zeki

2016-01-01

Increase in complexity of neuronal network models escalated the efforts to make NEURON simulation environment efficient. The computational neuroscientists divided the equations into subnets amongst multiple processors for achieving better hardware performance. On parallel machines for neuronal networks, interprocessor spikes exchange consumes large section of overall simulation time. In NEURON for communication between processors Message Passing Interface (MPI) is used. MPI_Allgather collective is exercised for spikes exchange after each interval across distributed memory systems. The increase in number of processors though results in achieving concurrency and better performance but it inversely affects MPI_Allgather which increases communication time between processors. This necessitates improving communication methodology to decrease the spikes exchange time over distributed memory systems. This work has improved MPI_Allgather method using Remote Memory Access (RMA) by moving two-sided communication to one-sided communication, and use of recursive doubling mechanism facilitates achieving efficient communication between the processors in precise steps. This approach enhanced communication concurrency and has improved overall runtime making NEURON more efficient for simulation of large neuronal network models. PMID:27413363
Storage of multiple single-photon pulses emitted from a quantum dot in a solid-state quantum memory.

PubMed

Tang, Jian-Shun; Zhou, Zong-Quan; Wang, Yi-Tao; Li, Yu-Long; Liu, Xiao; Hua, Yi-Lin; Zou, Yang; Wang, Shuang; He, De-Yong; Chen, Geng; Sun, Yong-Nan; Yu, Ying; Li, Mi-Feng; Zha, Guo-Wei; Ni, Hai-Qiao; Niu, Zhi-Chuan; Li, Chuan-Feng; Guo, Guang-Can

2015-10-15

Quantum repeaters are critical components for distributing entanglement over long distances in presence of unavoidable optical losses during transmission. Stimulated by the Duan-Lukin-Cirac-Zoller protocol, many improved quantum repeater protocols based on quantum memories have been proposed, which commonly focus on the entanglement-distribution rate. Among these protocols, the elimination of multiple photons (or multiple photon-pairs) and the use of multimode quantum memory are demonstrated to have the ability to greatly improve the entanglement-distribution rate. Here, we demonstrate the storage of deterministic single photons emitted from a quantum dot in a polarization-maintaining solid-state quantum memory; in addition, multi-temporal-mode memory with 1, 20 and 100 narrow single-photon pulses is also demonstrated. Multi-photons are eliminated, and only one photon at most is contained in each pulse. Moreover, the solid-state properties of both sub-systems make this configuration more stable and easier to be scalable. Our work will be helpful in the construction of efficient quantum repeaters based on all-solid-state devices.
Storage of multiple single-photon pulses emitted from a quantum dot in a solid-state quantum memory

PubMed Central

Tang, Jian-Shun; Zhou, Zong-Quan; Wang, Yi-Tao; Li, Yu-Long; Liu, Xiao; Hua, Yi-Lin; Zou, Yang; Wang, Shuang; He, De-Yong; Chen, Geng; Sun, Yong-Nan; Yu, Ying; Li, Mi-Feng; Zha, Guo-Wei; Ni, Hai-Qiao; Niu, Zhi-Chuan; Li, Chuan-Feng; Guo, Guang-Can

2015-01-01

Quantum repeaters are critical components for distributing entanglement over long distances in presence of unavoidable optical losses during transmission. Stimulated by the Duan–Lukin–Cirac–Zoller protocol, many improved quantum repeater protocols based on quantum memories have been proposed, which commonly focus on the entanglement-distribution rate. Among these protocols, the elimination of multiple photons (or multiple photon-pairs) and the use of multimode quantum memory are demonstrated to have the ability to greatly improve the entanglement-distribution rate. Here, we demonstrate the storage of deterministic single photons emitted from a quantum dot in a polarization-maintaining solid-state quantum memory; in addition, multi-temporal-mode memory with 1, 20 and 100 narrow single-photon pulses is also demonstrated. Multi-photons are eliminated, and only one photon at most is contained in each pulse. Moreover, the solid-state properties of both sub-systems make this configuration more stable and easier to be scalable. Our work will be helpful in the construction of efficient quantum repeaters based on all-solid-state devices. PMID:26468996
Scalable PGAS Metadata Management on Extreme Scale Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chavarría-Miranda, Daniel; Agarwal, Khushbu; Straatsma, TP

Programming models intended to run on exascale systems have a number of challenges to overcome, specially the sheer size of the system as measured by the number of concurrent software entities created and managed by the underlying runtime. It is clear from the size of these systems that any state maintained by the programming model has to be strictly sub-linear in size, in order not to overwhelm memory usage with pure overhead. A principal feature of Partitioned Global Address Space (PGAS) models is providing easy access to global-view distributed data structures. In order to provide efficient access to these distributedmore » data structures, PGAS models must keep track of metadata such as where array sections are located with respect to processes/threads running on the HPC system. As PGAS models and applications become ubiquitous on very large transpetascale systems, a key component to their performance and scalability will be efficient and judicious use of memory for model overhead (metadata) compared to application data. We present an evaluation of several strategies to manage PGAS metadata that exhibit different space/time tradeoffs. We use two real-world PGAS applications to capture metadata usage patterns and gain insight into their communication behavior.« less
Distributed Learning Enhances Relational Memory Consolidation

ERIC Educational Resources Information Center

Litman, Leib; Davachi, Lila

2008-01-01

It has long been known that distributed learning (DL) provides a mnemonic advantage over massed learning (ML). However, the underlying mechanisms that drive this robust mnemonic effect remain largely unknown. In two experiments, we show that DL across a 24 hr interval does not enhance immediate memory performance but instead slows the rate of…
Havens: Explicit Reliable Memory Regions for HPC Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hukerikar, Saurabh; Engelmann, Christian

2016-01-01

Supporting error resilience in future exascale-class supercomputing systems is a critical challenge. Due to transistor scaling trends and increasing memory density, scientific simulations are expected to experience more interruptions caused by transient errors in the system memory. Existing hardware-based detection and recovery techniques will be inadequate to manage the presence of high memory fault rates. In this paper we propose a partial memory protection scheme based on region-based memory management. We define the concept of regions called havens that provide fault protection for program objects. We provide reliability for the regions through a software-based parity protection mechanism. Our approach enablesmore » critical program objects to be placed in these havens. The fault coverage provided by our approach is application agnostic, unlike algorithm-based fault tolerance techniques.« less

Enhancement of Immune Memory Responses to Respiratory Infection

DTIC Science & Technology

2017-08-01

Unlimited Distribution 13. SUPPLEMENTARY NOTES 14. ABSTRACT Maintenance of long - term immunological memory against pathogens is crucial for the rapid...highly expressed in memory B cells in mice, and Atg7 is required for maintenance of long - term memory B cells needed to protect against influenza...AWARD NUMBER: W81XWH-16-1-0360 TITLE: Enhancement of Immune Memory Responses to Respiratory Infection PRINCIPAL INVESTIGATORs: Dr Min Chen PhD
Autobiographical Memory and Depression in the Later Age: The Bump Is a Turning Point

ERIC Educational Resources Information Center

Gidron, Yori; Alon, Shirly

2007-01-01

This preliminary study integrated previous findings of the distribution of autobiographical memories in the later age according to their age of occurrence, with the overgeneral memory bias predictive of depression. Twenty-five non-demented, Israeli participants between 65-89 years of age provided autobiographical memories to 4 groups of word cues…
Cortex and Memory: Emergence of a New Paradigm

ERIC Educational Resources Information Center

Fuster, Joaquin M.

2009-01-01

Converging evidence from humans and nonhuman primates is obliging us to abandon conventional models in favor of a radically different, distributed-network paradigm of cortical memory. Central to the new paradigm is the concept of memory network or cognit--that is, a memory or an item of knowledge defined by a pattern of connections between neuron…
Digital Equipment Corporation VAX/VMS Version 4.3

DTIC Science & Technology

1986-07-30

operating system performs process-oriented paging that allows execution of programs that may be larger than the physical memory allocated to them... to higher privileged modes. (For an explanation of how the four access modes provide memory access protection see page 9, "Memory Management".) A... to optimize program performance for real-time applications or interactive environments. July 30, 1986 - 4 - Final Evaluation Report Digital VAX/VMS
Reconfigurable photonic crystals enabled by pressure-responsive shape-memory polymers

PubMed Central

Fang, Yin; Ni, Yongliang; Leo, Sin-Yen; Taylor, Curtis; Basile, Vito; Jiang, Peng

2015-01-01

Smart shape-memory polymers can memorize and recover their permanent shape in response to an external stimulus (for example, heat). They have been extensively exploited for a wide spectrum of applications ranging from biomedical devices to aerospace morphing structures. However, most of the existing shape-memory polymers are thermoresponsive and their performance is hindered by heat-demanding programming and recovery steps. Although pressure is an easily adjustable process variable such as temperature, pressure-responsive shape-memory polymers are largely unexplored. Here we report a series of shape-memory polymers that enable unusual ‘cold' programming and instantaneous shape recovery triggered by applying a contact pressure at ambient conditions. Moreover, the interdisciplinary integration of scientific principles drawn from two disparate fields—the fast-growing photonic crystal and shape-memory polymer technologies—enables fabrication of reconfigurable photonic crystals and simultaneously provides a simple and sensitive optical technique for investigating the intriguing shape-memory effects at nanoscale. PMID:26074349
Enhancing memory self-efficacy during menopause through a group memory strategies program.

PubMed

Unkenstein, Anne E; Bei, Bei; Bryant, Christina A

2017-05-01

Anxiety about memory during menopause can affect quality of life. We aimed to improve memory self-efficacy during menopause using a group memory strategies program. The program was run five times for a total of 32 peri- and postmenopausal women, age between 47 and 60 years, recruited from hospital menopause and gynecology clinics. The 4-week intervention consisted of weekly 2-hour sessions, and covered how memory works, memory changes related to ageing, health and lifestyle factors, and specific memory strategies. Memory contentment (CT), reported frequency of forgetting (FF), use of memory strategies, psychological distress, and attitude toward menopause were measured. A double-baseline design was applied, with outcomes measured on two baseline occasions (1-month prior [T1] and in the first session [T2]), immediately postintervention (T3), and 3-month postintervention (T4). To describe changes in each variable between time points paired sample t tests were conducted. Mixed-effects models comparing the means of random slopes from T2 to T3 with those from T1 to T2 were conducted for each variable to test for treatment effects. Examination of the naturalistic changes in outcome measures from T1 to T2 revealed no significant changes (all Ps > 0.05). CT, reported FF, and use of memory strategies improved significantly more from T2 to T3, than from T1 to T2 (all Ps < 0.05). Neither attitude toward menopause nor psychological distress improved significantly more postintervention than during the double-baseline (all Ps > 0.05). Improvements in reported CT and FF were maintained after 3 months. The use of group interventions to improve memory self-efficacy during menopause warrants continued evaluation.
Attentional Imbalances Following Head Injury

DTIC Science & Technology

1988-05-30

on recent memory , the patient’s basic fund of knowledge ( semantic memory ) and remote memory for events ( episodic memory ) were both impaired. For e...are notable for problems caused by poor memory , inflexibility, and concreteness. These problems are most severe when they interact with linguistic...demands. It is worth noting that memory problems were accompanied by confabulation (hen the patient first entered the program. In addition to the effects
A memory module for experimental data handling

NASA Astrophysics Data System (ADS)

De Blois, J.

1985-02-01

A compact CAMAC memory module for experimental data handling was developed to eliminate the need of direct memory access in computer controlled measurements. When using autonomous controllers it also makes measurements more independent of the program and enlarges the available space for programs in the memory of the micro-computer. The memory module has three modes of operation: an increment-, a list- and a fifo mode. This is achieved by connecting the main parts, being: the memory (MEM), the fifo buffer (FIFO), the address buffer (BUF), two counters (AUX and ADDR) and a readout register (ROR), by an internal 24-bit databus. The time needed for databus operations is 1 μs, for measuring cycles as well as for CAMAC cycles. The FIFO provides temporary data storage during CAMAC cycles and separates the memory part from the application part. The memory is variable from 1 to 64K (24 bits) by using different types of memory chips. The application part, which forms 1/3 of the module, will be specially designed for each application and is added to the memory chian internal connector. The memory unit will be used in Mössbauer experiments and in thermal neutron scattering experiments.
An efficient parallel termination detection algorithm

DOE Office of Scientific and Technical Information (OSTI.GOV)

Baker, A. H.; Crivelli, S.; Jessup, E. R.

2004-05-27

Information local to any one processor is insufficient to monitor the overall progress of most distributed computations. Typically, a second distributed computation for detecting termination of the main computation is necessary. In order to be a useful computational tool, the termination detection routine must operate concurrently with the main computation, adding minimal overhead, and it must promptly and correctly detect termination when it occurs. In this paper, we present a new algorithm for detecting the termination of a parallel computation on distributed-memory MIMD computers that satisfies all of those criteria. A variety of termination detection algorithms have been devised. Ofmore » these, the algorithm presented by Sinha, Kale, and Ramkumar (henceforth, the SKR algorithm) is unique in its ability to adapt to the load conditions of the system on which it runs, thereby minimizing the impact of termination detection on performance. Because their algorithm also detects termination quickly, we consider it to be the most efficient practical algorithm presently available. The termination detection algorithm presented here was developed for use in the PMESC programming library for distributed-memory MIMD computers. Like the SKR algorithm, our algorithm adapts to system loads and imposes little overhead. Also like the SKR algorithm, ours is tree-based, and it does not depend on any assumptions about the physical interconnection topology of the processors or the specifics of the distributed computation. In addition, our algorithm is easier to implement and requires only half as many tree traverses as does the SKR algorithm. This paper is organized as follows. In section 2, we define our computational model. In section 3, we review the SKR algorithm. We introduce our new algorithm in section 4, and prove its correctness in section 5. We discuss its efficiency and present experimental results in section 6.« less
Developing Memory Clinics in Primary Care: An Evidence-Based Interprofessional Program of Continuing Professional Development

ERIC Educational Resources Information Center

Lee, Linda; Weston, W. Wayne; Hillier, Loretta M.

2013-01-01

Introduction: Primary care is challenged to meet the needs of patients with dementia. A training program was developed to increase capacity for dementia care through the development of Family Health Team (FHT)-based interprofessional memory clinics. The interprofessional training program consisted of a 2-day workshop, 1-day observership, and 2-day…
Memories of Physical Education

ERIC Educational Resources Information Center

Sidwell, Amy M.; Walls, Richard T.

2014-01-01

The purpose of this investigation was to explore college students' autobiographical memories of physical education (PE). Questionnaires were distributed to students enrolled in undergraduate Introduction to PE and Introduction to Communications courses. The 261 participants wrote about memories of PE. These students recalled events from Grades…
Immigration, Language Proficiency, and Autobiographical Memories: Lifespan Distribution and Second-Language Access

PubMed Central

Esposito, Alena G.; Baker-Ward, Lynne

2015-01-01

This investigation examined two controversies in the autobiographical literature: how cross-language immigration affects the distribution of autobiographical memories across the lifespan and under what circumstances language-dependent recall is observed. Both Spanish/English bilingual immigrants and English monolingual non-immigrants participated in a cue word study, with the bilingual sample taking part in a within-subject language manipulation. The expected bump in the number of memories from early life was observed for non-immigrants but not immigrants, who reported more memories for events surrounding immigration. Aspects of the methodology addressed possible reasons for past discrepant findings. Language-dependent recall was influenced by second-language proficiency. Results were interpreted as evidence that bilinguals with high second-language proficiency, in contrast to those with lower second-language proficiency, access a single conceptual store through either language. The final multi-level model predicting language-dependent recall, including second-language proficiency, age of immigration, internal language, and cue word language, explained ¾ of the between-person variance and ⅕ of the within-person variance. We arrive at two conclusions. First, major life transitions influence the distribution of memories. Second, concept representation across multiple languages follows a developmental model. In addition, the results underscore the importance of considering language experience in research involving memory reports. PMID:26274061
Immigration, language proficiency, and autobiographical memories: Lifespan distribution and second-language access.

PubMed

Esposito, Alena G; Baker-Ward, Lynne

2016-08-01

This investigation examined two controversies in the autobiographical literature: how cross-language immigration affects the distribution of autobiographical memories across the lifespan and under what circumstances language-dependent recall is observed. Both Spanish/English bilingual immigrants and English monolingual non-immigrants participated in a cue word study, with the bilingual sample taking part in a within-subject language manipulation. The expected bump in the number of memories from early life was observed for non-immigrants but not immigrants, who reported more memories for events surrounding immigration. Aspects of the methodology addressed possible reasons for past discrepant findings. Language-dependent recall was influenced by second-language proficiency. Results were interpreted as evidence that bilinguals with high second-language proficiency, in contrast to those with lower second-language proficiency, access a single conceptual store through either language. The final multi-level model predicting language-dependent recall, including second-language proficiency, age of immigration, internal language, and cue word language, explained ¾ of the between-person variance and (1)/5 of the within-person variance. We arrive at two conclusions. First, major life transitions influence the distribution of memories. Second, concept representation across multiple languages follows a developmental model. In addition, the results underscore the importance of considering language experience in research involving memory reports.
A microcomputer program for energy assessment and aggregation using the triangular probability distribution

USGS Publications Warehouse

Crovelli, R.A.; Balay, R.H.

1991-01-01

A general risk-analysis method was developed for petroleum-resource assessment and other applications. The triangular probability distribution is used as a model with an analytic aggregation methodology based on probability theory rather than Monte-Carlo simulation. Among the advantages of the analytic method are its computational speed and flexibility, and the saving of time and cost on a microcomputer. The input into the model consists of a set of components (e.g. geologic provinces) and, for each component, three potential resource estimates: minimum, most likely (mode), and maximum. Assuming a triangular probability distribution, the mean, standard deviation, and seven fractiles (F100, F95, F75, F50, F25, F5, and F0) are computed for each component, where for example, the probability of more than F95 is equal to 0.95. The components are aggregated by combining the means, standard deviations, and respective fractiles under three possible siutations (1) perfect positive correlation, (2) complete independence, and (3) any degree of dependence between these two polar situations. A package of computer programs named the TRIAGG system was written in the Turbo Pascal 4.0 language for performing the analytic probabilistic methodology. The system consists of a program for processing triangular probability distribution assessments and aggregations, and a separate aggregation routine for aggregating aggregations. The user's documentation and program diskette of the TRIAGG system are available from USGS Open File Services. TRIAGG requires an IBM-PC/XT/AT compatible microcomputer with 256kbyte of main memory, MS-DOS 3.1 or later, either two diskette drives or a fixed disk, and a 132 column printer. A graphics adapter and color display are optional. ?? 1991.
Final Report: Correctness Tools for Petascale Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mellor-Crummey, John

2014-10-27

In the course of developing parallel programs for leadership computing systems, subtle programming errors often arise that are extremely difficult to diagnose without tools. To meet this challenge, University of Maryland, the University of Wisconsin—Madison, and Rice University worked to develop lightweight tools to help code developers pinpoint a variety of program correctness errors that plague parallel scientific codes. The aim of this project was to develop software tools that help diagnose program errors including memory leaks, memory access errors, round-off errors, and data races. Research at Rice University focused on developing algorithms and data structures to support efficient monitoringmore » of multithreaded programs for memory access errors and data races. This is a final report about research and development work at Rice University as part of this project.« less
Cognitive Training Program to Improve Working Memory in Older Adults with MCI.

PubMed

Hyer, Lee; Scott, Ciera; Atkinson, Mary Michael; Mullen, Christine M; Lee, Anna; Johnson, Aaron; Mckenzie, Laura C

2016-01-01

Deficits in working memory (WM) are associated with age-related decline. We report findings from a clinical trial that examined the effectiveness of Cogmed, a computerized program that trains WM. We compare this program to a Sham condition in older adults with Mild Cognitive Impairment (MCI). Older adults (N = 68) living in the community were assessed. Participants reported memory impairment and met criteria for MCI, either by poor delayed memory or poor performance in other cognitive areas. The Repeatable Battery for the Assessment of Neuropsychological Status (RBANS, Delayed Memory Index) and the Clinical Dementia Rating scale (CDR) were utilized. All presented with normal Mini Mental State Exams (MMSE) and activities of daily living (ADLs). Participants were randomized to Cogmed or a Sham computer program. Twenty-five sessions were completed over five to seven weeks. Pre, post, and follow-up measures included a battery of cognitive measures (three WM tests), a subjective memory scale, and a functional measure. Both intervention groups improved over time. Cogmed significantly outperformed Sham on Span Board and exceeded in subjective memory reports at follow-up as assessed by the Cognitive Failures Questionnaire (CFQ). The Cogmed group demonstrated better performance on the Functional Activities Questionnaire (FAQ), a measure of adjustment and far transfer, at follow-up. Both groups, especially Cogmed, enjoyed the intervention. Results suggest that WM was enhanced in both groups of older adults with MCI. Cogmed was better on one core WM measure and had higher ratings of satisfaction. The Sham condition declined on adjustment.
Review and analysis of dense linear system solver package for distributed memory machines

NASA Technical Reports Server (NTRS)

Narang, H. N.

1993-01-01

A dense linear system solver package recently developed at the University of Texas at Austin for distributed memory machine (e.g. Intel Paragon) has been reviewed and analyzed. The package contains about 45 software routines, some written in FORTRAN, and some in C-language, and forms the basis for parallel/distributed solutions of systems of linear equations encountered in many problems of scientific and engineering nature. The package, being studied by the Computer Applications Branch of the Analysis and Computation Division, may provide a significant computational resource for NASA scientists and engineers in parallel/distributed computing. Since the package is new and not well tested or documented, many of its underlying concepts and implementations were unclear; our task was to review, analyze, and critique the package as a step in the process that will enable scientists and engineers to apply it to the solution of their problems. All routines in the package were reviewed and analyzed. Underlying theory or concepts which exist in the form of published papers or technical reports, or memos, were either obtained from the author, or from the scientific literature; and general algorithms, explanations, examples, and critiques have been provided to explain the workings of these programs. Wherever the things were still unclear, communications were made with the developer (author), either by telephone or by electronic mail, to understand the workings of the routines. Whenever possible, tests were made to verify the concepts and logic employed in their implementations. A detailed report is being separately documented to explain the workings of these routines.
ALPS - A LINEAR PROGRAM SOLVER

NASA Technical Reports Server (NTRS)

Viterna, L. A.

1994-01-01

Linear programming is a widely-used engineering and management tool. Scheduling, resource allocation, and production planning are all well-known applications of linear programs (LP's). Most LP's are too large to be solved by hand, so over the decades many computer codes for solving LP's have been developed. ALPS, A Linear Program Solver, is a full-featured LP analysis program. ALPS can solve plain linear programs as well as more complicated mixed integer and pure integer programs. ALPS also contains an efficient solution technique for pure binary (0-1 integer) programs. One of the many weaknesses of LP solvers is the lack of interaction with the user. ALPS is a menu-driven program with no special commands or keywords to learn. In addition, ALPS contains a full-screen editor to enter and maintain the LP formulation. These formulations can be written to and read from plain ASCII files for portability. For those less experienced in LP formulation, ALPS contains a problem "parser" which checks the formulation for errors. ALPS creates fully formatted, readable reports that can be sent to a printer or output file. ALPS is written entirely in IBM's APL2/PC product, Version 1.01. The APL2 workspace containing all the ALPS code can be run on any APL2/PC system (AT or 386). On a 32-bit system, this configuration can take advantage of all extended memory. The user can also examine and modify the ALPS code. The APL2 workspace has also been "packed" to be run on any DOS system (without APL2) as a stand-alone "EXE" file, but has limited memory capacity on a 640K system. A numeric coprocessor (80X87) is optional but recommended. The standard distribution medium for ALPS is a 5.25 inch 360K MS-DOS format diskette. IBM, IBM PC and IBM APL2 are registered trademarks of International Business Machines Corporation. MS-DOS is a registered trademark of Microsoft Corporation.
A ten-year follow-up of a study of memory for the attack of September 11, 2001: Flashbulb memories and memories for flashbulb events.

PubMed

Hirst, William; Phelps, Elizabeth A; Meksin, Robert; Vaidya, Chandan J; Johnson, Marcia K; Mitchell, Karen J; Buckner, Randy L; Budson, Andrew E; Gabrieli, John D E; Lustig, Cindy; Mather, Mara; Ochsner, Kevin N; Schacter, Daniel; Simons, Jon S; Lyle, Keith B; Cuc, Alexandru F; Olsson, Andreas

2015-06-01

Within a week of the attack of September 11, 2001, a consortium of researchers from across the United States distributed a survey asking about the circumstances in which respondents learned of the attack (their flashbulb memories) and the facts about the attack itself (their event memories). Follow-up surveys were distributed 11, 25, and 119 months after the attack. The study, therefore, examines retention of flashbulb memories and event memories at a substantially longer retention interval than any previous study using a test-retest methodology, allowing for the study of such memories over the long term. There was rapid forgetting of both flashbulb and event memories within the first year, but the forgetting curves leveled off after that, not significantly changing even after a 10-year delay. Despite the initial rapid forgetting, confidence remained high throughout the 10-year period. Five putative factors affecting flashbulb memory consistency and event memory accuracy were examined: (a) attention to media, (b) the amount of discussion, (c) residency, (d) personal loss and/or inconvenience, and (e) emotional intensity. After 10 years, none of these factors predicted flashbulb memory consistency; media attention and ensuing conversation predicted event memory accuracy. Inconsistent flashbulb memories were more likely to be repeated rather than corrected over the 10-year period; inaccurate event memories, however, were more likely to be corrected. The findings suggest that even traumatic memories and those implicated in a community's collective identity may be inconsistent over time and these inconsistencies can persist without the corrective force of external influences. (c) 2015 APA, all rights reserved).
Helicopter In-Flight Monitoring System Second Generation (HIMS II).

DTIC Science & Technology

1983-08-01

acquisition cycle. B. Computer Chassis CPU (DEC LSI-II/2) -- Executes instructions contained in the memory. 32K memory (DEC MSVII-DD) --Contains program...when the operator executes command #2, 3, or 5 (display data). New cartridges can be inserted as required for truly unlimited, continuous data...is called bootstrapping. The software, which is stored on a tape cartridge, is loaded into memory by execution of a small program stored in read-only

Dynamic overset grid communication on distributed memory parallel processors

NASA Technical Reports Server (NTRS)

Barszcz, Eric; Weeratunga, Sisira K.; Meakin, Robert L.

1993-01-01

A parallel distributed memory implementation of intergrid communication for dynamic overset grids is presented. Included are discussions of various options considered during development. Results are presented comparing an Intel iPSC/860 to a single processor Cray Y-MP. Results for grids in relative motion show the iPSC/860 implementation to be faster than the Cray implementation.
pyCTQW: A continuous-time quantum walk simulator on distributed memory computers

NASA Astrophysics Data System (ADS)

Izaac, Josh A.; Wang, Jingbo B.

2015-01-01

In the general field of quantum information and computation, quantum walks are playing an increasingly important role in constructing physical models and quantum algorithms. We have recently developed a distributed memory software package pyCTQW, with an object-oriented Python interface, that allows efficient simulation of large multi-particle CTQW (continuous-time quantum walk)-based systems. In this paper, we present an introduction to the Python and Fortran interfaces of pyCTQW, discuss various numerical methods of calculating the matrix exponential, and demonstrate the performance behavior of pyCTQW on a distributed memory cluster. In particular, the Chebyshev and Krylov-subspace methods for calculating the quantum walk propagation are provided, as well as methods for visualization and data analysis.
From pipelines to pathways: the Memorial experience in educating doctors for rural generalist practice.

PubMed

Rourke, James; Asghari, Shabnam; Hurley, Oliver; Ravalia, Mohamed; Jong, Michael; Parsons, Wanda; Duggan, Norah; Stringer, Katherine; O'Keefe, Danielle; Moffatt, Scott; Graham, Wendy; Sturge Sparkes, Carolyn; Hippe, Janelle; Harris Walsh, Kristin; McKay, Donald; Samarasena, Asoka

2018-03-01

This report describes the community context, concept and mission of The Faculty of Medicine at Memorial University of Newfoundland (Memorial), Canada, and its 'pathways to rural practice' approach, which includes influences at the pre-medical school, medical school experience, postgraduate residency training, and physician practice levels. Memorial's pathways to practice helped Memorial to fulfill its social accountability mandate to populate the province with highly skilled rural generalist practitioners. Programs/interventions/initiatives: The 'pathways to rural practice' include initiatives in four stages: (1) before admission to medical school; (2) during undergraduate medical training (medical degree (MD) program); (3) during postgraduate vocational residency training; and (4) after postgraduate vocational residency training. Memorial's Learners & Locations (L&L) database tracks students through these stages. The Aboriginal initiative - the MedQuest program and the admissions process that considers geographic or minority representation in terms of those selecting candidates and the candidates themselves - occurs before the student is admitted. Once a student starts Memorial's MD program, the student has ample opportunities to have rural-based experiences through pre-clerkship and clerkship, of which some take place exclusively outside of St. John's tertiary hospitals. Memorial's postgraduate (PG) Family Medicine (FM) residency (vocational) training program allows for deeper community integration and longer periods of training within the same community, which increases the likelihood of a physician choosing rural family medicine. After postgraduate training, rural physicians were given many opportunities for professional development as well as faculty development opportunities. Each of the programs and initiatives were assessed through geospatial rurality analysis of administrative data collected upon entry into and during the MD program and PG training (L&L). Among Memorial MD-graduating classes of 2011-2020, 56% spent the majority of their lives before their 18th birthday in a rural location and 44% in an urban location. As of September 2016, 23 Memorial MD students self-identified as Aboriginal, of which 2 (9%) were from an urban location and 20 (91%) were from rural locations. For Year 3 Family Medicine, graduating classes 2011 to 2019, 89% of placement weeks took place in rural communities and 8% took place in rural towns. For Memorial MD graduating classes 2011-2013 who completed Memorial Family Medicine vocational training residencies, (N=49), 100% completed some rural training. For these 49 residents (vocational trainees), the average amount of time spent in rural areas was 52 weeks out of a total average FM training time of 95 weeks. For Family Medicine residencies from July 2011 to October 2016, 29% of all placement weeks took place in rural communities and 21% of all placement weeks took place in rural towns. For 2016-2017 first-year residents, 53% of the first year training is completed in rural locations, reflecting an even greater rural experiential learning focus. Memorial's pathways approach has allowed for the comprehensive training of rural generalists for Newfoundland and Labrador and the rest of Canada and may be applicable to other settings. More challenges remain, requiring ongoing collaboration with governments, medical associations, health authorities, communities, and their physicians to help achieve reliable and feasible healthcare delivery for those living in rural and remote areas.
User's manual: Subsonic/supersonic advanced panel pilot code

NASA Technical Reports Server (NTRS)

Moran, J.; Tinoco, E. N.; Johnson, F. T.

1978-01-01

Sufficient instructions for running the subsonic/supersonic advanced panel pilot code were developed. This software was developed as a vehicle for numerical experimentation and it should not be construed to represent a finished production program. The pilot code is based on a higher order panel method using linearly varying source and quadratically varying doublet distributions for computing both linearized supersonic and subsonic flow over arbitrary wings and bodies. This user's manual contains complete input and output descriptions. A brief description of the method is given as well as practical instructions for proper configurations modeling. Computed results are also included to demonstrate some of the capabilities of the pilot code. The computer program is written in FORTRAN IV for the SCOPE 3.4.4 operations system of the Ames CDC 7600 computer. The program uses overlay structure and thirteen disk files, and it requires approximately 132000 (Octal) central memory words.
High Performance Programming Using Explicit Shared Memory Model on Cray T3D1

NASA Technical Reports Server (NTRS)

Simon, Horst D.; Saini, Subhash; Grassi, Charles

1994-01-01

The Cray T3D system is the first-phase system in Cray Research, Inc.'s (CRI) three-phase massively parallel processing (MPP) program. This system features a heterogeneous architecture that closely couples DEC's Alpha microprocessors and CRI's parallel-vector technology, i.e., the Cray Y-MP and Cray C90. An overview of the Cray T3D hardware and available programming models is presented. Under Cray Research adaptive Fortran (CRAFT) model four programming methods (data parallel, work sharing, message-passing using PVM, and explicit shared memory model) are available to the users. However, at this time data parallel and work sharing programming models are not available to the user community. The differences between standard PVM and CRI's PVM are highlighted with performance measurements such as latencies and communication bandwidths. We have found that the performance of neither standard PVM nor CRI s PVM exploits the hardware capabilities of the T3D. The reasons for the bad performance of PVM as a native message-passing library are presented. This is illustrated by the performance of NAS Parallel Benchmarks (NPB) programmed in explicit shared memory model on Cray T3D. In general, the performance of standard PVM is about 4 to 5 times less than obtained by using explicit shared memory model. This degradation in performance is also seen on CM-5 where the performance of applications using native message-passing library CMMD on CM-5 is also about 4 to 5 times less than using data parallel methods. The issues involved (such as barriers, synchronization, invalidating data cache, aligning data cache etc.) while programming in explicit shared memory model are discussed. Comparative performance of NPB using explicit shared memory programming model on the Cray T3D and other highly parallel systems such as the TMC CM-5, Intel Paragon, Cray C90, IBM-SP1, etc. is presented.
Biaxial Fatigue Behavior of Niti Shape Memory Alloy

DTIC Science & Technology

2005-03-01

BIAXIAL FATIGUE BEHAVIOR OF NiTi SHAPE MEMORY ALLOY THESIS Daniel M. Jensen, 1st Lieutenant...BIAXIAL FATIGUE BEHAVIOR OF NiTi SHAPE MEMORY ALLOY THESIS Presented to the Faculty Department of Aeronautics and Astronautics Graduate School of...FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED AFIT/GA/ENY/05-M06 BIAXIAL FATIGUE BEHAVIOR OF NiTi SHAPE MEMORY ALLOY Daniel M. Jensen
The Measurement of Visuo-Spatial and Verbal-Numerical Working Memory: Development of IRT-Based Scales

ERIC Educational Resources Information Center

Vock, Miriam; Holling, Heinz

2008-01-01

The objective of this study is to explore the potential for developing IRT-based working memory scales for assessing specific working memory components in children (8-13 years). These working memory scales should measure cognitive abilities reliably in the upper range of ability distribution as well as in the normal range, and provide a…
Object properties and cognitive load in the formation of associative memory during precision lifting.

PubMed

Li, Yong; Randerath, Jennifer; Bauer, Hans; Marquardt, Christian; Goldenberg, Georg; Hermsdörfer, Joachim

2009-01-03

When we manipulate familiar objects in our daily life, our grip force anticipates the physical demands right from the moment of contact with the object, indicating the existence of a memory for relevant object properties. This study explores the formation and consolidation of the memory processes that associate either familiar (size) or arbitrary object features (color) with object weight. In the general task, participants repetitively lifted two differently weighted objects (580 and 280 g) in a pseudo-random order. Forty young healthy adults participated in this study and were randomly distributed into four groups: Color Cue Single task (CCS, blue and red, 9.8(3)cm(3)), Color Cue Dual task (CCD), No Cue (NC) and Size Cue (SC, 9.8(3) and 6(3)cm(3)) group. All groups performed a repetitive precision grasp-lift task and were retested with the same protocol after a 5-min pause. The CCD group was also required to simultaneously perform a memory task during each lift of differently weighted objects coded by color. The results show that groups lifting objects with arbitrary or familiar features successfully formed the association between object weight and manipulated object features and incorporated this into grip force programming, as observed in the different scaling of grip force and grip force rate for different object weights. An arbitrary feature, i.e., color, can be sufficiently associated with object weight, however with less strength than the familiar feature of size. The simultaneous memory task impaired anticipatory force scaling during repetitive object lifting but did not jeopardize the learning process and the consolidation of the associative memory.
Task set induces dynamic reallocation of resources in visual short-term memory.

PubMed

Sheremata, Summer L; Shomstein, Sarah

2017-08-01

Successful interaction with the environment requires the ability to flexibly allocate resources to different locations in the visual field. Recent evidence suggests that visual short-term memory (VSTM) resources are distributed asymmetrically across the visual field based upon task demands. Here, we propose that context, rather than the stimulus itself, determines asymmetrical distribution of VSTM resources. To test whether context modulates the reallocation of resources to the right visual field, task set, defined by memory-load, was manipulated to influence visual short-term memory performance. Performance was measured for single-feature objects embedded within predominantly single- or two-feature memory blocks. Therefore, context was varied to determine whether task set directly predicts changes in visual field biases. In accord with the dynamic reallocation of resources hypothesis, task set, rather than aspects of the physical stimulus, drove improvements in performance in the right- visual field. Our results show, for the first time, that preparation for upcoming memory demands directly determines how resources are allocated across the visual field.
Temperature and electrical memory of polymer fibers

NASA Astrophysics Data System (ADS)

Yuan, Jinkai; Zakri, Cécile; Grillard, Fabienne; Neri, Wilfrid; Poulin, Philippe

2014-05-01

We report in this work studies of the shape memory behavior of polymer fibers loaded with carbon nanotubes or graphene flakes. These materials exhibit enhanced shape memory properties with the generation of a giant stress upon shape recovery. In addition, they exhibit a surprising temperature memory with a peak of generated stress at a temperature nearly equal to the temperature of programming. This temperature memory is ascribed to the presence of dynamical heterogeneities and to the intrinsic broadness of the glass transition. We present recent experiments related to observables other than mechanical properties. In particular nanocomposite fibers exhibit variations of electrical conductivity with an accurate memory. Indeed, the rate of conductivity variations during temperature changes reaches a well defined maximum at a temperature equal to the temperature of programming. Such materials are promising for future actuators that couple dimensional changes with sensing electronic functionalities.
Spacecraft On-Board Information Extraction Computer (SOBIEC)

NASA Technical Reports Server (NTRS)

Eisenman, David; Decaro, Robert E.; Jurasek, David W.

1994-01-01

The Jet Propulsion Laboratory is the Technical Monitor on an SBIR Program issued for Irvine Sensors Corporation to develop a highly compact, dual use massively parallel processing node known as SOBIEC. SOBIEC couples 3D memory stacking technology provided by nCUBE. The node contains sufficient network Input/Output to implement up to an order-13 binary hypercube. The benefit of this network, is that it scales linearly as more processors are added, and it is a superset of other commonly used interconnect topologies such as: meshes, rings, toroids, and trees. In this manner, a distributed processing network can be easily devised and supported. The SOBIEC node has sufficient memory for most multi-computer applications, and also supports external memory expansion and DMA interfaces. The SOBIEC node is supported by a mature set of software development tools from nCUBE. The nCUBE operating system (OS) provides configuration and operational support for up to 8000 SOBIEC processors in an order-13 binary hypercube or any subset or partition(s) thereof. The OS is UNIX (USL SVR4) compatible, with C, C++, and FORTRAN compilers readily available. A stand-alone development system is also available to support SOBIEC test and integration.
Geometry- and Length Scale-Dependent Deformation and Recovery on Micro- and Nanopatterned Shape Memory Polymer Surfaces

PubMed Central

Lee, Wei Li; Low, Hong Yee

2016-01-01

Micro- and nanoscale surface textures, when optimally designed, present a unique approach to improve surface functionalities. Coupling surface texture with shape memory polymers may generate reversibly tuneable surface properties. A shape memory polyetherurethane is used to prepare various surface textures including 2 μm- and 200 nm-gratings, 250 nm-pillars and 200 nm-holes. The mechanical deformation via stretching and recovery of the surface texture are investigated as a function of length scales and shapes. Results show the 200 nm-grating exhibiting more deformation than 2 μm-grating. Grating imparts anisotropic and surface area-to-volume effects, causing different degree of deformation between gratings and pillars under the same applied macroscopic strain. Full distribution of stress within the film causes the holes to deform more substantially than the pillars. In the recovery study, unlike a nearly complete recovery for the gratings after 10 transformation cycles, the high contribution of surface energy impedes the recovery of holes and pillars. The surface textures are shown to perform a switchable wetting function. This study provides insights into how geometric features of shape memory surface patterns can be designed to modulate the shape programming and recovery, and how the control of reversibly deformable surface textures can be applied to transfer microdroplets. PMID:27026290
An Apple II Implementation of Man-Mod Manpower Planning Model.

DTIC Science & Technology

1982-03-01

next page. It is highly recommended, to prevent the loss of data, that the user save the data at this point. If Choice (1 ), yes, is selected, the...approximately 30 seconds, but will clear and reload memory preventing any inadvertent memory changes which might cause program interruptions or erroneous cal... prgram . 70 MAN-MOD/PROGRAM (PROGRAM LISTING) 1000 REM MAN-MOD/PROGRAM PROGRAM: "FOR" IS IN QUOTES IN LINES 1004,10518,10520,10524,10526,10528,1072
Diffusion theory of decision making in continuous report.

PubMed

Smith, Philip L

2016-07-01

I present a diffusion model for decision making in continuous report tasks, in which a continuous, circularly distributed, stimulus attribute in working memory is matched to a representation of the attribute in the stimulus display. Memory retrieval is modeled as a 2-dimensional diffusion process with vector-valued drift on a disk, whose bounding circle represents the decision criterion. The direction and magnitude of the drift vector describe the identity of the stimulus and the quality of its representation in memory, respectively. The point at which the diffusion exits the disk determines the reported value of the attribute and the time to exit the disk determines the decision time. Expressions for the joint distribution of decision times and report outcomes are obtained by means of the Girsanov change-of-measure theorem, which allows the properties of the nonzero-drift diffusion process to be characterized as a function of a Euclidian-distance Bessel process. Predicted report precision is equal to the product of the decision criterion and the drift magnitude and follows a von Mises distribution, in agreement with the treatment of precision in the working memory literature. Trial-to-trial variability in criterion and drift rate leads, respectively, to direct and inverse relationships between report accuracy and decision times, in agreement with, and generalizing, the standard diffusion model of 2-choice decisions. The 2-dimensional model provides a process account of working memory precision and its relationship with the diffusion model, and a new way to investigate the properties of working memory, via the distributions of decision times. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Contention Modeling for Multithreaded Distributed Shared Memory Machines: The Cray XMT

DOE Office of Scientific and Technical Information (OSTI.GOV)

Secchi, Simone; Tumeo, Antonino; Villa, Oreste

Distributed Shared Memory (DSM) machines are a wide class of multi-processor computing systems where a large virtually-shared address space is mapped on a network of physically distributed memories. High memory latency and network contention are two of the main factors that limit performance scaling of such architectures. Modern high-performance computing DSM systems have evolved toward exploitation of massive hardware multi-threading and fine-grained memory hashing to tolerate irregular latencies, avoid network hot-spots and enable high scaling. In order to model the performance of such large-scale machines, parallel simulation has been proved to be a promising approach to achieve good accuracy inmore » reasonable times. One of the most critical factors in solving the simulation speed-accuracy trade-off is network modeling. The Cray XMT is a massively multi-threaded supercomputing architecture that belongs to the DSM class, since it implements a globally-shared address space abstraction on top of a physically distributed memory substrate. In this paper, we discuss the development of a contention-aware network model intended to be integrated in a full-system XMT simulator. We start by measuring the effects of network contention in a 128-processor XMT machine and then investigate the trade-off that exists between simulation accuracy and speed, by comparing three network models which operate at different levels of accuracy. The comparison and model validation is performed by executing a string-matching algorithm on the full-system simulator and on the XMT, using three datasets that generate noticeably different contention patterns.« less
PANDA: A distributed multiprocessor operating system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chubb, P.

1989-01-01

PANDA is a design for a distributed multiprocessor and an operating system. PANDA is designed to allow easy expansion of both hardware and software. As such, the PANDA kernel provides only message passing and memory and process management. The other features needed for the system (device drivers, secondary storage management, etc.) are provided as replaceable user tasks. The thesis presents PANDA's design and implementation, both hardware and software. PANDA uses multiple 68010 processors sharing memory on a VME bus, each such node potentially connected to others via a high speed network. The machine is completely homogeneous: there are no differencesmore » between processors that are detectable by programs running on the machine. A single two-processor node has been constructed. Each processor contains memory management circuits designed to allow processors to share page tables safely. PANDA presents a programmers' model similar to the hardware model: a job is divided into multiple tasks, each having its own address space. Within each task, multiple processes share code and data. Tasks can send messages to each other, and set up virtual circuits between themselves. Peripheral devices such as disc drives are represented within PANDA by tasks. PANDA divides secondary storage into volumes, each volume being accessed by a volume access task, or VAT. All knowledge about the way that data is stored on a disc is kept in its volume's VAT. The design is such that PANDA should provide a useful testbed for file systems and device drivers, as these can be installed without recompiling PANDA itself, and without rebooting the machine.« less
Kmerind: A Flexible Parallel Library for K-mer Indexing of Biological Sequences on Distributed Memory Systems.

PubMed

Pan, Tony; Flick, Patrick; Jain, Chirag; Liu, Yongchao; Aluru, Srinivas

2017-10-09

Counting and indexing fixed length substrings, or k-mers, in biological sequences is a key step in many bioinformatics tasks including genome alignment and mapping, genome assembly, and error correction. While advances in next generation sequencing technologies have dramatically reduced the cost and improved latency and throughput, few bioinformatics tools can efficiently process the datasets at the current generation rate of 1.8 terabases every 3 days. We present Kmerind, a high performance parallel k-mer indexing library for distributed memory environments. The Kmerind library provides a set of simple and consistent APIs with sequential semantics and parallel implementations that are designed to be flexible and extensible. Kmerind's k-mer counter performs similarly or better than the best existing k-mer counting tools even on shared memory systems. In a distributed memory environment, Kmerind counts k-mers in a 120 GB sequence read dataset in less than 13 seconds on 1024 Xeon CPU cores, and fully indexes their positions in approximately 17 seconds. Querying for 1% of the k-mers in these indices can be completed in 0.23 seconds and 28 seconds, respectively. Kmerind is the first k-mer indexing library for distributed memory environments, and the first extensible library for general k-mer indexing and counting. Kmerind is available at https://github.com/ParBLiSS/kmerind.
Mass Memory Storage Devices for AN/SLQ-32(V).

DTIC Science & Technology

1985-06-01

tactical programs and libraries into the AN/UYK-19 computer , the RP-16 microprocessor, and other peripheral processors (e.g., ADLS and Band 1) will be...software must be loaded into computer memory from the 4-track magnetic tape cartridges (MTCs) on which the programs are stored. Program load begins...software. Future computer programs , which will reside in peripheral processors, include the Automated Decoy Launching System (ADLS) and Band 1. As
Memory training interventions for older adults: a meta-analysis.

PubMed

Gross, Alden L; Parisi, Jeanine M; Spira, Adam P; Kueider, Alexandra M; Ko, Jean Y; Saczynski, Jane S; Samus, Quincy M; Rebok, George W

2012-01-01

A systematic review and meta-analysis of memory training research was conducted to characterize the effect of memory strategies on memory performance among cognitively intact, community-dwelling older adults, and to identify characteristics of individuals and of programs associated with improved memory. The review identified 402 publications, of which 35 studies met criteria for inclusion. The overall effect size estimate, representing the mean standardized difference in pre-post change between memory-trained and control groups, was 0.31 standard deviations (SD; 95% confidence interval (CI): 0.22, 0.39). The pre-post training effect for memory-trained interventions was 0.43 SD (95% CI: 0.29, 0.57) and the practice effect for control groups was 0.06 SD (95% CI: 0.05, 0.16). Among 10 distinct memory strategies identified in studies, meta-analytic methods revealed that training multiple strategies was associated with larger training gains (p=0.04), although this association did not reach statistical significance after adjusting for multiple comparisons. Treatment gains among memory-trained individuals were not better after training in any particular strategy, or by the average age of participants, session length, or type of control condition. These findings can inform the design of future memory training programs for older adults.
The Shark Random Swim - (Lévy Flight with Memory)

NASA Astrophysics Data System (ADS)

Businger, Silvia

2018-05-01

The Elephant Random Walk (ERW), first introduced by Schütz and Trimper (Phys Rev E 70:045101, 2004), is a one-dimensional simple random walk on Z having a memory about the whole past. We study the Shark Random Swim, a random walk with memory about the whole past, whose steps are α -stable distributed with α \\in (0,2] . Our aim in this work is to study the impact of the heavy tailed step distributions on the asymptotic behavior of the random walk. We shall see that, as for the ERW, the asymptotic behavior of the Shark Random Swim depends on its memory parameter p, and that a phase transition can be observed at the critical value p=1/α.

Rapid solution of large-scale systems of equations

NASA Technical Reports Server (NTRS)

Storaasli, Olaf O.

1994-01-01

The analysis and design of complex aerospace structures requires the rapid solution of large systems of linear and nonlinear equations, eigenvalue extraction for buckling, vibration and flutter modes, structural optimization and design sensitivity calculation. Computers with multiple processors and vector capabilities can offer substantial computational advantages over traditional scalar computer for these analyses. These computers fall into two categories: shared memory computers and distributed memory computers. This presentation covers general-purpose, highly efficient algorithms for generation/assembly or element matrices, solution of systems of linear and nonlinear equations, eigenvalue and design sensitivity analysis and optimization. All algorithms are coded in FORTRAN for shared memory computers and many are adapted to distributed memory computers. The capability and numerical performance of these algorithms will be addressed.
Effects of a multimedia project on users' knowledge about normal forgetting and serious memory loss.

PubMed

Mahoney, Diane Feeney; Tarlow, Barbara J; Jones, Richard N; Sandaire, Johnny

2002-01-01

The aim of the project was to develop and evaluate the effectiveness of a CD-ROM-based multimedia program as a tool to increase user's knowledge about the differences between "normal" forgetfulness and more serious memory loss associated with Alzheimer's disease. The research was a controlled randomized study conducted with 113 adults who were recruited from the community and who expressed a concern about memory loss in a family member. The intervention group (n=56) viewed a module entitled "Forgetfulness: What's Normal and What's Not" on a laptop computer in their homes; the control group (n=57) did not. Both groups completed a 25-item knowledge-about-memory-loss test (primary outcome) and a sociodemographic and technology usage questionnaire; the intervention group also completed a CD-ROM user's evaluation. The mean (SD) number of correct responses to the knowledge test was 14.2 (4.5) for controls and 19.7 (3.1) for intervention participants. This highly significant difference (p<0.001) corresponds to a very large effect size. The program was most effective for participants with a lower level of self-reported prior knowledge about memory loss and Alzheimer's disease (p=0.02). Viewers were very satisfied with the program and felt that it was easy to use and understand. They particularly valued having personal access to a confidential source that permitted them to become informed about memory loss without public disclosure. This multimedia CD-ROM technology program provides an efficient and effective means of teaching older adults about memory loss and ways to distinguish benign from serious memory loss. It uniquely balances public community outreach education and personal privacy.
Fabrication of InGaZnO Nonvolatile Memory Devices at Low Temperature of 150 degrees C for Applications in Flexible Memory Displays and Transparency Coating on Plastic Substrates.

PubMed

Hanh, Nguyen Hong; Jang, Kyungsoo; Yi, Junsin

2016-05-01

We directly deposited amorphous InGaZnO (a-IGZO) nonvolatile memory (NVM) devices with oxynitride-oxide-dioxide (OOO) stack structures on plastic substrate by a DC pulsed magnetron sputtering and inductively coupled plasma chemical vapor deposition (ICPCVD) system, using a low-temperature of 150 degrees C. The fabricated bottom gate a-IGZO NVM devices have a wide memory window with a low operating voltage during programming and erasing, due to an effective control of the gate dielectrics. In addition, after ten years, the memory device retains a memory window of over 73%, with a programming duration of only 1 ms. Moreover, the a-IGZO films show high optical transmittance of over 85%, and good uniformity with a root mean square (RMS) roughness of 0.26 nm. This film is a promising candidate to achieve flexible displays and transparency on plastic substrates because of the possibility of low-temperature deposition, and the high transparent properties of a-IGZO films. These results demonstrate that the a-IGZO NVM devices obtained at low-temperature have a suitable programming and erasing efficiency for data storage under low-voltage conditions, in combination with excellent charge retention characteristics, and thus show great potential application in flexible memory displays.
Efficient entanglement distillation without quantum memory.

PubMed

Abdelkhalek, Daniela; Syllwasschy, Mareike; Cerf, Nicolas J; Fiurášek, Jaromír; Schnabel, Roman

2016-05-31

Entanglement distribution between distant parties is an essential component to most quantum communication protocols. Unfortunately, decoherence effects such as phase noise in optical fibres are known to demolish entanglement. Iterative (multistep) entanglement distillation protocols have long been proposed to overcome decoherence, but their probabilistic nature makes them inefficient since the success probability decays exponentially with the number of steps. Quantum memories have been contemplated to make entanglement distillation practical, but suitable quantum memories are not realised to date. Here, we present the theory for an efficient iterative entanglement distillation protocol without quantum memories and provide a proof-of-principle experimental demonstration. The scheme is applied to phase-diffused two-mode-squeezed states and proven to distil entanglement for up to three iteration steps. The data are indistinguishable from those that an efficient scheme using quantum memories would produce. Since our protocol includes the final measurement it is particularly promising for enhancing continuous-variable quantum key distribution.
Efficient entanglement distillation without quantum memory

PubMed Central

Abdelkhalek, Daniela; Syllwasschy, Mareike; Cerf, Nicolas J.; Fiurášek, Jaromír; Schnabel, Roman

2016-01-01

Entanglement distribution between distant parties is an essential component to most quantum communication protocols. Unfortunately, decoherence effects such as phase noise in optical fibres are known to demolish entanglement. Iterative (multistep) entanglement distillation protocols have long been proposed to overcome decoherence, but their probabilistic nature makes them inefficient since the success probability decays exponentially with the number of steps. Quantum memories have been contemplated to make entanglement distillation practical, but suitable quantum memories are not realised to date. Here, we present the theory for an efficient iterative entanglement distillation protocol without quantum memories and provide a proof-of-principle experimental demonstration. The scheme is applied to phase-diffused two-mode-squeezed states and proven to distil entanglement for up to three iteration steps. The data are indistinguishable from those that an efficient scheme using quantum memories would produce. Since our protocol includes the final measurement it is particularly promising for enhancing continuous-variable quantum key distribution. PMID:27241946
Retrieval of high-fidelity memory arises from distributed cortical networks.

PubMed

Wais, Peter E; Jahanikia, Sahar; Steiner, Daniel; Stark, Craig E L; Gazzaley, Adam

2017-04-01

Medial temporal lobe (MTL) function is well established as necessary for memory of facts and events. It is likely that lateral cortical regions critically guide cognitive control processes to tune in high-fidelity details that are most relevant for memory retrieval. Here, convergent results from functional and structural MRI show that retrieval of detailed episodic memory arises from lateral cortical-MTL networks, including regions of inferior frontal and angular gyrii. Results also suggest that recognition of items based on low-fidelity, generalized information, rather than memory arising from retrieval of relevant episodic details, is not associated with functional connectivity between MTL and lateral cortical regions. Additionally, individual differences in microstructural properties in white matter pathways, associated with distributed MTL-cortical networks, are positively correlated with better performance on a mnemonic discrimination task. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Development and Verification of Sputtered Thin-Film Nickel-Titanium (NiTi) Shape Memory Alloy (SMA)

DTIC Science & Technology

2015-08-01

Shape Memory Alloy (SMA) by Cory R Knick and Christopher J Morris Approved for public release; distribution unlimited...Laboratory Development and Verification of Sputtered Thin-Film Nickel-Titanium (NiTi) Shape Memory Alloy (SMA) by Cory R Knick and Christopher
JuxtaView - A tool for interactive visualization of large imagery on scalable tiled displays

USGS Publications Warehouse

Krishnaprasad, N.K.; Vishwanath, V.; Venkataraman, S.; Rao, A.G.; Renambot, L.; Leigh, J.; Johnson, A.E.; Davis, B.

2004-01-01

JuxtaView is a cluster-based application for viewing ultra-high-resolution images on scalable tiled displays. We present in JuxtaView, a new parallel computing and distributed memory approach for out-of-core montage visualization, using LambdaRAM, a software-based network-level cache system. The ultimate goal of JuxtaView is to enable a user to interactively roam through potentially terabytes of distributed, spatially referenced image data such as those from electron microscopes, satellites and aerial photographs. In working towards this goal, we describe our first prototype implemented over a local area network, where the image is distributed using LambdaRAM, on the memory of all nodes of a PC cluster driving a tiled display wall. Aggressive pre-fetching schemes employed by LambdaRAM help to reduce latency involved in remote memory access. We compare LambdaRAM with a more traditional memory-mapped file approach for out-of-core visualization. ?? 2004 IEEE.
Human short-term spatial memory: precision predicts capacity.

PubMed

Banta Lavenex, Pamela; Boujon, Valérie; Ndarugendamwo, Angélique; Lavenex, Pierre

2015-03-01

Here, we aimed to determine the capacity of human short-term memory for allocentric spatial information in a real-world setting. Young adults were tested on their ability to learn, on a trial-unique basis, and remember over a 1-min interval the location(s) of 1, 3, 5, or 7 illuminating pads, among 23 pads distributed in a 4m×4m arena surrounded by curtains on three sides. Participants had to walk to and touch the pads with their foot to illuminate the goal locations. In contrast to the predictions from classical slot models of working memory capacity limited to a fixed number of items, i.e., Miller's magical number 7 or Cowan's magical number 4, we found that the number of visited locations to find the goals was consistently about 1.6 times the number of goals, whereas the number of correct choices before erring and the number of errorless trials varied with memory load even when memory load was below the hypothetical memory capacity. In contrast to resource models of visual working memory, we found no evidence that memory resources were evenly distributed among unlimited numbers of items to be remembered. Instead, we found that memory for even one individual location was imprecise, and that memory performance for one location could be used to predict memory performance for multiple locations. Our findings are consistent with a theoretical model suggesting that the precision of the memory for individual locations might determine the capacity of human short-term memory for spatial information. Copyright © 2015 Elsevier Inc. All rights reserved.
How autobiographical memories can support episodic recall: transfer and maintenance effect of memory training with old-old low-autonomy adults.

PubMed

Carretti, Barbara; Facchini, Giulia; Nicolini, Chiara

2011-02-01

A large body of research has demonstrated that, although specific memory activities can enhance the memory performance of healthy older adults, the extent of the increment is negatively associated with age. Conversely, few studies have examined the case of healthy elderly people not living alone. This study has two mains goals: to understand whether older adults with limited autonomy can benefit from activities devoted to increasing their episodic memory performance, and to test the efficacy of a memory training program based on autobiographical memories, in terms of transfer and maintenance effect. We postulated that being able to rely on stable autobiographical memories (intrinsically associated with emotions) would be a valuable memory aid. Memory training was given to healthy older adults (aged 75-85) living in a retirement home. Two programs were compared: in the first, participants were primed to recall autobiographical memories around certain themes, and then to complete a set of episodic memory tasks (experimental group); in the second, participants were only given the episodic tasks (control group). Both groups improved their performance from pre- to post-test. However, the experimental group reported a greater feeling of well-being after the training, and maintained the training gains relating to episodic performance after three months. Our findings suggest that specific memory activities are beneficial to elderly people living in a retirement home context. In addition, training based on reactivation of autobiographical memories is shown to produce a long-lasting effect on memory performance.
Parallel and distributed computation for fault-tolerant object recognition

NASA Technical Reports Server (NTRS)

Wechsler, Harry

1988-01-01

The distributed associative memory (DAM) model is suggested for distributed and fault-tolerant computation as it relates to object recognition tasks. The fault-tolerance is with respect to geometrical distortions (scale and rotation), noisy inputs, occulsion/overlap, and memory faults. An experimental system was developed for fault-tolerant structure recognition which shows the feasibility of such an approach. The approach is futher extended to the problem of multisensory data integration and applied successfully to the recognition of colored polyhedral objects.
Kanerva's sparse distributed memory with multiple hamming thresholds

NASA Technical Reports Server (NTRS)

Pohja, Seppo; Kaski, Kimmo

1992-01-01

If the stored input patterns of Kanerva's Sparse Distributed Memory (SDM) are highly correlated, utilization of the storage capacity is very low compared to the case of uniformly distributed random input patterns. We consider a variation of SDM that has a better storage capacity utilization for correlated input patterns. This approach uses a separate selection threshold for each physical storage address or hard location. The selection of the hard locations for reading or writing can be done in parallel of which SDM implementations can benefit.
Design and Implementation of an MC68020-Based Educational Computer Board

DTIC Science & Technology

1989-12-01

device and the other for a Macintosh personal computer. A stored program can be installed in 8K bytes Programmable Read Only Memory (PROM) to initialize...MHz. It includes four * Static Random Access Memory (SRAM) chips which provide a storage of 32K bytes. Two Programmable Array Logic (PAL) chips...device and the other for a Macintosh personal computer. A stored program can be installed in 8K bytes Programmable Read Only Memory (PROM) to
Effect with high density nano dot type storage layer structure on 20 nm planar NAND flash memory characteristics

NASA Astrophysics Data System (ADS)

Sasaki, Takeshi; Muraguchi, Masakazu; Seo, Moon-Sik; Park, Sung-kye; Endoh, Tetsuo

2014-01-01

The merits, concerns and design principle for the future nano dot (ND) type NAND flash memory cell are clarified, by considering the effect of storage layer structure on NAND flash memory characteristics. The characteristics of the ND cell for a NAND flash memory in comparison with the floating gate type (FG) is comprehensively studied through the read, erase, program operation, and the cell to cell interference with device simulation. Although the degradation of the read throughput (0.7% reduction of the cell current) and slower program time (26% smaller programmed threshold voltage shift) with high density (10 × 1012 cm-2) ND NAND are still concerned, the suppress of the cell to cell interference with high density (10 × 1012 cm-2) plays the most important part for scaling and multi-level cell (MLC) operation in comparison with the FG NAND. From these results, the design knowledge is shown to require the control of the number of nano dots rather than the higher nano dot density, from the viewpoint of increasing its memory capacity by MLC operation and suppressing threshold voltage variability caused by the number of dots in the storage layer. Moreover, in order to increase its memory capacity, it is shown the tunnel oxide thickness with ND should be designed thicker (>3 nm) than conventional designed ND cell for programming/erasing with direct tunneling mechanism.
Asymmetric programming: a highly reliable metadata allocation strategy for MLC NAND flash memory-based sensor systems.

PubMed

Huang, Min; Liu, Zhaoqing; Qiao, Liyan

2014-10-10

While the NAND flash memory is widely used as the storage medium in modern sensor systems, the aggressive shrinking of process geometry and an increase in the number of bits stored in each memory cell will inevitably degrade the reliability of NAND flash memory. In particular, it's critical to enhance metadata reliability, which occupies only a small portion of the storage space, but maintains the critical information of the file system and the address translations of the storage system. Metadata damage will cause the system to crash or a large amount of data to be lost. This paper presents Asymmetric Programming, a highly reliable metadata allocation strategy for MLC NAND flash memory storage systems. Our technique exploits for the first time the property of the multi-page architecture of MLC NAND flash memory to improve the reliability of metadata. The basic idea is to keep metadata in most significant bit (MSB) pages which are more reliable than least significant bit (LSB) pages. Thus, we can achieve relatively low bit error rates for metadata. Based on this idea, we propose two strategies to optimize address mapping and garbage collection. We have implemented Asymmetric Programming on a real hardware platform. The experimental results show that Asymmetric Programming can achieve a reduction in the number of page errors of up to 99.05% with the baseline error correction scheme.
Asymmetric Programming: A Highly Reliable Metadata Allocation Strategy for MLC NAND Flash Memory-Based Sensor Systems

PubMed Central

Huang, Min; Liu, Zhaoqing; Qiao, Liyan

2014-01-01

While the NAND flash memory is widely used as the storage medium in modern sensor systems, the aggressive shrinking of process geometry and an increase in the number of bits stored in each memory cell will inevitably degrade the reliability of NAND flash memory. In particular, it's critical to enhance metadata reliability, which occupies only a small portion of the storage space, but maintains the critical information of the file system and the address translations of the storage system. Metadata damage will cause the system to crash or a large amount of data to be lost. This paper presents Asymmetric Programming, a highly reliable metadata allocation strategy for MLC NAND flash memory storage systems. Our technique exploits for the first time the property of the multi-page architecture of MLC NAND flash memory to improve the reliability of metadata. The basic idea is to keep metadata in most significant bit (MSB) pages which are more reliable than least significant bit (LSB) pages. Thus, we can achieve relatively low bit error rates for metadata. Based on this idea, we propose two strategies to optimize address mapping and garbage collection. We have implemented Asymmetric Programming on a real hardware platform. The experimental results show that Asymmetric Programming can achieve a reduction in the number of page errors of up to 99.05% with the baseline error correction scheme. PMID:25310473
Automatic Generation of OpenMP Directives and Its Application to Computational Fluid Dynamics Codes

NASA Technical Reports Server (NTRS)

Yan, Jerry; Jin, Haoqiang; Frumkin, Michael; Yan, Jerry (Technical Monitor)

2000-01-01

The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate OpenMP-based parallel programs with nominal user assistance. We outline techniques used in the implementation of the tool and discuss the application of this tool on the NAS Parallel Benchmarks and several computational fluid dynamics codes. This work demonstrates the great potential of using the tool to quickly port parallel programs and also achieve good performance that exceeds some of the commercial tools.
W-Sb-Te phase-change material: A candidate for the trade-off between programming speed and data retention

NASA Astrophysics Data System (ADS)

Peng, Cheng; Wu, Liangcai; Rao, Feng; Song, Zhitang; Yang, Pingxiong; Song, Hongjia; Ren, Kun; Zhou, Xilin; Zhu, Min; Liu, Bo; Chu, Junhao

2012-09-01

W-Sb-Te phase-change material has been proposed to improve the performance of phase-change memory (PCM). Crystallization temperature, crystalline resistance, and 10-year data retention of Sb2Te increase markedly by W doping. The Wx(Sb2Te)1-x films crystallize quickly into a stable hexagonal phase with W uniformly distributing in the crystal lattice, which ensures faster SET speed and better operation stability for the application in practical device. PCM device based on W0.07(Sb2Te)0.93 shows ultrafast SET operation (6 ns) and good endurance (1.8 × 105 cycles). W-Sb-Te material is a promising candidate for the trade-off between programming speed and data retention.
Implementing High-Performance Geometric Multigrid Solver with Naturally Grained Messages

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shan, Hongzhang; Williams, Samuel; Zheng, Yili

2015-10-26

Structured-grid linear solvers often require manually packing and unpacking of communication data to achieve high performance.Orchestrating this process efficiently is challenging, labor-intensive, and potentially error-prone.In this paper, we explore an alternative approach that communicates the data with naturally grained messagesizes without manual packing and unpacking. This approach is the distributed analogue of shared-memory programming, taking advantage of the global addressspace in PGAS languages to provide substantial programming ease. However, its performance may suffer from the large number of small messages. We investigate theruntime support required in the UPC ++ library for this naturally grained version to close the performance gapmore » between the two approaches and attain comparable performance at scale using the High-Performance Geometric Multgrid (HPGMG-FV) benchmark as a driver.« less
Distributed memory parallel Markov random fields using graph partitioning

DOE Office of Scientific and Technical Information (OSTI.GOV)

Heinemann, C.; Perciano, T.; Ushizima, D.

Markov random fields (MRF) based algorithms have attracted a large amount of interest in image analysis due to their ability to exploit contextual information about data. Image data generated by experimental facilities, though, continues to grow larger and more complex, making it more difficult to analyze in a reasonable amount of time. Applying image processing algorithms to large datasets requires alternative approaches to circumvent performance problems. Aiming to provide scientists with a new tool to recover valuable information from such datasets, we developed a general purpose distributed memory parallel MRF-based image analysis framework (MPI-PMRF). MPI-PMRF overcomes performance and memory limitationsmore » by distributing data and computations across processors. The proposed approach was successfully tested with synthetic and experimental datasets. Additionally, the performance of the MPI-PMRF framework is analyzed through a detailed scalability study. We show that a performance increase is obtained while maintaining an accuracy of the segmentation results higher than 98%. The contributions of this paper are: (a) development of a distributed memory MRF framework; (b) measurement of the performance increase of the proposed approach; (c) verification of segmentation accuracy in both synthetic and experimental, real-world datasets« less

Is random access memory random?

NASA Technical Reports Server (NTRS)

Denning, P. J.

1986-01-01

Most software is contructed on the assumption that the programs and data are stored in random access memory (RAM). Physical limitations on the relative speeds of processor and memory elements lead to a variety of memory organizations that match processor addressing rate with memory service rate. These include interleaved and cached memory. A very high fraction of a processor's address requests can be satified from the cache without reference to the main memory. The cache requests information from main memory in blocks that can be transferred at the full memory speed. Programmers who organize algorithms for locality can realize the highest performance from these computers.
Neural bases of orthographic long-term memory and working memory in dysgraphia

PubMed Central

Purcell, Jeremy; Hillis, Argye E.; Capasso, Rita; Miceli, Gabriele

2016-01-01

Spelling a word involves the retrieval of information about the word’s letters and their order from long-term memory as well as the maintenance and processing of this information by working memory in preparation for serial production by the motor system. While it is known that brain lesions may selectively affect orthographic long-term memory and working memory processes, relatively little is known about the neurotopographic distribution of the substrates that support these cognitive processes, or the lesions that give rise to the distinct forms of dysgraphia that affect these cognitive processes. To examine these issues, this study uses a voxel-based mapping approach to analyse the lesion distribution of 27 individuals with dysgraphia subsequent to stroke, who were identified on the basis of their behavioural profiles alone, as suffering from deficits only affecting either orthographic long-term or working memory, as well as six other individuals with deficits affecting both sets of processes. The findings provide, for the first time, clear evidence of substrates that selectively support orthographic long-term and working memory processes, with orthographic long-term memory deficits centred in either the left posterior inferior frontal region or left ventral temporal cortex, and orthographic working memory deficits primarily arising from lesions of the left parietal cortex centred on the intraparietal sulcus. These findings also contribute to our understanding of the relationship between the neural instantiation of written language processes and spoken language, working memory and other cognitive skills. PMID:26685156
An Element-Based Concurrent Partitioner for Unstructured Finite Element Meshes

NASA Technical Reports Server (NTRS)

Ding, Hong Q.; Ferraro, Robert D.

1996-01-01

A concurrent partitioner for partitioning unstructured finite element meshes on distributed memory architectures is developed. The partitioner uses an element-based partitioning strategy. Its main advantage over the more conventional node-based partitioning strategy is its modular programming approach to the development of parallel applications. The partitioner first partitions element centroids using a recursive inertial bisection algorithm. Elements and nodes then migrate according to the partitioned centroids, using a data request communication template for unpredictable incoming messages. Our scalable implementation is contrasted to a non-scalable implementation which is a straightforward parallelization of a sequential partitioner.
Accurate low-cost methods for performance evaluation of cache memory systems

NASA Technical Reports Server (NTRS)

Laha, Subhasis; Patel, Janak H.; Iyer, Ravishankar K.

1988-01-01

Methods of simulation based on statistical techniques are proposed to decrease the need for large trace measurements and for predicting true program behavior. Sampling techniques are applied while the address trace is collected from a workload. This drastically reduces the space and time needed to collect the trace. Simulation techniques are developed to use the sampled data not only to predict the mean miss rate of the cache, but also to provide an empirical estimate of its actual distribution. Finally, a concept of primed cache is introduced to simulate large caches by the sampling-based method.
Memory for Context becomes Less Specific with Time

ERIC Educational Resources Information Center

Wiltgen, Brian J.; Silva, Alcino J.

2007-01-01

Context memories initially require the hippocampus, but over time become independent of this structure. This shift reflects a consolidation process whereby memories are gradually stored in distributed regions of the cortex. The function of this process is thought to be the extraction of statistical regularities and general knowledge from specific…
A Memory-Based Theory of Verbal Cognition

ERIC Educational Resources Information Center

Dennis, Simon

2005-01-01

The syntagmatic paradigmatic model is a distributed, memory-based account of verbal processing. Built on a Bayesian interpretation of string edit theory, it characterizes the control of verbal cognition as the retrieval of sets of syntagmatic and paradigmatic constraints from sequential and relational long-term memory and the resolution of these…
Autobiographical Memory from a Life Span Perspective

ERIC Educational Resources Information Center

Schroots, Johannes J. F.; van Dijkum, Cor; Assink, Marian H. J.

2004-01-01

This comparative study (i.e., three age groups, three measures) explores the distribution of retrospective and prospective autobiographical memory data across the lifespan, in particular the bump pattern of disproportionally higher recall of memories from the ages 10 to 30, as generally observed in older age groups, in conjunction with the…
A fast and low-power microelectromechanical system-based non-volatile memory device

PubMed Central

Lee, Sang Wook; Park, Seung Joo; Campbell, Eleanor E. B.; Park, Yung Woo

2011-01-01

Several new generation memory devices have been developed to overcome the low performance of conventional silicon-based flash memory. In this study, we demonstrate a novel non-volatile memory design based on the electromechanical motion of a cantilever to provide fast charging and discharging of a floating-gate electrode. The operation is demonstrated by using an electromechanical metal cantilever to charge a floating gate that controls the charge transport through a carbon nanotube field-effect transistor. The set and reset currents are unchanged after more than 11 h constant operation. Over 500 repeated programming and erasing cycles were demonstrated under atmospheric conditions at room temperature without degradation. Multinary bit programming can be achieved by varying the voltage on the cantilever. The operation speed of the device is faster than a conventional flash memory and the power consumption is lower than other memory devices. PMID:21364559
Hard Real-Time: C++ Versus RTSJ

NASA Technical Reports Server (NTRS)

Dvorak, Daniel L.; Reinholtz, William K.

2004-01-01

In the domain of hard real-time systems, which language is better: C++ or the Real-Time Specification for Java (RTSJ)? Although ordinary Java provides a more productive programming environment than C++ due to its automatic memory management, that benefit does not apply to RTSJ when using NoHeapRealtimeThread and non-heap memory areas. As a result, RTSJ programmers must manage non-heap memory explicitly. While that's not a deterrent for veteran real-time programmers-where explicit memory management is common-the lack of certain language features in RTSJ (and Java) makes that manual memory management harder to accomplish safely than in C++. This paper illustrates the problem for practitioners in the context of moving data and managing memory in a real-time producer/consumer pattern. The relative ease of implementation and safety of the C++ programming model suggests that RTSJ has a struggle ahead in the domain of hard real-time applications, despite its other attractive features.
A hot hole-programmed and low-temperature-formed SONOS flash memory

PubMed Central

2013-01-01

In this study, a high-performance TixZrySizO flash memory is demonstrated using a sol–gel spin-coating method and formed under a low annealing temperature. The high-efficiency charge storage layer is formed by depositing a well-mixed solution of titanium tetrachloride, silicon tetrachloride, and zirconium tetrachloride, followed by 60 s of annealing at 600°C. The flash memory exhibits a noteworthy hot hole trapping characteristic and excellent electrical properties regarding memory window, program/erase speeds, and charge retention. At only 6-V operation, the program/erase speeds can be as fast as 120:5.2 μs with a 2-V shift, and the memory window can be up to 8 V. The retention times are extrapolated to 106 s with only 5% (at 85°C) and 10% (at 125°C) charge loss. The barrier height of the TixZrySizO film is demonstrated to be 1.15 eV for hole trapping, through the extraction of the Poole-Frenkel current. The excellent performance of the memory is attributed to high trapping sites of the low-temperature-annealed, high-κ sol–gel film. PMID:23899050
MOM3D/EM-ANIMATE - MOM3D WITH ANIMATION CODE

NASA Technical Reports Server (NTRS)

Shaeffer, J. F.

1994-01-01

MOM3D (LAR-15074) is a FORTRAN method-of-moments electromagnetic analysis algorithm for open or closed 3-D perfectly conducting or resistive surfaces. Radar cross section with plane wave illumination is the prime analysis emphasis; however, provision is also included for local port excitation for computing antenna gain patterns and input impedances. The Electric Field Integral Equation form of Maxwell's equations is solved using local triangle couple basis and testing functions with a resultant system impedance matrix. The analysis emphasis is not only for routine RCS pattern predictions, but also for phenomenological diagnostics: bistatic imaging, currents, and near scattered/total electric fields. The images, currents, and near fields are output in form suitable for animation. MOM3D computes the full backscatter and bistatic radar cross section polarization scattering matrix (amplitude and phase), body currents and near scattered and total fields for plane wave illumination. MOM3D also incorporates a new bistatic k space imaging algorithm for computing down range and down/cross range diagnostic images using only one matrix inversion. MOM3D has been made memory and cpu time efficient by using symmetric matrices, symmetric geometry, and partitioned fixed and variable geometries suitable for design iteration studies. MOM3D may be run interactively or in batch mode on 486 IBM PCs and compatibles, UNIX workstations or larger computers. A 486 PC with 16 megabytes of memory has the potential to solve a 30 square wavelength (containing 3000 unknowns) symmetric configuration. Geometries are described using a triangular mesh input in the form of a list of spatial vertex points and a triangle join connection list. The EM-ANIMATE (LAR-15075) program is a specialized visualization program that displays and animates the near-field and surface-current solutions obtained from an electromagnetics program, in particular, that from MOM3D. The EM-ANIMATE program is windows based and contains a user-friendly, graphical interface for setting viewing options, case selection, file manipulation, etc. EM-ANIMATE displays the field and surface-current magnitude as smooth shaded color fields (color contours) ranging from a minimum contour value to a maximum contour value for the fields and surface currents. The program can display either the total electric field or the scattered electric field in either time-harmonic animation mode or in the root mean square (RMS) average mode. The default setting is initially set to the minimum and maximum values within the field and surface current data and can be optionally set by the user. The field and surface-current value are animated by calculating and viewing the solution at user selectable radian time increments between 0 and 2pi. The surface currents can also be displayed in either time-harmonic animation mode or in RMS average mode. In RMS mode, the color contours do not vary with time, but show the constant time averaged field and surface-current magnitude solution. The electric field and surface-current directions can be displayed as scaled vector arrows which have a length proportional to the magnitude at each field grid point or surface node point. These vector properties can be viewed separately or concurrently with the field or surface-current magnitudes. Animation speed is improved by turning off the display of the vector arrows. In RMS modes, the direction vectors are still displayed as varying with time since the time averaged direction vectors would be zero length vectors. Other surface properties can optionally be viewed. These include the surface grid, the resistance value assigned to each element of the grid, and the power dissipation of each element which has an assigned resistance value. The EM-ANIMATE program will accept up to 10 different surface current cases each consisting of up to 20,000 node points and 10,000 triangle definitions and will animate one of these cases. The capability is used to compare surface-current distribution due to various initial excitation directions or electric field orientations. The program can accept up to 50 planes of field data consisting of a grid of 100 by 100 field points. These planes of data are user selectable and can be viewed individually or concurrently. With these preset limits, the program requires 55 megabytes of core memory to run. These limits can be changed in the header files to accommodate the available core memory of an individual workstation. An estimate of memory required can be made as follows: approximate memory in bytes equals (number of nodes times number of surfaces times 14 variables times bytes per word, typically 4 bytes per floating point) plus (number of field planes times number of nodes per plane times 21 variables times bytes per word). This gives the approximate memory size required to store the field and surface-current data. The total memory size is approximately 400,000 bytes plus the data memory size. The animation calculations are performed in real time at any user set time step. For Silicon Graphics Workstations that have multiple processors, this program has been optimized to perform these calculations on multiple processors to increase animation rates. The optimized program uses the SGI PFA (Power FORTRAN Accelerator) library. On single processor machines, the parallelization directives are seen as comments to the program and will have no effect on compilation or execution. MOM3D and EM-ANIMATE are written in FORTRAN 77 for interactive or batch execution on SGI series computers running IRIX 3.0 or later. The RAM requirements for these programs vary with the size of the problem being solved. A minimum of 30Mb of RAM is required for execution of EM-ANIMATE; however, the code may be modified to accommodate the available memory of an individual workstation. For EM-ANIMATE, twenty-four bit, double-buffered color capability is suggested, but not required. Sample executables and sample input and output files are provided. Electronic documentation is provided for both EM-ANIMATE and MOM3D in PostScript format. Documentation for EM-ANIMATE is also provided in the form of IRIX man pages. The standard distribution medium for COS-10048 is a .25 inch streaming magnetic IRIX tape cartridge in UNIX tar format. MOM3D and EM-ANIMATE are also available separately as LAR-15074 and LAR-15075, respectively. MOM3D was developed in 1992. EM-ANIMATE was developed in 1993.
Hierarchical resilience with lightweight threads.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wheeler, Kyle Bruce

2011-10-01

This paper proposes methodology for providing robustness and resilience for a highly threaded distributed- and shared-memory environment based on well-defined inputs and outputs to lightweight tasks. These inputs and outputs form a failure 'barrier', allowing tasks to be restarted or duplicated as necessary. These barriers must be expanded based on task behavior, such as communication between tasks, but do not prohibit any given behavior. One of the trends in high-performance computing codes seems to be a trend toward self-contained functions that mimic functional programming. Software designers are trending toward a model of software design where their core functions are specifiedmore » in side-effect free or low-side-effect ways, wherein the inputs and outputs of the functions are well-defined. This provides the ability to copy the inputs to wherever they need to be - whether that's the other side of the PCI bus or the other side of the network - do work on that input using local memory, and then copy the outputs back (as needed). This design pattern is popular among new distributed threading environment designs. Such designs include the Barcelona STARS system, distributed OpenMP systems, the Habanero-C and Habanero-Java systems from Vivek Sarkar at Rice University, the HPX/ParalleX model from LSU, as well as our own Scalable Parallel Runtime effort (SPR) and the Trilinos stateless kernels. This design pattern is also shared by CUDA and several OpenMP extensions for GPU-type accelerators (e.g. the PGI OpenMP extensions).« less
CLUSTOM-CLOUD: In-Memory Data Grid-Based Software for Clustering 16S rRNA Sequence Data in the Cloud Environment.

PubMed

Oh, Jeongsu; Choi, Chi-Hwan; Park, Min-Kyu; Kim, Byung Kwon; Hwang, Kyuin; Lee, Sang-Heon; Hong, Soon Gyu; Nasir, Arshan; Cho, Wan-Sup; Kim, Kyung Mo

2016-01-01

High-throughput sequencing can produce hundreds of thousands of 16S rRNA sequence reads corresponding to different organisms present in the environmental samples. Typically, analysis of microbial diversity in bioinformatics starts from pre-processing followed by clustering 16S rRNA reads into relatively fewer operational taxonomic units (OTUs). The OTUs are reliable indicators of microbial diversity and greatly accelerate the downstream analysis time. However, existing hierarchical clustering algorithms that are generally more accurate than greedy heuristic algorithms struggle with large sequence datasets. To keep pace with the rapid rise in sequencing data, we present CLUSTOM-CLOUD, which is the first distributed sequence clustering program based on In-Memory Data Grid (IMDG) technology-a distributed data structure to store all data in the main memory of multiple computing nodes. The IMDG technology helps CLUSTOM-CLOUD to enhance both its capability of handling larger datasets and its computational scalability better than its ancestor, CLUSTOM, while maintaining high accuracy. Clustering speed of CLUSTOM-CLOUD was evaluated on published 16S rRNA human microbiome sequence datasets using the small laboratory cluster (10 nodes) and under the Amazon EC2 cloud-computing environments. Under the laboratory environment, it required only ~3 hours to process dataset of size 200 K reads regardless of the complexity of the human microbiome data. In turn, one million reads were processed in approximately 20, 14, and 11 hours when utilizing 20, 30, and 40 nodes on the Amazon EC2 cloud-computing environment. The running time evaluation indicates that CLUSTOM-CLOUD can handle much larger sequence datasets than CLUSTOM and is also a scalable distributed processing system. The comparative accuracy test using 16S rRNA pyrosequences of a mock community shows that CLUSTOM-CLOUD achieves higher accuracy than DOTUR, mothur, ESPRIT-Tree, UCLUST and Swarm. CLUSTOM-CLOUD is written in JAVA and is freely available at http://clustomcloud.kopri.re.kr.
CLUSTOM-CLOUD: In-Memory Data Grid-Based Software for Clustering 16S rRNA Sequence Data in the Cloud Environment

PubMed Central

Park, Min-Kyu; Kim, Byung Kwon; Hwang, Kyuin; Lee, Sang-Heon; Hong, Soon Gyu; Nasir, Arshan; Cho, Wan-Sup; Kim, Kyung Mo

2016-01-01

High-throughput sequencing can produce hundreds of thousands of 16S rRNA sequence reads corresponding to different organisms present in the environmental samples. Typically, analysis of microbial diversity in bioinformatics starts from pre-processing followed by clustering 16S rRNA reads into relatively fewer operational taxonomic units (OTUs). The OTUs are reliable indicators of microbial diversity and greatly accelerate the downstream analysis time. However, existing hierarchical clustering algorithms that are generally more accurate than greedy heuristic algorithms struggle with large sequence datasets. To keep pace with the rapid rise in sequencing data, we present CLUSTOM-CLOUD, which is the first distributed sequence clustering program based on In-Memory Data Grid (IMDG) technology–a distributed data structure to store all data in the main memory of multiple computing nodes. The IMDG technology helps CLUSTOM-CLOUD to enhance both its capability of handling larger datasets and its computational scalability better than its ancestor, CLUSTOM, while maintaining high accuracy. Clustering speed of CLUSTOM-CLOUD was evaluated on published 16S rRNA human microbiome sequence datasets using the small laboratory cluster (10 nodes) and under the Amazon EC2 cloud-computing environments. Under the laboratory environment, it required only ~3 hours to process dataset of size 200 K reads regardless of the complexity of the human microbiome data. In turn, one million reads were processed in approximately 20, 14, and 11 hours when utilizing 20, 30, and 40 nodes on the Amazon EC2 cloud-computing environment. The running time evaluation indicates that CLUSTOM-CLOUD can handle much larger sequence datasets than CLUSTOM and is also a scalable distributed processing system. The comparative accuracy test using 16S rRNA pyrosequences of a mock community shows that CLUSTOM-CLOUD achieves higher accuracy than DOTUR, mothur, ESPRIT-Tree, UCLUST and Swarm. CLUSTOM-CLOUD is written in JAVA and is freely available at http://clustomcloud.kopri.re.kr. PMID:26954507
Changing concepts of working memory

PubMed Central

Ma, Wei Ji; Husain, Masud; Bays, Paul M

2014-01-01

Working memory is widely considered to be limited in capacity, holding a fixed, small number of items, such as Miller's ‘magical number’ seven or Cowan's four. It has recently been proposed that working memory might better be conceptualized as a limited resource that is distributed flexibly among all items to be maintained in memory. According to this view, the quality rather than the quantity of working memory representations determines performance. Here we consider behavioral and emerging neural evidence for this proposal. PMID:24569831
An Efficient Multiblock Method for Aerodynamic Analysis and Design on Distributed Memory Systems

NASA Technical Reports Server (NTRS)

Reuther, James; Alonso, Juan Jose; Vassberg, John C.; Jameson, Antony; Martinelli, Luigi

1997-01-01

The work presented in this paper describes the application of a multiblock gridding strategy to the solution of aerodynamic design optimization problems involving complex configurations. The design process is parallelized using the MPI (Message Passing Interface) Standard such that it can be efficiently run on a variety of distributed memory systems ranging from traditional parallel computers to networks of workstations. Substantial improvements to the parallel performance of the baseline method are presented, with particular attention to their impact on the scalability of the program as a function of the mesh size. Drag minimization calculations at a fixed coefficient of lift are presented for a business jet configuration that includes the wing, body, pylon, aft-mounted nacelle, and vertical and horizontal tails. An aerodynamic design optimization is performed with both the Euler and Reynolds Averaged Navier-Stokes (RANS) equations governing the flow solution and the results are compared. These sample calculations establish the feasibility of efficient aerodynamic optimization of complete aircraft configurations using the RANS equations as the flow model. There still exists, however, the need for detailed studies of the importance of a true viscous adjoint method which holds the promise of tackling the minimization of not only the wave and induced components of drag, but also the viscous drag.
Precision Pointing in Space Using Arrays of Shape Memory Based Linear Actuators

NASA Astrophysics Data System (ADS)

Sonawane, Nikhil

Space systems such as communication satellites, earth observation satellites and telescope require accurate pointing to observe fixed targets over prolonged time. These systems typically use reaction wheels to slew the spacecraft and gimballing systems containing motors to achieve precise pointing. Motor based actuators have limited life as they contain moving parts that require lubrication in space. Alternate methods have utilized piezoelectric actuators. This paper presents Shape memory alloys (SMA) actuators for control of a deployable antenna placed on a satellite. The SMAs are operated as a series of distributed linear actuators. These distributed linear actuators are not prone to single point failures and although each individual actuator is imprecise due to hysteresis and temperature variation, the system as a whole achieves reliable results. The SMAs can be programmed to perform a series of periodic motion and operate as a mechanical guidance system that is not prone to damage from radiation or space weather. Efforts are focused on developing a system that can achieve 1 degree pointing accuracy at first, with an ultimate goal of achieving a few arc seconds accuracy. Bench top model of the actuator system has been developed and working towards testing the system under vacuum. A demonstration flight of the technology is planned aboard a CubeSat.
Optical read/write memory system components

NASA Technical Reports Server (NTRS)

Kozma, A.

1972-01-01

The optical components of a breadboard holographic read/write memory system have been fabricated and the parameters specified of the major system components: (1) a laser system; (2) an x-y beam deflector; (3) a block data composer; (4) the read/write memory material; (5) an output detector array; and (6) the electronics to drive, synchronize, and control all system components. The objectives of the investigation were divided into three concurrent phases: (1) to supply and fabricate the major components according to the previously established specifications; (2) to prepare computer programs to simulate the entire holographic memory system so that a designer can balance the requirements on the various components; and (3) to conduct a development program to optimize the combined recording and reconstruction process of the high density holographic memory system.
OSCAR API for Real-Time Low-Power Multicores and Its Performance on Multicores and SMP Servers

NASA Astrophysics Data System (ADS)

Kimura, Keiji; Mase, Masayoshi; Mikami, Hiroki; Miyamoto, Takamichi; Shirako, Jun; Kasahara, Hironori

OSCAR (Optimally Scheduled Advanced Multiprocessor) API has been designed for real-time embedded low-power multicores to generate parallel programs for various multicores from different vendors by using the OSCAR parallelizing compiler. The OSCAR API has been developed by Waseda University in collaboration with Fujitsu Laboratory, Hitachi, NEC, Panasonic, Renesas Technology, and Toshiba in an METI/NEDO project entitled "Multicore Technology for Realtime Consumer Electronics." By using the OSCAR API as an interface between the OSCAR compiler and backend compilers, the OSCAR compiler enables hierarchical multigrain parallel processing with memory optimization under capacity restriction for cache memory, local memory, distributed shared memory, and on-chip/off-chip shared memory; data transfer using a DMA controller; and power reduction control using DVFS (Dynamic Voltage and Frequency Scaling), clock gating, and power gating for various embedded multicores. In addition, a parallelized program automatically generated by the OSCAR compiler with OSCAR API can be compiled by the ordinary OpenMP compilers since the OSCAR API is designed on a subset of the OpenMP. This paper describes the OSCAR API and its compatibility with the OSCAR compiler by showing code examples. Performance evaluations of the OSCAR compiler and the OSCAR API are carried out using an IBM Power5+ workstation, an IBM Power6 high-end SMP server, and a newly developed consumer electronics multicore chip RP2 by Renesas, Hitachi and Waseda. From the results of scalability evaluation, it is found that on an average, the OSCAR compiler with the OSCAR API can exploit 5.8 times speedup over the sequential execution on the Power5+ workstation with eight cores and 2.9 times speedup on RP2 with four cores, respectively. In addition, the OSCAR compiler can accelerate an IBM XL Fortran compiler up to 3.3 times on the Power6 SMP server. Due to low-power optimization on RP2, the OSCAR compiler with the OSCAR API achieves a maximum power reduction of 84% in the real-time execution mode.
Memory rehabilitation for the working memory of patients with multiple sclerosis (MS).

PubMed

Mousavi, Shokoufeh; Zare, Hossein; Etemadifar, Masoud; Taher Neshatdoost, Hamid

2018-05-01

The main cognitive impairments in multiple sclerosis (MS) affect the working memory, processing speed, and performances that are in close interaction with one another. Cognitive problems in MS are influenced to a lesser degree by disease recovery medications or treatments,but cognitive rehabilitation is considered one of the promising methods for cure. There is evidence regarding the effectiveness of cognitive rehabilitation for MS patients in various stages of the disease. Since the impairment in working memory is one of the main MS deficits, a particular training that affects this cognitive domain can be of a great value. This study aims to determine the effectiveness of memory rehabilitation on the working memory performance of MS patients. Sixty MS patients with cognitive impairment and similar in terms of demographic characteristics, duration of disease, neurological problems, and mental health were randomly assigned to three groups: namely, experimental, placebo, and control. Patients' cognitive evaluation incorporated baseline assessments immediately post-intervention and 5 weeks post-intervention. The experimental group received a cognitive rehabilitation program in one-hour sessions on a weekly basis for 8 weeks. The placebo group received relaxation techniques on a weekly basis; the control group received no intervention. The results of this study showed that the cognitive rehabilitation program had a positive effect on the working memory performance of patients with MS in the experimental group. These results were achieved in immediate evaluation (post-test) and follow-up 5 weeks after intervention. There was no significant difference in working memory performance between the placebo group and the control group. According to the study, there is evidence for the effectiveness of a memory rehabilitation program for the working memory of patients with MS. Cognitive rehabilitation can improve working memory disorders and have a positive effect on the working memory performance of these patients.

Cognitive remediation therapy (CRT) benefits more to patients with schizophrenia with low initial memory performances.

PubMed

Pillet, Benoit; Morvan, Yannick; Todd, Aurelia; Franck, Nicolas; Duboc, Chloé; Grosz, Aimé; Launay, Corinne; Demily, Caroline; Gaillard, Raphaël; Krebs, Marie-Odile; Amado, Isabelle

2015-01-01

Cognitive deficits in schizophrenia mainly affect memory, attention and executive functions. Cognitive remediation is a technique derived from neuropsychology, which aims to improve or compensate for these deficits. Working memory, verbal learning, and executive functions are crucial factors for functional outcome. Our purpose was to assess the impact of the cognitive remediation therapy (CRT) program on cognitive difficulties in patients with schizophrenia, especially on working memory, verbal memory, and cognitive flexibility. We collected data from clinical and neuropsychological assessments in 24 patients suffering from schizophrenia (Diagnostic and Statistical Manual of mental Disorders-Fourth Edition, DSM-IV) who followed a 3-month (CRT) program. Verbal and visuo-spatial working memory, verbal memory, and cognitive flexibility were assessed before and after CRT. The Wilcoxon test showed significant improvements on the backward digit span, on the visual working memory span, on verbal memory and on flexibility. Cognitive improvement was substantial when baseline performance was low, independently from clinical benefit. CRT is effective on crucial cognitive domains and provides a huge benefit for patients having low baseline performance. Such cognitive amelioration appears highly promising for improving the outcome in cognitively impaired patients.
GRADSPMHD: A parallel MHD code based on the SPH formalism

NASA Astrophysics Data System (ADS)

Vanaverbeke, S.; Keppens, R.; Poedts, S.

2014-03-01

We present GRADSPMHD, a completely Lagrangian parallel magnetohydrodynamics code based on the SPH formalism. The implementation of the equations of SPMHD in the “GRAD-h” formalism assembles known results, including the derivation of the discretized MHD equations from a variational principle, the inclusion of time-dependent artificial viscosity, resistivity and conductivity terms, as well as the inclusion of a mixed hyperbolic/parabolic correction scheme for satisfying the ∇ṡB→ constraint on the magnetic field. The code uses a tree-based formalism for neighbor finding and can optionally use the tree code for computing the self-gravity of the plasma. The structure of the code closely follows the framework of our parallel GRADSPH FORTRAN 90 code which we added previously to the CPC program library. We demonstrate the capabilities of GRADSPMHD by running 1, 2, and 3 dimensional standard benchmark tests and we find good agreement with previous work done by other researchers. The code is also applied to the problem of simulating the magnetorotational instability in 2.5D shearing box tests as well as in global simulations of magnetized accretion disks. We find good agreement with available results on this subject in the literature. Finally, we discuss the performance of the code on a parallel supercomputer with distributed memory architecture. Catalogue identifier: AERP_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AERP_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 620503 No. of bytes in distributed program, including test data, etc.: 19837671 Distribution format: tar.gz Programming language: FORTRAN 90/MPI. Computer: HPC cluster. Operating system: Unix. Has the code been vectorized or parallelized?: Yes, parallelized using MPI. RAM: ˜30 MB for a Sedov test including 15625 particles on a single CPU. Classification: 12. Nature of problem: Evolution of a plasma in the ideal MHD approximation. Solution method: The equations of magnetohydrodynamics are solved using the SPH method. Running time: The test provided takes approximately 20 min using 4 processors.
Computer viruses

NASA Technical Reports Server (NTRS)

Denning, Peter J.

1988-01-01

The worm, Trojan horse, bacterium, and virus are destructive programs that attack information stored in a computer's memory. Virus programs, which propagate by incorporating copies of themselves into other programs, are a growing menace in the late-1980s world of unprotected, networked workstations and personal computers. Limited immunity is offered by memory protection hardware, digitally authenticated object programs,and antibody programs that kill specific viruses. Additional immunity can be gained from the practice of digital hygiene, primarily the refusal to use software from untrusted sources. Full immunity requires attention in a social dimension, the accountability of programmers.
Human interferon and its inducers: clinical program overview at Roswell Park Memorial Institute.

PubMed

Carter, W A; Horoszewicz, J S

1978-11-01

An overview of the clinical interferon program at Roswell Park Memorial Institute is presented. Purified fibroblast interferon and a novel inducer of human interferon [rIn-r(C12,U)n] are being evaluated for possible antiviral, antiproliferative, and immunomodulatory activities in patients with cancer.
Data systems and computer science space data systems: Onboard memory and storage

NASA Technical Reports Server (NTRS)

Shull, Tom

1991-01-01

The topics are presented in viewgraph form and include the following: technical objectives; technology challenges; state-of-the-art assessment; mass storage comparison; SODR drive and system concepts; program description; vertical Bloch line (VBL) device concept; relationship to external programs; and backup charts for memory and storage.
A New Extension Model: The Memorial Middle School Agricultural Extension and Education Center

ERIC Educational Resources Information Center

Skelton, Peter; Seevers, Brenda

2010-01-01

The Memorial Middle School Agricultural Extension and Education Center is a new model for Extension. The center applies the Cooperative Extension Service System philosophy and mission to developing public education-based programs. Programming primarily serves middle school students and teachers through agricultural and natural resource science…
Memory as the "whole brain work": a large-scale model based on "oscillations in super-synergy".

PubMed

Başar, Erol

2005-01-01

According to recent trends, memory depends on several brain structures working in concert across many levels of neural organization; "memory is a constant work-in progress." The proposition of a brain theory based on super-synergy in neural populations is most pertinent for the understanding of this constant work in progress. This report introduces a new model on memory basing on the processes of EEG oscillations and Brain Dynamics. This model is shaped by the following conceptual and experimental steps: 1. The machineries of super-synergy in the whole brain are responsible for formation of sensory-cognitive percepts. 2. The expression "dynamic memory" is used for memory processes that evoke relevant changes in alpha, gamma, theta and delta activities. The concerted action of distributed multiple oscillatory processes provides a major key for understanding of distributed memory. It comprehends also the phyletic memory and reflexes. 3. The evolving memory, which incorporates reciprocal actions or reverberations in the APLR alliance and during working memory processes, is especially emphasized. 4. A new model related to "hierarchy of memories as a continuum" is introduced. 5. The notions of "longer activated memory" and "persistent memory" are proposed instead of long-term memory. 6. The new analysis to recognize faces emphasizes the importance of EEG oscillations in neurophysiology and Gestalt analysis. 7. The proposed basic framework called "Memory in the Whole Brain Work" emphasizes that memory and all brain functions are inseparable and are acting as a "whole" in the whole brain. 8. The role of genetic factors is fundamental in living system settings and oscillations and accordingly in memory, according to recent publications. 9. A link from the "whole brain" to "whole body," and incorporation of vegetative and neurological system, is proposed, EEG oscillations and ultraslow oscillations being a control parameter.
Programmable, reversible and repeatable wrinkling of shape memory polymer thin films on elastomeric substrates for smart adhesion.

PubMed

Wang, Yu; Xiao, Jianliang

2017-08-09

Programmable, reversible and repeatable wrinkling of shape memory polymer (SMP) thin films on elastomeric polydimethylsiloxane (PDMS) substrates is realized, by utilizing the heat responsive shape memory effect of SMPs. The dependencies of wrinkle wavelength and amplitude on program strain and SMP film thickness are shown to agree with the established nonlinear buckling theory. The wrinkling is reversible, as the wrinkled SMP thin film can be recovered to the flat state by heating up the bilayer system. The programming cycle between wrinkle and flat is repeatable, and different program strains can be used in different programming cycles to induce different surface morphologies. Enabled by the programmable, reversible and repeatable SMP film wrinkling on PDMS, smart, programmable surface adhesion with large tuning range is demonstrated.
Effectiveness of Working Memory Training among Subjects Currently on Sick Leave Due to Complex Symptoms.

PubMed

Aasvik, Julie K; Woodhouse, Astrid; Stiles, Tore C; Jacobsen, Henrik B; Landmark, Tormod; Glette, Mari; Borchgrevink, Petter C; Landrø, Nils I

2016-01-01

Introduction: The current study examined if adaptive working memory training (Cogmed QM) has the potential to improve inhibitory control, working memory capacity, and perceptions of memory functioning in a group of patients currently on sick leave due to symptoms of pain, insomnia, fatigue, depression and anxiety. Participants who were referred to a vocational rehabilitation center volunteered to take part in the study. Methods: Participants were randomly assigned to either a training condition ( N = 25) or a control condition ( N = 29). Participants in the training condition received working memory training in addition to the clinical intervention offered as part of the rehabilitation program, while participants in the control condition received treatment as usual i.e., the rehabilitation program only. Inhibitory control was measured by The Stop Signal Task, working memory was assessed by the Spatial Working Memory Test, while perceptions of memory functioning were assessed by The Everyday Memory Questionnaire-Revised. Results: Participants in the training group showed a significant improvement on the post-tests of inhibitory control when compared with the comparison group ( p = 0.025). The groups did not differ on the post-tests of working memory. Both groups reported less memory problems at post-testing, but there was no sizeable difference between the two groups. Conclusions: Results indicate that working memory training does not improve general working memory capacity per se . Nor does it seem to give any added effects in terms of targeting and improving self-perceived memory functioning. Results do, however, provide evidence to suggest that inhibitory control is accessible and susceptible to modification by adaptive working memory training.
Memory training interventions for older adults: A meta-analysis

PubMed Central

Gross, Alden L.; Parisi, Jeanine M.; Spira, Adam P.; Kueider, Alexandra M.; Ko, Jean Y.; Saczynski, Jane S.; Samus, Quincy M.; Rebok, George W.

2012-01-01

A systematic review and meta-analysis of memory training research was conducted to characterize the effect of memory strategies on memory performance among cognitively intact, community-dwelling older adults, and to identify characteristics of individuals and of programs associated with improved memory. The review identified 402 publications, of which 35 studies met criteria for inclusion. The overall effect size estimate, representing the mean standardized difference in pre-post change between memory-trained and control groups, was 0.31 standard deviations (SD; 95% confidence interval (CI): 0.22, 0.39). The pre-post training effect for memory-trained interventions was 0.43 SD (95% CI: 0.29, 0.57) and the practice effect for control groups was 0.06 SD (95% CI: -0.05, 0.16). Among 10 distinct memory strategies identified in studies, meta-analytic methods revealed that training multiple strategies was associated with larger training gains (p=0.04), although this association did not reach statistical significance after adjusting for multiple comparisons. Treatment gains among memory-trained individuals were not better after training in any particular strategy, or by the average age of participants, session length, or type of control condition. These findings can inform the design of future memory training programs for older adults. PMID:22423647
Effector CD8 T cells dedifferentiate into long-lived memory cells.

PubMed

Youngblood, Ben; Hale, J Scott; Kissick, Haydn T; Ahn, Eunseon; Xu, Xiaojin; Wieland, Andreas; Araki, Koichi; West, Erin E; Ghoneim, Hazem E; Fan, Yiping; Dogra, Pranay; Davis, Carl W; Konieczny, Bogumila T; Antia, Rustom; Cheng, Xiaodong; Ahmed, Rafi

2017-12-21

Memory CD8 T cells that circulate in the blood and are present in lymphoid organs are an essential component of long-lived T cell immunity. These memory CD8 T cells remain poised to rapidly elaborate effector functions upon re-exposure to pathogens, but also have many properties in common with naive cells, including pluripotency and the ability to migrate to the lymph nodes and spleen. Thus, memory cells embody features of both naive and effector cells, fuelling a long-standing debate centred on whether memory T cells develop from effector cells or directly from naive cells. Here we show that long-lived memory CD8 T cells are derived from a subset of effector T cells through a process of dedifferentiation. To assess the developmental origin of memory CD8 T cells, we investigated changes in DNA methylation programming at naive and effector cell-associated genes in virus-specific CD8 T cells during acute lymphocytic choriomeningitis virus infection in mice. Methylation profiling of terminal effector versus memory-precursor CD8 T cell subsets showed that, rather than retaining a naive epigenetic state, the subset of cells that gives rise to memory cells acquired de novo DNA methylation programs at naive-associated genes and became demethylated at the loci of classically defined effector molecules. Conditional deletion of the de novo methyltransferase Dnmt3a at an early stage of effector differentiation resulted in reduced methylation and faster re-expression of naive-associated genes, thereby accelerating the development of memory cells. Longitudinal phenotypic and epigenetic characterization of the memory-precursor effector subset of virus-specific CD8 T cells transferred into antigen-free mice revealed that differentiation to memory cells was coupled to erasure of de novo methylation programs and re-expression of naive-associated genes. Thus, epigenetic repression of naive-associated genes in effector CD8 T cells can be reversed in cells that develop into long-lived memory CD8 T cells while key effector genes remain demethylated, demonstrating that memory T cells arise from a subset of fate-permissive effector T cells.
Parallelization and checkpointing of GPU applications through program transformation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Solano-Quinde, Lizandro Damian

2012-01-01

GPUs have emerged as a powerful tool for accelerating general-purpose applications. The availability of programming languages that makes writing general-purpose applications for running on GPUs tractable have consolidated GPUs as an alternative for accelerating general purpose applications. Among the areas that have benefited from GPU acceleration are: signal and image processing, computational fluid dynamics, quantum chemistry, and, in general, the High Performance Computing (HPC) Industry. In order to continue to exploit higher levels of parallelism with GPUs, multi-GPU systems are gaining popularity. In this context, single-GPU applications are parallelized for running in multi-GPU systems. Furthermore, multi-GPU systems help to solvemore » the GPU memory limitation for applications with large application memory footprint. Parallelizing single-GPU applications has been approached by libraries that distribute the workload at runtime, however, they impose execution overhead and are not portable. On the other hand, on traditional CPU systems, parallelization has been approached through application transformation at pre-compile time, which enhances the application to distribute the workload at application level and does not have the issues of library-based approaches. Hence, a parallelization scheme for GPU systems based on application transformation is needed. Like any computing engine of today, reliability is also a concern in GPUs. GPUs are vulnerable to transient and permanent failures. Current checkpoint/restart techniques are not suitable for systems with GPUs. Checkpointing for GPU systems present new and interesting challenges, primarily due to the natural differences imposed by the hardware design, the memory subsystem architecture, the massive number of threads, and the limited amount of synchronization among threads. Therefore, a checkpoint/restart technique suitable for GPU systems is needed. The goal of this work is to exploit higher levels of parallelism and to develop support for application-level fault tolerance in applications using multiple GPUs. Our techniques reduce the burden of enhancing single-GPU applications to support these features. To achieve our goal, this work designs and implements a framework for enhancing a single-GPU OpenCL application through application transformation.« less
Regulators of Long-Term Memory Revealed by Mushroom Body-Specific Gene Expression Profiling in Drosophila melanogaster.

PubMed

Widmer, Yves F; Bilican, Adem; Bruggmann, Rémy; Sprecher, Simon G

2018-06-20

Memory formation is achieved by genetically tightly controlled molecular pathways that result in a change of synaptic strength and synapse organization. While for short-term memory traces rapidly acting biochemical pathways are in place, the formation of long-lasting memories requires changes in the transcriptional program of a cell. Although many genes involved in learning and memory formation have been identified, little is known about the genetic mechanisms required for changing the transcriptional program during different phases of long-term memory formation. With Drosophila melanogaster as a model system we profiled transcriptomic changes in the mushroom body, a memory center in the fly brain, at distinct time intervals during appetitive olfactory long-term memory formation using the targeted DamID technique. We describe the gene expression profiles during these phases and tested 33 selected candidate genes for deficits in long-term memory formation using RNAi knockdown. We identified 10 genes that enhance or decrease memory when knocked-down in the mushroom body. For vajk-1 and hacd1 , the two strongest hits, we gained further support for their crucial role in appetitive learning and forgetting. These findings show that profiling gene expression changes in specific cell-types harboring memory traces provides a powerful entry point to identify new genes involved in learning and memory. The presented transcriptomic data may further be used as resource to study genes acting at different memory phases. Copyright © 2018, Genetics.
Short-Term Memory in Orthogonal Neural Networks

NASA Astrophysics Data System (ADS)

White, Olivia L.; Lee, Daniel D.; Sompolinsky, Haim

2004-04-01

We study the ability of linear recurrent networks obeying discrete time dynamics to store long temporal sequences that are retrievable from the instantaneous state of the network. We calculate this temporal memory capacity for both distributed shift register and random orthogonal connectivity matrices. We show that the memory capacity of these networks scales with system size.
The Basolateral Amygdala and Nucleus Accumbens Core Mediate Dissociable Aspects of Drug Memory Reconsolidation

ERIC Educational Resources Information Center

Theberge, Florence R. M.; Milton, Amy L.; Belin, David; Lee, Jonathan L. C.; Everitt, Barry J.

2010-01-01

A distributed limbic-corticostriatal circuitry is implicated in cue-induced drug craving and relapse. Exposure to drug-paired cues not only precipitates relapse, but also triggers the reactivation and reconsolidation of the cue-drug memory. However, the limbic cortical-striatal circuitry underlying drug memory reconsolidation is unclear. The aim…
NAS Applications and Advanced Algorithms

NASA Technical Reports Server (NTRS)

Bailey, David H.; Biswas, Rupak; VanDerWijngaart, Rob; Kutler, Paul (Technical Monitor)

1997-01-01

This paper examines the applications most commonly run on the supercomputers at the Numerical Aerospace Simulation (NAS) facility. It analyzes the extent to which such applications are fundamentally oriented to vector computers, and whether or not they can be efficiently implemented on hierarchical memory machines, such as systems with cache memories and highly parallel, distributed memory systems.
Schedulers with load-store queue awareness

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Tong; Eichenberger, Alexandre E.; Jacob, Arpith C.

2017-02-07

In one embodiment, a computer-implemented method includes tracking a size of a load-store queue (LSQ) during compile time of a program. The size of the LSQ is time-varying and indicates how many memory access instructions of the program are on the LSQ. The method further includes scheduling, by a computer processor, a plurality of memory access instructions of the program based on the size of the LSQ.
Schedulers with load-store queue awareness

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Tong; Eichenberger, Alexandre E.; Jacob, Arpith C.

2017-01-24

In one embodiment, a computer-implemented method includes tracking a size of a load-store queue (LSQ) during compile time of a program. The size of the LSQ is time-varying and indicates how many memory access instructions of the program are on the LSQ. The method further includes scheduling, by a computer processor, a plurality of memory access instructions of the program based on the size of the LSQ.
Efficacy of Code Optimization on Cache-based Processors

NASA Technical Reports Server (NTRS)

VanderWijngaart, Rob F.; Chancellor, Marisa K. (Technical Monitor)

1997-01-01

The current common wisdom in the U.S. is that the powerful, cost-effective supercomputers of tomorrow will be based on commodity (RISC) micro-processors with cache memories. Already, most distributed systems in the world use such hardware as building blocks. This shift away from vector supercomputers and towards cache-based systems has brought about a change in programming paradigm, even when ignoring issues of parallelism. Vector machines require inner-loop independence and regular, non-pathological memory strides (usually this means: non-power-of-two strides) to allow efficient vectorization of array operations. Cache-based systems require spatial and temporal locality of data, so that data once read from main memory and stored in high-speed cache memory is used optimally before being written back to main memory. This means that the most cache-friendly array operations are those that feature zero or unit stride, so that each unit of data read from main memory (a cache line) contains information for the next iteration in the loop. Moreover, loops ought to be 'fat', meaning that as many operations as possible are performed on cache data-provided instruction caches do not overflow and enough registers are available. If unit stride is not possible, for example because of some data dependency, then care must be taken to avoid pathological strides, just ads on vector computers. For cache-based systems the issues are more complex, due to the effects of associativity and of non-unit block (cache line) size. But there is more to the story. Most modern micro-processors are superscalar, which means that they can issue several (arithmetic) instructions per clock cycle, provided that there are enough independent instructions in the loop body. This is another argument for providing fat loop bodies. With these restrictions, it appears fairly straightforward to produce code that will run efficiently on any cache-based system. It can be argued that although some of the important computational algorithms employed at NASA Ames require different programming styles on vector machines and cache-based machines, respectively, neither architecture class appeared to be favored by particular algorithms in principle. Practice tells us that the situation is more complicated. This report presents observations and some analysis of performance tuning for cache-based systems. We point out several counterintuitive results that serve as a cautionary reminder that memory accesses are not the only factors that determine performance, and that within the class of cache-based systems, significant differences exist.
Targeting latent function: Encouraging effective encoding for successful memory training and transfer

PubMed Central

Lustig, Cindy; Flegal, Kristin E.

2009-01-01

Cognitive training programs for older adults often result in improvements at the group level. However, there are typically large age and individual differences in the size of training benefits. These differences may be related to the degree to which participants implement the processes targeted by the training program. To test this possibility, we tested older adults in a memory-training procedure either under specific strategy instructions designed to encourage semantic, integrative encoding, or in a condition that encouraged time and attention to encoding but allowed participants to choose their own strategy. Both conditions improved the performance of old-old adults relative to an earlier study (Bissig & Lustig, 2007) and reduced self-reports of everyday memory errors. Performance in the strategy-instruction group was related to pre-existing ability, performance in the strategy-choice group was not. The strategy-choice group performed better on a laboratory transfer test of recognition memory, and training performance was correlated with reduced everyday memory errors. Training programs that target latent but inefficiently-used abilities while allowing flexibility in bringing those abilities to bear may best promote effective training and transfer. PMID:19140647

Avoiding and tolerating latency in large-scale next-generation shared-memory multiprocessors

NASA Technical Reports Server (NTRS)

Probst, David K.

1993-01-01

A scalable solution to the memory-latency problem is necessary to prevent the large latencies of synchronization and memory operations inherent in large-scale shared-memory multiprocessors from reducing high performance. We distinguish latency avoidance and latency tolerance. Latency is avoided when data is brought to nearby locales for future reference. Latency is tolerated when references are overlapped with other computation. Latency-avoiding locales include: processor registers, data caches used temporally, and nearby memory modules. Tolerating communication latency requires parallelism, allowing the overlap of communication and computation. Latency-tolerating techniques include: vector pipelining, data caches used spatially, prefetching in various forms, and multithreading in various forms. Relaxing the consistency model permits increased use of avoidance and tolerance techniques. Each model is a mapping from the program text to sets of partial orders on program operations; it is a convention about which temporal precedences among program operations are necessary. Information about temporal locality and parallelism constrains the use of avoidance and tolerance techniques. Suitable architectural primitives and compiler technology are required to exploit the increased freedom to reorder and overlap operations in relaxed models.
A simple modern correctness condition for a space-based high-performance multiprocessor

NASA Technical Reports Server (NTRS)

Probst, David K.; Li, Hon F.

1992-01-01

A number of U.S. national programs, including space-based detection of ballistic missile launches, envisage putting significant computing power into space. Given sufficient progress in low-power VLSI, multichip-module packaging and liquid-cooling technologies, we will see design of high-performance multiprocessors for individual satellites. In very high speed implementations, performance depends critically on tolerating large latencies in interprocessor communication; without latency tolerance, performance is limited by the vastly differing time scales in processor and data-memory modules, including interconnect times. The modern approach to tolerating remote-communication cost in scalable, shared-memory multiprocessors is to use a multithreaded architecture, and alter the semantics of shared memory slightly, at the price of forcing the programmer either to reason about program correctness in a relaxed consistency model or to agree to program in a constrained style. The literature on multiprocessor correctness conditions has become increasingly complex, and sometimes confusing, which may hinder its practical application. We propose a simple modern correctness condition for a high-performance, shared-memory multiprocessor; the correctness condition is based on a simple interface between the multiprocessor architecture and a high-performance, shared-memory multiprocessor; the correctness condition is based on a simple interface between the multiprocessor architecture and the parallel programming system.
Concurrent Image Processing Executive (CIPE). Volume 1: Design overview

NASA Technical Reports Server (NTRS)

Lee, Meemong; Groom, Steven L.; Mazer, Alan S.; Williams, Winifred I.

1990-01-01

The design and implementation of a Concurrent Image Processing Executive (CIPE), which is intended to become the support system software for a prototype high performance science analysis workstation are described. The target machine for this software is a JPL/Caltech Mark 3fp Hypercube hosted by either a MASSCOMP 5600 or a Sun-3, Sun-4 workstation; however, the design will accommodate other concurrent machines of similar architecture, i.e., local memory, multiple-instruction-multiple-data (MIMD) machines. The CIPE system provides both a multimode user interface and an applications programmer interface, and has been designed around four loosely coupled modules: user interface, host-resident executive, hypercube-resident executive, and application functions. The loose coupling between modules allows modification of a particular module without significantly affecting the other modules in the system. In order to enhance hypercube memory utilization and to allow expansion of image processing capabilities, a specialized program management method, incremental loading, was devised. To minimize data transfer between host and hypercube, a data management method which distributes, redistributes, and tracks data set information was implemented. The data management also allows data sharing among application programs. The CIPE software architecture provides a flexible environment for scientific analysis of complex remote sensing image data, such as planetary data and imaging spectrometry, utilizing state-of-the-art concurrent computation capabilities.
Receptor Signaling Directs Global Recruitment of Pre-existing Transcription Factors to Inducible Elements.

PubMed

Cockerill, Peter N

2016-12-01

Gene expression programs are largely regulated by the tissue-specific expression of lineage-defining transcription factors or by the inducible expression of transcription factors in response to specific stimuli. Here I will review our own work over the last 20 years to show how specific activation signals also lead to the wide-spread re-distribution of pre-existing constitutive transcription factors to sites undergoing chromatin reorganization. I will summarize studies showing that activation of kinase signaling pathways creates open chromatin regions that recruit pre-existing factors which were previously unable to bind to closed chromatin. As models I will draw upon genes activated or primed by receptor signaling in memory T cells, and genes activated by cytokine receptor mutations in acute myeloid leukemia. I also summarize a hit-and-run model of stable epigenetic reprograming in memory T cells, mediated by transient Activator Protein 1 (AP-1) binding, which enables the accelerated activation of inducible enhancers.
Parallel Computation of the Regional Ocean Modeling System (ROMS)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, P; Song, Y T; Chao, Y

2005-04-05

The Regional Ocean Modeling System (ROMS) is a regional ocean general circulation modeling system solving the free surface, hydrostatic, primitive equations over varying topography. It is free software distributed world-wide for studying both complex coastal ocean problems and the basin-to-global scale ocean circulation. The original ROMS code could only be run on shared-memory systems. With the increasing need to simulate larger model domains with finer resolutions and on a variety of computer platforms, there is a need in the ocean-modeling community to have a ROMS code that can be run on any parallel computer ranging from 10 to hundreds ofmore » processors. Recently, we have explored parallelization for ROMS using the MPI programming model. In this paper, an efficient parallelization strategy for such a large-scale scientific software package, based on an existing shared-memory computing model, is presented. In addition, scientific applications and data-performance issues on a couple of SGI systems, including Columbia, the world's third-fastest supercomputer, are discussed.« less
Quantitative histogram analysis of images

NASA Astrophysics Data System (ADS)

Holub, Oliver; Ferreira, Sérgio T.

2006-11-01

A routine for histogram analysis of images has been written in the object-oriented, graphical development environment LabVIEW. The program converts an RGB bitmap image into an intensity-linear greyscale image according to selectable conversion coefficients. This greyscale image is subsequently analysed by plots of the intensity histogram and probability distribution of brightness, and by calculation of various parameters, including average brightness, standard deviation, variance, minimal and maximal brightness, mode, skewness and kurtosis of the histogram and the median of the probability distribution. The program allows interactive selection of specific regions of interest (ROI) in the image and definition of lower and upper threshold levels (e.g., to permit the removal of a constant background signal). The results of the analysis of multiple images can be conveniently saved and exported for plotting in other programs, which allows fast analysis of relatively large sets of image data. The program file accompanies this manuscript together with a detailed description of two application examples: The analysis of fluorescence microscopy images, specifically of tau-immunofluorescence in primary cultures of rat cortical and hippocampal neurons, and the quantification of protein bands by Western-blot. The possibilities and limitations of this kind of analysis are discussed. Program summaryTitle of program: HAWGC Catalogue identifier: ADXG_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADXG_v1_0 Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Computers: Mobile Intel Pentium III, AMD Duron Installations: No installation necessary—Executable file together with necessary files for LabVIEW Run-time engine Operating systems or monitors under which the program has been tested: WindowsME/2000/XP Programming language used: LabVIEW 7.0 Memory required to execute with typical data:˜16MB for starting and ˜160MB used for loading of an image No. of bits in a word: 32 No. of processors used: 1 Has the code been vectorized or parallelized?: No No of lines in distributed program, including test data, etc.:138 946 No. of bytes in distributed program, including test data, etc.:15 166 675 Distribution format: tar.gz Nature of physical problem: Quantification of image data (e.g., for discrimination of molecular species in gels or fluorescent molecular probes in cell cultures) requires proprietary or complex software packages, which might not include the relevant statistical parameters or make the analysis of multiple images a tedious procedure for the general user. Method of solution: Tool for conversion of RGB bitmap image into luminance-linear image and extraction of luminance histogram, probability distribution, and statistical parameters (average brightness, standard deviation, variance, minimal and maximal brightness, mode, skewness and kurtosis of histogram and median of probability distribution) with possible selection of region of interest (ROI) and lower and upper threshold levels. Restrictions on the complexity of the problem: Does not incorporate application-specific functions (e.g., morphometric analysis) Typical running time: Seconds (depending on image size and processor speed) Unusual features of the program: None
A distributed microcomputer-controlled system for data acquisition and power spectral analysis of EEG.

PubMed

Vo, T D; Dwyer, G; Szeto, H H

1986-04-01

A relatively powerful and inexpensive microcomputer-based system for the spectral analysis of the EEG is presented. High resolution and speed is achieved with the use of recently available large-scale integrated circuit technology with enhanced functionality (INTEL Math co-processors 8087) which can perform transcendental functions rapidly. The versatility of the system is achieved with a hardware organization that has distributed data acquisition capability performed by the use of a microprocessor-based analog to digital converter with large resident memory (Cyborg ISAAC-2000). Compiled BASIC programs and assembly language subroutines perform on-line or off-line the fast Fourier transform and spectral analysis of the EEG which is stored as soft as well as hard copy. Some results obtained from test application of the entire system in animal studies are presented.
Detection of weak signals in memory thermal baths.

PubMed

Jiménez-Aquino, J I; Velasco, R M; Romero-Bastida, M

2014-11-01

The nonlinear relaxation time and the statistics of the first passage time distribution in connection with the quasideterministic approach are used to detect weak signals in the decay process of the unstable state of a Brownian particle embedded in memory thermal baths. The study is performed in the overdamped approximation of a generalized Langevin equation characterized by an exponential decay in the friction memory kernel. A detection criterion for each time scale is studied: The first one is referred to as the receiver output, which is given as a function of the nonlinear relaxation time, and the second one is related to the statistics of the first passage time distribution.
Projected phase-change memory devices.

PubMed

Koelmans, Wabe W; Sebastian, Abu; Jonnalagadda, Vara Prasad; Krebs, Daniel; Dellmann, Laurent; Eleftheriou, Evangelos

2015-09-03

Nanoscale memory devices, whose resistance depends on the history of the electric signals applied, could become critical building blocks in new computing paradigms, such as brain-inspired computing and memcomputing. However, there are key challenges to overcome, such as the high programming power required, noise and resistance drift. Here, to address these, we present the concept of a projected memory device, whose distinguishing feature is that the physical mechanism of resistance storage is decoupled from the information-retrieval process. We designed and fabricated projected memory devices based on the phase-change storage mechanism and convincingly demonstrate the concept through detailed experimentation, supported by extensive modelling and finite-element simulations. The projected memory devices exhibit remarkably low drift and excellent noise performance. We also demonstrate active control and customization of the programming characteristics of the device that reliably realize a multitude of resistance states.
Mental Fitness for Life: Assessing the Impact of an 8-Week Mental Fitness Program on Healthy Aging.

ERIC Educational Resources Information Center

Cusack, Sandra A.; Thompson, Wendy J. A.; Rogers, Mary E.

2003-01-01

A mental fitness program taught goal setting, critical thinking, creativity, positive attitudes, learning, memory, and self-expression to adults over 50 (n=22). Pre/posttests of depression and cognition revealed significant impacts on mental fitness, cognitive confidence, goal setting, optimism, creativity, flexibility, and memory. Not significant…
A Balancing Act: Interpreting Tragedy at the 9/11 Memorial Museum

ERIC Educational Resources Information Center

Rauch, Noah

2018-01-01

The 9/11 Memorial Museum's docent program offers visitors artifact-based entry-points into a difficult, emotional history. The program's launch raised a host of questions, many centered on how to balance and convey strongly held, often traumatic, and sometimes conflicting experiences with a newly constructed institutional narrative. This article…
DOE Office of Scientific and Technical Information (OSTI.GOV)

Lee, Seyong; Vetter, Jeffrey S

Computer architecture experts expect that non-volatile memory (NVM) hierarchies will play a more significant role in future systems including mobile, enterprise, and HPC architectures. With this expectation in mind, we present NVL-C: a novel programming system that facilitates the efficient and correct programming of NVM main memory systems. The NVL-C programming abstraction extends C with a small set of intuitive language features that target NVM main memory, and can be combined directly with traditional C memory model features for DRAM. We have designed these new features to enable compiler analyses and run-time checks that can improve performance and guard againstmore » a number of subtle programming errors, which, when left uncorrected, can corrupt NVM-stored data. Moreover, to enable recovery of data across application or system failures, these NVL-C features include a flexible directive for specifying NVM transactions. So that our implementation might be extended to other compiler front ends and languages, the majority of our compiler analyses are implemented in an extended version of LLVM's intermediate representation (LLVM IR). We evaluate NVL-C on a number of applications to show its flexibility, performance, and correctness.« less
Fast Initialization of Bubble-Memory Systems

NASA Technical Reports Server (NTRS)

Looney, K. T.; Nichols, C. D.; Hayes, P. J.

1986-01-01

Improved scheme several orders of magnitude faster than normal initialization scheme. State-of-the-art commercial bubble-memory device used. Hardware interface designed connects controlling microprocessor to bubblememory circuitry. System software written to exercise various functions of bubble-memory system in comparison made between normal and fast techniques. Future implementations of approach utilize E2PROM (electrically-erasable programable read-only memory) to provide greater system flexibility. Fastinitialization technique applicable to all bubble-memory devices.
Understanding human dynamics in microblog posting activities

NASA Astrophysics Data System (ADS)

Jiang, Zhihong; Zhang, Yubao; Wang, Hui; Li, Pei

2013-02-01

Human activity patterns are an important issue in behavior dynamics research. Empirical evidence indicates that human activity patterns can be characterized by a heavy-tailed inter-event time distribution. However, most researchers give an understanding by only modeling the power-law feature of the inter-event time distribution, and those overlooked non-power-law features are likely to be nontrivial. In this work, we propose a behavior dynamics model, called the finite memory model, in which humans adaptively change their activity rates based on a finite memory of recent activities, which is driven by inherent individual interest. Theoretical analysis shows a finite memory model can properly explain various heavy-tailed inter-event time distributions, including a regular power law and some non-power-law deviations. To validate the model, we carry out an empirical study based on microblogging activity from thousands of microbloggers in the Celebrity Hall of the Sina microblog. The results show further that the model is reasonably effective. We conclude that finite memory is an effective dynamics element to describe the heavy-tailed human activity pattern.
Quantitative autoradiographic analysis of muscarinic receptor subtypes and their role in representational memory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Messer, W.S.

1986-01-01

Autoradiographic techniques were used to examine the distribution of muscarinic receptors in rat brain slices. Agonist and selective antagonist binding were examined by measuring the ability for unlabeled ligands to inhibit (/sup 3/H)-1-QNB labeling of muscarinic receptors. The distribution of high affinity pirenzepine binding sites (M/sub 1/ subtype) was distinct from the distribution of high affinity carbamylcholine sites, which corresponded to the M/sub 2/ subtype. In a separate assay, the binding profile for pirenzepine was shown to differ from the profile for scopolamine, a classical muscarinic antagonist. Muscarinic antagonists, when injected into the Hippocampus, impaired performance of a representational memorymore » task. Pirenzepine, the M/sub 1/ selective antagonist, produced representational memory deficits. Scopolamine, a less selective muscarinic antagonist, caused increases in running times in some animals which prevented a definitive interpretation of the nature of the impairment. Pirenzepine displayed a higher affinity for the hippocampus and was more effective in producing a selective impairment of representational memory than scopolamine. The data indicated that cholinergic activity in the hippocampus was necessary for representation memory function.« less
Implications of the Turing machine model of computation for processor and programming language design

NASA Astrophysics Data System (ADS)

Hunter, Geoffrey

2004-01-01

A computational process is classified according to the theoretical model that is capable of executing it; computational processes that require a non-predeterminable amount of intermediate storage for their execution are Turing-machine (TM) processes, while those whose storage are predeterminable are Finite Automation (FA) processes. Simple processes (such as traffic light controller) are executable by Finite Automation, whereas the most general kind of computation requires a Turing Machine for its execution. This implies that a TM process must have a non-predeterminable amount of memory allocated to it at intermediate instants of its execution; i.e. dynamic memory allocation. Many processes encountered in practice are TM processes. The implication for computational practice is that the hardware (CPU) architecture and its operating system must facilitate dynamic memory allocation, and that the programming language used to specify TM processes must have statements with the semantic attribute of dynamic memory allocation, for in Alan Turing"s thesis on computation (1936) the "standard description" of a process is invariant over the most general data that the process is designed to process; i.e. the program describing the process should never have to be modified to allow for differences in the data that is to be processed in different instantiations; i.e. data-invariant programming. Any non-trivial program is partitioned into sub-programs (procedures, subroutines, functions, modules, etc). Examination of the calls/returns between the subprograms reveals that they are nodes in a tree-structure; this tree-structure is independent of the programming language used to encode (define) the process. Each sub-program typically needs some memory for its own use (to store values intermediate between its received data and its computed results); this locally required memory is not needed before the subprogram commences execution, and it is not needed after its execution terminates; it may be allocated as its execution commences, and deallocated as its execution terminates, and if the amount of this local memory is not known until just before execution commencement, then it is essential that it be allocated dynamically as the first action of its execution. This dynamically allocated/deallocated storage of each subprogram"s intermediate values, conforms with the stack discipline; i.e. last allocated = first to be deallocated, an incidental benefit of which is automatic overlaying of variables. This stack-based dynamic memory allocation was a semantic implication of the nested block structure that originated in the ALGOL-60 programming language. AGLOL-60 was a TM language, because the amount of memory allocated on subprogram (block/procedure) entry (for arrays, etc) was computable at execution time. A more general requirement of a Turing machine process is for code generation at run-time; this mandates access to the source language processor (compiler/interpretor) during execution of the process. This fundamental aspect of computer science is important to the future of system design, because it has been overlooked throughout the 55 years since modern computing began in 1048. The popular computer systems of this first half-century of computing were constrained by compile-time (or even operating system boot-time) memory allocation, and were thus limited to executing FA processes. The practical effect was that the distinction between the data-invariant program and its variable data was blurred; programmers had to make trial and error executions, modifying the program"s compile-time constants (array dimensions) to iterate towards the values required at run-time by the data being processed. This era of trial and error computing still persists; it pervades the culture of current (2003) computing practice.
A class of designs for a sparse distributed memory

NASA Technical Reports Server (NTRS)

Jaeckel, Louis A.

1989-01-01

A general class of designs for a space distributed memory (SDM) is described. The author shows that Kanerva's original design and the selected-coordinate design are related, and that there is a series of possible intermediate designs between those two designs. In each such design, the set of addresses that activate a memory location is a sphere in the address space. We can also have hybrid designs, in which the memory locations may be a mixture of those found in the other designs. In some applications, the bits of the read and write addresses that will actually be used might be mostly zeros; that is, the addresses might lie on or near z hyperplane in the address space. The author describes a hyperplane design which is adapted to this situation and compares it to an adaptation of Kanerva's design. To study the performance of these designs, he computes the expected number of memory locations activated by both of two addresses.
A molecular dynamics implementation of the 3D Mercedes-Benz water model

NASA Astrophysics Data System (ADS)

Hynninen, T.; Dias, C. L.; Mkrtchyan, A.; Heinonen, V.; Karttunen, M.; Foster, A. S.; Ala-Nissila, T.

2012-02-01

The three-dimensional Mercedes-Benz model was recently introduced to account for the structural and thermodynamic properties of water. It treats water molecules as point-like particles with four dangling bonds in tetrahedral coordination, representing H-bonds of water. Its conceptual simplicity renders the model attractive in studies where complex behaviors emerge from H-bond interactions in water, e.g., the hydrophobic effect. A molecular dynamics (MD) implementation of the model is non-trivial and we outline here the mathematical framework of its force-field. Useful routines written in modern Fortran are also provided. This open source code is free and can easily be modified to account for different physical context. The provided code allows both serial and MPI-parallelized execution. Program summaryProgram title: CASHEW (Coarse Approach Simulator for Hydrogen-bonding Effects in Water) Catalogue identifier: AEKM_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEKM_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 20 501 No. of bytes in distributed program, including test data, etc.: 551 044 Distribution format: tar.gz Programming language: Fortran 90 Computer: Program has been tested on desktop workstations and a Cray XT4/XT5 supercomputer. Operating system: Linux, Unix, OS X Has the code been vectorized or parallelized?: The code has been parallelized using MPI. RAM: Depends on size of system, about 5 MB for 1500 molecules. Classification: 7.7 External routines: A random number generator, Mersenne Twister ( http://www.math.sci.hiroshima-u.ac.jp/m-mat/MT/VERSIONS/FORTRAN/mt95.f90), is used. A copy of the code is included in the distribution. Nature of problem: Molecular dynamics simulation of a new geometric water model. Solution method: New force-field for water molecules, velocity-Verlet integration, representation of molecules as rigid particles with rotations described using quaternion algebra. Restrictions: Memory and cpu time limit the size of simulations. Additional comments: Software web site: https://gitorious.org/cashew/. Running time: Depends on the size of system. The sample tests provided only take a few seconds.
Quantum cryptography: individual eavesdropping with the knowledge of the error-correcting protocol

DOE Office of Scientific and Technical Information (OSTI.GOV)

Horoshko, D B

2007-12-31

The quantum key distribution protocol BB84 combined with the repetition protocol for error correction is analysed from the point of view of its security against individual eavesdropping relying on quantum memory. It is shown that the mere knowledge of the error-correcting protocol changes the optimal attack and provides the eavesdropper with additional information on the distributed key. (fifth seminar in memory of d.n. klyshko)
Weather prediction using a genetic memory

NASA Technical Reports Server (NTRS)

Rogers, David

1990-01-01

Kanaerva's sparse distributed memory (SDM) is an associative memory model based on the mathematical properties of high dimensional binary address spaces. Holland's genetic algorithms are a search technique for high dimensional spaces inspired by evolutional processes of DNA. Genetic Memory is a hybrid of the above two systems, in which the memory uses a genetic algorithm to dynamically reconfigure its physical storage locations to reflect correlations between the stored addresses and data. This architecture is designed to maximize the ability of the system to scale-up to handle real world problems.

Modeling the Coupled Chemo-Thermo-Mechanical Behavior of Amorphous Polymer Networks.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zimmerman, Jonathan A.; Nguyen, Thao D.; Xiao, Rui

2015-02-01

Amorphous polymers exhibit a rich landscape of time-dependent behavior including viscoelasticity, structural relaxation, and viscoplasticity. These time-dependent mechanisms can be exploited to achieve shape-memory behavior, which allows the material to store a programmed deformed shape indefinitely and to recover entirely the undeformed shape in response to specific environmental stimulus. The shape-memory performance of amorphous polymers depends on the coordination of multiple physical mechanisms, and considerable opportunities exist to tailor the polymer structure and shape-memory programming procedure to achieve the desired performance. The goal of this project was to use a combination of theoretical, numerical and experimental methods to investigate themore » effect of shape memory programming, thermo-mechanical properties, and physical and environmental aging on the shape memory performance. Physical and environmental aging occurs during storage and through exposure to solvents, such as water, and can significantly alter the viscoelastic behavior and shape memory behavior of amorphous polymers. This project – executed primarily by Professor Thao Nguyen and Graduate Student Rui Xiao at Johns Hopkins University in support of a DOE/NNSA Presidential Early Career Award in Science and Engineering (PECASE) – developed a theoretical framework for chemothermo- mechanical behavior of amorphous polymers to model the effects of physical aging and solvent-induced environmental factors on their thermoviscoelastic behavior.« less
Low latency messages on distributed memory multiprocessors

NASA Technical Reports Server (NTRS)

Rosing, Matthew; Saltz, Joel

1993-01-01

Many of the issues in developing an efficient interface for communication on distributed memory machines are described and a portable interface is proposed. Although the hardware component of message latency is less than one microsecond on many distributed memory machines, the software latency associated with sending and receiving typed messages is on the order of 50 microseconds. The reason for this imbalance is that the software interface does not match the hardware. By changing the interface to match the hardware more closely, applications with fine grained communication can be put on these machines. Based on several tests that were run on the iPSC/860, an interface that will better match current distributed memory machines is proposed. The model used in the proposed interface consists of a computation processor and a communication processor on each node. Communication between these processors and other nodes in the system is done through a buffered network. Information that is transmitted is either data or procedures to be executed on the remote processor. The dual processor system is better suited for efficiently handling asynchronous communications compared to a single processor system. The ability to send data or procedure is very flexible for minimizing message latency, based on the type of communication being performed. The test performed and the proposed interface are described.
Distributed shared memory for roaming large volumes.

PubMed

Castanié, Laurent; Mion, Christophe; Cavin, Xavier; Lévy, Bruno

2006-01-01

We present a cluster-based volume rendering system for roaming very large volumes. This system allows to move a gigabyte-sized probe inside a total volume of several tens or hundreds of gigabytes in real-time. While the size of the probe is limited by the total amount of texture memory on the cluster, the size of the total data set has no theoretical limit. The cluster is used as a distributed graphics processing unit that both aggregates graphics power and graphics memory. A hardware-accelerated volume renderer runs in parallel on the cluster nodes and the final image compositing is implemented using a pipelined sort-last rendering algorithm. Meanwhile, volume bricking and volume paging allow efficient data caching. On each rendering node, a distributed hierarchical cache system implements a global software-based distributed shared memory on the cluster. In case of a cache miss, this system first checks page residency on the other cluster nodes instead of directly accessing local disks. Using two Gigabit Ethernet network interfaces per node, we accelerate data fetching by a factor of 4 compared to directly accessing local disks. The system also implements asynchronous disk access and texture loading, which makes it possible to overlap data loading, volume slicing and rendering for optimal volume roaming.
A compositional reservoir simulator on distributed memory parallel computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rame, M.; Delshad, M.

1995-12-31

This paper presents the application of distributed memory parallel computes to field scale reservoir simulations using a parallel version of UTCHEM, The University of Texas Chemical Flooding Simulator. The model is a general purpose highly vectorized chemical compositional simulator that can simulate a wide range of displacement processes at both field and laboratory scales. The original simulator was modified to run on both distributed memory parallel machines (Intel iPSC/960 and Delta, Connection Machine 5, Kendall Square 1 and 2, and CRAY T3D) and a cluster of workstations. A domain decomposition approach has been taken towards parallelization of the code. Amore » portion of the discrete reservoir model is assigned to each processor by a set-up routine that attempts a data layout as even as possible from the load-balance standpoint. Each of these subdomains is extended so that data can be shared between adjacent processors for stencil computation. The added routines that make parallel execution possible are written in a modular fashion that makes the porting to new parallel platforms straight forward. Results of the distributed memory computing performance of Parallel simulator are presented for field scale applications such as tracer flood and polymer flood. A comparison of the wall-clock times for same problems on a vector supercomputer is also presented.« less
Stretching-induced nanostructures on shape memory polyurethane films and their regulation to osteoblasts morphology.

PubMed

Xing, Juan; Ma, Yufei; Lin, Manping; Wang, Yuanliang; Pan, Haobo; Ruan, Changshun; Luo, Yanfeng

2016-10-01

Programming such as stretching, compression and bending is indispensible to endow polyurethanes with shape memory effects. Despite extensive investigations on the contributions of programming processes to the shape memory effects of polyurethane, less attention has been paid to the nanostructures of shape memory polyurethanes surface during the programming process. Here we found that stretching could induce the reassembly of hard domains and thereby change the nanostructures on the film surfaces with dependence on the stretching ratios (0%, 50%, 100%, and 200%). In as-cast polyurethane films, hard segments sequentially assembled into nano-scale hard domains, round or fibrillar islands, and fibrillar apophyses. Upon stretching, the islands packed along the stretching axis to form reoriented fibrillar apophyses along the stretching direction. Stretching only changed the chemical patterns on polyurethane films without significantly altering surface roughness, with the primary composition of fibrillar apophyses being hydrophilic hard domains. Further analysis of osteoblasts morphology revealed that the focal adhesion formation and osteoblasts orientation were in accordance with the chemical patterns of the underlying stretched films, which corroborates the vital roles of stretching-induced nanostructures in regulating osteoblasts morphology. These novel findings suggest that programming might hold great potential for patterning polyurethane surfaces so as to direct cellular behavior. In addition, this work lays groundwork for guiding the programming of shape memory polyurethanes to produce appropriate nanostructures for predetermined medical applications. Copyright © 2016 Elsevier B.V. All rights reserved.
Cognitive stimulation intervention for elders with mild cognitive impairment compared with normal aged subjects: preliminary results.

PubMed

Wenisch, Emilie; Cantegreil-Kallen, Inge; De Rotrou, Jocelyne; Garrigue, Pia; Moulin, Florence; Batouche, Fériel; Richard, Aurore; De Sant'Anna, Martha; Rigaud, Anne Sophie

2007-08-01

Cognitive training programs have been developed for Alzheimer's disease patients and the healthy elderly population. Collective cognitive stimulation programs have been shown to be efficient for subjects with memory complaint. The aim of this study was to evaluate the benefit of such cognitive programs in populations with Mild Cognitive Impairment (MCI). Twelve patients with MCI and twelve cognitively normal elders were administered a cognitive stimulation program. Cognitive performance (Logical Memory, Word paired associative learning task, Trail Making Test, verbal fluency test) were collected before and after the intervention. A gain score [(post-score - pre-score)/ pre-score] was calculated for each variable and compared between groups. The analysis revealed a larger intervention size effect in MCI than in normal elders' performances on the associative learning task (immediate recall: p<0.05, delayed recall: p<0.01). The intervention was more beneficial in improving associative memory abilities in MCI than in normal subjects. At the end of the intervention, the MCI group had lower results than the normal group only for the delayed recall of Logical Memory. Although further studies are needed for more details on the impact of cognitive stimulation programs on MCI patients, this intervention is effective in compensating associative memory difficulties of these patients. Among non-pharmacological interventions, cognitive stimulation therapy is a repeatable and inexpensive collective method that can easily be provided to various populations with the aim of slowing down the rate of decline in elderly persons with cognitive impairment.
A Cache Design to Exploit Structural Locality

DTIC Science & Technology

1991-12-01

memory and secondary storage. Main memory was used to store the instructions and data of an executing pro- gram, while secondary storage held programs ...efficiency of the CPU and faster turnaround of executing programs . In addition to the well known spatial and temporal aspects of locality, Hobart has...identified a third aspect, which he has called structural locality (9). This type of locality is defined as the tendency of an executing program to
The FORCE - A highly portable parallel programming language

NASA Technical Reports Server (NTRS)

Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

1989-01-01

This paper explains why the FORCE parallel programming language is easily portable among six different shared-memory multiprocessors, and how a two-level macro preprocessor makes it possible to hide low-level machine dependencies and to build machine-independent high-level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared-memory multiprocessor executing them.
The FORCE: A highly portable parallel programming language

NASA Technical Reports Server (NTRS)

Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

1989-01-01

Here, it is explained why the FORCE parallel programming language is easily portable among six different shared-memory microprocessors, and how a two-level macro preprocessor makes it possible to hide low level machine dependencies and to build machine-independent high level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared memory multiprocessor executing them.
Multimodal retrieval of autobiographical memories: sensory information contributes differently to the recollection of events.

PubMed

Willander, Johan; Sikström, Sverker; Karlsson, Kristina

2015-01-01

Previous studies on autobiographical memory have focused on unimodal retrieval cues (i.e., cues pertaining to one modality). However, from an ecological perspective multimodal cues (i.e., cues pertaining to several modalities) are highly important to investigate. In the present study we investigated age distributions and experiential ratings of autobiographical memories retrieved with unimodal and multimodal cues. Sixty-two participants were randomized to one of four cue-conditions: visual, olfactory, auditory, or multimodal. The results showed that the peak of the distributions depends on the modality of the retrieval cue. The results indicated that multimodal retrieval seemed to be driven by visual and auditory information to a larger extent and to a lesser extent by olfactory information. Finally, no differences were observed in the number of retrieved memories or experiential ratings across the four cue-conditions.
Discrete-Slots Models of Visual Working-Memory Response Times

PubMed Central

Donkin, Christopher; Nosofsky, Robert M.; Gold, Jason M.; Shiffrin, Richard M.

2014-01-01

Much recent research has aimed to establish whether visual working memory (WM) is better characterized by a limited number of discrete all-or-none slots or by a continuous sharing of memory resources. To date, however, researchers have not considered the response-time (RT) predictions of discrete-slots versus shared-resources models. To complement the past research in this field, we formalize a family of mixed-state, discrete-slots models for explaining choice and RTs in tasks of visual WM change detection. In the tasks under investigation, a small set of visual items is presented, followed by a test item in 1 of the studied positions for which a change judgment must be made. According to the models, if the studied item in that position is retained in 1 of the discrete slots, then a memory-based evidence-accumulation process determines the choice and the RT; if the studied item in that position is missing, then a guessing-based accumulation process operates. Observed RT distributions are therefore theorized to arise as probabilistic mixtures of the memory-based and guessing distributions. We formalize an analogous set of continuous shared-resources models. The model classes are tested on individual subjects with both qualitative contrasts and quantitative fits to RT-distribution data. The discrete-slots models provide much better qualitative and quantitative accounts of the RT and choice data than do the shared-resources models, although there is some evidence for “slots plus resources” when memory set size is very small. PMID:24015956
Design of multiple sequence alignment algorithms on parallel, distributed memory supercomputers.

PubMed

Church, Philip C; Goscinski, Andrzej; Holt, Kathryn; Inouye, Michael; Ghoting, Amol; Makarychev, Konstantin; Reumann, Matthias

2011-01-01

The challenge of comparing two or more genomes that have undergone recombination and substantial amounts of segmental loss and gain has recently been addressed for small numbers of genomes. However, datasets of hundreds of genomes are now common and their sizes will only increase in the future. Multiple sequence alignment of hundreds of genomes remains an intractable problem due to quadratic increases in compute time and memory footprint. To date, most alignment algorithms are designed for commodity clusters without parallelism. Hence, we propose the design of a multiple sequence alignment algorithm on massively parallel, distributed memory supercomputers to enable research into comparative genomics on large data sets. Following the methodology of the sequential progressiveMauve algorithm, we design data structures including sequences and sorted k-mer lists on the IBM Blue Gene/P supercomputer (BG/P). Preliminary results show that we can reduce the memory footprint so that we can potentially align over 250 bacterial genomes on a single BG/P compute node. We verify our results on a dataset of E.coli, Shigella and S.pneumoniae genomes. Our implementation returns results matching those of the original algorithm but in 1/2 the time and with 1/4 the memory footprint for scaffold building. In this study, we have laid the basis for multiple sequence alignment of large-scale datasets on a massively parallel, distributed memory supercomputer, thus enabling comparison of hundreds instead of a few genome sequences within reasonable time.
Hydrodynamic immunization leads to poor CD8 T-cell expansion, low frequency of memory CTLs and ineffective antiviral protection.

PubMed

Obeng-Adjei, N; Choo, D K; Weiner, D B

2013-10-01

Hepatotropic pathogens, such as hepatitis B (HBV) and hepatitis C (HCV), often escape cellular immune clearance resulting in chronic infection. As HBV and HCV infections are the most common causes of hepatocellular carcinoma (HCC), prevention of these infections is believed to be key to the prevention of HCC. It is believed that an effective immune therapy must induce strong cytotonic T lymphocytes (CTLs) that can migrate into the liver, where they can clear infected hepatocytes. Here, we compared the induction of CD8 T cells by two different DNA immunization methods for T-cell differentiation, function, memory programming and their distribution within relevant tissues in a highly controlled fashion. We used hydrodynamic tail vein injection of plasmid to establish liver-specific LCMV-gp antigen (Ag) transient expression, and studied CD8 T cells induced using the P14 transgenic mouse model. CD8 T cells from this group exhibited unique and limited expansion, memory differentiation, polyfunctionality and cytotoxicity compared with T cells generated in intramuscularly immunized mice. This difference in liver-generated expansion resulted in lower memory CD8 T-cell frequency, leading to reduced protection against lethal viral challenge. These data show an unusual induction of naive CD8 T cells contributed to the lower frequency of Ag-specific CTLs observed after immunization in the liver, suggesting that limited priming in liver compared with peripheral tissues is responsible for this outcome.
Virtual memory

NASA Technical Reports Server (NTRS)

Denning, P. J.

1986-01-01

Virtual memory was conceived as a way to automate overlaying of program segments. Modern computers have very large main memories, but need automatic solutions to the relocation and protection problems. Virtual memory serves this need as well and is thus useful in computers of all sizes. The history of the idea is traced, showing how it has become a widespread, little noticed feature of computers today.
The effects of drumming on working memory in older adults.

PubMed

Degé, Franziska; Kerkovius, Katharina

2018-05-04

Our study investigated the effect of a music training program on working memory (verbal memory, visual memory, and as a part of central executive processing working memory) in older adults. The experimental group was musically trained (drumming and singing), whereas one control group received a literature training program and a second control group was untrained. We randomly assigned 24 participants (all females; M = 77 years and 3 months) to the music group, the literature group, and the untrained group. The training groups were trained for 15 weeks. The three groups did not differ significantly in age, socioeconomic status, music education, musical aptitude, cognitive abilities, or depressive symptoms. We did not find differences in the music group in central executive function. However, we found a potential effect of music training on verbal memory and an impact of music training on visual memory. Musically trained participants remembered more words from a word list than both control groups, and they were able to remember more symbol sequences correctly than the control groups. Our findings show a possible effect of music training on verbal and visual memory in older people. © 2018 New York Academy of Sciences.
FOXO1 opposition of CD8+ T cell effector programming confers early memory properties and phenotypic diversity.

PubMed

Delpoux, Arnaud; Lai, Chen-Yen; Hedrick, Stephen M; Doedens, Andrew L

2017-10-17

The factors and steps controlling postinfection CD8 + T cell terminal effector versus memory differentiation are incompletely understood. Whereas we found that naive TCF7 (alias "Tcf-1") expression is FOXO1 independent, early postinfection we report bimodal, FOXO1-dependent expression of the memory-essential transcription factor TCF7 in pathogen-specific CD8 + T cells. We determined the early postinfection TCF7 high population is marked by low TIM3 expression and bears memory signature hallmarks before the appearance of established memory precursor marker CD127 (IL-7R). These cells exhibit diminished TBET, GZMB, mTOR signaling, and cell cycle progression. Day 5 postinfection, TCF7 high cells express higher memory-associated BCL2 and EOMES, as well as increased accumulation potential and capacity to differentiate into memory phenotype cells. TCF7 retroviral transduction opposes GZMB expression and the formation of KLRG1 pos phenotype cells, demonstrating an active role for TCF7 in extinguishing the effector program and forestalling terminal differentiation. Past the peak of the cellular immune response, we report a gradient of FOXO1 and TCF7 expression, which functions to oppose TBET and orchestrate a continuum of effector-to-memory phenotypes.
Phonotactic Probability Effect in Nonword Recall and Its Relationship with Vocabulary in Monolingual and Bilingual Preschoolers

ERIC Educational Resources Information Center

Messer, Marielle H.; Leseman, Paul P. M.; Boom, Jan; Mayo, Aziza Y.

2010-01-01

The current study examined to what extent information in long-term memory concerning the distribution of phoneme clusters in a language, so-called long-term phonotactic knowledge, increased the capacity of verbal short-term memory in young language learners and, through increased verbal short-term memory capacity, supported these children's first…
Contrasting single and multi-component working-memory systems in dual tasking.

PubMed

Nijboer, Menno; Borst, Jelmer; van Rijn, Hedderik; Taatgen, Niels

2016-05-01

Working memory can be a major source of interference in dual tasking. However, there is no consensus on whether this interference is the result of a single working memory bottleneck, or of interactions between different working memory components that together form a complete working-memory system. We report a behavioral and an fMRI dataset in which working memory requirements are manipulated during multitasking. We show that a computational cognitive model that assumes a distributed version of working memory accounts for both behavioral and neuroimaging data better than a model that takes a more centralized approach. The model's working memory consists of an attentional focus, declarative memory, and a subvocalized rehearsal mechanism. Thus, the data and model favor an account where working memory interference in dual tasking is the result of interactions between different resources that together form a working-memory system. Copyright © 2016 Elsevier Inc. All rights reserved.
System and method for programmable bank selection for banked memory subsystems

DOEpatents

Blumrich, Matthias A.; Chen, Dong; Gara, Alan G.; Giampapa, Mark E.; Hoenicke, Dirk; Ohmacht, Martin; Salapura, Valentina; Sugavanam, Krishnan

2010-09-07

A programmable memory system and method for enabling one or more processor devices access to shared memory in a computing environment, the shared memory including one or more memory storage structures having addressable locations for storing data. The system comprises: one or more first logic devices associated with a respective one or more processor devices, each first logic device for receiving physical memory address signals and programmable for generating a respective memory storage structure select signal upon receipt of pre-determined address bit values at selected physical memory address bit locations; and, a second logic device responsive to each of the respective select signal for generating an address signal used for selecting a memory storage structure for processor access. The system thus enables each processor device of a computing environment memory storage access distributed across the one or more memory storage structures.
Memory consolidation reconfigures neural pathways involved in the suppression of emotional memories

PubMed Central

Liu, Yunzhe; Lin, Wanjun; Liu, Chao; Luo, Yuejia; Wu, Jianhui; Bayley, Peter J.; Qin, Shaozheng

2016-01-01

The ability to suppress unwanted emotional memories is crucial for human mental health. Through consolidation over time, emotional memories often become resistant to change. However, how consolidation impacts the effectiveness of emotional memory suppression is still unknown. Using event-related fMRI while concurrently recording skin conductance, we investigated the neurobiological processes underlying the suppression of aversive memories before and after overnight consolidation. Here we report that consolidated aversive memories retain their emotional reactivity and become more resistant to suppression. Suppression of consolidated memories involves higher prefrontal engagement, and less concomitant hippocampal and amygdala disengagement. In parallel, we show a shift away from hippocampal-dependent representational patterns to distributed neocortical representational patterns in the suppression of aversive memories after consolidation. These findings demonstrate rapid changes in emotional memory organization with overnight consolidation, and suggest possible neurobiological bases underlying the resistance to suppression of emotional memories in affective disorders. PMID:27898050

Some links on this page may take you to non-federal websites. Their policies may differ from this site.