A simple modern correctness condition for a space-based high-performance multiprocessor
NASA Technical Reports Server (NTRS)
Probst, David K.; Li, Hon F.
1992-01-01
A number of U.S. national programs, including space-based detection of ballistic missile launches, envisage putting significant computing power into space. Given sufficient progress in low-power VLSI, multichip-module packaging and liquid-cooling technologies, we will see design of high-performance multiprocessors for individual satellites. In very high speed implementations, performance depends critically on tolerating large latencies in interprocessor communication; without latency tolerance, performance is limited by the vastly differing time scales in processor and data-memory modules, including interconnect times. The modern approach to tolerating remote-communication cost in scalable, shared-memory multiprocessors is to use a multithreaded architecture, and alter the semantics of shared memory slightly, at the price of forcing the programmer either to reason about program correctness in a relaxed consistency model or to agree to program in a constrained style. The literature on multiprocessor correctness conditions has become increasingly complex, and sometimes confusing, which may hinder its practical application. We propose a simple modern correctness condition for a high-performance, shared-memory multiprocessor; the correctness condition is based on a simple interface between the multiprocessor architecture and a high-performance, shared-memory multiprocessor; the correctness condition is based on a simple interface between the multiprocessor architecture and the parallel programming system.
Shared versus distributed memory multiprocessors
NASA Technical Reports Server (NTRS)
Jordan, Harry F.
1991-01-01
The question of whether multiprocessors should have shared or distributed memory has attracted a great deal of attention. Some researchers argue strongly for building distributed memory machines, while others argue just as strongly for programming shared memory multiprocessors. A great deal of research is underway on both types of parallel systems. Special emphasis is placed on systems with a very large number of processors for computation intensive tasks and considers research and implementation trends. It appears that the two types of systems will likely converge to a common form for large scale multiprocessors.
A general model for memory interference in a multiprocessor system with memory hierarchy
NASA Technical Reports Server (NTRS)
Taha, Badie A.; Standley, Hilda M.
1989-01-01
The problem of memory interference in a multiprocessor system with a hierarchy of shared buses and memories is addressed. The behavior of the processors is represented by a sequence of memory requests with each followed by a determined amount of processing time. A statistical queuing network model for determining the extent of memory interference in multiprocessor systems with clusters of memory hierarchies is presented. The performance of the system is measured by the expected number of busy memory clusters. The results of the analytic model are compared with simulation results, and the correlation between them is found to be very high.
Preliminary basic performance analysis of the Cedar multiprocessor memory system
NASA Technical Reports Server (NTRS)
Gallivan, K.; Jalby, W.; Turner, S.; Veidenbaum, A.; Wijshoff, H.
1991-01-01
Some preliminary basic results on the performance of the Cedar multiprocessor memory system are presented. Empirical results are presented and used to calibrate a memory system simulator which is then used to discuss the scalability of the system.
Multiprocessor architectural study
NASA Technical Reports Server (NTRS)
Kosmala, A. L.; Stanten, S. F.; Vandever, W. H.
1972-01-01
An architectural design study was made of a multiprocessor computing system intended to meet functional and performance specifications appropriate to a manned space station application. Intermetrics, previous experience, and accumulated knowledge of the multiprocessor field is used to generate a baseline philosophy for the design of a future SUMC* multiprocessor. Interrupts are defined and the crucial questions of interrupt structure, such as processor selection and response time, are discussed. Memory hierarchy and performance is discussed extensively with particular attention to the design approach which utilizes a cache memory associated with each processor. The ability of an individual processor to approach its theoretical maximum performance is then analyzed in terms of a hit ratio. Memory management is envisioned as a virtual memory system implemented either through segmentation or paging. Addressing is discussed in terms of various register design adopted by current computers and those of advanced design.
A fault-tolerant information processing concept for space vehicles.
NASA Technical Reports Server (NTRS)
Hopkins, A. L., Jr.
1971-01-01
A distributed fault-tolerant information processing system is proposed, comprising a central multiprocessor, dedicated local processors, and multiplexed input-output buses connecting them together. The processors in the multiprocessor are duplicated for error detection, which is felt to be less expensive than using coded redundancy of comparable effectiveness. Error recovery is made possible by a triplicated scratchpad memory in each processor. The main multiprocessor memory uses replicated memory for error detection and correction. Local processors use any of three conventional redundancy techniques: voting, duplex pairs with backup, and duplex pairs in independent subsystems.
NASA Astrophysics Data System (ADS)
Pleros, Nikos; Maniotis, Pavlos; Alexoudi, Theonitsa; Fitsios, Dimitris; Vagionas, Christos; Papaioannou, Sotiris; Vyrsokinos, K.; Kanellos, George T.
2014-03-01
The processor-memory performance gap, commonly referred to as "Memory Wall" problem, owes to the speed mismatch between processor and electronic RAM clock frequencies, forcing current Chip Multiprocessor (CMP) configurations to consume more than 50% of the chip real-estate for caching purposes. In this article, we present our recent work spanning from Si-based integrated optical RAM cell architectures up to complete optical cache memory architectures for Chip Multiprocessor configurations. Moreover, we discuss on e/o router subsystems with up to Tb/s routing capacity for cache interconnection purposes within CMP configurations, currently pursued within the FP7 PhoxTrot project.
A cache-aided multiprocessor rollback recovery scheme
NASA Technical Reports Server (NTRS)
Wu, Kun-Lung; Fuchs, W. Kent
1989-01-01
This paper demonstrates how previous uniprocessor cache-aided recovery schemes can be applied to multiprocessor architectures, for recovering from transient processor failures, utilizing private caches and a global shared memory. As with cache-aided uniprocessor recovery, the multiprocessor cache-aided recovery scheme of this paper can be easily integrated into standard bus-based snoopy cache coherence protocols. A consistent shared memory state is maintained without the necessity of global check-pointing.
C-MOS array design techniques: SUMC multiprocessor system study
NASA Technical Reports Server (NTRS)
Clapp, W. A.; Helbig, W. A.; Merriam, A. S.
1972-01-01
The current capabilities of LSI techniques for speed and reliability, plus the possibilities of assembling large configurations of LSI logic and storage elements, have demanded the study of multiprocessors and multiprocessing techniques, problems, and potentialities. Evaluated are three previous systems studies for a space ultrareliable modular computer multiprocessing system, and a new multiprocessing system is proposed that is flexibly configured with up to four central processors, four 1/0 processors, and 16 main memory units, plus auxiliary memory and peripheral devices. This multiprocessor system features a multilevel interrupt, qualified S/360 compatibility for ground-based generation of programs, virtual memory management of a storage hierarchy through 1/0 processors, and multiport access to multiple and shared memory units.
NASA Technical Reports Server (NTRS)
Bradley, D. B.; Irwin, J. D.
1974-01-01
A computer simulation model for a multiprocessor computer is developed that is useful for studying the problem of matching multiprocessor's memory space, memory bandwidth and numbers and speeds of processors with aggregate job set characteristics. The model assumes an input work load of a set of recurrent jobs. The model includes a feedback scheduler/allocator which attempts to improve system performance through higher memory bandwidth utilization by matching individual job requirements for space and bandwidth with space availability and estimates of bandwidth availability at the times of memory allocation. The simulation model includes provisions for specifying precedence relations among the jobs in a job set, and provisions for specifying precedence execution of TMR (Triple Modular Redundant and SIMPLEX (non redundant) jobs.
Ohmacht, Martin
2017-08-15
In a multiprocessor system, a central memory synchronization module coordinates memory synchronization requests responsive to memory access requests in flight, a generation counter, and a reclaim pointer. The central module communicates via point-to-point communication. The module includes a global OR reduce tree for each memory access requesting device, for detecting memory access requests in flight. An interface unit is implemented associated with each processor requesting synchronization. The interface unit includes multiple generation completion detectors. The generation count and reclaim pointer do not pass one another.
Ohmacht, Martin
2014-09-09
In a multiprocessor system, a central memory synchronization module coordinates memory synchronization requests responsive to memory access requests in flight, a generation counter, and a reclaim pointer. The central module communicates via point-to-point communication. The module includes a global OR reduce tree for each memory access requesting device, for detecting memory access requests in flight. An interface unit is implemented associated with each processor requesting synchronization. The interface unit includes multiple generation completion detectors. The generation count and reclaim pointer do not pass one another.
Experimental evaluation of multiprocessor cache-based error recovery
NASA Technical Reports Server (NTRS)
Janssens, Bob; Fuchs, W. K.
1991-01-01
Several variations of cache-based checkpointing for rollback error recovery in shared-memory multiprocessors have been recently developed. By modifying the cache replacement policy, these techniques use the inherent redundancy in the memory hierarchy to periodically checkpoint the computation state. Three schemes, different in the manner in which they avoid rollback propagation, are evaluated. By simulation with address traces from parallel applications running on an Encore Multimax shared-memory multiprocessor, the performance effect of integrating the recovery schemes in the cache coherence protocol are evaluated. The results indicate that the cache-based schemes can provide checkpointing capability with low performance overhead but uncontrollable high variability in the checkpoint interval.
Cache-based error recovery for shared memory multiprocessor systems
NASA Technical Reports Server (NTRS)
Wu, Kun-Lung; Fuchs, W. Kent; Patel, Janak H.
1989-01-01
A multiprocessor cache-based checkpointing and recovery scheme for of recovering from transient processor errors in a shared-memory multiprocessor with private caches is presented. New implementation techniques that use checkpoint identifiers and recovery stacks to reduce performance degradation in processor utilization during normal execution are examined. This cache-based checkpointing technique prevents rollback propagation, provides for rapid recovery, and can be integrated into standard cache coherence protocols. An analytical model is used to estimate the relative performance of the scheme during normal execution. Extensions that take error latency into account are presented.
Cache as point of coherence in multiprocessor system
Blumrich, Matthias A.; Ceze, Luis H.; Chen, Dong; Gara, Alan; Heidelberger, Phlip; Ohmacht, Martin; Steinmacher-Burow, Burkhard; Zhuang, Xiaotong
2016-11-29
In a multiprocessor system, a conflict checking mechanism is implemented in the L2 cache memory. Different versions of speculative writes are maintained in different ways of the cache. A record of speculative writes is maintained in the cache directory. Conflict checking occurs as part of directory lookup. Speculative versions that do not conflict are aggregated into an aggregated version in a different way of the cache. Speculative memory access requests do not go to main memory.
Low latency memory access and synchronization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Blumrich, Matthias A.; Chen, Dong; Coteus, Paul W.
A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Each processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processormore » only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple prefetching for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefetch rather than some other predictive algorithm. This enables hardware to effectively prefetch memory access patterns that are non-contiguous, but repetitive.« less
Low latency memory access and synchronization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Blumrich, Matthias A.; Chen, Dong; Coteus, Paul W.
A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Bach processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processormore » only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple prefetching for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefetch rather than some other predictive algorithm. This enables hardware to effectively prefetch memory access patterns that are non-contiguous, but repetitive.« less
Scheduling for Locality in Shared-Memory Multiprocessors
1993-05-01
Submitted in Partial Fulfillment of the Requirements for the Degree ’)iIC Q(JALfryT INSPECTED 5 DOCTOR OF PHILOSOPHY I Accesion For Supervised by NTIS CRAM... architecture on parallel program performance, explain the implications of this trend on popular parallel programming models, and propose system software to 0...decomoosition and scheduling algorithms. I. SUIUECT TERMS IS. NUMBER OF PAGES shared-memory multiprocessors; architecture trends; loop 110 scheduling
Blumrich, Matthias A.; Salapura, Valentina
2010-11-02
An apparatus and method are disclosed for single-stepping coherence events in a multiprocessor system under software control in order to monitor the behavior of a memory coherence mechanism. Single-stepping coherence events in a multiprocessor system is made possible by adding one or more step registers. By accessing these step registers, one or more coherence requests are processed by the multiprocessor system. The step registers determine if the snoop unit will operate by proceeding in a normal execution mode, or operate in a single-step mode.
Method for prefetching non-contiguous data structures
Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton On Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Hoenicke, Dirk [Ossining, NY; Ohmacht, Martin [Brewster, NY; Steinmacher-Burow, Burkhard D [Mount Kisco, NY; Takken, Todd E [Mount Kisco, NY; Vranas, Pavlos M [Bedford Hills, NY
2009-05-05
A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Each processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple perfecting for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefect rather than some other predictive algorithm. This enables hardware to effectively prefect memory access patterns that are non-contiguous, but repetitive.
The FORCE - A highly portable parallel programming language
NASA Technical Reports Server (NTRS)
Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger
1989-01-01
This paper explains why the FORCE parallel programming language is easily portable among six different shared-memory multiprocessors, and how a two-level macro preprocessor makes it possible to hide low-level machine dependencies and to build machine-independent high-level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared-memory multiprocessor executing them.
Programmable partitioning for high-performance coherence domains in a multiprocessor system
Blumrich, Matthias A [Ridgefield, CT; Salapura, Valentina [Chappaqua, NY
2011-01-25
A multiprocessor computing system and a method of logically partitioning a multiprocessor computing system are disclosed. The multiprocessor computing system comprises a multitude of processing units, and a multitude of snoop units. Each of the processing units includes a local cache, and the snoop units are provided for supporting cache coherency in the multiprocessor system. Each of the snoop units is connected to a respective one of the processing units and to all of the other snoop units. The multiprocessor computing system further includes a partitioning system for using the snoop units to partition the multitude of processing units into a plurality of independent, memory-consistent, adjustable-size processing groups. Preferably, when the processor units are partitioned into these processing groups, the partitioning system also configures the snoop units to maintain cache coherency within each of said groups.
NASA Technical Reports Server (NTRS)
Smith, T. B., Jr.; Lala, J. H.
1983-01-01
The basic organization of the fault tolerant multiprocessor, (FTMP) is that of a general purpose homogeneous multiprocessor. Three processors operate on a shared system (memory and I/O) bus. Replication and tight synchronization of all elements and hardware voting is employed to detect and correct any single fault. Reconfiguration is then employed to repair a fault. Multiple faults may be tolerated as a sequence of single faults with repair between fault occurrences.
Compiler-directed cache management in multiprocessors
NASA Technical Reports Server (NTRS)
Cheong, Hoichi; Veidenbaum, Alexander V.
1990-01-01
The necessity of finding alternatives to hardware-based cache coherence strategies for large-scale multiprocessor systems is discussed. Three different software-based strategies sharing the same goals and general approach are presented. They consist of a simple invalidation approach, a fast selective invalidation scheme, and a version control scheme. The strategies are suitable for shared-memory multiprocessor systems with interconnection networks and a large number of processors. Results of trace-driven simulations conducted on numerical benchmark routines to compare the performance of the three schemes are presented.
Multiprocessor and memory architecture of the neurocomputer SYNAPSE-1.
Ramacher, U; Raab, W; Anlauf, J; Hachmann, U; Beichter, J; Brüls, N; Wesseling, M; Sicheneder, E; Männer, R; Glass, J
1993-12-01
A general purpose neurocomputer, SYNAPSE-1, which exhibits a multiprocessor and memory architecture is presented. It offers wide flexibility with respect to neural algorithms and a speed-up factor of several orders of magnitude--including learning. The computational power is provided by a 2-dimensional systolic array of neural signal processors. Since the weights are stored outside these NSPs, memory size and processing power can be adapted individually to the application needs. A neural algorithms programming language, embedded in C(+2) has been defined for the user to cope with the neurocomputer. In a benchmark test, the prototype of SYNAPSE-1 was 8000 times as fast as a standard workstation.
Shared Memory Parallelization of an Implicit ADI-type CFD Code
NASA Technical Reports Server (NTRS)
Hauser, Th.; Huang, P. G.
1999-01-01
A parallelization study designed for ADI-type algorithms is presented using the OpenMP specification for shared-memory multiprocessor programming. Details of optimizations specifically addressed to cache-based computer architectures are described and performance measurements for the single and multiprocessor implementation are summarized. The paper demonstrates that optimization of memory access on a cache-based computer architecture controls the performance of the computational algorithm. A hybrid MPI/OpenMP approach is proposed for clusters of shared memory machines to further enhance the parallel performance. The method is applied to develop a new LES/DNS code, named LESTool. A preliminary DNS calculation of a fully developed channel flow at a Reynolds number of 180, Re(sub tau) = 180, has shown good agreement with existing data.
Parallelising a molecular dynamics algorithm on a multi-processor workstation
NASA Astrophysics Data System (ADS)
Müller-Plathe, Florian
1990-12-01
The Verlet neighbour-list algorithm is parallelised for a multi-processor Hewlett-Packard/Apollo DN10000 workstation. The implementation makes use of memory shared between the processors. It is a genuine master-slave approach by which most of the computational tasks are kept in the master process and the slaves are only called to do part of the nonbonded forces calculation. The implementation features elements of both fine-grain and coarse-grain parallelism. Apart from three calls to library routines, two of which are standard UNIX calls, and two machine-specific language extensions, the whole code is written in standard Fortran 77. Hence, it may be expected that this parallelisation concept can be transfered in parts or as a whole to other multi-processor shared-memory computers. The parallel code is routinely used in production work.
Multiprocessor shared-memory information exchange
DOE Office of Scientific and Technical Information (OSTI.GOV)
Santoline, L.L.; Bowers, M.D.; Crew, A.W.
1989-02-01
In distributed microprocessor-based instrumentation and control systems, the inter-and intra-subsystem communication requirements ultimately form the basis for the overall system architecture. This paper describes a software protocol which addresses the intra-subsystem communications problem. Specifically the protocol allows for multiple processors to exchange information via a shared-memory interface. The authors primary goal is to provide a reliable means for information to be exchanged between central application processor boards (masters) and dedicated function processor boards (slaves) in a single computer chassis. The resultant Multiprocessor Shared-Memory Information Exchange (MSMIE) protocol, a standard master-slave shared-memory interface suitable for use in nuclear safety systems, ismore » designed to pass unidirectional buffers of information between the processors while providing a minimum, deterministic cycle time for this data exchange.« less
The force on the flex: Global parallelism and portability
NASA Technical Reports Server (NTRS)
Jordan, H. F.
1986-01-01
A parallel programming methodology, called the force, supports the construction of programs to be executed in parallel by an unspecified, but potentially large, number of processes. The methodology was originally developed on a pipelined, shared memory multiprocessor, the Denelcor HEP, and embodies the primitive operations of the force in a set of macros which expand into multiprocessor Fortran code. A small set of primitives is sufficient to write large parallel programs, and the system has been used to produce 10,000 line programs in computational fluid dynamics. The level of complexity of the force primitives is intermediate. It is high enough to mask detailed architectural differences between multiprocessors but low enough to give the user control over performance. The system is being ported to a medium scale multiprocessor, the Flex/32, which is a 20 processor system with a mixture of shared and local memory. Memory organization and the type of processor synchronization supported by the hardware on the two machines lead to some differences in efficient implementations of the force primitives, but the user interface remains the same. An initial implementation was done by retargeting the macros to Flexible Computer Corporation's ConCurrent C language. Subsequently, the macros were caused to directly produce the system calls which form the basis for ConCurrent C. The implementation of the Fortran based system is in step with Flexible Computer Corporations's implementation of a Fortran system in the parallel environment.
Neural networks and MIMD-multiprocessors
NASA Technical Reports Server (NTRS)
Vanhala, Jukka; Kaski, Kimmo
1990-01-01
Two artificial neural network models are compared. They are the Hopfield Neural Network Model and the Sparse Distributed Memory model. Distributed algorithms for both of them are designed and implemented. The run time characteristics of the algorithms are analyzed theoretically and tested in practice. The storage capacities of the networks are compared. Implementations are done using a distributed multiprocessor system.
Shared Versus Distributed Memory Multiprocessors
1991-01-01
multiprocessors should hawe shared or dis.trimuted meieo-% ha~ trr ~ g ’’~ de~i c4~accio;, S Cm teaicners argue S trongly tor Outiding (li15 tri huted...Applications, MIT Press (1985). 161 D. Gajski et el., "Cedar," Proc. Compcon, pp. 306-309 (Spring 19S9). 171 S. Ahuja, N. Carriero and D. Gelernter, "Linda
Distributed parallel messaging for multiprocessor systems
Chen, Dong; Heidelberger, Philip; Salapura, Valentina; Senger, Robert M; Steinmacher-Burrow, Burhard; Sugawara, Yutaka
2013-06-04
A method and apparatus for distributed parallel messaging in a parallel computing system. The apparatus includes, at each node of a multiprocessor network, multiple injection messaging engine units and reception messaging engine units, each implementing a DMA engine and each supporting both multiple packet injection into and multiple reception from a network, in parallel. The reception side of the messaging unit (MU) includes a switch interface enabling writing of data of a packet received from the network to the memory system. The transmission side of the messaging unit, includes switch interface for reading from the memory system when injecting packets into the network.
Automatic Data Partitioning on Distributed Memory Multiprocessors
1990-10-01
DISTRIBUTED MEMORY MULTIPROCESSORS D 1"’ 1 C . Manish Gupta NOV,1 41990.NOV 1 41990m Prithviraj Banerjee D Coordinated Science Laboratory College of...developed on the partitioning of arrays can as well be applied to other programming languages, such as C . 3 The rest of this paper is organized as follows...value 1, as in Fortran. a) N= 4, N 2 = 1: f(i) = J,(j) = 03 b) =Ni 1, N 2 =4: fA(i) =, f() - c ) NI 2, X) 2: f()=[., f2(j) = [L-.j d) N 1 , N 2 =4: fA(i
The performance of disk arrays in shared-memory database machines
NASA Technical Reports Server (NTRS)
Katz, Randy H.; Hong, Wei
1993-01-01
In this paper, we examine how disk arrays and shared memory multiprocessors lead to an effective method for constructing database machines for general-purpose complex query processing. We show that disk arrays can lead to cost-effective storage systems if they are configured from suitably small formfactor disk drives. We introduce the storage system metric data temperature as a way to evaluate how well a disk configuration can sustain its workload, and we show that disk arrays can sustain the same data temperature as a more expensive mirrored-disk configuration. We use the metric to evaluate the performance of disk arrays in XPRS, an operational shared-memory multiprocessor database system being developed at the University of California, Berkeley.
Parallel and fault-tolerant algorithms for hypercube multiprocessors
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aykanat, C.
1988-01-01
Several techniques for increasing the performance of parallel algorithms on distributed-memory message-passing multi-processor systems are investigated. These techniques are effectively implemented for the parallelization of the Scaled Conjugate Gradient (SCG) algorithm on a hypercube connected message-passing multi-processor. Significant performance improvement is achieved by using these techniques. The SCG algorithm is used for the solution phase of an FE modeling system. Almost linear speed-up is achieved, and it is shown that hypercube topology is scalable for an FE class of problem. The SCG algorithm is also shown to be suitable for vectorization, and near supercomputer performance is achieved on a vectormore » hypercube multiprocessor by exploiting both parallelization and vectorization. Fault-tolerance issues for the parallel SCG algorithm and for the hypercube topology are also addressed.« less
Adaptive Backoff Synchronization Techniques
1989-07-01
The Simple Code. Technical Report, Lawrence Livermore Laboratory, February 1978. [6] F. Darems-Rogers, D. A. George, V. A. Norton, and G . F. Pfister...Heights, November 1986. 20 [7] Daniel Gajski , David Kuck, Duncan Lawrie, and Ahmed Saleh. Cedar - A Large Scale Multiprocessor. In International...17] Janak H. Patel. Analysis of Multiprocessors with Private Cache Memories. IEEE Transactions on Com- puters, C-31(4):296-304, April 1982. [18] G
NASA Technical Reports Server (NTRS)
Sohn, Andrew; Biswas, Rupak; Simon, Horst D.
1996-01-01
The computational requirements for an adaptive solution of unsteady problems change as the simulation progresses. This causes workload imbalance among processors on a parallel machine which, in turn, requires significant data movement at runtime. We present a new dynamic load-balancing framework, called JOVE, that balances the workload across all processors with a global view. Whenever the computational mesh is adapted, JOVE is activated to eliminate the load imbalance. JOVE has been implemented on an IBM SP2 distributed-memory machine in MPI for portability. Experimental results for two model meshes demonstrate that mesh adaption with load balancing gives more than a sixfold improvement over one without load balancing. We also show that JOVE gives a 24-fold speedup on 64 processors compared to sequential execution.
Expert Systems on Multiprocessor Architectures. Volume 2. Technical Reports
1991-06-01
Report RC 12936 (#58037). IBM T. J. Wartson Reiearch Center. July 1987. Alan Jay Smith. Cache memories. Coniputing Sitrry., 1.1(3): I.3-5:30...basic-shared is an instrument for ashared memory design. The components panels are processor- qload-scrolling-bar-panel, memory-qload-scrolling-bar-panel
DOE Office of Scientific and Technical Information (OSTI.GOV)
Blanford, M.
1997-12-31
Most commercially-available quasistatic finite element programs assemble element stiffnesses into a global stiffness matrix, then use a direct linear equation solver to obtain nodal displacements. However, for large problems (greater than a few hundred thousand degrees of freedom), the memory size and computation time required for this approach becomes prohibitive. Moreover, direct solution does not lend itself to the parallel processing needed for today`s multiprocessor systems. This talk gives an overview of the iterative solution strategy of JAS3D, the nonlinear large-deformation quasistatic finite element program. Because its architecture is derived from an explicit transient-dynamics code, it does not ever assemblemore » a global stiffness matrix. The author describes the approach he used to implement the solver on multiprocessor computers, and shows examples of problems run on hundreds of processors and more than a million degrees of freedom. Finally, he describes some of the work he is presently doing to address the challenges of iterative convergence for ill-conditioned problems.« less
The MIT Alewife Machine: A Large-Scale Distributed-Memory Multiprocessor
1991-06-01
Symposium on Compiler Construction, June 1986. [14] Daniel Gajski , David Kuck, Duncan Lawrie, and Ahmed Saleh. Cedar - A Large Scale Multiprocessor. In...Directory Methods. In Proceedings 17th Annual International Symposium on Computer Architecture, June 1990. [31] G . M. Papadopoulos and D.E. Culler...Monsoon: An Explicit Token-Store Ar- chitecture. In Proceedings 17th Annual International Symposium on Computer Architecture, June 1990. [32] G . F
Adaptive Backoff Synchronization Techniques
1989-06-01
The Simple Code. Technical Report, Lawrence Livermore Laboratory, February 1978. [6J F. Darems-Rogers, D. A. George, V. A. Norton, and G . F. Pfister...Heights, November 1986. 20 [7] Daniel Gajski , David Kuck, Duncan Lawrie, and Ahmed Saleh. Cedar - A Large Scale Multiprocessor. In International Conference...17] Janak H. Patel. Analysis of Multiprocessors with Private Cache Memories. IEEE Transactions on Com- puters, C-31(4):296-304, April 1982. [18] G
MPF: A portable message passing facility for shared memory multiprocessors
NASA Technical Reports Server (NTRS)
Malony, Allen D.; Reed, Daniel A.; Mcguire, Patrick J.
1987-01-01
The design, implementation, and performance evaluation of a message passing facility (MPF) for shared memory multiprocessors are presented. The MPF is based on a message passing model conceptually similar to conversations. Participants (parallel processors) can enter or leave a conversation at any time. The message passing primitives for this model are implemented as a portable library of C function calls. The MPF is currently operational on a Sequent Balance 21000, and several parallel applications were developed and tested. Several simple benchmark programs are presented to establish interprocess communication performance for common patterns of interprocess communication. Finally, performance figures are presented for two parallel applications, linear systems solution, and iterative solution of partial differential equations.
Event parallelism: Distributed memory parallel computing for high energy physics experiments
NASA Astrophysics Data System (ADS)
Nash, Thomas
1989-12-01
This paper describes the present and expected future development of distributed memory parallel computers for high energy physics experiments. It covers the use of event parallel microprocessor farms, particularly at Fermilab, including both ACP multiprocessors and farms of MicroVAXES. These systems have proven very cost effective in the past. A case is made for moving to the more open environment of UNIX and RISC processors. The 2nd Generation ACP Multiprocessor System, which is based on powerful RISC system, is described. Given the promise of still more extraordinary increases in processor performance, a new emphasis on point to point, rather than bussed, communication will be required. Developments in this direction are described.
A multiarchitecture parallel-processing development environment
NASA Technical Reports Server (NTRS)
Townsend, Scott; Blech, Richard; Cole, Gary
1993-01-01
A description is given of the hardware and software of a multiprocessor test bed - the second generation Hypercluster system. The Hypercluster architecture consists of a standard hypercube distributed-memory topology, with multiprocessor shared-memory nodes. By using standard, off-the-shelf hardware, the system can be upgraded to use rapidly improving computer technology. The Hypercluster's multiarchitecture nature makes it suitable for researching parallel algorithms in computational field simulation applications (e.g., computational fluid dynamics). The dedicated test-bed environment of the Hypercluster and its custom-built software allows experiments with various parallel-processing concepts such as message passing algorithms, debugging tools, and computational 'steering'. Such research would be difficult, if not impossible, to achieve on shared, commercial systems.
Evaluation of the Cedar memory system: Configuration of 16 by 16
NASA Technical Reports Server (NTRS)
Gallivan, K.; Jalby, W.; Wijshoff, H.
1991-01-01
Some basic results on the performance of the Cedar multiprocessor system are presented. Empirical results on the 16 processor 16 memory bank system configuration, which show the behavior of the Cedar system under different modes of operation are presented.
Avoiding and tolerating latency in large-scale next-generation shared-memory multiprocessors
NASA Technical Reports Server (NTRS)
Probst, David K.
1993-01-01
A scalable solution to the memory-latency problem is necessary to prevent the large latencies of synchronization and memory operations inherent in large-scale shared-memory multiprocessors from reducing high performance. We distinguish latency avoidance and latency tolerance. Latency is avoided when data is brought to nearby locales for future reference. Latency is tolerated when references are overlapped with other computation. Latency-avoiding locales include: processor registers, data caches used temporally, and nearby memory modules. Tolerating communication latency requires parallelism, allowing the overlap of communication and computation. Latency-tolerating techniques include: vector pipelining, data caches used spatially, prefetching in various forms, and multithreading in various forms. Relaxing the consistency model permits increased use of avoidance and tolerance techniques. Each model is a mapping from the program text to sets of partial orders on program operations; it is a convention about which temporal precedences among program operations are necessary. Information about temporal locality and parallelism constrains the use of avoidance and tolerance techniques. Suitable architectural primitives and compiler technology are required to exploit the increased freedom to reorder and overlap operations in relaxed models.
An Evaluation of Architectural Platforms for Parallel Navier-Stokes Computations
NASA Technical Reports Server (NTRS)
Jayasimha, D. N.; Hayder, M. E.; Pillay, S. K.
1996-01-01
We study the computational, communication, and scalability characteristics of a computational fluid dynamics application, which solves the time accurate flow field of a jet using the compressible Navier-Stokes equations, on a variety of parallel architecture platforms. The platforms chosen for this study are a cluster of workstations (the LACE experimental testbed at NASA Lewis), a shared memory multiprocessor (the Cray YMP), and distributed memory multiprocessors with different topologies - the IBM SP and the Cray T3D. We investigate the impact of various networks connecting the cluster of workstations on the performance of the application and the overheads induced by popular message passing libraries used for parallelization. The work also highlights the importance of matching the memory bandwidth to the processor speed for good single processor performance. By studying the performance of an application on a variety of architectures, we are able to point out the strengths and weaknesses of each of the example computing platforms.
Parallelizing Navier-Stokes Computations on a Variety of Architectural Platforms
NASA Technical Reports Server (NTRS)
Jayasimha, D. N.; Hayder, M. E.; Pillay, S. K.
1997-01-01
We study the computational, communication, and scalability characteristics of a Computational Fluid Dynamics application, which solves the time accurate flow field of a jet using the compressible Navier-Stokes equations, on a variety of parallel architectural platforms. The platforms chosen for this study are a cluster of workstations (the LACE experimental testbed at NASA Lewis), a shared memory multiprocessor (the Cray YMP), distributed memory multiprocessors with different topologies-the IBM SP and the Cray T3D. We investigate the impact of various networks, connecting the cluster of workstations, on the performance of the application and the overheads induced by popular message passing libraries used for parallelization. The work also highlights the importance of matching the memory bandwidth to the processor speed for good single processor performance. By studying the performance of an application on a variety of architectures, we are able to point out the strengths and weaknesses of each of the example computing platforms.
Dynamic programming on a shared-memory multiprocessor
NASA Technical Reports Server (NTRS)
Edmonds, Phil; Chu, Eleanor; George, Alan
1993-01-01
Three new algorithms for solving dynamic programming problems on a shared-memory parallel computer are described. All three algorithms attempt to balance work load, while keeping synchronization cost low. In particular, for a multiprocessor having p processors, an analysis of the best algorithm shows that the arithmetic cost is O(n-cubed/6p) and that the synchronization cost is O(absolute value of log sub C n) if p much less than n, where C = (2p-1)/(2p + 1) and n is the size of the problem. The low synchronization cost is important for machines where synchronization is expensive. Analysis and experiments show that the best algorithm is effective in balancing the work load and producing high efficiency.
Software Coherence in Multiprocessor Memory Systems. Ph.D. Thesis
NASA Technical Reports Server (NTRS)
Bolosky, William Joseph
1993-01-01
Processors are becoming faster and multiprocessor memory interconnection systems are not keeping up. Therefore, it is necessary to have threads and the memory they access as near one another as possible. Typically, this involves putting memory or caches with the processors, which gives rise to the problem of coherence: if one processor writes an address, any other processor reading that address must see the new value. This coherence can be maintained by the hardware or with software intervention. Systems of both types have been built in the past; the hardware-based systems tended to outperform the software ones. However, the ratio of processor to interconnect speed is now so high that the extra overhead of the software systems may no longer be significant. This issue is explored both by implementing a software maintained system and by introducing and using the technique of offline optimal analysis of memory reference traces. It finds that in properly built systems, software maintained coherence can perform comparably to or even better than hardware maintained coherence. The architectural features necessary for efficient software coherence to be profitable include a small page size, a fast trap mechanism, and the ability to execute instructions while remote memory references are outstanding.
Address tracing for parallel machines
NASA Technical Reports Server (NTRS)
Stunkel, Craig B.; Janssens, Bob; Fuchs, W. Kent
1991-01-01
Recently implemented parallel system address-tracing methods based on several metrics are surveyed. The issues specific to collection of traces for both shared and distributed memory parallel computers are highlighted. Five general categories of address-trace collection methods are examined: hardware-captured, interrupt-based, simulation-based, altered microcode-based, and instrumented program-based traces. The problems unique to shared memory and distributed memory multiprocessors are examined separately.
Cache directory look-up re-use as conflict check mechanism for speculative memory requests
Ohmacht, Martin
2013-09-10
In a cache memory, energy and other efficiencies can be realized by saving a result of a cache directory lookup for sequential accesses to a same memory address. Where the cache is a point of coherence for speculative execution in a multiprocessor system, with directory lookups serving as the point of conflict detection, such saving becomes particularly advantageous.
Design and evaluation of a fault-tolerant multiprocessor using hardware recovery blocks
NASA Technical Reports Server (NTRS)
Lee, Y. H.; Shin, K. G.
1982-01-01
A fault-tolerant multiprocessor with a rollback recovery mechanism is discussed. The rollback mechanism is based on the hardware recovery block which is a hardware equivalent to the software recovery block. The hardware recovery block is constructed by consecutive state-save operations and several state-save units in every processor and memory module. When a fault is detected, the multiprocessor reconfigures itself to replace the faulty component and then the process originally assigned to the faulty component retreats to one of the previously saved states in order to resume fault-free execution. A mathematical model is proposed to calculate both the coverage of multi-step rollback recovery and the risk of restart. A performance evaluation in terms of task execution time is also presented.
Recent Progress on the Parallel Implementation of Moving-Body Overset Grid Schemes
NASA Technical Reports Server (NTRS)
Wissink, Andrew; Allen, Edwin (Technical Monitor)
1998-01-01
Viscous calculations about geometrically complex bodies in which there is relative motion between component parts is one of the most computationally demanding problems facing CFD researchers today. This presentation documents results from the first two years of a CHSSI-funded effort within the U.S. Army AFDD to develop scalable dynamic overset grid methods for unsteady viscous calculations with moving-body problems. The first pan of the presentation will focus on results from OVERFLOW-D1, a parallelized moving-body overset grid scheme that employs traditional Chimera methodology. The two processes that dominate the cost of such problems are the flow solution on each component and the intergrid connectivity solution. Parallel implementations of the OVERFLOW flow solver and DCF3D connectivity software are coupled with a proposed two-part static-dynamic load balancing scheme and tested on the IBM SP and Cray T3E multi-processors. The second part of the presentation will cover some recent results from OVERFLOW-D2, a new flow solver that employs Cartesian grids with various levels of refinement, facilitating solution adaption. A study of the parallel performance of the scheme on large distributed- memory multiprocessor computer architectures will be reported.
Parallelization of NAS Benchmarks for Shared Memory Multiprocessors
NASA Technical Reports Server (NTRS)
Waheed, Abdul; Yan, Jerry C.; Saini, Subhash (Technical Monitor)
1998-01-01
This paper presents our experiences of parallelizing the sequential implementation of NAS benchmarks using compiler directives on SGI Origin2000 distributed shared memory (DSM) system. Porting existing applications to new high performance parallel and distributed computing platforms is a challenging task. Ideally, a user develops a sequential version of the application, leaving the task of porting to new generations of high performance computing systems to parallelization tools and compilers. Due to the simplicity of programming shared-memory multiprocessors, compiler developers have provided various facilities to allow the users to exploit parallelism. Native compilers on SGI Origin2000 support multiprocessing directives to allow users to exploit loop-level parallelism in their programs. Additionally, supporting tools can accomplish this process automatically and present the results of parallelization to the users. We experimented with these compiler directives and supporting tools by parallelizing sequential implementation of NAS benchmarks. Results reported in this paper indicate that with minimal effort, the performance gain is comparable with the hand-parallelized, carefully optimized, message-passing implementations of the same benchmarks.
Parallel Navier-Stokes computations on shared and distributed memory architectures
NASA Technical Reports Server (NTRS)
Hayder, M. Ehtesham; Jayasimha, D. N.; Pillay, Sasi Kumar
1995-01-01
We study a high order finite difference scheme to solve the time accurate flow field of a jet using the compressible Navier-Stokes equations. As part of our ongoing efforts, we have implemented our numerical model on three parallel computing platforms to study the computational, communication, and scalability characteristics. The platforms chosen for this study are a cluster of workstations connected through fast networks (the LACE experimental testbed at NASA Lewis), a shared memory multiprocessor (the Cray YMP), and a distributed memory multiprocessor (the IBM SPI). Our focus in this study is on the LACE testbed. We present some results for the Cray YMP and the IBM SP1 mainly for comparison purposes. On the LACE testbed, we study: (1) the communication characteristics of Ethernet, FDDI, and the ALLNODE networks and (2) the overheads induced by the PVM message passing library used for parallelizing the application. We demonstrate that clustering of workstations is effective and has the potential to be computationally competitive with supercomputers at a fraction of the cost.
Proceedings of the second SISAL users` conference
DOE Office of Scientific and Technical Information (OSTI.GOV)
Feo, J T; Frerking, C; Miller, P J
1992-12-01
This report contains papers on the following topics: A sisal code for computing the fourier transform on S{sub N}; five ways to fill your knapsack; simulating material dislocation motion in sisal; candis as an interface for sisal; parallelisation and performance of the burg algorithm on a shared-memory multiprocessor; use of genetic algorithm in sisal to solve the file design problem; implementing FFT`s in sisal; programming and evaluating the performance of signal processing applications in the sisal programming environment; sisal and Von Neumann-based languages: translation and intercommunication; an IF2 code generator for ADAM architecture; program partitioning for NUMA multiprocessor computer systems;more » mapping functional parallelism on distributed memory machines; implicit array copying: prevention is better than cure ; mathematical syntax for sisal; an approach for optimizing recursive functions; implementing arrays in sisal 2.0; Fol: an object oriented extension to the sisal language; twine: a portable, extensible sisal execution kernel; and investigating the memory performance of the optimizing sisal compiler.« less
Multiprocessor system with multiple concurrent modes of execution
Ahn, Daniel; Ceze, Luis H; Chen, Dong; Gara, Alan; Heidelberger, Philip; Ohmacht, Martin
2013-12-31
A multiprocessor system supports multiple concurrent modes of speculative execution. Speculation identification numbers (IDs) are allocated to speculative threads from a pool of available numbers. The pool is divided into domains, with each domain being assigned to a mode of speculation. Modes of speculation include TM, TLS, and rollback. Allocation of the IDs is carried out with respect to a central state table and using hardware pointers. The IDs are used for writing different versions of speculative results in different ways of a set in a cache memory.
Multiprocessor system with multiple concurrent modes of execution
Ahn, Daniel; Ceze, Luis H.; Chen, Dong Chen; Gara, Alan; Heidelberger, Philip; Ohmacht, Martin
2016-11-22
A multiprocessor system supports multiple concurrent modes of speculative execution. Speculation identification numbers (IDs) are allocated to speculative threads from a pool of available numbers. The pool is divided into domains, with each domain being assigned to a mode of speculation. Modes of speculation include TM, TLS, and rollback. Allocation of the IDs is carried out with respect to a central state table and using hardware pointers. The IDs are used for writing different versions of speculative results in different ways of a set in a cache memory.
A manual for PARTI runtime primitives
NASA Technical Reports Server (NTRS)
Berryman, Harry; Saltz, Joel
1990-01-01
Primitives are presented that are designed to help users efficiently program irregular problems (e.g., unstructured mesh sweeps, sparse matrix codes, adaptive mesh partial differential equations solvers) on distributed memory machines. These primitives are also designed for use in compilers for distributed memory multiprocessors. Communications patterns are captured at runtime, and the appropriate send and receive messages are automatically generated.
NASA Technical Reports Server (NTRS)
Sohn, Andrew; Biswas, Rupak
1996-01-01
Solving the hard Satisfiability Problem is time consuming even for modest-sized problem instances. Solving the Random L-SAT Problem is especially difficult due to the ratio of clauses to variables. This report presents a parallel synchronous simulated annealing method for solving the Random L-SAT Problem on a large-scale distributed-memory multiprocessor. In particular, we use a parallel synchronous simulated annealing procedure, called Generalized Speculative Computation, which guarantees the same decision sequence as sequential simulated annealing. To demonstrate the performance of the parallel method, we have selected problem instances varying in size from 100-variables/425-clauses to 5000-variables/21,250-clauses. Experimental results on the AP1000 multiprocessor indicate that our approach can satisfy 99.9 percent of the clauses while giving almost a 70-fold speedup on 500 processors.
Numerical methods on some structured matrix algebra problems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jessup, E.R.
1996-06-01
This proposal concerned the design, analysis, and implementation of serial and parallel algorithms for certain structured matrix algebra problems. It emphasized large order problems and so focused on methods that can be implemented efficiently on distributed-memory MIMD multiprocessors. Such machines supply the computing power and extensive memory demanded by the large order problems. We proposed to examine three classes of matrix algebra problems: the symmetric and nonsymmetric eigenvalue problems (especially the tridiagonal cases) and the solution of linear systems with specially structured coefficient matrices. As all of these are of practical interest, a major goal of this work was tomore » translate our research in linear algebra into useful tools for use by the computational scientists interested in these and related applications. Thus, in addition to software specific to the linear algebra problems, we proposed to produce a programming paradigm and library to aid in the design and implementation of programs for distributed-memory MIMD computers. We now report on our progress on each of the problems and on the programming tools.« less
1988-02-29
by memory copyin g will degrade system performance on shared-memory multiprocessors. Virtual memor y (VM) remapping, as opposed to memory copying...Bershad, G.D. Giuseppe Facchetti, Kevin Fall, G . Scott Graham, Ellen Nelson , P. Venkat Rangan, Bruno Sartirana, Shin-Yuan Tzou, Raj Vaswani, and Robert...Remote Execution in NEST", IEEE Trans. on Software Eng. 13, 8 (August 1987), 905-912. 3. G . T. Almes, A. P. Black, E. Lazowska and J. Noe, "The Eden
A manual for PARTI runtime primitives, revision 1
NASA Technical Reports Server (NTRS)
Das, Raja; Saltz, Joel; Berryman, Harry
1991-01-01
Primitives are presented that are designed to help users efficiently program irregular problems (e.g., unstructured mesh sweeps, sparse matrix codes, adaptive mesh partial differential equations solvers) on distributed memory machines. These primitives are also designed for use in compilers for distributed memory multiprocessors. Communications patterns are captured at runtime, and the appropriate send and receive messages are automatically generated.
Memory management and compiler support for rapid recovery from failures in computer systems
NASA Technical Reports Server (NTRS)
Fuchs, W. K.
1991-01-01
This paper describes recent developments in the use of memory management and compiler technology to support rapid recovery from failures in computer systems. The techniques described include cache coherence protocols for user transparent checkpointing in multiprocessor systems, compiler-based checkpoint placement, compiler-based code modification for multiple instruction retry, and forward recovery in distributed systems utilizing optimistic execution.
Programming parallel architectures: The BLAZE family of languages
NASA Technical Reports Server (NTRS)
Mehrotra, Piyush
1988-01-01
Programming multiprocessor architectures is a critical research issue. An overview is given of the various approaches to programming these architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive since they remove much of the burden of exploiting parallel architectures from the user. Also described is recent work by the author in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described, as well as the relations of this work to other current language research projects.
Power-Aware Compiler Controllable Chip Multiprocessor
NASA Astrophysics Data System (ADS)
Shikano, Hiroaki; Shirako, Jun; Wada, Yasutaka; Kimura, Keiji; Kasahara, Hironori
A power-aware compiler controllable chip multiprocessor (CMP) is presented and its performance and power consumption are evaluated with the optimally scheduled advanced multiprocessor (OSCAR) parallelizing compiler. The CMP is equipped with power control registers that change clock frequency and power supply voltage to functional units including processor cores, memories, and an interconnection network. The OSCAR compiler carries out coarse-grain task parallelization of programs and reduces power consumption using architectural power control support and the compiler's power saving scheme. The performance evaluation shows that MPEG-2 encoding on the proposed CMP with four CPUs results in 82.6% power reduction in real-time execution mode with a deadline constraint on its sequential execution time. Furthermore, MP3 encoding on a heterogeneous CMP with four CPUs and four accelerators results in 53.9% power reduction at 21.1-fold speed-up in performance against its sequential execution in the fastest execution mode.
Gara, Alan; Ohmacht, Martin
2014-09-16
In a multiprocessor system with at least two levels of cache, a speculative thread may run on a core processor in parallel with other threads. When the thread seeks to do a write to main memory, this access is to be written through the first level cache to the second level cache. After the write though, the corresponding line is deleted from the first level cache and/or prefetch unit, so that any further accesses to the same location in main memory have to be retrieved from the second level cache. The second level cache keeps track of multiple versions of data, where more than one speculative thread is running in parallel, while the first level cache does not have any of the versions during speculation. A switch allows choosing between modes of operation of a speculation blind first level cache.
Testing and operating a multiprocessor chip with processor redundancy
Bellofatto, Ralph E; Douskey, Steven M; Haring, Rudolf A; McManus, Moyra K; Ohmacht, Martin; Schmunkamp, Dietmar; Sugavanam, Krishnan; Weatherford, Bryan J
2014-10-21
A system and method for improving the yield rate of a multiprocessor semiconductor chip that includes primary processor cores and one or more redundant processor cores. A first tester conducts a first test on one or more processor cores, and encodes results of the first test in an on-chip non-volatile memory. A second tester conducts a second test on the processor cores, and encodes results of the second test in an external non-volatile storage device. An override bit of a multiplexer is set if a processor core fails the second test. In response to the override bit, the multiplexer selects a physical-to-logical mapping of processor IDs according to one of: the encoded results in the memory device or the encoded results in the external storage device. On-chip logic configures the processor cores according to the selected physical-to-logical mapping.
Strategies for concurrent processing of complex algorithms in data driven architectures
NASA Technical Reports Server (NTRS)
Stoughton, John W.; Mielke, Roland R.
1988-01-01
The purpose is to document research to develop strategies for concurrent processing of complex algorithms in data driven architectures. The problem domain consists of decision-free algorithms having large-grained, computationally complex primitive operations. Such are often found in signal processing and control applications. The anticipated multiprocessor environment is a data flow architecture containing between two and twenty computing elements. Each computing element is a processor having local program memory, and which communicates with a common global data memory. A new graph theoretic model called ATAMM which establishes rules for relating a decomposed algorithm to its execution in a data flow architecture is presented. The ATAMM model is used to determine strategies to achieve optimum time performance and to develop a system diagnostic software tool. In addition, preliminary work on a new multiprocessor operating system based on the ATAMM specifications is described.
A robot arm simulation with a shared memory multiprocessor machine
NASA Technical Reports Server (NTRS)
Kim, Sung-Soo; Chuang, Li-Ping
1989-01-01
A parallel processing scheme for a single chain robot arm is presented for high speed computation on a shared memory multiprocessor. A recursive formulation that is derived from a virtual work form of the d'Alembert equations of motion is utilized for robot arm dynamics. A joint drive system that consists of a motor rotor and gears is included in the arm dynamics model, in order to take into account gyroscopic effects due to the spinning of the rotor. The fine grain parallelism of mechanical and control subsystem models is exploited, based on independent computation associated with bodies, joint drive systems, and controllers. Efficiency and effectiveness of the parallel scheme are demonstrated through simulations of a telerobotic manipulator arm. Two different mechanical subsystem models, i.e., with and without gyroscopic effects, are compared, to show the trade-off between efficiency and accuracy.
Scheduler-Conscious Synchronization.
1994-12-01
SPONSORING I MONITORING Office of Naval Research ARPA AGENCY REPORT NUMBER Information Systems 3701 N. Fairfax Drive TR 550 Arlington VA 22217 Arlington VA...Broughton. A New Approach to Exclusive Data Access in Shared Memory Multiprocessors. Technical Report UCRL -97663, Lawrence Livermore National Laboratory
NASA Astrophysics Data System (ADS)
Goedecker, Stefan; Boulet, Mireille; Deutsch, Thierry
2003-08-01
Three-dimensional Fast Fourier Transforms (FFTs) are the main computational task in plane wave electronic structure calculations. Obtaining a high performance on a large numbers of processors is non-trivial on the latest generation of parallel computers that consist of nodes made up of a shared memory multiprocessors. A non-dogmatic method for obtaining high performance for such 3-dim FFTs in a combined MPI/OpenMP programming paradigm will be presented. Exploiting the peculiarities of plane wave electronic structure calculations, speedups of up to 160 and speeds of up to 130 Gflops were obtained on 256 processors.
A Heterogeneous Multiprocessor Graphics System Using Processor-Enhanced Memories
1989-02-01
frames per second, font generation directly from conic spline descriptions, and rapid calculation of radiosity form factors. The hardware consists of...generality for rendering curved surfaces, volume data, objects dcscri id with Constructive Solid Geometry, for rendering scenes using the radiosity ...f.aces and for computing a spherical radiosity lighting model (see Section 7.6). Custom Memory Chips \\ 208 bits x 128 pixels - Renderer Board ix p o a
Multiprocessor architecture: Synthesis and evaluation
NASA Technical Reports Server (NTRS)
Standley, Hilda M.
1990-01-01
Multiprocessor computed architecture evaluation for structural computations is the focus of the research effort described. Results obtained are expected to lead to more efficient use of existing architectures and to suggest designs for new, application specific, architectures. The brief descriptions given outline a number of related efforts directed toward this purpose. The difficulty is analyzing an existing architecture or in designing a new computer architecture lies in the fact that the performance of a particular architecture, within the context of a given application, is determined by a number of factors. These include, but are not limited to, the efficiency of the computation algorithm, the programming language and support environment, the quality of the program written in the programming language, the multiplicity of the processing elements, the characteristics of the individual processing elements, the interconnection network connecting processors and non-local memories, and the shared memory organization covering the spectrum from no shared memory (all local memory) to one global access memory. These performance determiners may be loosely classified as being software or hardware related. This distinction is not clear or even appropriate in many cases. The effect of the choice of algorithm is ignored by assuming that the algorithm is specified as given. Effort directed toward the removal of the effect of the programming language and program resulted in the design of a high-level parallel programming language. Two characteristics of the fundamental structure of the architecture (memory organization and interconnection network) are examined.
Multi-processor including data flow accelerator module
Davidson, George S.; Pierce, Paul E.
1990-01-01
An accelerator module for a data flow computer includes an intelligent memory. The module is added to a multiprocessor arrangement and uses a shared tagged memory architecture in the data flow computer. The intelligent memory module assigns locations for holding data values in correspondence with arcs leading to a node in a data dependency graph. Each primitive computation is associated with a corresponding memory cell, including a number of slots for operands needed to execute a primitive computation, a primitive identifying pointer, and linking slots for distributing the result of the cell computation to other cells requiring that result as an operand. Circuitry is provided for utilizing tag bits to determine automatically when all operands required by a processor are available and for scheduling the primitive for execution in a queue. Each memory cell of the module may be associated with any of the primitives, and the particular primitive to be executed by the processor associated with the cell is identified by providing an index, such as the cell number for the primitive, to the primitive lookup table of starting addresses. The module thus serves to perform functions previously performed by a number of sections of data flow architectures and coexists with conventional shared memory therein. A multiprocessing system including the module operates in a hybrid mode, wherein the same processing modules are used to perform some processing in a sequential mode, under immediate control of an operating system, while performing other processing in a data flow mode.
A Low-Cost and Energy-Efficient Multiprocessor System-on-Chip for UWB MAC Layer
NASA Astrophysics Data System (ADS)
Xiao, Hao; Isshiki, Tsuyoshi; Khan, Arif Ullah; Li, Dongju; Kunieda, Hiroaki; Nakase, Yuko; Kimura, Sadahiro
Ultra-wideband (UWB) technology has attracted much attention recently due to its high data rate and low emission power. Its media access control (MAC) protocol, WiMedia MAC, promises a lot of facilities for high-speed and high-quality wireless communication. However, these benefits in turn involve a large amount of computational load, which challenges the traditional uniprocessor architecture based implementation method to provide the required performance. However, the constrained cost and power budget, on the other hand, makes using commercial multiprocessor solutions unrealistic. In this paper, a low-cost and energy-efficient multiprocessor system-on-chip (MPSoC), which tackles at once the aspects of system design, software migration and hardware architecture, is presented for the implementation of UWB MAC layer. Experimental results show that the proposed MPSoC, based on four simple RISC processors and shared-memory infrastructure, achieves up to 45% performance improvement and 65% power saving, but takes 15% less area than the uniprocessor implementation.
Vienna FORTRAN: A FORTRAN language extension for distributed memory multiprocessors
NASA Technical Reports Server (NTRS)
Chapman, Barbara; Mehrotra, Piyush; Zima, Hans
1991-01-01
Exploiting the performance potential of distributed memory machines requires a careful distribution of data across the processors. Vienna FORTRAN is a language extension of FORTRAN which provides the user with a wide range of facilities for such mapping of data structures. However, programs in Vienna FORTRAN are written using global data references. Thus, the user has the advantage of a shared memory programming paradigm while explicitly controlling the placement of data. The basic features of Vienna FORTRAN are presented along with a set of examples illustrating the use of these features.
Reader set encoding for directory of shared cache memory in multiprocessor system
Ahn, Dnaiel; Ceze, Luis H.; Gara, Alan; Ohmacht, Martin; Xiaotong, Zhuang
2014-06-10
In a parallel processing system with speculative execution, conflict checking occurs in a directory lookup of a cache memory that is shared by all processors. In each case, the same physical memory address will map to the same set of that cache, no matter which processor originated that access. The directory includes a dynamic reader set encoding, indicating what speculative threads have read a particular line. This reader set encoding is used in conflict checking. A bitset encoding is used to specify particular threads that have read the line.
The FORCE: A highly portable parallel programming language
NASA Technical Reports Server (NTRS)
Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger
1989-01-01
Here, it is explained why the FORCE parallel programming language is easily portable among six different shared-memory microprocessors, and how a two-level macro preprocessor makes it possible to hide low level machine dependencies and to build machine-independent high level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared memory multiprocessor executing them.
Hybrid Memory Management for Parallel Execution of Prolog on Shared Memory Multiprocessors
1990-06-01
organizing data to increase locality. The stack structure exhibits greater locality than the heap structure. Tradeoff decisions can also be made on...PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES...University of California at Berkeley,Department of Electrical Engineering and Computer Sciences,Berkeley,CA,94720 8. PERFORMING ORGANIZATION REPORT
Multiprocessor switch with selective pairing
Gara, Alan; Gschwind, Michael K; Salapura, Valentina
2014-03-11
System, method and computer program product for a multiprocessing system to offer selective pairing of processor cores for increased processing reliability. A selective pairing facility is provided that selectively connects, i.e., pairs, multiple microprocessor or processor cores to provide one highly reliable thread (or thread group). Each paired microprocessor or processor cores that provide one highly reliable thread for high-reliability connect with a system components such as a memory "nest" (or memory hierarchy), an optional system controller, and optional interrupt controller, optional I/O or peripheral devices, etc. The memory nest is attached to a selective pairing facility via a switch or a bus
Applications considerations in the system design of highly concurrent multiprocessors
NASA Technical Reports Server (NTRS)
Lundstrom, Stephen F.
1987-01-01
A flow model processor approach to parallel processing is described, using very-high-performance individual processors, high-speed circuit switched interconnection networks, and a high-speed synchronization capability to minimize the effect of the inherently serial portions of applications on performance. Design studies related to the determination of the number of processors, the memory organization, and the structure of the networks used to interconnect the processor and memory resources are discussed. Simulations indicate that applications centered on the large shared data memory should be able to sustain over 500 million floating point operations per second.
NASA Technical Reports Server (NTRS)
Quealy, Angela; Cole, Gary L.; Blech, Richard A.
1993-01-01
The Application Portable Parallel Library (APPL) is a subroutine-based library of communication primitives that is callable from applications written in FORTRAN or C. APPL provides a consistent programmer interface to a variety of distributed and shared-memory multiprocessor MIMD machines. The objective of APPL is to minimize the effort required to move parallel applications from one machine to another, or to a network of homogeneous machines. APPL encompasses many of the message-passing primitives that are currently available on commercial multiprocessor systems. This paper describes APPL (version 2.3.1) and its usage, reports the status of the APPL project, and indicates possible directions for the future. Several applications using APPL are discussed, as well as performance and overhead results.
NASA Technical Reports Server (NTRS)
Lala, J. H.; Smith, T. B., III
1983-01-01
The experimental test and evaluation of the Fault-Tolerant Multiprocessor (FTMP) is described. Major objectives of this exercise include expanding validation envelope, building confidence in the system, revealing any weaknesses in the architectural concepts and in their execution in hardware and software, and in general, stressing the hardware and software. To this end, pin-level faults were injected into one LRU of the FTMP and the FTMP response was measured in terms of fault detection, isolation, and recovery times. A total of 21,055 stuck-at-0, stuck-at-1 and invert-signal faults were injected in the CPU, memory, bus interface circuits, Bus Guardian Units, and voters and error latches. Of these, 17,418 were detected. At least 80 percent of undetected faults are estimated to be on unused pins. The multiprocessor identified all detected faults correctly and recovered successfully in each case. Total recovery time for all faults averaged a little over one second. This can be reduced to half a second by including appropriate self-tests.
Performances of multiprocessor multidisk architectures for continuous media storage
NASA Astrophysics Data System (ADS)
Gennart, Benoit A.; Messerli, Vincent; Hersch, Roger D.
1996-03-01
Multimedia interfaces increase the need for large image databases, capable of storing and reading streams of data with strict synchronicity and isochronicity requirements. In order to fulfill these requirements, we consider a parallel image server architecture which relies on arrays of intelligent disk nodes, each disk node being composed of one processor and one or more disks. This contribution analyzes through bottleneck performance evaluation and simulation the behavior of two multi-processor multi-disk architectures: a point-to-point architecture and a shared-bus architecture similar to current multiprocessor workstation architectures. We compare the two architectures on the basis of two multimedia algorithms: the compute-bound frame resizing by resampling and the data-bound disk-to-client stream transfer. The results suggest that the shared bus is a potential bottleneck despite its very high hardware throughput (400Mbytes/s) and that an architecture with addressable local memories located closely to their respective processors could partially remove this bottleneck. The point- to-point architecture is scalable and able to sustain high throughputs for simultaneous compute- bound and data-bound operations.
Communication Studies of DMP and SMP Machines
NASA Technical Reports Server (NTRS)
Sohn, Andrew; Biswas, Rupak; Chancellor, Marisa K. (Technical Monitor)
1997-01-01
Understanding the interplay between machines and problems is key to obtaining high performance on parallel machines. This paper investigates the interplay between programming paradigms and communication capabilities of parallel machines. In particular, we explicate the communication capabilities of the IBM SP-2 distributed-memory multiprocessor and the SGI PowerCHALLENGEarray symmetric multiprocessor. Two benchmark problems of bitonic sorting and Fast Fourier Transform are selected for experiments. Communication-efficient algorithms are developed to exploit the overlapping capabilities of the machines. Programs are written in Message-Passing Interface for portability and identical codes are used for both machines. Various data sizes and message sizes are used to test the machines' communication capabilities. Experimental results indicate that the communication performance of the multiprocessors are consistent with the size of messages. The SP-2 is sensitive to message size but yields a much higher communication overlapping because of the communication co-processor. The PowerCHALLENGEarray is not highly sensitive to message size and yields a low communication overlapping. Bitonic sorting yields lower performance compared to FFT due to a smaller computation-to-communication ratio.
Integrating Software Modules For Robot Control
NASA Technical Reports Server (NTRS)
Volpe, Richard A.; Khosla, Pradeep; Stewart, David B.
1993-01-01
Reconfigurable, sensor-based control system uses state variables in systematic integration of reusable control modules. Designed for open-architecture hardware including many general-purpose microprocessors, each having own local memory plus access to global shared memory. Implemented in software as extension of Chimera II real-time operating system. Provides transparent computing mechanism for intertask communication between control modules and generic process-module architecture for multiprocessor realtime computation. Used to control robot arm. Proves useful in variety of other control and robotic applications.
Utilizing Dynamically Coupled Cores to Form a Resilient Chip Multiprocessor
2007-06-01
requires a significant deviation from previous work. For instance, we find that using the relaxed input replication model from Reunion incurs a...Circuit Width Delay Count CRC-16 16 6.65 754 CRC- SDLC -16 16 6.10 888 CRC-32 16 7.28 2260 CRC-32 32 8.60 4240 Table 1. FO4 delay and transistor count for...the operation of our proposed system is the same in all other respects. 4.4 Compatibility Across Memory Consis- tency Models The memory consistency
NASA Technical Reports Server (NTRS)
Clement, Bradley J.; Estlin, Tara A.; Bornstein, Benjamin J.
2013-01-01
The Mobile Thread Task Manager (MTTM) is being applied to parallelizing existing flight software to understand the benefits and to develop new techniques and architectural concepts for adapting software to multicore architectures. It allocates and load-balances tasks for a group of threads that migrate across processors to improve cache performance. In order to balance-load across threads, the MTTM augments a basic map-reduce strategy to draw jobs from a global queue. In a multicore processor, memory may be "homed" to the cache of a specific processor and must be accessed from that processor. The MTTB architecture wraps access to data with thread management to move threads to the home processor for that data so that the computation follows the data in an attempt to avoid L2 cache misses. Cache homing is also handled by a memory manager that translates identifiers to processor IDs where the data will be homed (according to rules defined by the user). The user can also specify the number of threads and processors separately, which is important for tuning performance for different patterns of computation and memory access. MTTM efficiently processes tasks in parallel on a multiprocessor computer. It also provides an interface to make it easier to adapt existing software to a multiprocessor environment.
An Adaptive Insertion and Promotion Policy for Partitioned Shared Caches
NASA Astrophysics Data System (ADS)
Mahrom, Norfadila; Liebelt, Michael; Raof, Rafikha Aliana A.; Daud, Shuhaizar; Hafizah Ghazali, Nur
2018-03-01
Cache replacement policies in chip multiprocessors (CMP) have been investigated extensively and proven able to enhance shared cache management. However, competition among multiple processors executing different threads that require simultaneous access to a shared memory may cause cache contention and memory coherence problems on the chip. These issues also exist due to some drawbacks of the commonly used Least Recently Used (LRU) policy employed in multiprocessor systems, which are because of the cache lines residing in the cache longer than required. In image processing analysis of for example extra pulmonary tuberculosis (TB), an accurate diagnosis for tissue specimen is required. Therefore, a fast and reliable shared memory management system to execute algorithms for processing vast amount of specimen image is needed. In this paper, the effects of the cache replacement policy in a partitioned shared cache are investigated. The goal is to quantify whether better performance can be achieved by using less complex replacement strategies. This paper proposes a Middle Insertion 2 Positions Promotion (MI2PP) policy to eliminate cache misses that could adversely affect the access patterns and the throughput of the processors in the system. The policy employs a static predefined insertion point, near distance promotion, and the concept of ownership in the eviction policy to effectively improve cache thrashing and to avoid resource stealing among the processors.
Solutions and debugging for data consistency in multiprocessors with noncoherent caches
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bernstein, D.; Mendelson, B.; Breternitz, M. Jr.
1995-02-01
We analyze two important problems that arise in shared-memory multiprocessor systems. The stale data problem involves ensuring that data items in local memory of individual processors are current, independent of writes done by other processors. False sharing occurs when two processors have copies of the same shared data block but update different portions of the block. The false sharing problem involves guaranteeing that subsequent writes are properly combined. In modern architectures these problems are usually solved in hardware, by exploiting mechanisms for hardware controlled cache consistency. This leads to more expensive and nonscalable designs. Therefore, we are concentrating on softwaremore » methods for ensuring cache consistency that would allow for affordable and scalable multiprocessing systems. Unfortunately, providing software control is nontrivial, both for the compiler writer and for the application programmer. For this reason we are developing a debugging environment that will facilitate the development of compiler-based techniques and will help the programmer to tune his or her application using explicit cache management mechanisms. We extend the notion of a race condition for IBM Shared Memory System POWER/4, taking into consideration its noncoherent caches, and propose techniques for detection of false sharing problems. Identification of the stale data problem is discussed as well, and solutions are suggested.« less
Hardware for Accelerating N-Modular Redundant Systems for High-Reliability Computing
NASA Technical Reports Server (NTRS)
Dobbs, Carl, Sr.
2012-01-01
A hardware unit has been designed that reduces the cost, in terms of performance and power consumption, for implementing N-modular redundancy (NMR) in a multiprocessor device. The innovation monitors transactions to memory, and calculates a form of sumcheck on-the-fly, thereby relieving the processors of calculating the sumcheck in software
Estimating Performance of Single Bus, Shared Memory Multiprocessors
1987-05-01
Chandy78] K.M. Chandy, C.M. Sauer, "Approximate methods for analyzing queuing network models of computing systems," Computing Surveys, vol10 , no 3...Denning78] P. Denning, J. Buzen, "The operational analysis of queueing network models", Computing Sur- veys, vol10 , no 3, September 1978, pp 225-261
Framework for analysis of guaranteed QOS systems
NASA Astrophysics Data System (ADS)
Chaudhry, Shailender; Choudhary, Alok
1997-01-01
Multimedia data is isochronous in nature and entails managing and delivering high volumes of data. Multiprocessors with their large processing power, vast memory, and fast interconnects, are an ideal candidate for the implementation of multimedia applications. Initially, multiprocessors were designed to execute scientific programs and thus their architecture was optimized to provide low message latency and efficiently support regular communication patterns. Hence, they have a regular network topology and most use wormhole routing. The design offers the benefits of a simple router, small buffer size, and network latency that is almost independent of path length. Among the various multimedia applications, video on demand (VOD) server is well-suited for implementation using parallel multiprocessors. Logical models for VOD servers are presently mapped onto multiprocessors. Our paper provides a framework for calculating bounds on utilization of system resources with which QoS parameters for each isochronous stream can be guaranteed. Effects of the architecture of multiprocessors, and efficiency of various local models and mapping on particular architectures can be investigated within our framework. Our framework is based on rigorous proofs and provides tight bounds. The results obtained may be used as the basis for admission control tests. To illustrate the versatility of our framework, we provide bounds on utilization for various logical models applied to mesh connected architectures for a video on demand server. Our results show that worm hole routing can lead to packets waiting for transmission of other packets that apparently share no common resources. This situation is analogous to head-of-the-line blocking. We find that the provision of multiple VCs per link and multiple flit buffers improves utilization (even under guaranteed QoS parameters). This analogous to parallel iterative matching.
Low Latency Messages on Distributed Memory Multiprocessors
Rosing, Matt; Saltz, Joel
1995-01-01
This article describes many of the issues in developing an efficient interface for communication on distributed memory machines. Although the hardware component of message latency is less than 1 ws on many distributed memory machines, the software latency associated with sending and receiving typed messages is on the order of 50 μs. The reason for this imbalance is that the software interface does not match the hardware. By changing the interface to match the hardware more closely, applications with fine grained communication can be put on these machines. This article describes several tests performed and many of the issues involvedmore » in supporting low latency messages on distributed memory machines.« less
Parallel processing for scientific computations
NASA Technical Reports Server (NTRS)
Alkhatib, Hasan S.
1995-01-01
The scope of this project dealt with the investigation of the requirements to support distributed computing of scientific computations over a cluster of cooperative workstations. Various experiments on computations for the solution of simultaneous linear equations were performed in the early phase of the project to gain experience in the general nature and requirements of scientific applications. A specification of a distributed integrated computing environment, DICE, based on a distributed shared memory communication paradigm has been developed and evaluated. The distributed shared memory model facilitates porting existing parallel algorithms that have been designed for shared memory multiprocessor systems to the new environment. The potential of this new environment is to provide supercomputing capability through the utilization of the aggregate power of workstations cooperating in a cluster interconnected via a local area network. Workstations, generally, do not have the computing power to tackle complex scientific applications, making them primarily useful for visualization, data reduction, and filtering as far as complex scientific applications are concerned. There is a tremendous amount of computing power that is left unused in a network of workstations. Very often a workstation is simply sitting idle on a desk. A set of tools can be developed to take advantage of this potential computing power to create a platform suitable for large scientific computations. The integration of several workstations into a logical cluster of distributed, cooperative, computing stations presents an alternative to shared memory multiprocessor systems. In this project we designed and evaluated such a system.
Conditional load and store in a shared memory
Blumrich, Matthias A; Ohmacht, Martin
2015-02-03
A method, system and computer program product for implementing load-reserve and store-conditional instructions in a multi-processor computing system. The computing system includes a multitude of processor units and a shared memory cache, and each of the processor units has access to the memory cache. In one embodiment, the method comprises providing the memory cache with a series of reservation registers, and storing in these registers addresses reserved in the memory cache for the processor units as a result of issuing load-reserve requests. In this embodiment, when one of the processor units makes a request to store data in the memory cache using a store-conditional request, the reservation registers are checked to determine if an address in the memory cache is reserved for that processor unit. If an address in the memory cache is reserved for that processor, the data are stored at this address.
Memory interface simulator: A computer design aid
NASA Technical Reports Server (NTRS)
Taylor, D. S.; Williams, T.; Weatherbee, J. E.
1972-01-01
Results are presented of a study conducted with a digital simulation model being used in the design of the Automatically Reconfigurable Modular Multiprocessor System (ARMMS), a candidate computer system for future manned and unmanned space missions. The model simulates the activity involved as instructions are fetched from random access memory for execution in one of the system central processing units. A series of model runs measured instruction execution time under various assumptions pertaining to the CPU's and the interface between the CPU's and RAM. Design tradeoffs are presented in the following areas: Bus widths, CPU microprogram read only memory cycle time, multiple instruction fetch, and instruction mix.
Parallel Computing for Probabilistic Response Analysis of High Temperature Composites
NASA Technical Reports Server (NTRS)
Sues, R. H.; Lua, Y. J.; Smith, M. D.
1994-01-01
The objective of this Phase I research was to establish the required software and hardware strategies to achieve large scale parallelism in solving PCM problems. To meet this objective, several investigations were conducted. First, we identified the multiple levels of parallelism in PCM and the computational strategies to exploit these parallelisms. Next, several software and hardware efficiency investigations were conducted. These involved the use of three different parallel programming paradigms and solution of two example problems on both a shared-memory multiprocessor and a distributed-memory network of workstations.
DOE Office of Scientific and Technical Information (OSTI.GOV)
De Supinski, B.; Caliga, D.
2017-09-28
The primary objective of this project was to develop memory optimization technology to efficiently deliver data to, and distribute data within, the SRC-6's Field Programmable Gate Array- ("FPGA") based Multi-Adaptive Processors (MAPs). The hardware/software approach was to explore efficient MAP configurations and generate the compiler technology to exploit those configurations. This memory accessing technology represents an important step towards making reconfigurable symmetric multi-processor (SMP) architectures that will be a costeffective solution for large-scale scientific computing.
A class Hierarchical, object-oriented approach to virtual memory management
NASA Technical Reports Server (NTRS)
Russo, Vincent F.; Campbell, Roy H.; Johnston, Gary M.
1989-01-01
The Choices family of operating systems exploits class hierarchies and object-oriented programming to facilitate the construction of customized operating systems for shared memory and networked multiprocessors. The software is being used in the Tapestry laboratory to study the performance of algorithms, mechanisms, and policies for parallel systems. Described here are the architectural design and class hierarchy of the Choices virtual memory management system. The software and hardware mechanisms and policies of a virtual memory system implement a memory hierarchy that exploits the trade-off between response times and storage capacities. In Choices, the notion of a memory hierarchy is captured by abstract classes. Concrete subclasses of those abstractions implement a virtual address space, segmentation, paging, physical memory management, secondary storage, and remote (that is, networked) storage. Captured in the notion of a memory hierarchy are classes that represent memory objects. These classes provide a storage mechanism that contains encapsulated data and have methods to read or write the memory object. Each of these classes provides specializations to represent the memory hierarchy.
NASA Technical Reports Server (NTRS)
Farhat, Charbel; Lesoinne, Michel
1993-01-01
Most of the recently proposed computational methods for solving partial differential equations on multiprocessor architectures stem from the 'divide and conquer' paradigm and involve some form of domain decomposition. For those methods which also require grids of points or patches of elements, it is often necessary to explicitly partition the underlying mesh, especially when working with local memory parallel processors. In this paper, a family of cost-effective algorithms for the automatic partitioning of arbitrary two- and three-dimensional finite element and finite difference meshes is presented and discussed in view of a domain decomposed solution procedure and parallel processing. The influence of the algorithmic aspects of a solution method (implicit/explicit computations), and the architectural specifics of a multiprocessor (SIMD/MIMD, startup/transmission time), on the design of a mesh partitioning algorithm are discussed. The impact of the partitioning strategy on load balancing, operation count, operator conditioning, rate of convergence and processor mapping is also addressed. Finally, the proposed mesh decomposition algorithms are demonstrated with realistic examples of finite element, finite volume, and finite difference meshes associated with the parallel solution of solid and fluid mechanics problems on the iPSC/2 and iPSC/860 multiprocessors.
2015-05-01
LLC and DRAM banks. For each µB task and isolation configuration, we ran experiments with all 256 possible LLC area sizes (given by 1 to 16 ways and 1...isolation on multicoore platforms. In RTAS ’14. [29] H. Yun, G. Yao, R. Pellizzoni, M. Caccamo, and L. Sha . Memory access control in multiprocessor
Cache directory lookup reader set encoding for partial cache line speculation support
Gara, Alan; Ohmacht, Martin
2014-10-21
In a multiprocessor system, with conflict checking implemented in a directory lookup of a shared cache memory, a reader set encoding permits dynamic recordation of read accesses. The reader set encoding includes an indication of a portion of a line read, for instance by indicating boundaries of read accesses. Different encodings may apply to different types of speculative execution.
Job-mix modeling and system analysis of an aerospace multiprocessor.
NASA Technical Reports Server (NTRS)
Mallach, E. G.
1972-01-01
An aerospace guidance computer organization, consisting of multiple processors and memory units attached to a central time-multiplexed data bus, is described. A job mix for this type of computer is obtained by analysis of Apollo mission programs. Multiprocessor performance is then analyzed using: 1) queuing theory, under certain 'limiting case' assumptions; 2) Markov process methods; and 3) system simulation. Results of the analyses indicate: 1) Markov process analysis is a useful and efficient predictor of simulation results; 2) efficient job execution is not seriously impaired even when the system is so overloaded that new jobs are inordinately delayed in starting; 3) job scheduling is significant in determining system performance; and 4) a system having many slow processors may or may not perform better than a system of equal power having few fast processors, but will not perform significantly worse.
HyperForest: A high performance multi-processor architecture for real-time intelligent systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Garcia, P. Jr.; Rebeil, J.P.; Pollard, H.
1997-04-01
Intelligent Systems are characterized by the intensive use of computer power. The computer revolution of the last few years is what has made possible the development of the first generation of Intelligent Systems. Software for second generation Intelligent Systems will be more complex and will require more powerful computing engines in order to meet real-time constraints imposed by new robots, sensors, and applications. A multiprocessor architecture was developed that merges the advantages of message-passing and shared-memory structures: expendability and real-time compliance. The HyperForest architecture will provide an expandable real-time computing platform for computationally intensive Intelligent Systems and open the doorsmore » for the application of these systems to more complex tasks in environmental restoration and cleanup projects, flexible manufacturing systems, and DOE`s own production and disassembly activities.« less
Using parallel banded linear system solvers in generalized eigenvalue problems
NASA Technical Reports Server (NTRS)
Zhang, Hong; Moss, William F.
1993-01-01
Subspace iteration is a reliable and cost effective method for solving positive definite banded symmetric generalized eigenproblems, especially in the case of large scale problems. This paper discusses an algorithm that makes use of two parallel banded solvers in subspace iteration. A shift is introduced to decompose the banded linear systems into relatively independent subsystems and to accelerate the iterations. With this shift, an eigenproblem is mapped efficiently into the memories of a multiprocessor and a high speed-up is obtained for parallel implementations. An optimal shift is a shift that balances total computation and communication costs. Under certain conditions, we show how to estimate an optimal shift analytically using the decay rate for the inverse of a banded matrix, and how to improve this estimate. Computational results on iPSC/2 and iPSC/860 multiprocessors are presented.
NASA Technical Reports Server (NTRS)
Lala, J. H.; Smith, T. B., III
1983-01-01
The software developed for the Fault-Tolerant Multiprocessor (FTMP) is described. The FTMP executive is a timer-interrupt driven dispatcher that schedules iterative tasks which run at 3.125, 12.5, and 25 Hz. Major tasks which run under the executive include system configuration control, flight control, and display. The flight control task includes autopilot and autoland functions for a jet transport aircraft. System Displays include status displays of all hardware elements (processors, memories, I/O ports, buses), failure log displays showing transient and hard faults, and an autopilot display. All software is in a higher order language (AED, an ALGOL derivative). The executive is a fully distributed general purpose executive which automatically balances the load among available processor triads. Provisions for graceful performance degradation under processing overload are an integral part of the scheduling algorithms.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gala, Alan; Ohmacht, Martin
A multiprocessor system includes nodes. Each node includes a data path that includes a core, a TLB, and a first level cache implementing disambiguation. The system also includes at least one second level cache and a main memory. For thread memory access requests, the core uses an address associated with an instruction format of the core. The first level cache uses an address format related to the size of the main memory plus an offset corresponding to hardware thread meta data. The second level cache uses a physical main memory address plus software thread meta data to store the memorymore » access request. The second level cache accesses the main memory using the physical address with neither the offset nor the thread meta data after resolving speculation. In short, this system includes mapping of a virtual address to a different physical addresses for value disambiguation for different threads.« less
Parallel computing for probabilistic fatigue analysis
NASA Technical Reports Server (NTRS)
Sues, Robert H.; Lua, Yuan J.; Smith, Mark D.
1993-01-01
This paper presents the results of Phase I research to investigate the most effective parallel processing software strategies and hardware configurations for probabilistic structural analysis. We investigate the efficiency of both shared and distributed-memory architectures via a probabilistic fatigue life analysis problem. We also present a parallel programming approach, the virtual shared-memory paradigm, that is applicable across both types of hardware. Using this approach, problems can be solved on a variety of parallel configurations, including networks of single or multiprocessor workstations. We conclude that it is possible to effectively parallelize probabilistic fatigue analysis codes; however, special strategies will be needed to achieve large-scale parallelism to keep large number of processors busy and to treat problems with the large memory requirements encountered in practice. We also conclude that distributed-memory architecture is preferable to shared-memory for achieving large scale parallelism; however, in the future, the currently emerging hybrid-memory architectures will likely be optimal.
On the impact of communication complexity in the design of parallel numerical algorithms
NASA Technical Reports Server (NTRS)
Gannon, D.; Vanrosendale, J.
1984-01-01
This paper describes two models of the cost of data movement in parallel numerical algorithms. One model is a generalization of an approach due to Hockney, and is suitable for shared memory multiprocessors where each processor has vector capabilities. The other model is applicable to highly parallel nonshared memory MIMD systems. In the second model, algorithm performance is characterized in terms of the communication network design. Techniques used in VLSI complexity theory are also brought in, and algorithm independent upper bounds on system performance are derived for several problems that are important to scientific computation.
Apparatus for multiprocessor-based control of a multiagent robot
NASA Technical Reports Server (NTRS)
Peters, II, Richard Alan (Inventor)
2009-01-01
An architecture for robot intelligence enables a robot to learn new behaviors and create new behavior sequences autonomously and interact with a dynamically changing environment. Sensory information is mapped onto a Sensory Ego-Sphere (SES) that rapidly identifies important changes in the environment and functions much like short term memory. Behaviors are stored in a DBAM that creates an active map from the robot's current state to a goal state and functions much like long term memory. A dream state converts recent activities stored in the SES and creates or modifies behaviors in the DBAM.
On the impact of communication complexity on the design of parallel numerical algorithms
NASA Technical Reports Server (NTRS)
Gannon, D. B.; Van Rosendale, J.
1984-01-01
This paper describes two models of the cost of data movement in parallel numerical alorithms. One model is a generalization of an approach due to Hockney, and is suitable for shared memory multiprocessors where each processor has vector capabilities. The other model is applicable to highly parallel nonshared memory MIMD systems. In this second model, algorithm performance is characterized in terms of the communication network design. Techniques used in VLSI complexity theory are also brought in, and algorithm-independent upper bounds on system performance are derived for several problems that are important to scientific computation.
Fault-Tolerant Multiprocessor and VLSI-Based Systems.
1987-03-15
54590 170 Table 1: Statistics for the Benchmark Programs pages are distributed amongst the groups of the reconfigured memory in proportion to the...distances are proportional to only the logarithm of the sure that possesses relevance to a system which consists of alare nmbe ofhomgenouseleent...and comn.unication overhead resulting from faults communicating with all of the other elements in the system the network to degrade proportionately to
A divide and conquer approach to the nonsymmetric eigenvalue problem
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jessup, E.R.
1991-01-01
Serial computation combined with high communication costs on distributed-memory multiprocessors make parallel implementations of the QR method for the nonsymmetric eigenvalue problem inefficient. This paper introduces an alternative algorithm for the nonsymmetric tridiagonal eigenvalue problem based on rank two tearing and updating of the matrix. The parallelism of this divide and conquer approach stems from independent solution of the updating problems. 11 refs.
Efficient partitioning and assignment on programs for multiprocessor execution
NASA Technical Reports Server (NTRS)
Standley, Hilda M.
1993-01-01
The general problem studied is that of segmenting or partitioning programs for distribution across a multiprocessor system. Efficient partitioning and the assignment of program elements are of great importance since the time consumed in this overhead activity may easily dominate the computation, effectively eliminating any gains made by the use of the parallelism. In this study, the partitioning of sequentially structured programs (written in FORTRAN) is evaluated. Heuristics, developed for similar applications are examined. Finally, a model for queueing networks with finite queues is developed which may be used to analyze multiprocessor system architectures with a shared memory approach to the problem of partitioning. The properties of sequentially written programs form obstacles to large scale (at the procedure or subroutine level) parallelization. Data dependencies of even the minutest nature, reflecting the sequential development of the program, severely limit parallelism. The design of heuristic algorithms is tied to the experience gained in the parallel splitting. Parallelism obtained through the physical separation of data has seen some success, especially at the data element level. Data parallelism on a grander scale requires models that accurately reflect the effects of blocking caused by finite queues. A model for the approximation of the performance of finite queueing networks is developed. This model makes use of the decomposition approach combined with the efficiency of product form solutions.
Support for non-locking parallel reception of packets belonging to a single memory reception FIFO
Chen, Dong [Yorktown Heights, NY; Heidelberger, Philip [Yorktown Heights, NY; Salapura, Valentina [Yorktown Heights, NY; Senger, Robert M [Yorktown Heights, NY; Steinmacher-Burow, Burkhard [Boeblingen, DE; Sugawara, Yutaka [Yorktown Heights, NY
2011-01-27
A method and apparatus for distributed parallel messaging in a parallel computing system. A plurality of DMA engine units are configured in a multiprocessor system to operate in parallel, one DMA engine unit for transferring a current packet received at a network reception queue to a memory location in a memory FIFO (rmFIFO) region of a memory. A control unit implements logic to determine whether any prior received packet destined for that rmFIFO is still in a process of being stored in the associated memory by another DMA engine unit of the plurality, and prevent the one DMA engine unit from indicating completion of storing the current received packet in the reception memory FIFO (rmFIFO) until all prior received packets destined for that rmFIFO are completely stored by the other DMA engine units. Thus, there is provided non-locking support so that multiple packets destined for a single rmFIFO are transferred and stored in parallel to predetermined locations in a memory.
Optoelectronic-cache memory system architecture.
Chiarulli, D M; Levitan, S P
1996-05-10
We present an investigation of the architecture of an optoelectronic cache that can integrate terabit optical memories with the electronic caches associated with high-performance uniprocessors and multiprocessors. The use of optoelectronic-cache memories enables these terabit technologies to provide transparently low-latency secondary memory with frame sizes comparable with disk pages but with latencies that approach those of electronic secondary-cache memories. This enables the implementation of terabit memories with effective access times comparable with the cycle times of current microprocessors. The cache design is based on the use of a smart-pixel array and combines parallel free-space optical input-output to-and-from optical memory with conventional electronic communication to the processor caches. This cache and the optical memory system to which it will interface provide a large random-access memory space that has a lower overall latency than that of magnetic disks and disk arrays. In addition, as a consequence of the high-bandwidth parallel input-output capabilities of optical memories, fault service times for the optoelectronic cache are substantially less than those currently achievable with any rotational media.
High-performance computing — an overview
NASA Astrophysics Data System (ADS)
Marksteiner, Peter
1996-08-01
An overview of high-performance computing (HPC) is given. Different types of computer architectures used in HPC are discussed: vector supercomputers, high-performance RISC processors, various parallel computers like symmetric multiprocessors, workstation clusters, massively parallel processors. Software tools and programming techniques used in HPC are reviewed: vectorizing compilers, optimization and vector tuning, optimization for RISC processors; parallel programming techniques like shared-memory parallelism, message passing and data parallelism; and numerical libraries.
CELLFS: TAKING THE "DMA" OUT OF CELL PROGRAMMING
DOE Office of Scientific and Technical Information (OSTI.GOV)
IONKOV, LATCHESAR A.; MIRTCHOVSKI, ANDREY A.; NYRHINEN, AKI M.
In this paper we present a new programming model for the Cell BE architecture of scalar multiprocessors. They call this programming model CellFS. CellFS aims at simplifying the task of managing I/O between the local store of the processing units and main memory. The CellFS support library provides the means for transferring data via simple file I/O operations between the PPU and the SPU.
NASA Technical Reports Server (NTRS)
Jordan, Harry F.; Benten, Muhammad S.; Arenstorf, Norbert S.; Ramanan, Aruna V.
1987-01-01
A methodology for writing parallel programs for shared memory multiprocessors has been formalized as an extension to the Fortran language and implemented as a macro preprocessor. The extended language is known as the Force, and this manual describes how to write Force programs and execute them on the Flexible Computer Corporation Flex/32, the Encore Multimax and the Sequent Balance computers. The parallel extension macros are described in detail, but knowledge of Fortran is assumed.
Efficient Approximation Algorithms for Weighted $b$-Matching
DOE Office of Scientific and Technical Information (OSTI.GOV)
Khan, Arif; Pothen, Alex; Mostofa Ali Patwary, Md.
2016-01-01
We describe a half-approximation algorithm, b-Suitor, for computing a b-Matching of maximum weight in a graph with weights on the edges. b-Matching is a generalization of the well-known Matching problem in graphs, where the objective is to choose a subset of M edges in the graph such that at most a specified number b(v) of edges in M are incident on each vertex v. Subject to this restriction we maximize the sum of the weights of the edges in M. We prove that the b-Suitor algorithm computes the same b-Matching as the one obtained by the greedy algorithm for themore » problem. We implement the algorithm on serial and shared-memory parallel processors, and compare its performance against a collection of approximation algorithms that have been proposed for the Matching problem. Our results show that the b-Suitor algorithm outperforms the Greedy and Locally Dominant edge algorithms by one to two orders of magnitude on a serial processor. The b-Suitor algorithm has a high degree of concurrency, and it scales well up to 240 threads on a shared memory multiprocessor. The b-Suitor algorithm outperforms the Locally Dominant edge algorithm by a factor of fourteen on 16 cores of an Intel Xeon multiprocessor.« less
Implementation and performance of parallel Prolog interpreter
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wei, S.; Kale, L.V.; Balkrishna, R.
1988-01-01
In this paper, the authors discuss the implementation of a parallel Prolog interpreter on different parallel machines. The implementation is based on the REDUCE--OR process model which exploits both AND and OR parallelism in logic programs. It is machine independent as it runs on top of the chare-kernel--a machine-independent parallel programming system. The authors also give the performance of the interpreter running a diverse set of benchmark pargrams on parallel machines including shared memory systems: an Alliant FX/8, Sequent and a MultiMax, and a non-shared memory systems: Intel iPSC/32 hypercube, in addition to its performance on a multiprocessor simulation system.
The Refinement-Tree Partition for Parallel Solution of Partial Differential Equations
Mitchell, William F.
1998-01-01
Dynamic load balancing is considered in the context of adaptive multilevel methods for partial differential equations on distributed memory multiprocessors. An approach that periodically repartitions the grid is taken. The important properties of a partitioning algorithm are presented and discussed in this context. A partitioning algorithm based on the refinement tree of the adaptive grid is presented and analyzed in terms of these properties. Theoretical and numerical results are given. PMID:28009355
The Refinement-Tree Partition for Parallel Solution of Partial Differential Equations.
Mitchell, William F
1998-01-01
Dynamic load balancing is considered in the context of adaptive multilevel methods for partial differential equations on distributed memory multiprocessors. An approach that periodically repartitions the grid is taken. The important properties of a partitioning algorithm are presented and discussed in this context. A partitioning algorithm based on the refinement tree of the adaptive grid is presented and analyzed in terms of these properties. Theoretical and numerical results are given.
DARPA Status Report - November 1988
1988-11-01
style used in the applic4#ons reference to that block was by processor j. where j It. We was influenced by it. MACH is a multiprocessor operating S call...it can be order they occurred. However. the exact time at which the treated specially in memory management , and so most of the reference wa, made is...on cache consistency performance, sophisti- peak can be explained as clinging references that occur when cated cache management schemes that take
Fast adaptive composite grid methods on distributed parallel architectures
NASA Technical Reports Server (NTRS)
Lemke, Max; Quinlan, Daniel
1992-01-01
The fast adaptive composite (FAC) grid method is compared with the adaptive composite method (AFAC) under variety of conditions including vectorization and parallelization. Results are given for distributed memory multiprocessor architectures (SUPRENUM, Intel iPSC/2 and iPSC/860). It is shown that the good performance of AFAC and its superiority over FAC in a parallel environment is a property of the algorithm and not dependent on peculiarities of any machine.
PANDA: A distributed multiprocessor operating system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chubb, P.
1989-01-01
PANDA is a design for a distributed multiprocessor and an operating system. PANDA is designed to allow easy expansion of both hardware and software. As such, the PANDA kernel provides only message passing and memory and process management. The other features needed for the system (device drivers, secondary storage management, etc.) are provided as replaceable user tasks. The thesis presents PANDA's design and implementation, both hardware and software. PANDA uses multiple 68010 processors sharing memory on a VME bus, each such node potentially connected to others via a high speed network. The machine is completely homogeneous: there are no differencesmore » between processors that are detectable by programs running on the machine. A single two-processor node has been constructed. Each processor contains memory management circuits designed to allow processors to share page tables safely. PANDA presents a programmers' model similar to the hardware model: a job is divided into multiple tasks, each having its own address space. Within each task, multiple processes share code and data. Tasks can send messages to each other, and set up virtual circuits between themselves. Peripheral devices such as disc drives are represented within PANDA by tasks. PANDA divides secondary storage into volumes, each volume being accessed by a volume access task, or VAT. All knowledge about the way that data is stored on a disc is kept in its volume's VAT. The design is such that PANDA should provide a useful testbed for file systems and device drivers, as these can be installed without recompiling PANDA itself, and without rebooting the machine.« less
Ultra-fast fluence optimization for beam angle selection algorithms
NASA Astrophysics Data System (ADS)
Bangert, M.; Ziegenhein, P.; Oelfke, U.
2014-03-01
Beam angle selection (BAS) including fluence optimization (FO) is among the most extensive computational tasks in radiotherapy. Precomputed dose influence data (DID) of all considered beam orientations (up to 100 GB for complex cases) has to be handled in the main memory and repeated FOs are required for different beam ensembles. In this paper, the authors describe concepts accelerating FO for BAS algorithms using off-the-shelf multiprocessor workstations. The FO runtime is not dominated by the arithmetic load of the CPUs but by the transportation of DID from the RAM to the CPUs. On multiprocessor workstations, however, the speed of data transportation from the main memory to the CPUs is non-uniform across the RAM; every CPU has a dedicated memory location (node) with minimum access time. We apply a thread node binding strategy to ensure that CPUs only access DID from their preferred node. Ideal load balancing for arbitrary beam ensembles is guaranteed by distributing the DID of every candidate beam equally to all nodes. Furthermore we use a custom sorting scheme of the DID to minimize the overall data transportation. The framework is implemented on an AMD Opteron workstation. One FO iteration comprising dose, objective function, and gradient calculation takes between 0.010 s (9 beams, skull, 0.23 GB DID) and 0.070 s (9 beams, abdomen, 1.50 GB DID). Our overall FO time is < 1 s for small cases, larger cases take ~ 4 s. BAS runs including FOs for 1000 different beam ensembles take ~ 15-70 min, depending on the treatment site. This enables an efficient clinical evaluation of different BAS algorithms.
NASA Technical Reports Server (NTRS)
Choudhary, Alok Nidhi; Leung, Mun K.; Huang, Thomas S.; Patel, Janak H.
1989-01-01
Several techniques to perform static and dynamic load balancing techniques for vision systems are presented. These techniques are novel in the sense that they capture the computational requirements of a task by examining the data when it is produced. Furthermore, they can be applied to many vision systems because many algorithms in different systems are either the same, or have similar computational characteristics. These techniques are evaluated by applying them on a parallel implementation of the algorithms in a motion estimation system on a hypercube multiprocessor system. The motion estimation system consists of the following steps: (1) extraction of features; (2) stereo match of images in one time instant; (3) time match of images from different time instants; (4) stereo match to compute final unambiguous points; and (5) computation of motion parameters. It is shown that the performance gains when these data decomposition and load balancing techniques are used are significant and the overhead of using these techniques is minimal.
NASA Technical Reports Server (NTRS)
OKeefe, Matthew (Editor); Kerr, Christopher L. (Editor)
1998-01-01
This report contains the abstracts and technical papers from the Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications, held June 15-18, 1998, in Scottsdale, Arizona. The purpose of the workshop is to bring together software developers in meteorology and oceanography to discuss software engineering and code design issues for parallel architectures, including Massively Parallel Processors (MPP's), Parallel Vector Processors (PVP's), Symmetric Multi-Processors (SMP's), Distributed Shared Memory (DSM) multi-processors, and clusters. Issues to be discussed include: (1) code architectures for current parallel models, including basic data structures, storage allocation, variable naming conventions, coding rules and styles, i/o and pre/post-processing of data; (2) designing modular code; (3) load balancing and domain decomposition; (4) techniques that exploit parallelism efficiently yet hide the machine-related details from the programmer; (5) tools for making the programmer more productive; and (6) the proliferation of programming models (F--, OpenMP, MPI, and HPF).
Sequoia: A fault-tolerant tightly coupled multiprocessor for transaction processing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bernstein, P.A.
1988-02-01
The Sequoia computer is a tightly coupled multiprocessor, and thus attains the performance advantages of this style of architecture. It avoids most of the fault-tolerance disadvantages of tight coupling by using a new fault-tolerance design. The Sequoia architecture is similar to other multimicroprocessor architectures, such as those of Encore and Sequent, in that it gives dozens of microprocessors shared access to a large main memory. It resembles the Stratus architecture in its extensive use of hardware fault-detection techniques. It resembles Stratus and Auragen in its ability to quickly recover all processes after a single point failure, transparently to the user.more » However, Sequoia is unique in its combination of a large-scale tightly coupled architecture with a hardware approach to fault tolerance. This article gives an overview of how the hardware architecture and operating systems (OS) work together to provide a high degree of fault tolerance with good system performance.« less
NASA Astrophysics Data System (ADS)
Kutt, P. H.; Balamuth, D. P.
1989-10-01
Summary form only given, as follows. A multiprocessor system based on commercially available VMEbus components has been developed for the acquisition and reduction of event-mode data in nuclear physics experiments. The system contains seven 68000 CPUs and 14 Mbyte of memory. A minimal operating system handles data transfer and task allocation, and a compiler for a specially designed event analysis language produces code for the processors. The system has been in operation for four years at the University of Pennsylvania Tandem Accelerator Laboratory. Computation rates over three times that of a MicroVAX II have been achieved at a fraction of the cost. The use of WORM optical disks for event recording allows the processing of gigabyte data sets without operator intervention. A more powerful system is being planned which will make use of recently developed RISC (reduced instruction set computer) processors to obtain an order of magnitude increase in computing power per node.
USC orthogonal multiprocessor for image processing with neural networks
NASA Astrophysics Data System (ADS)
Hwang, Kai; Panda, Dhabaleswar K.; Haddadi, Navid
1990-07-01
This paper presents the architectural features and imaging applications of the Orthogonal MultiProcessor (OMP) system, which is under construction at the University of Southern California with research funding from NSF and assistance from several industrial partners. The prototype OMP is being built with 16 Intel i860 RISC microprocessors and 256 parallel memory modules using custom-designed spanning buses, which are 2-D interleaved and orthogonally accessed without conflicts. The 16-processor OMP prototype is targeted to achieve 430 MIPS and 600 Mflops, which have been verified by simulation experiments based on the design parameters used. The prototype OMP machine will be initially applied for image processing, computer vision, and neural network simulation applications. We summarize important vision and imaging algorithms that can be restructured with neural network models. These algorithms can efficiently run on the OMP hardware with linear speedup. The ultimate goal is to develop a high-performance Visual Computer (Viscom) for integrated low- and high-level image processing and vision tasks.
OPAD-EDIFIS Real-Time Processing
NASA Technical Reports Server (NTRS)
Katsinis, Constantine
1997-01-01
The Optical Plume Anomaly Detection (OPAD) detects engine hardware degradation of flight vehicles through identification and quantification of elemental species found in the plume by analyzing the plume emission spectra in a real-time mode. Real-time performance of OPAD relies on extensive software which must report metal amounts in the plume faster than once every 0.5 sec. OPAD software previously written by NASA scientists performed most necessary functions at speeds which were far below what is needed for real-time operation. The research presented in this report improved the execution speed of the software by optimizing the code without changing the algorithms and converting it into a parallelized form which is executed in a shared-memory multiprocessor system. The resulting code was subjected to extensive timing analysis. The report also provides suggestions for further performance improvement by (1) identifying areas of algorithm optimization, (2) recommending commercially available multiprocessor architectures and operating systems to support real-time execution and (3) presenting an initial study of fault-tolerance requirements.
Efficient ICCG on a shared memory multiprocessor
NASA Technical Reports Server (NTRS)
Hammond, Steven W.; Schreiber, Robert
1989-01-01
Different approaches are discussed for exploiting parallelism in the ICCG (Incomplete Cholesky Conjugate Gradient) method for solving large sparse symmetric positive definite systems of equations on a shared memory parallel computer. Techniques for efficiently solving triangular systems and computing sparse matrix-vector products are explored. Three methods for scheduling the tasks in solving triangular systems are implemented on the Sequent Balance 21000. Sample problems that are representative of a large class of problems solved using iterative methods are used. We show that a static analysis to determine data dependences in the triangular solve can greatly improve its parallel efficiency. We also show that ignoring symmetry and storing the whole matrix can reduce solution time substantially.
Optical backplane interconnect switch for data processors and computers
NASA Technical Reports Server (NTRS)
Hendricks, Herbert D.; Benz, Harry F.; Hammer, Jacob M.
1989-01-01
An optoelectronic integrated device design is reported which can be used to implement an all-optical backplane interconnect switch. The switch is sized to accommodate an array of processors and memories suitable for direct replacement into the basic avionic multiprocessor backplane. The optical backplane interconnect switch is also suitable for direct replacement of the PI bus traffic switch and at the same time, suitable for supporting pipelining of the processor and memory. The 32 bidirectional switchable interconnects are configured with broadcast capability for controls, reconfiguration, and messages. The approach described here can handle a serial interconnection of data processors or a line-to-link interconnection of data processors. An optical fiber demonstration of this approach is presented.
Error recovery in shared memory multiprocessors using private caches
NASA Technical Reports Server (NTRS)
Wu, Kun-Lung; Fuchs, W. Kent; Patel, Janak H.
1990-01-01
The problem of recovering from processor transient faults in shared memory multiprocesses systems is examined. A user-transparent checkpointing and recovery scheme using private caches is presented. Processes can recover from errors due to faulty processors by restarting from the checkpointed computation state. Implementation techniques using checkpoint identifiers and recovery stacks are examined as a means of reducing performance degradation in processor utilization during normal execution. This cache-based checkpointing technique prevents rollback propagation, provides rapid recovery, and can be integrated into standard cache coherence protocols. An analytical model is used to estimate the relative performance of the scheme during normal execution. Extensions to take error latency into account are presented.
Enabling Next-Generation Multicore Platforms in Embedded Applications
2014-04-01
mapping to sets 129 − 256 ) to the second page in memory, color 2 (sets 257 − 384) to the third page, and so on. Then, after the 32nd page, all 212 sets...the Real-Time Nested Locking Protocol (RNLP) [56], a recently developed multiprocessor real-time locking protocol that optimally supports the...RELEASE; DISTRIBUTION UNLIMITED 15 In general, the problems of optimally assigning tasks to processors and colors to tasks are both NP-hard in the
Domain Decomposition: A Bridge between Nature and Parallel Computers
1992-09-01
B., "Domain Decomposition Algorithms for Indefinite Elliptic Problems," S"IAM Journal of S; cientific and Statistical (’omputing, Vol. 13, 1992, pp...AD-A256 575 NASA Contractor Report 189709 ICASE Report No. 92-44 ICASE DOMAIN DECOMPOSITION: A BRIDGE BETWEEN NATURE AND PARALLEL COMPUTERS DTIC dE...effectively implemented on dis- tributed memory multiprocessors. In 1990 (as reported in Ref. 38 using the tile algo- rithm), a 103,201-unknown 2D elliptic
Software-Controlled Caches in the VMP Multiprocessor
1986-03-01
programming system level that Processors is tuned for the VMP design. In this vein, we are interested in exploring how far the software support can go to ...handled in software, analogously to the handling agement of the shared program state is familiar and of virtual memory page faults. Hardware support for...ensure good behavior, as opposed to how Each cache miss results in bus traffic. Table 2 pro- vides the bus cost for the "average" cache miss. Fig
Cache Coherence Protocols for Large-Scale Multiprocessors
1990-09-01
and is compared with the other protocols for large-scale machines. In later analysis, this coherence method is designated by the acronym OCPD , which...private read misses 2 6 6 ( OCPD ) private write misses 2 6 6 Table 4.2: Transaction Types and Costs. the performance of the memory system. These...methodologies. Figure 4-2 shows the processor utiliza- tions of the Weather program, with special code in the dyn-nic post-mortem sched- 94 OCPD DlrINB
1980-08-31
DEPARTMENT OF THE ARMY POSITION, UNLESS SO DESIGNATED BY OTHER AUTHORIZED DOCUMENTS. -I , ! unclassified SECURITY CLASSIICATION Of THIS PAGE ’Whm bate...ADORE.SE4I dFfemtar CIatblftaE Office) IS. SECURITY CLASS. (of ftio #sNt) unclassified IS. DOCL ASSI FICATION/DOWNGRADING SCHEDULENA INA IS. DISTRIBUTION...In this section, we describe the characteristics of the access sequence of a pipelined processor. A pipelined organization in the most general sense
Parallelization of a Monte Carlo particle transport simulation code
NASA Astrophysics Data System (ADS)
Hadjidoukas, P.; Bousis, C.; Emfietzoglou, D.
2010-05-01
We have developed a high performance version of the Monte Carlo particle transport simulation code MC4. The original application code, developed in Visual Basic for Applications (VBA) for Microsoft Excel, was first rewritten in the C programming language for improving code portability. Several pseudo-random number generators have been also integrated and studied. The new MC4 version was then parallelized for shared and distributed-memory multiprocessor systems using the Message Passing Interface. Two parallel pseudo-random number generator libraries (SPRNG and DCMT) have been seamlessly integrated. The performance speedup of parallel MC4 has been studied on a variety of parallel computing architectures including an Intel Xeon server with 4 dual-core processors, a Sun cluster consisting of 16 nodes of 2 dual-core AMD Opteron processors and a 200 dual-processor HP cluster. For large problem size, which is limited only by the physical memory of the multiprocessor server, the speedup results are almost linear on all systems. We have validated the parallel implementation against the serial VBA and C implementations using the same random number generator. Our experimental results on the transport and energy loss of electrons in a water medium show that the serial and parallel codes are equivalent in accuracy. The present improvements allow for studying of higher particle energies with the use of more accurate physical models, and improve statistics as more particles tracks can be simulated in low response time.
Simulation Analysis of Data Sharing in Shared Memory Multiprocessors
1989-02-24
LIMITATION OF ABSTRACT Same as Report (SAR) 18. NUMBER OF PAGES 178 19a. NAME OF RESPONSIBLE PERSON a. REPORT unclassified b . ABSTRACT unclassified...work. Andrea Casotto (CELL), Steve McGrogan (SPICE), Srinivas Devadas (TOPOP1) and Hi-Keung Tony Ma (VERIFY) donated the parallel programs and a con...Effect of Block Size on B us Utilization 120 5-14 Ratio of Sharing Bus Cyc les to Total Bus Cycles 120 5-15 Oassification of Bus Cyc les for
Hierarchically Parallelized Constrained Nonlinear Solvers with Automated Substructuring
NASA Technical Reports Server (NTRS)
Padovan, Joe; Kwang, Abel
1994-01-01
This paper develops a parallelizable multilevel multiple constrained nonlinear equation solver. The substructuring process is automated to yield appropriately balanced partitioning of each succeeding level. Due to the generality of the procedure,_sequential, as well as partially and fully parallel environments can be handled. This includes both single and multiprocessor assignment per individual partition. Several benchmark examples are presented. These illustrate the robustness of the procedure as well as its capability to yield significant reductions in memory utilization and calculational effort due both to updating and inversion.
The Effects of Block Size on the Performance of Coherent Caches in Shared-Memory Multiprocessors
1993-05-01
increase with the bandwidth and latency. For those applications with poor spatial locality, the best choice of cache line size is determined by the...observation was used in the design of two schemes: LimitLESS di- rectories and Tag caches. LimitLESS directories [15] were designed for the ALEWIFE...small packets may be used to avoid network congestion. The most important factor influencing the choice of cache line size for a multipro- cessor is the
A model for the distributed storage and processing of large arrays
NASA Technical Reports Server (NTRS)
Mehrota, P.; Pratt, T. W.
1983-01-01
A conceptual model for parallel computations on large arrays is developed. The model provides a set of language concepts appropriate for processing arrays which are generally too large to fit in the primary memories of a multiprocessor system. The semantic model is used to represent arrays on a concurrent architecture in such a way that the performance realities inherent in the distributed storage and processing can be adequately represented. An implementation of the large array concept as an Ada package is also described.
Embedded Multiprocessor Technology for VHSIC Insertion
NASA Technical Reports Server (NTRS)
Hayes, Paul J.
1990-01-01
Viewgraphs on embedded multiprocessor technology for VHSIC insertion are presented. The objective was to develop multiprocessor system technology providing user-selectable fault tolerance, increased throughput, and ease of application representation for concurrent operation. The approach was to develop graph management mapping theory for proper performance, model multiprocessor performance, and demonstrate performance in selected hardware systems.
Importance of balanced architectures in the design of high-performance imaging systems
NASA Astrophysics Data System (ADS)
Sgro, Joseph A.; Stanton, Paul C.
1999-03-01
Imaging systems employed in demanding military and industrial applications, such as automatic target recognition and computer vision, typically require real-time high-performance computing resources. While high- performances computing systems have traditionally relied on proprietary architectures and custom components, recent advances in high performance general-purpose microprocessor technology have produced an abundance of low cost components suitable for use in high-performance computing systems. A common pitfall in the design of high performance imaging system, particularly systems employing scalable multiprocessor architectures, is the failure to balance computational and memory bandwidth. The performance of standard cluster designs, for example, in which several processors share a common memory bus, is typically constrained by memory bandwidth. The symptom characteristic of this problem is failure to the performance of the system to scale as more processors are added. The problem becomes exacerbated if I/O and memory functions share the same bus. The recent introduction of microprocessors with large internal caches and high performance external memory interfaces makes it practical to design high performance imaging system with balanced computational and memory bandwidth. Real word examples of such designs will be presented, along with a discussion of adapting algorithm design to best utilize available memory bandwidth.
NASA Astrophysics Data System (ADS)
Chase, Patrick; Vondran, Gary
2011-01-01
Tetrahedral interpolation is commonly used to implement continuous color space conversions from sparse 3D and 4D lookup tables. We investigate the implementation and optimization of tetrahedral interpolation algorithms for GPUs, and compare to the best known CPU implementations as well as to a well known GPU-based trilinear implementation. We show that a 500 NVIDIA GTX-580 GPU is 3x faster than a 1000 Intel Core i7 980X CPU for 3D interpolation, and 9x faster for 4D interpolation. Performance-relevant GPU attributes are explored including thread scheduling, local memory characteristics, global memory hierarchy, and cache behaviors. We consider existing tetrahedral interpolation algorithms and tune based on the structure and branching capabilities of current GPUs. Global memory performance is improved by reordering and expanding the lookup table to ensure optimal access behaviors. Per multiprocessor local memory is exploited to implement optimally coalesced global memory accesses, and local memory addressing is optimized to minimize bank conflicts. We explore the impacts of lookup table density upon computation and memory access costs. Also presented are CPU-based 3D and 4D interpolators, using SSE vector operations that are faster than any previously published solution.
Managing coherence via put/get windows
Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton on Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Hoenicke, Dirk [Ossining, NY; Ohmacht, Martin [Yorktown Heights, NY
2011-01-11
A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.
Managing coherence via put/get windows
Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton on Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Hoenicke, Dirk [Ossining, NY; Ohmacht, Martin [Yorktown Heights, NY
2012-02-21
A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.
Compiler analysis for irregular problems in FORTRAN D
NASA Technical Reports Server (NTRS)
Vonhanxleden, Reinhard; Kennedy, Ken; Koelbel, Charles; Das, Raja; Saltz, Joel
1992-01-01
We developed a dataflow framework which provides a basis for rigorously defining strategies to make use of runtime preprocessing methods for distributed memory multiprocessors. In many programs, several loops access the same off-processor memory locations. Our runtime support gives us a mechanism for tracking and reusing copies of off-processor data. A key aspect of our compiler analysis strategy is to determine when it is safe to reuse copies of off-processor data. Another crucial function of the compiler analysis is to identify situations which allow runtime preprocessing overheads to be amortized. This dataflow analysis will make it possible to effectively use the results of interprocedural analysis in our efforts to reduce interprocessor communication and the need for runtime preprocessing.
Validation of multiprocessor systems
NASA Technical Reports Server (NTRS)
Siewiorek, D. P.; Segall, Z.; Kong, T.
1982-01-01
Experiments that can be used to validate fault free performance of multiprocessor systems in aerospace systems integrating flight controls and avionics are discussed. Engineering prototypes for two fault tolerant multiprocessors are tested.
NASA Technical Reports Server (NTRS)
Waheed, Abdul; Yan, Jerry
1998-01-01
This paper presents a model to evaluate the performance and overhead of parallelizing sequential code using compiler directives for multiprocessing on distributed shared memory (DSM) systems. With increasing popularity of shared address space architectures, it is essential to understand their performance impact on programs that benefit from shared memory multiprocessing. We present a simple model to characterize the performance of programs that are parallelized using compiler directives for shared memory multiprocessing. We parallelized the sequential implementation of NAS benchmarks using native Fortran77 compiler directives for an Origin2000, which is a DSM system based on a cache-coherent Non Uniform Memory Access (ccNUMA) architecture. We report measurement based performance of these parallelized benchmarks from four perspectives: efficacy of parallelization process; scalability; parallelization overhead; and comparison with hand-parallelized and -optimized version of the same benchmarks. Our results indicate that sequential programs can conveniently be parallelized for DSM systems using compiler directives but realizing performance gains as predicted by the performance model depends primarily on minimizing architecture-specific data locality overhead.
Data traffic reduction schemes for Cholesky factorization on asynchronous multiprocessor systems
NASA Technical Reports Server (NTRS)
Naik, Vijay K.; Patrick, Merrell L.
1989-01-01
Communication requirements of Cholesky factorization of dense and sparse symmetric, positive definite matrices are analyzed. The communication requirement is characterized by the data traffic generated on multiprocessor systems with local and shared memory. Lower bound proofs are given to show that when the load is uniformly distributed the data traffic associated with factoring an n x n dense matrix using n to the alpha power (alpha less than or equal 2) processors is omega(n to the 2 + alpha/2 power). For n x n sparse matrices representing a square root of n x square root of n regular grid graph the data traffic is shown to be omega(n to the 1 + alpha/2 power), alpha less than or equal 1. Partitioning schemes that are variations of block assignment scheme are described and it is shown that the data traffic generated by these schemes are asymptotically optimal. The schemes allow efficient use of up to O(n to the 2nd power) processors in the dense case and up to O(n) processors in the sparse case before the total data traffic reaches the maximum value of O(n to the 3rd power) and O(n to the 3/2 power), respectively. It is shown that the block based partitioning schemes allow a better utilization of the data accessed from shared memory and thus reduce the data traffic than those based on column-wise wrap around assignment schemes.
Exploring the use of I/O nodes for computation in a MIMD multiprocessor
NASA Technical Reports Server (NTRS)
Kotz, David; Cai, Ting
1995-01-01
As parallel systems move into the production scientific-computing world, the emphasis will be on cost-effective solutions that provide high throughput for a mix of applications. Cost effective solutions demand that a system make effective use of all of its resources. Many MIMD multiprocessors today, however, distinguish between 'compute' and 'I/O' nodes, the latter having attached disks and being dedicated to running the file-system server. This static division of responsibilities simplifies system management but does not necessarily lead to the best performance in workloads that need a different balance of computation and I/O. Of course, computational processes sharing a node with a file-system service may receive less CPU time, network bandwidth, and memory bandwidth than they would on a computation-only node. In this paper we begin to examine this issue experimentally. We found that high performance I/O does not necessarily require substantial CPU time, leaving plenty of time for application computation. There were some complex file-system requests, however, which left little CPU time available to the application. (The impact on network and memory bandwidth still needs to be determined.) For applications (or users) that cannot tolerate an occasional interruption, we recommend that they continue to use only compute nodes. For tolerant applications needing more cycles than those provided by the compute nodes, we recommend that they take full advantage of both compute and I/O nodes for computation, and that operating systems should make this possible.
Sparse Gaussian elimination with controlled fill-in on a shared memory multiprocessor
NASA Technical Reports Server (NTRS)
Alaghband, Gita; Jordan, Harry F.
1989-01-01
It is shown that in sparse matrices arising from electronic circuits, it is possible to do computations on many diagonal elements simultaneously. A technique for obtaining an ordered compatible set directly from the ordered incompatible table is given. The ordering is based on the Markowitz number of the pivot candidates. This technique generates a set of compatible pivots with the property of generating few fills. A novel heuristic algorithm is presented that combines the idea of an order-compatible set with a limited binary tree search to generate several sets of compatible pivots in linear time. An elimination set for reducing the matrix is generated and selected on the basis of a minimum Markowitz sum number. The parallel pivoting technique presented is a stepwise algorithm and can be applied to any submatrix of the original matrix. Thus, it is not a preordering of the sparse matrix and is applied dynamically as the decomposition proceeds. Parameters are suggested to obtain a balance between parallelism and fill-ins. Results of applying the proposed algorithms on several large application matrices using the HEP multiprocessor (Kowalik, 1985) are presented and analyzed.
NASA Astrophysics Data System (ADS)
Li, Hao; Xie, Lunguo
2013-03-01
The design of cache system for Chip Multiprocessor (CMP) face many challenges because future CMPs will have more cores and greater on-chip cache capacity. There are two base design schemes about L2 cache: private scheme in which each L2 slice is treated as a private L2 cache and shared scheme in which all L2 slices are treated as a large L2 cache shared by all cores. Private caches provide the lowest hit latency but reduce the total effective cache capacity. A shared L2 cache increases the effective cache capacity but has long hit latencies when data is on a remote tile. This paper present a new Controlled Replication (CR) policy to reduce the capacities occupied by redundant shared replicas. the new CR policy increases the effective capacity than victim replication scheme and has lower hit latency than shared scheme. We evaluate the various schemes using full-system simulation of parallel applications. Results show that CR reduces the average memory access latency of shared scheme by an average of 13%, providing better overall performance than victim replication and shared schemes.
Queueing analysis of a canonical model of real-time multiprocessors
NASA Technical Reports Server (NTRS)
Krishna, C. M.; Shin, K. G.
1983-01-01
A logical classification of multiprocessor structures from the point of view of control applications is presented. A computation of the response time distribution for a canonical model of a real time multiprocessor is presented. The multiprocessor is approximated by a blocking model. Two separate models are derived: one created from the system's point of view, and the other from the point of view of an incoming task.
High speed quantitative digital microscopy
NASA Technical Reports Server (NTRS)
Castleman, K. R.; Price, K. H.; Eskenazi, R.; Ovadya, M. M.; Navon, M. A.
1984-01-01
Modern digital image processing hardware makes possible quantitative analysis of microscope images at high speed. This paper describes an application to automatic screening for cervical cancer. The system uses twelve MC6809 microprocessors arranged in a pipeline multiprocessor configuration. Each processor executes one part of the algorithm on each cell image as it passes through the pipeline. Each processor communicates with its upstream and downstream neighbors via shared two-port memory. Thus no time is devoted to input-output operations as such. This configuration is expected to be at least ten times faster than previous systems.
1990-06-01
RAM and ROM output enable signals. Figure C.7 shows the logic for the interrupt priority level (IPLO* through IPL2 *) and the interrupt acknowledge...IACK681* signal is sent to the DUART when a level one interrupt acknowledge is output by the CPU. The logic for the IACK681* and the IPLO* through IPL2 ...signals are actually implemented with an EPLD. Listing D.4 in Appendix D presents the Abel description of the IACK681* and IPLO* through IPL2
Programmable architecture for pixel level processing tasks in lightweight strapdown IR seekers
NASA Astrophysics Data System (ADS)
Coates, James L.
1993-06-01
Typical processing tasks associated with missile IR seeker applications are described, and a straw man suite of algorithms is presented. A fully programmable multiprocessor architecture is realized on a multimedia video processor (MVP) developed by Texas Instruments. The MVP combines the elements of RISC, floating point, advanced DSPs, graphics processors, display and acquisition control, RAM, and external memory. Front end pixel level tasks typical of missile interceptor applications, operating on 256 x 256 sensor imagery, can be processed at frame rates exceeding 100 Hz in a single MVP chip.
Laboratory for Computer Science Progress Report 19, 1 July 1981-30 June 1982.
1984-05-01
Multiprocessor Architectures 202 4. TRIX Operating System 209 5. VLSI Tools 212 ’SYSTEMATIC PROGRAM DEVELOPMENT, 221 1. Introduction 222 2. Specification...exploring distributed operating systems and the architecture of single-user powerful computers that are interconnected by communication networks. The...to now. In particular, we expect to experiment with languages, operating systems , and applications that establish the feasibility of distributed
Iterative algorithms for tridiagonal matrices on a WSI-multiprocessor
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gajski, D.D.; Sameh, A.H.; Wisniewski, J.A.
1982-01-01
With the rapid advances in semiconductor technology, the construction of Wafer Scale Integration (WSI)-multiprocessors consisting of a large number of processors is now feasible. We illustrate the implementation of some basic linear algebra algorithms on such multiprocessors.
Software/hardware distributed processing network supporting the Ada environment
NASA Astrophysics Data System (ADS)
Wood, Richard J.; Pryk, Zen
1993-09-01
A high-performance, fault-tolerant, distributed network has been developed, tested, and demonstrated. The network is based on the MIPS Computer Systems, Inc. R3000 Risc for processing, VHSIC ASICs for high speed, reliable, inter-node communications and compatible commercial memory and I/O boards. The network is an evolution of the Advanced Onboard Signal Processor (AOSP) architecture. It supports Ada application software with an Ada- implemented operating system. A six-node implementation (capable of expansion up to 256 nodes) of the RISC multiprocessor architecture provides 120 MIPS of scalar throughput, 96 Mbytes of RAM and 24 Mbytes of non-volatile memory. The network provides for all ground processing applications, has merit for space-qualified RISC-based network, and interfaces to advanced Computer Aided Software Engineering (CASE) tools for application software development.
Simplifying and speeding the management of intra-node cache coherence
Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton on Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Phillip [Cortlandt Manor, NY; Hoenicke, Dirk [Ossining, NY; Ohmacht, Martin [Yorktown Heights, NY
2012-04-17
A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.
Managing coherence via put/get windows
DOE Office of Scientific and Technical Information (OSTI.GOV)
Blumrich, Matthias A; Chen, Dong; Coteus, Paul W
A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an areamore » of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.« less
Shared performance monitor in a multiprocessor system
Chiu, George; Gara, Alan G; Salapura, Valentina
2014-12-02
A performance monitoring unit (PMU) and method for monitoring performance of events occurring in a multiprocessor system. The multiprocessor system comprises a plurality of processor devices units, each processor device for generating signals representing occurrences of events in the processor device, and, a single shared counter resource for performance monitoring. The performance monitor unit is shared by all processor cores in the multiprocessor system. The PMU is further programmed to monitor event signals issued from non-processor devices.
A fault-tolerant multiprocessor architecture for aircraft, volume 1. [autopilot configuration
NASA Technical Reports Server (NTRS)
Smith, T. B.; Hopkins, A. L.; Taylor, W.; Ausrotas, R. A.; Lala, J. H.; Hanley, L. D.; Martin, J. H.
1978-01-01
A fault-tolerant multiprocessor architecture is reported. This architecture, together with a comprehensive information system architecture, has important potential for future aircraft applications. A preliminary definition and assessment of a suitable multiprocessor architecture for such applications is developed.
Composing Data Parallel Code for a SPARQL Graph Engine
DOE Office of Scientific and Technical Information (OSTI.GOV)
Castellana, Vito G.; Tumeo, Antonino; Villa, Oreste
Big data analytics process large amount of data to extract knowledge from them. Semantic databases are big data applications that adopt the Resource Description Framework (RDF) to structure metadata through a graph-based representation. The graph based representation provides several benefits, such as the possibility to perform in memory processing with large amounts of parallelism. SPARQL is a language used to perform queries on RDF-structured data through graph matching. In this paper we present a tool that automatically translates SPARQL queries to parallel graph crawling and graph matching operations. The tool also supports complex SPARQL constructs, which requires more than basicmore » graph matching for their implementation. The tool generates parallel code annotated with OpenMP pragmas for x86 Shared-memory Multiprocessors (SMPs). With respect to commercial database systems such as Virtuoso, our approach reduces memory occupation due to join operations and provides higher performance. We show the scaling of the automatically generated graph-matching code on a 48-core SMP.« less
Efficient Parallelization of a Dynamic Unstructured Application on the Tera MTA
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Biswas, Rupak
1999-01-01
The success of parallel computing in solving real-life computationally-intensive problems relies on their efficient mapping and execution on large-scale multiprocessor architectures. Many important applications are both unstructured and dynamic in nature, making their efficient parallel implementation a daunting task. This paper presents the parallelization of a dynamic unstructured mesh adaptation algorithm using three popular programming paradigms on three leading supercomputers. We examine an MPI message-passing implementation on the Cray T3E and the SGI Origin2OOO, a shared-memory implementation using cache coherent nonuniform memory access (CC-NUMA) of the Origin2OOO, and a multi-threaded version on the newly-released Tera Multi-threaded Architecture (MTA). We compare several critical factors of this parallel code development, including runtime, scalability, programmability, and memory overhead. Our overall results demonstrate that multi-threaded systems offer tremendous potential for quickly and efficiently solving some of the most challenging real-life problems on parallel computers.
State recovery and lockstep execution restart in a system with multiprocessor pairing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gara, Alan; Gschwind, Michael K; Salapura, Valentina
System, method and computer program product for a multiprocessing system to offer selective pairing of processor cores for increased processing reliability. A selective pairing facility is provided that selectively connects, i.e., pairs, multiple microprocessor or processor cores to provide one highly reliable thread (or thread group). Each paired microprocessor or processor cores that provide one highly reliable thread for high-reliability connect with a system components such as a memory "nest" (or memory hierarchy), an optional system controller, and optional interrupt controller, optional I/O or peripheral devices, etc. The memory nest is attached to a selective pairing facility via a switchmore » or a bus. Each selectively paired processor core is includes a transactional execution facility, whereing the system is configured to enable processor rollback to a previous state and reinitialize lockstep execution in order to recover from an incorrect execution when an incorrect execution has been detected by the selective pairing facility.« less
A Tensor Product Formulation of Strassen's Matrix Multiplication Algorithm with Memory Reduction
Kumar, B.; Huang, C. -H.; Sadayappan, P.; ...
1995-01-01
In this article, we present a program generation strategy of Strassen's matrix multiplication algorithm using a programming methodology based on tensor product formulas. In this methodology, block recursive programs such as the fast Fourier Transforms and Strassen's matrix multiplication algorithm are expressed as algebraic formulas involving tensor products and other matrix operations. Such formulas can be systematically translated to high-performance parallel/vector codes for various architectures. In this article, we present a nonrecursive implementation of Strassen's algorithm for shared memory vector processors such as the Cray Y-MP. A previous implementation of Strassen's algorithm synthesized from tensor product formulas required working storagemore » of size O(7 n ) for multiplying 2 n × 2 n matrices. We present a modified formulation in which the working storage requirement is reduced to O(4 n ). The modified formulation exhibits sufficient parallelism for efficient implementation on a shared memory multiprocessor. Performance results on a Cray Y-MP8/64 are presented.« less
Two Fundamental Issues in Multiprocessing.
1987-10-01
Structural Model of a Multiprocessor 6 Figure 5: Operational Model of a Multiprocessor 7 Figure 6: The von Neumann Processor (from Gajski and Peir [201) 10...Computer Society, June, 1983. 20. Gajski , D. D. & J-K. Peir. "Essential Issues in Multiprocessor Systems". Computer 18, 6 (June 1985), 9-27. 21. Gurd
Insertion of coherence requests for debugging a multiprocessor
Blumrich, Matthias A.; Salapura, Valentina
2010-02-23
A method and system are disclosed to insert coherence events in a multiprocessor computer system, and to present those coherence events to the processors of the multiprocessor computer system for analysis and debugging purposes. The coherence events are inserted in the computer system by adding one or more special insert registers. By writing into the insert registers, coherence events are inserted in the multiprocessor system as if they were generated by the normal coherence protocol. Once these coherence events are processed, the processing of coherence events can continue in the normal operation mode.
Komarov, Ivan; D'Souza, Roshan M
2012-01-01
The Gillespie Stochastic Simulation Algorithm (GSSA) and its variants are cornerstone techniques to simulate reaction kinetics in situations where the concentration of the reactant is too low to allow deterministic techniques such as differential equations. The inherent limitations of the GSSA include the time required for executing a single run and the need for multiple runs for parameter sweep exercises due to the stochastic nature of the simulation. Even very efficient variants of GSSA are prohibitively expensive to compute and perform parameter sweeps. Here we present a novel variant of the exact GSSA that is amenable to acceleration by using graphics processing units (GPUs). We parallelize the execution of a single realization across threads in a warp (fine-grained parallelism). A warp is a collection of threads that are executed synchronously on a single multi-processor. Warps executing in parallel on different multi-processors (coarse-grained parallelism) simultaneously generate multiple trajectories. Novel data-structures and algorithms reduce memory traffic, which is the bottleneck in computing the GSSA. Our benchmarks show an 8×-120× performance gain over various state-of-the-art serial algorithms when simulating different types of models.
CUDA Optimization Strategies for Compute- and Memory-Bound Neuroimaging Algorithms
Lee, Daren; Dinov, Ivo; Dong, Bin; Gutman, Boris; Yanovsky, Igor; Toga, Arthur W.
2011-01-01
As neuroimaging algorithms and technology continue to grow faster than CPU performance in complexity and image resolution, data-parallel computing methods will be increasingly important. The high performance, data-parallel architecture of modern graphical processing units (GPUs) can reduce computational times by orders of magnitude. However, its massively threaded architecture introduces challenges when GPU resources are exceeded. This paper presents optimization strategies for compute- and memory-bound algorithms for the CUDA architecture. For compute-bound algorithms, the registers are reduced through variable reuse via shared memory and the data throughput is increased through heavier thread workloads and maximizing the thread configuration for a single thread block per multiprocessor. For memory-bound algorithms, fitting the data into the fast but limited GPU resources is achieved through reorganizing the data into self-contained structures and employing a multi-pass approach. Memory latencies are reduced by selecting memory resources whose cache performance are optimized for the algorithm's access patterns. We demonstrate the strategies on two computationally expensive algorithms and achieve optimized GPU implementations that perform up to 6× faster than unoptimized ones. Compared to CPU implementations, we achieve peak GPU speedups of 129× for the 3D unbiased nonlinear image registration technique and 93× for the non-local means surface denoising algorithm. PMID:21159404
CUDA optimization strategies for compute- and memory-bound neuroimaging algorithms.
Lee, Daren; Dinov, Ivo; Dong, Bin; Gutman, Boris; Yanovsky, Igor; Toga, Arthur W
2012-06-01
As neuroimaging algorithms and technology continue to grow faster than CPU performance in complexity and image resolution, data-parallel computing methods will be increasingly important. The high performance, data-parallel architecture of modern graphical processing units (GPUs) can reduce computational times by orders of magnitude. However, its massively threaded architecture introduces challenges when GPU resources are exceeded. This paper presents optimization strategies for compute- and memory-bound algorithms for the CUDA architecture. For compute-bound algorithms, the registers are reduced through variable reuse via shared memory and the data throughput is increased through heavier thread workloads and maximizing the thread configuration for a single thread block per multiprocessor. For memory-bound algorithms, fitting the data into the fast but limited GPU resources is achieved through reorganizing the data into self-contained structures and employing a multi-pass approach. Memory latencies are reduced by selecting memory resources whose cache performance are optimized for the algorithm's access patterns. We demonstrate the strategies on two computationally expensive algorithms and achieve optimized GPU implementations that perform up to 6× faster than unoptimized ones. Compared to CPU implementations, we achieve peak GPU speedups of 129× for the 3D unbiased nonlinear image registration technique and 93× for the non-local means surface denoising algorithm. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.
Energy-efficient fault tolerance in multiprocessor real-time systems
NASA Astrophysics Data System (ADS)
Guo, Yifeng
The recent progress in the multiprocessor/multicore systems has important implications for real-time system design and operation. From vehicle navigation to space applications as well as industrial control systems, the trend is to deploy multiple processors in real-time systems: systems with 4 -- 8 processors are common, and it is expected that many-core systems with dozens of processing cores will be available in near future. For such systems, in addition to general temporal requirement common for all real-time systems, two additional operational objectives are seen as critical: energy efficiency and fault tolerance. An intriguing dimension of the problem is that energy efficiency and fault tolerance are typically conflicting objectives, due to the fact that tolerating faults (e.g., permanent/transient) often requires extra resources with high energy consumption potential. In this dissertation, various techniques for energy-efficient fault tolerance in multiprocessor real-time systems have been investigated. First, the Reliability-Aware Power Management (RAPM) framework, which can preserve the system reliability with respect to transient faults when Dynamic Voltage Scaling (DVS) is applied for energy savings, is extended to support parallel real-time applications with precedence constraints. Next, the traditional Standby-Sparing (SS) technique for dual processor systems, which takes both transient and permanent faults into consideration while saving energy, is generalized to support multiprocessor systems with arbitrary number of identical processors. Observing the inefficient usage of slack time in the SS technique, a Preference-Oriented Scheduling Framework is designed to address the problem where tasks are given preferences for being executed as soon as possible (ASAP) or as late as possible (ALAP). A preference-oriented earliest deadline (POED) scheduler is proposed and its application in multiprocessor systems for energy-efficient fault tolerance is investigated, where tasks' main copies are executed ASAP while backup copies ALAP to reduce the overlapped execution of main and backup copies of the same task and thus reduce energy consumption. All proposed techniques are evaluated through extensive simulations and compared with other state-of-the-art approaches. The simulation results confirm that the proposed schemes can preserve the system reliability while still achieving substantial energy savings. Finally, for both SS and POED based Energy-Efficient Fault-Tolerant (EEFT) schemes, a series of recovery strategies are designed when more than one (transient and permanent) faults need to be tolerated.
Multiprocessor computer overset grid method and apparatus
Barnette, Daniel W.; Ober, Curtis C.
2003-01-01
A multiprocessor computer overset grid method and apparatus comprises associating points in each overset grid with processors and using mapped interpolation transformations to communicate intermediate values between processors assigned base and target points of the interpolation transformations. The method allows a multiprocessor computer to operate with effective load balance on overset grid applications.
A large-grain mapping approach for multiprocessor systems through data flow model. Ph.D. Thesis
NASA Technical Reports Server (NTRS)
Kim, Hwa-Soo
1991-01-01
A large-grain level mapping method is presented of numerical oriented applications onto multiprocessor systems. The method is based on the large-grain data flow representation of the input application and it assumes a general interconnection topology of the multiprocessor system. The large-grain data flow model was used because such representation best exhibits inherited parallelism in many important applications, e.g., CFD models based on partial differential equations can be presented in large-grain data flow format, very effectively. A generalized interconnection topology of the multiprocessor architecture is considered, including such architectural issues as interprocessor communication cost, with the aim to identify the 'best matching' between the application and the multiprocessor structure. The objective is to minimize the total execution time of the input algorithm running on the target system. The mapping strategy consists of the following: (1) large-grain data flow graph generation from the input application using compilation techniques; (2) data flow graph partitioning into basic computation blocks; and (3) physical mapping onto the target multiprocessor using a priority allocation scheme for the computation blocks.
High Performance, Dependable Multiprocessor
NASA Technical Reports Server (NTRS)
Ramos, Jeremy; Samson, John R.; Troxel, Ian; Subramaniyan, Rajagopal; Jacobs, Adam; Greco, James; Cieslewski, Grzegorz; Curreri, John; Fischer, Michael; Grobelny, Eric;
2006-01-01
With the ever increasing demand for higher bandwidth and processing capacity of today's space exploration, space science, and defense missions, the ability to efficiently apply commercial-off-the-shelf (COTS) processors for on-board computing is now a critical need. In response to this need, NASA's New Millennium Program office has commissioned the development of Dependable Multiprocessor (DM) technology for use in payload and robotic missions. The Dependable Multiprocessor technology is a COTS-based, power efficient, high performance, highly dependable, fault tolerant cluster computer. To date, Honeywell has successfully demonstrated a TRL4 prototype of the Dependable Multiprocessor [I], and is now working on the development of a TRLS prototype. For the present effort Honeywell has teamed up with the University of Florida's High-performance Computing and Simulation (HCS) Lab, and together the team has demonstrated major elements of the Dependable Multiprocessor TRLS system.
VME rollback hardware for time warp multiprocessor systems
NASA Technical Reports Server (NTRS)
Robb, Michael J.; Buzzell, Calvin A.
1992-01-01
The purpose of the research effort is to develop and demonstrate innovative hardware to implement specific rollback and timing functions required for efficient queue management and precision timekeeping in multiprocessor discrete event simulations. The previously completed phase 1 effort demonstrated the technical feasibility of building hardware modules which eliminate the state saving overhead of the Time Warp paradigm used in distributed simulations on multiprocessor systems. The current phase 2 effort will build multiple pre-production rollback hardware modules integrated with a network of Sun workstations, and the integrated system will be tested by executing a Time Warp simulation. The rollback hardware will be designed to interface with the greatest number of multiprocessor systems possible. The authors believe that the rollback hardware will provide for significant speedup of large scale discrete event simulation problems and allow multiprocessors using Time Warp to dramatically increase performance.
Parallel Gaussian elimination of a block tridiagonal matrix using multiple microcomputers
NASA Technical Reports Server (NTRS)
Blech, Richard A.
1989-01-01
The solution of a block tridiagonal matrix using parallel processing is demonstrated. The multiprocessor system on which results were obtained and the software environment used to program that system are described. Theoretical partitioning and resource allocation for the Gaussian elimination method used to solve the matrix are discussed. The results obtained from running 1, 2 and 3 processor versions of the block tridiagonal solver are presented. The PASCAL source code for these solvers is given in the appendix, and may be transportable to other shared memory parallel processors provided that the synchronization outlines are reproduced on the target system.
Kirk, Andrew G; Plant, David V; Szymanski, Ted H; Vranesic, Zvonko G; Tooley, Frank A P; Rolston, David R; Ayliffe, Michael H; Lacroix, Frederic K; Robertson, Brian; Bernier, Eric; Brosseau, Daniel F
2003-05-10
Design and implementation of a free-space optical backplane for multiprocessor applications is presented. The system is designed to interconnect four multiprocessor nodes that communicate by using multiplexed 32-bit packets. Each multiprocessor node is electrically connected to an optoelectronic VLSI chip which implements the hyperplane interconnection architecture. The chips each contain 256 optical transmitters (implemented as dual-rail multiple quantum-well modulators) and 256 optical receivers. A rigid free-space microoptical interconnection system that interconnects the transceiver chips in a 512-channel unidirectional ring is implemented. Full design, implementation, and operational details are provided.
NASA Astrophysics Data System (ADS)
Kirk, Andrew G.; Plant, David V.; Szymanski, Ted H.; Vranesic, Zvonko G.; Tooley, Frank A. P.; Rolston, David R.; Ayliffe, Michael H.; Lacroix, Frederic K.; Robertson, Brian; Bernier, Eric; Brosseau, Daniel F.
2003-05-01
Design and implementation of a free-space optical backplane for multiprocessor applications is presented. The system is designed to interconnect four multiprocessor nodes that communicate by using multiplexed 32-bit packets. Each multiprocessor node is electrically connected to an optoelectronic VLSI chip which implements the hyperplane interconnection architecture. The chips each contain 256 optical transmitters (implemented as dual-rail multiple quantum-well modulators) and 256 optical receivers. A rigid free-space microoptical interconnection system that interconnects the transceiver chips in a 512-channel unidirectional ring is implemented. Full design, implementation, and operational details are provided.
MIT Laboratory for Computer Science Progress Report, July 1984-June 1985
1985-06-01
larger (up to several thousand machines) multiprocessor systems. This facility, funded by the newly formed Strategic Computing Program of the Defense...Szolovits, Group Leader R. Patil Collaborating Investigators M. Criscitiello, M.D., Tufts-New England Medical Center Hospital J. Dzierzanowski, Ph.D., Dept...COMPUTATION STRUCTURES Academic Staff J. B. Dennis, Group Leader Research Staff W. B. Ackerman G. A. Boughton W. Y-P. Lim Graduate Students T-A. Chu S
Laboratory for Computer Science Progress Report 21, July 1983-June 1984.
1984-06-01
Systems 269 4. Distributed Consensus 270 5. Election of a Leader in a Distributed Ring of Processors 273 6. Distributed Network Algorithms 274 7. Diagnosis...multiprocessor systems. This facility, funded by the new!y formed Strategic Computing Program of the Defense Advanced Research Projects Agency, will enable...Academic Staff P. Szo)ovits, Group Leader R. Patil Collaborating Investigators M. Criscitiello, M.D., Tufts-New England Medical Center Hospital R
Operating system for a real-time multiprocessor propulsion system simulator. User's manual
NASA Technical Reports Server (NTRS)
Cole, G. L.
1985-01-01
The NASA Lewis Research Center is developing and evaluating experimental hardware and software systems to help meet future needs for real-time, high-fidelity simulations of air-breathing propulsion systems. Specifically, the real-time multiprocessor simulator project focuses on the use of multiple microprocessors to achieve the required computing speed and accuracy at relatively low cost. Operating systems for such hardware configurations are generally not available. A real time multiprocessor operating system (RTMPOS) that supports a variety of multiprocessor configurations was developed at Lewis. With some modification, RTMPOS can also support various microprocessors. RTMPOS, by means of menus and prompts, provides the user with a versatile, user-friendly environment for interactively loading, running, and obtaining results from a multiprocessor-based simulator. The menu functions are described and an example simulation session is included to demonstrate the steps required to go from the simulation loading phase to the execution phase.
Resource Management for Distributed Parallel Systems
NASA Technical Reports Server (NTRS)
Neuman, B. Clifford; Rao, Santosh
1993-01-01
Multiprocessor systems should exist in the the larger context of distributed systems, allowing multiprocessor resources to be shared by those that need them. Unfortunately, typical multiprocessor resource management techniques do not scale to large networks. The Prospero Resource Manager (PRM) is a scalable resource allocation system that supports the allocation of processing resources in large networks and multiprocessor systems. To manage resources in such distributed parallel systems, PRM employs three types of managers: system managers, job managers, and node managers. There exist multiple independent instances of each type of manager, reducing bottlenecks. The complexity of each manager is further reduced because each is designed to utilize information at an appropriate level of abstraction.
Mapping of H.264 decoding on a multiprocessor architecture
NASA Astrophysics Data System (ADS)
van der Tol, Erik B.; Jaspers, Egbert G.; Gelderblom, Rob H.
2003-05-01
Due to the increasing significance of development costs in the competitive domain of high-volume consumer electronics, generic solutions are required to enable reuse of the design effort and to increase the potential market volume. As a result from this, Systems-on-Chip (SoCs) contain a growing amount of fully programmable media processing devices as opposed to application-specific systems, which offered the most attractive solutions due to a high performance density. The following motivates this trend. First, SoCs are increasingly dominated by their communication infrastructure and embedded memory, thereby making the cost of the functional units less significant. Moreover, the continuously growing design costs require generic solutions that can be applied over a broad product range. Hence, powerful programmable SoCs are becoming increasingly attractive. However, to enable power-efficient designs, that are also scalable over the advancing VLSI technology, parallelism should be fully exploited. Both task-level and instruction-level parallelism can be provided by means of e.g. a VLIW multiprocessor architecture. To provide the above-mentioned scalability, we propose to partition the data over the processors, instead of traditional functional partitioning. An advantage of this approach is the inherent locality of data, which is extremely important for communication-efficient software implementations. Consequently, a software implementation is discussed, enabling e.g. SD resolution H.264 decoding with a two-processor architecture, whereas High-Definition (HD) decoding can be achieved with an eight-processor system, executing the same software. Experimental results show that the data communication considerably reduces up to 65% directly improving the overall performance. Apart from considerable improvement in memory bandwidth, this novel concept of partitioning offers a natural approach for optimally balancing the load of all processors, thereby further improving the overall speedup.
File-System Workload on a Scientific Multiprocessor
NASA Technical Reports Server (NTRS)
Kotz, David; Nieuwejaar, Nils
1995-01-01
Many scientific applications have intense computational and I/O requirements. Although multiprocessors have permitted astounding increases in computational performance, the formidable I/O needs of these applications cannot be met by current multiprocessors a their I/O subsystems. To prevent I/O subsystems from forever bottlenecking multiprocessors and limiting the range of feasible applications, new I/O subsystems must be designed. The successful design of computer systems (both hardware and software) depends on a thorough understanding of their intended use. A system designer optimizes the policies and mechanisms for the cases expected to most common in the user's workload. In the case of multiprocessor file systems, however, designers have been forced to build file systems based only on speculation about how they would be used, extrapolating from file-system characterizations of general-purpose workloads on uniprocessor and distributed systems or scientific workloads on vector supercomputers (see sidebar on related work). To help these system designers, in June 1993 we began the Charisma Project, so named because the project sought to characterize 1/0 in scientific multiprocessor applications from a variety of production parallel computing platforms and sites. The Charisma project is unique in recording individual read and write requests-in live, multiprogramming, parallel workloads (rather than from selected or nonparallel applications). In this article, we present the first results from the project: a characterization of the file-system workload an iPSC/860 multiprocessor running production, parallel scientific applications at NASA's Ames Research Center.
Memory Benchmarks for SMP-Based High Performance Parallel Computers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yoo, A B; de Supinski, B; Mueller, F
2001-11-20
As the speed gap between CPU and main memory continues to grow, memory accesses increasingly dominates the performance of many applications. The problem is particularly acute for symmetric multiprocessor (SMP) systems, where the shared memory may be accessed concurrently by a group of threads running on separate CPUs. Unfortunately, several key issues governing memory system performance in current systems are not well understood. Complex interactions between the levels of the memory hierarchy, buses or switches, DRAM back-ends, system software, and application access patterns can make it difficult to pinpoint bottlenecks and determine appropriate optimizations, and the situation is even moremore » complex for SMP systems. To partially address this problem, we formulated a set of multi-threaded microbenchmarks for characterizing and measuring the performance of the underlying memory system in SMP-based high-performance computers. We report our use of these microbenchmarks on two important SMP-based machines. This paper has four primary contributions. First, we introduce a microbenchmark suite to systematically assess and compare the performance of different levels in SMP memory hierarchies. Second, we present a new tool based on hardware performance monitors to determine a wide array of memory system characteristics, such as cache sizes, quickly and easily; by using this tool, memory performance studies can be targeted to the full spectrum of performance regimes with many fewer data points than is otherwise required. Third, we present experimental results indicating that the performance of applications with large memory footprints remains largely constrained by memory. Fourth, we demonstrate that thread-level parallelism further degrades memory performance, even for the latest SMPs with hardware prefetching and switch-based memory interconnects.« less
A Framework for Parallel Unstructured Grid Generation for Complex Aerodynamic Simulations
NASA Technical Reports Server (NTRS)
Zagaris, George; Pirzadeh, Shahyar Z.; Chrisochoides, Nikos
2009-01-01
A framework for parallel unstructured grid generation targeting both shared memory multi-processors and distributed memory architectures is presented. The two fundamental building-blocks of the framework consist of: (1) the Advancing-Partition (AP) method used for domain decomposition and (2) the Advancing Front (AF) method used for mesh generation. Starting from the surface mesh of the computational domain, the AP method is applied recursively to generate a set of sub-domains. Next, the sub-domains are meshed in parallel using the AF method. The recursive nature of domain decomposition naturally maps to a divide-and-conquer algorithm which exhibits inherent parallelism. For the parallel implementation, the Master/Worker pattern is employed to dynamically balance the varying workloads of each task on the set of available CPUs. Performance results by this approach are presented and discussed in detail as well as future work and improvements.
NASA Technical Reports Server (NTRS)
Fatoohi, Rod; Saini, Subbash; Ciotti, Robert
2006-01-01
We study the performance of inter-process communication on four high-speed multiprocessor systems using a set of communication benchmarks. The goal is to identify certain limiting factors and bottlenecks with the interconnect of these systems as well as to compare these interconnects. We measured network bandwidth using different number of communicating processors and communication patterns, such as point-to-point communication, collective communication, and dense communication patterns. The four platforms are: a 512-processor SGI Altix 3700 BX2 shared-memory machine with 3.2 GB/s links; a 64-processor (single-streaming) Cray XI shared-memory machine with 32 1.6 GB/s links; a 128-processor Cray Opteron cluster using a Myrinet network; and a 1280-node Dell PowerEdge cluster with an InfiniBand network. Our, results show the impact of the network bandwidth and topology on the overall performance of each interconnect.
Dynamic Load Balancing for Adaptive Computations on Distributed-Memory Machines
NASA Technical Reports Server (NTRS)
1999-01-01
Dynamic load balancing is central to adaptive mesh-based computations on large-scale parallel computers. The principal investigator has investigated various issues on the dynamic load balancing problem under NASA JOVE and JAG rants. The major accomplishments of the project are two graph partitioning algorithms and a load balancing framework. The S-HARP dynamic graph partitioner is known to be the fastest among the known dynamic graph partitioners to date. It can partition a graph of over 100,000 vertices in 0.25 seconds on a 64- processor Cray T3E distributed-memory multiprocessor while maintaining the scalability of over 16-fold speedup. Other known and widely used dynamic graph partitioners take over a second or two while giving low scalability of a few fold speedup on 64 processors. These results have been published in journals and peer-reviewed flagship conferences.
DOE Office of Scientific and Technical Information (OSTI.GOV)
G.A. Pope; K. Sephernoori; D.C. McKinney
1996-03-15
This report describes the application of distributed-memory parallel programming techniques to a compositional simulator called UTCHEM. The University of Texas Chemical Flooding reservoir simulator (UTCHEM) is a general-purpose vectorized chemical flooding simulator that models the transport of chemical species in three-dimensional, multiphase flow through permeable media. The parallel version of UTCHEM addresses solving large-scale problems by reducing the amount of time that is required to obtain the solution as well as providing a flexible and portable programming environment. In this work, the original parallel version of UTCHEM was modified and ported to CRAY T3D and CRAY T3E, distributed-memory, multiprocessor computersmore » using CRAY-PVM as the interprocessor communication library. Also, the data communication routines were modified such that the portability of the original code across different computer architectures was mad possible.« less
A CRAY-Class Multiprocessor Simulator.
1983-09-01
instruction will appear in this column. 38 ! ?m L < ¢ < . - ?. .., ..i.-..--. .°.- ’’ , ... .,. ... 18. BCG -The parcel buffer change flag. When the next...program is first started up, no instruction parcels are in the parcel buffers so a fetch is required. The asterisk in the BCG field indicates a buffer...an in-buffer target address. While the buffer change is in progress, the instruction causing the change will place its tag in the BCG column. Once the
Shared performance monitor in a multiprocessor system
Chiu, George; Gara, Alan G.; Salapura, Valentina
2012-07-24
A performance monitoring unit (PMU) and method for monitoring performance of events occurring in a multiprocessor system. The multiprocessor system comprises a plurality of processor devices units, each processor device for generating signals representing occurrences of events in the processor device, and, a single shared counter resource for performance monitoring. The performance monitor unit is shared by all processor cores in the multiprocessor system. The PMU comprises: a plurality of performance counters each for counting signals representing occurrences of events from one or more the plurality of processor units in the multiprocessor system; and, a plurality of input devices for receiving the event signals from one or more processor devices of the plurality of processor units, the plurality of input devices programmable to select event signals for receipt by one or more of the plurality of performance counters for counting, wherein the PMU is shared between multiple processing units, or within a group of processors in the multiprocessing system. The PMU is further programmed to monitor event signals issued from non-processor devices.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sofronov, I.D.; Voronin, B.L.; Butnev, O.I.
1997-12-31
The aim of the work performed is to develop a 3D parallel program for numerical calculation of gas dynamics problem with heat conductivity on distributed memory computational systems (CS), satisfying the condition of numerical result independence from the number of processors involved. Two basically different approaches to the structure of massive parallel computations have been developed. The first approach uses the 3D data matrix decomposition reconstructed at temporal cycle and is a development of parallelization algorithms for multiprocessor CS with shareable memory. The second approach is based on using a 3D data matrix decomposition not reconstructed during a temporal cycle.more » The program was developed on 8-processor CS MP-3 made in VNIIEF and was adapted to a massive parallel CS Meiko-2 in LLNL by joint efforts of VNIIEF and LLNL staffs. A large number of numerical experiments has been carried out with different number of processors up to 256 and the efficiency of parallelization has been evaluated in dependence on processor number and their parameters.« less
Multiprogramming performance degradation - Case study on a shared memory multiprocessor
NASA Technical Reports Server (NTRS)
Dimpsey, R. T.; Iyer, R. K.
1989-01-01
The performance degradation due to multiprogramming overhead is quantified for a parallel-processing machine. Measurements of real workloads were taken, and it was found that there is a moderate correlation between the completion time of a program and the amount of system overhead measured during program execution. Experiments in controlled environments were then conducted to calculate a lower bound on the performance degradation of parallel jobs caused by multiprogramming overhead. The results show that the multiprogramming overhead of parallel jobs consumes at least 4 percent of the processor time. When two or more serial jobs are introduced into the system, this amount increases to 5.3 percent
A mechanism for efficient debugging of parallel programs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Miller, B.P.; Choi, J.D.
1988-01-01
This paper addresses the design and implementation of an integrated debugging system for parallel programs running on shared memory multi-processors (SMMP). The authors describe the use of flowback analysis to provide information on causal relationships between events in a program's execution without re-executing the program for debugging. The authors introduce a mechanism called incremental tracing that, by using semantic analyses of the debugged program, makes the flowback analysis practical with only a small amount of trace generated during execution. The extend flowback analysis to apply to parallel programs and describe a method to detect race conditions in the interactions ofmore » the co-operating processes.« less
Parallel computation with the force
NASA Technical Reports Server (NTRS)
Jordan, H. F.
1985-01-01
A methodology, called the force, supports the construction of programs to be executed in parallel by a force of processes. The number of processes in the force is unspecified, but potentially very large. The force idea is embodied in a set of macros which produce multiproceossor FORTRAN code and has been studied on two shared memory multiprocessors of fairly different character. The method has simplified the writing of highly parallel programs within a limited class of parallel algorithms and is being extended to cover a broader class. The individual parallel constructs which comprise the force methodology are discussed. Of central concern are their semantics, implementation on different architectures and performance implications.
A comparison of multiprocessor scheduling methods for iterative data flow architectures
NASA Technical Reports Server (NTRS)
Storch, Matthew
1993-01-01
A comparative study is made between the Algorithm to Architecture Mapping Model (ATAMM) and three other related multiprocessing models from the published literature. The primary focus of all four models is the non-preemptive scheduling of large-grain iterative data flow graphs as required in real-time systems, control applications, signal processing, and pipelined computations. Important characteristics of the models such as injection control, dynamic assignment, multiple node instantiations, static optimum unfolding, range-chart guided scheduling, and mathematical optimization are identified. The models from the literature are compared with the ATAMM for performance, scheduling methods, memory requirements, and complexity of scheduling and design procedures.
Principles for problem aggregation and assignment in medium scale multiprocessors
NASA Technical Reports Server (NTRS)
Nicol, David M.; Saltz, Joel H.
1987-01-01
One of the most important issues in parallel processing is the mapping of workload to processors. This paper considers a large class of problems having a high degree of potential fine grained parallelism, and execution requirements that are either not predictable, or are too costly to predict. The main issues in mapping such a problem onto medium scale multiprocessors are those of aggregation and assignment. We study a method of parameterized aggregation that makes few assumptions about the workload. The mapping of aggregate units of work onto processors is uniform, and exploits locality of workload intensity to balance the unknown workload. In general, a finer aggregate granularity leads to a better balance at the price of increased communication/synchronization costs; the aggregation parameters can be adjusted to find a reasonable granularity. The effectiveness of this scheme is demonstrated on three model problems: an adaptive one-dimensional fluid dynamics problem with message passing, a sparse triangular linear system solver on both a shared memory and a message-passing machine, and a two-dimensional time-driven battlefield simulation employing message passing. Using the model problems, the tradeoffs are studied between balanced workload and the communication/synchronization costs. Finally, an analytical model is used to explain why the method balances workload and minimizes the variance in system behavior.
Parallel solution of the symmetric tridiagonal eigenproblem. Research report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jessup, E.R.
1989-10-01
This thesis discusses methods for computing all eigenvalues and eigenvectors of a symmetric tridiagonal matrix on a distributed-memory Multiple Instruction, Multiple Data multiprocessor. Only those techniques having the potential for both high numerical accuracy and significant large-grained parallelism are investigated. These include the QL method or Cuppen's divide and conquer method based on rank-one updating to compute both eigenvalues and eigenvectors, bisection to determine eigenvalues and inverse iteration to compute eigenvectors. To begin, the methods are compared with respect to computation time, communication time, parallel speed up, and accuracy. Experiments on an IPSC hypercube multiprocessor reveal that Cuppen's method ismore » the most accurate approach, but bisection with inverse iteration is the fastest and most parallel. Because the accuracy of the latter combination is determined by the quality of the computed eigenvectors, the factors influencing the accuracy of inverse iteration are examined. This includes, in part, statistical analysis of the effect of a starting vector with random components. These results are used to develop an implementation of inverse iteration producing eigenvectors with lower residual error and better orthogonality than those generated by the EISPACK routine TINVIT. This thesis concludes with adaptions of methods for the symmetric tridiagonal eigenproblem to the related problem of computing the singular value decomposition (SVD) of a bidiagonal matrix.« less
Parallel solution of the symmetric tridiagonal eigenproblem
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jessup, E.R.
1989-01-01
This thesis discusses methods for computing all eigenvalues and eigenvectors of a symmetric tridiagonal matrix on a distributed memory MIMD multiprocessor. Only those techniques having the potential for both high numerical accuracy and significant large-grained parallelism are investigated. These include the QL method or Cuppen's divide and conquer method based on rank-one updating to compute both eigenvalues and eigenvectors, bisection to determine eigenvalues, and inverse iteration to compute eigenvectors. To begin, the methods are compared with respect to computation time, communication time, parallel speedup, and accuracy. Experiments on an iPSC hyper-cube multiprocessor reveal that Cuppen's method is the most accuratemore » approach, but bisection with inverse iteration is the fastest and most parallel. Because the accuracy of the latter combination is determined by the quality of the computed eigenvectors, the factors influencing the accuracy of inverse iteration are examined. This includes, in part, statistical analysis of the effects of a starting vector with random components. These results are used to develop an implementation of inverse iteration producing eigenvectors with lower residual error and better orthogonality than those generated by the EISPACK routine TINVIT. This thesis concludes with adaptations of methods for the symmetric tridiagonal eigenproblem to the related problem of computing the singular value decomposition (SVD) of a bidiagonal matrix.« less
Portable parallel stochastic optimization for the design of aeropropulsion components
NASA Technical Reports Server (NTRS)
Sues, Robert H.; Rhodes, G. S.
1994-01-01
This report presents the results of Phase 1 research to develop a methodology for performing large-scale Multi-disciplinary Stochastic Optimization (MSO) for the design of aerospace systems ranging from aeropropulsion components to complete aircraft configurations. The current research recognizes that such design optimization problems are computationally expensive, and require the use of either massively parallel or multiple-processor computers. The methodology also recognizes that many operational and performance parameters are uncertain, and that uncertainty must be considered explicitly to achieve optimum performance and cost. The objective of this Phase 1 research was to initialize the development of an MSO methodology that is portable to a wide variety of hardware platforms, while achieving efficient, large-scale parallelism when multiple processors are available. The first effort in the project was a literature review of available computer hardware, as well as review of portable, parallel programming environments. The first effort was to implement the MSO methodology for a problem using the portable parallel programming language, Parallel Virtual Machine (PVM). The third and final effort was to demonstrate the example on a variety of computers, including a distributed-memory multiprocessor, a distributed-memory network of workstations, and a single-processor workstation. Results indicate the MSO methodology can be well-applied towards large-scale aerospace design problems. Nearly perfect linear speedup was demonstrated for computation of optimization sensitivity coefficients on both a 128-node distributed-memory multiprocessor (the Intel iPSC/860) and a network of workstations (speedups of almost 19 times achieved for 20 workstations). Very high parallel efficiencies (75 percent for 31 processors and 60 percent for 50 processors) were also achieved for computation of aerodynamic influence coefficients on the Intel. Finally, the multi-level parallelization strategy that will be needed for large-scale MSO problems was demonstrated to be highly efficient. The same parallel code instructions were used on both platforms, demonstrating portability. There are many applications for which MSO can be applied, including NASA's High-Speed-Civil Transport, and advanced propulsion systems. The use of MSO will reduce design and development time and testing costs dramatically.
Realtime multiprocessor for mobile ad hoc networks
NASA Astrophysics Data System (ADS)
Jungeblut, T.; Grünewald, M.; Porrmann, M.; Rückert, U.
2008-05-01
This paper introduces a real-time Multiprocessor System-On-Chip (MPSoC) for low power wireless applications. The multiprocessor is based on eight 32bit RISC processors that are connected via an Network-On-Chip (NoC). The NoC follows a novel approach with guaranteed bandwidth to the application that meets hard realtime requirements. At a clock frequency of 100 MHz the total power consumption of the MPSoC that has been fabricated in 180 nm UMC standard cell technology is 772 mW.
Low latency messages on distributed memory multiprocessors
NASA Technical Reports Server (NTRS)
Rosing, Matthew; Saltz, Joel
1993-01-01
Many of the issues in developing an efficient interface for communication on distributed memory machines are described and a portable interface is proposed. Although the hardware component of message latency is less than one microsecond on many distributed memory machines, the software latency associated with sending and receiving typed messages is on the order of 50 microseconds. The reason for this imbalance is that the software interface does not match the hardware. By changing the interface to match the hardware more closely, applications with fine grained communication can be put on these machines. Based on several tests that were run on the iPSC/860, an interface that will better match current distributed memory machines is proposed. The model used in the proposed interface consists of a computation processor and a communication processor on each node. Communication between these processors and other nodes in the system is done through a buffered network. Information that is transmitted is either data or procedures to be executed on the remote processor. The dual processor system is better suited for efficiently handling asynchronous communications compared to a single processor system. The ability to send data or procedure is very flexible for minimizing message latency, based on the type of communication being performed. The test performed and the proposed interface are described.
ATAMM enhancement and multiprocessor performance evaluation
NASA Technical Reports Server (NTRS)
Stoughton, John W.; Mielke, Roland R.; Som, Sukhamoy; Obando, Rodrigo; Malekpour, Mahyar R.; Jones, Robert L., III; Mandala, Brij Mohan V.
1991-01-01
ATAMM (Algorithm To Architecture Mapping Model) enhancement and multiprocessor performance evaluation is discussed. The following topics are included: the ATAMM model; ATAMM enhancement; ADM (Advanced Development Model) implementation of ATAMM; and ATAMM support tools.
Multiprocessor smalltalk: Implementation, performance, and analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pallas, J.I.
1990-01-01
Multiprocessor Smalltalk demonstrates the value of object-oriented programming on a multiprocessor. Its implementation and analysis shed light on three areas: concurrent programming in an object oriented language without special extensions, implementation techniques for adapting to multiprocessors, and performance factors in the resulting system. Adding parallelism to Smalltalk code is easy, because programs already use control abstractions like iterators. Smalltalk's basic control and concurrency primitives (lambda expressions, processes and semaphores) can be used to build parallel control abstractions, including parallel iterators, parallel objects, atomic objects, and futures. Language extensions for concurrency are not required. This implementation demonstrates that it is possiblemore » to build an efficient parallel object-oriented programming system and illustrates techniques for doing so. Three modification tools-serialization, replication, and reorganization-adapted the Berkeley Smalltalk interpreter to the Firefly multiprocessor. Multiprocessor Smalltalk's performance shows that the combination of multiprocessing and object-oriented programming can be effective: speedups (relative to the original serial version) exceed 2.0 for five processors on all the benchmarks; the median efficiency is 48%. Analysis shows both where performance is lost and how to improve and generalize the experimental results. Changes in the interpreter to support concurrency add at most 12% overhead; better access to per-process variables could eliminate much of that. Changes in the user code to express concurrency add as much as 70% overhead; this overhead could be reduced to 54% if blocks (lambda expressions) were reentrant. Performance is also lost when the program cannot keep all five processors busy.« less
Load Balancing Unstructured Adaptive Grids for CFD Problems
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Oliker, Leonid
1996-01-01
Mesh adaption is a powerful tool for efficient unstructured-grid computations but causes load imbalance among processors on a parallel machine. A dynamic load balancing method is presented that balances the workload across all processors with a global view. After each parallel tetrahedral mesh adaption, the method first determines if the new mesh is sufficiently unbalanced to warrant a repartitioning. If so, the adapted mesh is repartitioned, with new partitions assigned to processors so that the redistribution cost is minimized. The new partitions are accepted only if the remapping cost is compensated by the improved load balance. Results indicate that this strategy is effective for large-scale scientific computations on distributed-memory multiprocessors.
HEP - A semaphore-synchronized multiprocessor with central control. [Heterogeneous Element Processor
NASA Technical Reports Server (NTRS)
Gilliland, M. C.; Smith, B. J.; Calvert, W.
1976-01-01
The paper describes the design concept of the Heterogeneous Element Processor (HEP), a system tailored to the special needs of scientific simulation. In order to achieve high-speed computation required by simulation, HEP features a hierarchy of processes executing in parallel on a number of processors, with synchronization being largely accomplished by hardware. A full-empty-reserve scheme of synchronization is realized by zero-one-valued hardware semaphores. A typical system has, besides the control computer and the scheduler, an algebraic module, a memory module, a first-in first-out (FIFO) module, an integrator module, and an I/O module. The architecture of the scheduler and the algebraic module is examined in detail.
An effective write policy for software coherence schemes
NASA Technical Reports Server (NTRS)
Chen, Yung-Chin; Veidenbaum, Alexander V.
1992-01-01
The authors study the write behavior and evaluate the performance of various write strategies and buffering techniques for a MIN-based multiprocessor system using the simple software coherence scheme. Hit ratios, memory latencies, total execution time, and total write traffic are used as the performance indices. The write-through write-allocate no-fetch cache using a write-back write buffer is shown to have a better performance than both write-through and write-back caches. This type of write buffer is effective in reducing the volume as well as bursts of write traffic. On average, the use of a write-back cache reduces by 60 percent the total write traffic generated by a write-through cache.
Software Mechanisms for Multiprocessor TLB Consistency
1989-12-01
thanks. Raj V aswani implemented the DASH message-passing system. Ramesh Govindan implemented part of the DASH virtual memory system. G . Scott...1 Latency (ma) 5 ---·-r··-·r···r··-T·-·-r··-·r····r····-r··-·r··- . g ~~~R=r 18 -----1·--·-r··-T·----1·--··r·---1·----1·--··r·---T·----1 16...model development. Synchronizing TLBs is similar to updating replicated data in a distributed environment. Lee and Garcia-Molina both used an M/ G /1
Method and apparatus for efficiently tracking queue entries relative to a timestamp
Blumrich, Matthias A.; Chen, Dong; Gara, Alan G.; Giampapa, Mark E.; Heidelberger, Philip; Ohmacht, Martin; Salapura, Velentina; Vranas, Pavlos
2014-06-17
An apparatus and method for tracking coherence event signals transmitted in a multiprocessor system. The apparatus comprises a coherence logic unit, each unit having a plurality of queue structures with each queue structure associated with a respective sender of event signals transmitted in the system. A timing circuit associated with a queue structure controls enqueuing and dequeuing of received coherence event signals, and, a counter tracks a number of coherence event signals remaining enqueued in the queue structure and dequeued since receipt of a timestamp signal. A counter mechanism generates an output signal indicating that all of the coherence event signals present in the queue structure at the time of receipt of the timestamp signal have been dequeued. In one embodiment, the timestamp signal is asserted at the start of a memory synchronization operation and, the output signal indicates that all coherence events present when the timestamp signal was asserted have completed. This signal can then be used as part of the completion condition for the memory synchronization operation.
Flexible language constructs for large parallel programs
NASA Technical Reports Server (NTRS)
Rosing, Matthew; Schnabel, Robert
1993-01-01
The goal of the research described is to develop flexible language constructs for writing large data parallel numerical programs for distributed memory (MIMD) multiprocessors. Previously, several models have been developed to support synchronization and communication. Models for global synchronization include SIMD (Single Instruction Multiple Data), SPMD (Single Program Multiple Data), and sequential programs annotated with data distribution statements. The two primary models for communication include implicit communication based on shared memory and explicit communication based on messages. None of these models by themselves seem sufficient to permit the natural and efficient expression of the variety of algorithms that occur in large scientific computations. An overview of a new language that combines many of these programming models in a clean manner is given. This is done in a modular fashion such that different models can be combined to support large programs. Within a module, the selection of a model depends on the algorithm and its efficiency requirements. An overview of the language and discussion of some of the critical implementation details is given.
Data traffic reduction schemes for sparse Cholesky factorizations
NASA Technical Reports Server (NTRS)
Naik, Vijay K.; Patrick, Merrell L.
1988-01-01
Load distribution schemes are presented which minimize the total data traffic in the Cholesky factorization of dense and sparse, symmetric, positive definite matrices on multiprocessor systems with local and shared memory. The total data traffic in factoring an n x n sparse, symmetric, positive definite matrix representing an n-vertex regular 2-D grid graph using n (sup alpha), alpha is equal to or less than 1, processors are shown to be O(n(sup 1 + alpha/2)). It is O(n(sup 3/2)), when n (sup alpha), alpha is equal to or greater than 1, processors are used. Under the conditions of uniform load distribution, these results are shown to be asymptotically optimal. The schemes allow efficient use of up to O(n) processors before the total data traffic reaches the maximum value of O(n(sup 3/2)). The partitioning employed within the scheme, allows a better utilization of the data accessed from shared memory than those of previously published methods.
S-Band POSIX Device Drivers for RTEMS
NASA Technical Reports Server (NTRS)
Lux, James P.; Lang, Minh; Peters, Kenneth J.; Taylor, Gregory H.
2011-01-01
This is a set of POSIX device driver level abstractions in the RTEMS RTOS (Real-Time Executive for Multiprocessor Systems real-time operating system) to SBand radio hardware devices that have been instantiated in an FPGA (field-programmable gate array). These include A/D (analog-to-digital) sample capture, D/A (digital-to-analog) sample playback, PLL (phase-locked-loop) tuning, and PWM (pulse-width-modulation)-controlled gain. This software interfaces to Sband radio hardware in an attached Xilinx Virtex-2 FPGA. It uses plug-and-play device discovery to map memory to device IDs. Instead of interacting with hardware devices directly, using direct-memory mapped access at the application level, this driver provides an application programming interface (API) offering that easily uses standard POSIX function calls. This simplifies application programming, enables portability, and offers an additional level of protection to the hardware. There are three separate device drivers included in this package: sband_device (ADC capture and DAC playback), pll_device (RF front end PLL tuning), and pwm_device (RF front end AGC control).
The Galley Parallel File System
NASA Technical Reports Server (NTRS)
Nieuwejaar, Nils; Kotz, David
1996-01-01
Most current multiprocessor file systems are designed to use multiple disks in parallel, using the high aggregate bandwidth to meet the growing I/0 requirements of parallel scientific applications. Many multiprocessor file systems provide applications with a conventional Unix-like interface, allowing the application to access multiple disks transparently. This interface conceals the parallelism within the file system, increasing the ease of programmability, but making it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. In addition to providing an insufficient interface, most current multiprocessor file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic scientific multiprocessor workloads. We discuss Galley's file structure and application interface, as well as the performance advantages offered by that interface.
Design and implementation of highly parallel pipelined VLSI systems
NASA Astrophysics Data System (ADS)
Delange, Alphonsus Anthonius Jozef
A methodology and its realization as a prototype CAD (Computer Aided Design) system for the design and analysis of complex multiprocessor systems is presented. The design is an iterative process in which the behavioral specifications of the system components are refined into structural descriptions consisting of interconnections and lower level components etc. A model for the representation and analysis of multiprocessor systems at several levels of abstraction and an implementation of a CAD system based on this model are described. A high level design language, an object oriented development kit for tool design, a design data management system, and design and analysis tools such as a high level simulator and graphics design interface which are integrated into the prototype system and graphics interface are described. Procedures for the synthesis of semiregular processor arrays, and to compute the switching of input/output signals, memory management and control of processor array, and sequencing and segmentation of input/output data streams due to partitioning and clustering of the processor array during the subsequent synthesis steps, are described. The architecture and control of a parallel system is designed and each component mapped to a module or module generator in a symbolic layout library, compacted for design rules of VLSI (Very Large Scale Integration) technology. An example of the design of a processor that is a useful building block for highly parallel pipelined systems in the signal/image processing domains is given.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Feo, J.T.
1993-10-01
This report contain papers on: Programmability and performance issues; The case of an iterative partial differential equation solver; Implementing the kernal of the Australian Region Weather Prediction Model in Sisal; Even and quarter-even prime length symmetric FFTs and their Sisal Implementations; Top-down thread generation for Sisal; Overlapping communications and computations on NUMA architechtures; Compiling technique based on dataflow analysis for funtional programming language Valid; Copy elimination for true multidimensional arrays in Sisal 2.0; Increasing parallelism for an optimization that reduces copying in IF2 graphs; Caching in on Sisal; Cache performance of Sisal Vs. FORTRAN; FFT algorithms on a shared-memory multiprocessor;more » A parallel implementation of nonnumeric search problems in Sisal; Computer vision algorithms in Sisal; Compilation of Sisal for a high-performance data driven vector processor; Sisal on distributed memory machines; A virtual shared addressing system for distributed memory Sisal; Developing a high-performance FFT algorithm in Sisal for a vector supercomputer; Implementation issues for IF2 on a static data-flow architechture; and Systematic control of parallelism in array-based data-flow computation. Selected papers have been indexed separately for inclusion in the Energy Science and Technology Database.« less
Scalable parallel communications
NASA Technical Reports Server (NTRS)
Maly, K.; Khanna, S.; Overstreet, C. M.; Mukkamala, R.; Zubair, M.; Sekhar, Y. S.; Foudriat, E. C.
1992-01-01
Coarse-grain parallelism in networking (that is, the use of multiple protocol processors running replicated software sending over several physical channels) can be used to provide gigabit communications for a single application. Since parallel network performance is highly dependent on real issues such as hardware properties (e.g., memory speeds and cache hit rates), operating system overhead (e.g., interrupt handling), and protocol performance (e.g., effect of timeouts), we have performed detailed simulations studies of both a bus-based multiprocessor workstation node (based on the Sun Galaxy MP multiprocessor) and a distributed-memory parallel computer node (based on the Touchstone DELTA) to evaluate the behavior of coarse-grain parallelism. Our results indicate: (1) coarse-grain parallelism can deliver multiple 100 Mbps with currently available hardware platforms and existing networking protocols (such as Transmission Control Protocol/Internet Protocol (TCP/IP) and parallel Fiber Distributed Data Interface (FDDI) rings); (2) scale-up is near linear in n, the number of protocol processors, and channels (for small n and up to a few hundred Mbps); and (3) since these results are based on existing hardware without specialized devices (except perhaps for some simple modifications of the FDDI boards), this is a low cost solution to providing multiple 100 Mbps on current machines. In addition, from both the performance analysis and the properties of these architectures, we conclude: (1) multiple processors providing identical services and the use of space division multiplexing for the physical channels can provide better reliability than monolithic approaches (it also provides graceful degradation and low-cost load balancing); (2) coarse-grain parallelism supports running several transport protocols in parallel to provide different types of service (for example, one TCP handles small messages for many users, other TCP's running in parallel provide high bandwidth service to a single application); and (3) coarse grain parallelism will be able to incorporate many future improvements from related work (e.g., reduced data movement, fast TCP, fine-grain parallelism) also with near linear speed-ups.
Improvement of multiprocessing performance by using optical centralized shared bus
NASA Astrophysics Data System (ADS)
Han, Xuliang; Chen, Ray T.
2004-06-01
With the ever-increasing need to solve larger and more complex problems, multiprocessing is attracting more and more research efforts. One of the challenges facing the multiprocessor designers is to fulfill in an effective manner the communications among the processes running in parallel on multiple multiprocessors. The conventional electrical backplane bus provides narrow bandwidth as restricted by the physical limitations of electrical interconnects. In the electrical domain, in order to operate at high frequency, the backplane topology has been changed from the simple shared bus to the complicated switched medium. However, the switched medium is an indirect network. It cannot support multicast/broadcast as effectively as the shared bus. Besides the additional latency of going through the intermediate switching nodes, signal routing introduces substantial delay and considerable system complexity. Alternatively, optics has been well known for its interconnect capability. Therefore, it has become imperative to investigate how to improve multiprocessing performance by utilizing optical interconnects. From the implementation standpoint, the existing optical technologies still cannot fulfill the intelligent functions that a switch fabric should provide as effectively as their electronic counterparts. Thus, an innovative optical technology that can provide sufficient bandwidth capacity, while at the same time, retaining the essential merits of the shared bus topology, is highly desirable for the multiprocessing performance improvement. In this paper, the optical centralized shared bus is proposed for use in the multiprocessing systems. This novel optical interconnect architecture not only utilizes the beneficial characteristics of optics, but also retains the desirable properties of the shared bus topology. Meanwhile, from the architecture standpoint, it fits well in the centralized shared-memory multiprocessing scheme. Therefore, a smooth migration with substantial multiprocessing performance improvement is expected. To prove the technical feasibility from the architecture standpoint, a conceptual emulation of the centralized shared-memory multiprocessing scheme is demonstrated on a generic PCI subsystem with an optical centralized shared bus.
Real-Time Multiprocessor Programming Language (RTMPL) user's manual
NASA Technical Reports Server (NTRS)
Arpasi, D. J.
1985-01-01
A real-time multiprocessor programming language (RTMPL) has been developed to provide for high-order programming of real-time simulations on systems of distributed computers. RTMPL is a structured, engineering-oriented language. The RTMPL utility supports a variety of multiprocessor configurations and types by generating assembly language programs according to user-specified targeting information. Many programming functions are assumed by the utility (e.g., data transfer and scaling) to reduce the programming chore. This manual describes RTMPL from a user's viewpoint. Source generation, applications, utility operation, and utility output are detailed. An example simulation is generated to illustrate many RTMPL features.
NASA Technical Reports Server (NTRS)
Clune, E.; Segall, Z.; Siewiorek, D.
1984-01-01
A program of experiments has been conducted at NASA-Langley to test the fault-free performance of a Fault-Tolerant Multiprocessor (FTMP) avionics system for next-generation aircraft. Baseline measurements of an operating FTMP system were obtained with respect to the following parameters: instruction execution time, frame size, and the variation of clock ticks. The mechanisms of frame stretching were also investigated. The experimental results are summarized in a table. Areas of interest for future tests are identified, with emphasis given to the implementation of a synthetic workload generation mechanism on FTMP.
Techniques for video compression
NASA Technical Reports Server (NTRS)
Wu, Chwan-Hwa
1995-01-01
In this report, we present our study on multiprocessor implementation of a MPEG2 encoding algorithm. First, we compare two approaches to implementing video standards, VLSI technology and multiprocessor processing, in terms of design complexity, applications, and cost. Then we evaluate the functional modules of MPEG2 encoding process in terms of their computation time. Two crucial modules are identified based on this evaluation. Then we present our experimental study on the multiprocessor implementation of the two crucial modules. Data partitioning is used for job assignment. Experimental results show that high speedup ratio and good scalability can be achieved by using this kind of job assignment strategy.
Spaceborne VHSIC multiprocessor system for AI applications
NASA Technical Reports Server (NTRS)
Lum, Henry, Jr.; Shrobe, Howard E.; Aspinall, John G.
1988-01-01
A multiprocessor system, under design for space-station applications, makes use of the latest generation symbolic processor and packaging technology. The result will be a compact, space-qualified system two to three orders of magnitude more powerful than present-day symbolic processing systems.
Contention Modeling for Multithreaded Distributed Shared Memory Machines: The Cray XMT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Secchi, Simone; Tumeo, Antonino; Villa, Oreste
Distributed Shared Memory (DSM) machines are a wide class of multi-processor computing systems where a large virtually-shared address space is mapped on a network of physically distributed memories. High memory latency and network contention are two of the main factors that limit performance scaling of such architectures. Modern high-performance computing DSM systems have evolved toward exploitation of massive hardware multi-threading and fine-grained memory hashing to tolerate irregular latencies, avoid network hot-spots and enable high scaling. In order to model the performance of such large-scale machines, parallel simulation has been proved to be a promising approach to achieve good accuracy inmore » reasonable times. One of the most critical factors in solving the simulation speed-accuracy trade-off is network modeling. The Cray XMT is a massively multi-threaded supercomputing architecture that belongs to the DSM class, since it implements a globally-shared address space abstraction on top of a physically distributed memory substrate. In this paper, we discuss the development of a contention-aware network model intended to be integrated in a full-system XMT simulator. We start by measuring the effects of network contention in a 128-processor XMT machine and then investigate the trade-off that exists between simulation accuracy and speed, by comparing three network models which operate at different levels of accuracy. The comparison and model validation is performed by executing a string-matching algorithm on the full-system simulator and on the XMT, using three datasets that generate noticeably different contention patterns.« less
A parallel algorithm for multi-level logic synthesis using the transduction method. M.S. Thesis
NASA Technical Reports Server (NTRS)
Lim, Chieng-Fai
1991-01-01
The Transduction Method has been shown to be a powerful tool in the optimization of multilevel networks. Many tools such as the SYLON synthesis system (X90), (CM89), (LM90) have been developed based on this method. A parallel implementation is presented of SYLON-XTRANS (XM89) on an eight processor Encore Multimax shared memory multiprocessor. It minimizes multilevel networks consisting of simple gates through parallel pruning, gate substitution, gate merging, generalized gate substitution, and gate input reduction. This implementation, called Parallel TRANSduction (PTRANS), also uses partitioning to break large circuits up and performs inter- and intra-partition dynamic load balancing. With this, good speedups and high processor efficiencies are achievable without sacrificing the resulting circuit quality.
Local concurrent error detection and correction in data structures using virtual backpointers
NASA Technical Reports Server (NTRS)
Li, C. C.; Chen, P. P.; Fuchs, W. K.
1987-01-01
A new technique, based on virtual backpointers, for local concurrent error detection and correction in linked data structures is presented. Two new data structures, the Virtual Double Linked List, and the B-tree with Virtual Backpointers, are described. For these structures, double errors can be detected in 0(1) time and errors detected during forward moves can be corrected in 0(1) time. The application of a concurrent auditor process to data structure error detection and correction is analyzed, and an implementation is described, to determine the effect on mean time to failure of a multi-user shared database system. The implementation utilizes a Sequent shared memory multiprocessor system operating on a shared databased of Virtual Double Linked Lists.
Local concurrent error detection and correction in data structures using virtual backpointers
NASA Technical Reports Server (NTRS)
Li, Chung-Chi Jim; Chen, Paul Peichuan; Fuchs, W. Kent
1989-01-01
A new technique, based on virtual backpointers, for local concurrent error detection and correction in linked data strutures is presented. Two new data structures, the Virtual Double Linked List, and the B-tree with Virtual Backpointers, are described. For these structures, double errors can be detected in 0(1) time and errors detected during forward moves can be corrected in 0(1) time. The application of a concurrent auditor process to data structure error detection and correction is analyzed, and an implementation is described, to determine the effect on mean time to failure of a multi-user shared database system. The implementation utilizes a Sequent shared memory multiprocessor system operating on a shared database of Virtual Double Linked Lists.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chin, George; Marquez, Andres; Choudhury, Sutanay
2012-09-01
Triadic analysis encompasses a useful set of graph mining methods that is centered on the concept of a triad, which is a subgraph of three nodes and the configuration of directed edges across the nodes. Such methods are often applied in the social sciences as well as many other diverse fields. Triadic methods commonly operate on a triad census that counts the number of triads of every possible edge configuration in a graph. Like other graph algorithms, triadic census algorithms do not scale well when graphs reach tens of millions to billions of nodes. To enable the triadic analysis ofmore » large-scale graphs, we developed and optimized a triad census algorithm to efficiently execute on shared memory architectures. We will retrace the development and evolution of a parallel triad census algorithm. Over the course of several versions, we continually adapted the code’s data structures and program logic to expose more opportunities to exploit parallelism on shared memory that would translate into improved computational performance. We will recall the critical steps and modifications that occurred during code development and optimization. Furthermore, we will compare the performances of triad census algorithm versions on three specific systems: Cray XMT, HP Superdome, and AMD multi-core NUMA machine. These three systems have shared memory architectures but with markedly different hardware capabilities to manage parallelism.« less
Modelling parallel programs and multiprocessor architectures with AXE
NASA Technical Reports Server (NTRS)
Yan, Jerry C.; Fineman, Charles E.
1991-01-01
AXE, An Experimental Environment for Parallel Systems, was designed to model and simulate for parallel systems at the process level. It provides an integrated environment for specifying computation models, multiprocessor architectures, data collection, and performance visualization. AXE is being used at NASA-Ames for developing resource management strategies, parallel problem formulation, multiprocessor architectures, and operating system issues related to the High Performance Computing and Communications Program. AXE's simple, structured user-interface enables the user to model parallel programs and machines precisely and efficiently. Its quick turn-around time keeps the user interested and productive. AXE models multicomputers. The user may easily modify various architectural parameters including the number of sites, connection topologies, and overhead for operating system activities. Parallel computations in AXE are represented as collections of autonomous computing objects known as players. Their use and behavior is described. Performance data of the multiprocessor model can be observed on a color screen. These include CPU and message routing bottlenecks, and the dynamic status of the software.
NASA Astrophysics Data System (ADS)
Urfianto, Mohammad Zalfany; Isshiki, Tsuyoshi; Khan, Arif Ullah; Li, Dongju; Kunieda, Hiroaki
This paper presentss a Multiprocessor System-on-Chips (MPSoC) architecture used as an execution platform for the new C-language based MPSoC design framework we are currently developing. The MPSoC architecture is based on an existing SoC platform with a commercial RISC core acting as the host CPU. We extend the existing SoC with a multiprocessor-array block that is used as the main engine to run parallel applications modeled in our design framework. Utilizing several optimizations provided by our compiler, an efficient inter-communication between processing elements with minimum overhead is implemented. A host-interface is designed to integrate the existing RISC core to the multiprocessor-array. The experimental results show that an efficacious integration is achieved, proving that the designed communication module can be used to efficiently incorporate off-the-shelf processors as a processing element for MPSoC architectures designed using our framework.
Polymorphous Computing Architectures
2007-12-12
provide a multiprocessor implementation. In this work, we introduce the Atomos transactional programming language, which is the first to include...implicit transactions, strong atomicity, and a scalable multiprocessor implementation [47]. Atomos is derived from Java, but replaces its synchronization...and conditional waiting constructs with transactional alternatives. The Atomos conditional waiting proposal is tailored to allow efficient
Parallel processing and expert systems
NASA Technical Reports Server (NTRS)
Lau, Sonie; Yan, Jerry C.
1991-01-01
Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 1990s cannot enjoy an increased level of autonomy without the efficient implementation of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real-time demands are met for larger systems. Speedup via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial laboratories in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems is surveyed. The survey discusses multiprocessors for expert systems, parallel languages for symbolic computations, and mapping expert systems to multiprocessors. Results to date indicate that the parallelism achieved for these systems is small. The main reasons are (1) the body of knowledge applicable in any given situation and the amount of computation executed by each rule firing are small, (2) dividing the problem solving process into relatively independent partitions is difficult, and (3) implementation decisions that enable expert systems to be incrementally refined hamper compile-time optimization. In order to obtain greater speedups, data parallelism and application parallelism must be exploited.
Flexible Language Constructs for Large Parallel Programs
Rosing, Matt; Schnabel, Robert
1994-01-01
The goal of the research described in this article is to develop flexible language constructs for writing large data parallel numerical programs for distributed memory (multiple instruction multiple data [MIMD]) multiprocessors. Previously, several models have been developed to support synchronization and communication. Models for global synchronization include single instruction multiple data (SIMD), single program multiple data (SPMD), and sequential programs annotated with data distribution statements. The two primary models for communication include implicit communication based on shared memory and explicit communication based on messages. None of these models by themselves seem sufficient to permit the natural and efficient expression ofmore » the variety of algorithms that occur in large scientific computations. In this article, we give an overview of a new language that combines many of these programming models in a clean manner. This is done in a modular fashion such that different models can be combined to support large programs. Within a module, the selection of a model depends on the algorithm and its efficiency requirements. In this article, we give an overview of the language and discuss some of the critical implementation details.« less
Static Scheduler for Hard Real-Time Tasks on Multiprocessor Systems
1992-09-01
Foundation of Computer Science, 1980 . [SIM83] Simons, B., "Multiprocessor Scheduling of Unit-Time Jobs with Arbitrary Release Times and Deadlines", SIAM...Research Office Attn: Dr. David Hislop P. O. Box 12211 Research Triangle Park, NC 27709-2211 31. Persistent Data Systems 75 W. Chapel Ridge Road Attn: Dr
Fault tree models for fault tolerant hypercube multiprocessors
NASA Technical Reports Server (NTRS)
Boyd, Mark A.; Tuazon, Jezus O.
1991-01-01
Three candidate fault tolerant hypercube architectures are modeled, their reliability analyses are compared, and the resulting implications of these methods of incorporating fault tolerance into hypercube multiprocessors are discussed. In the course of performing the reliability analyses, the use of HARP and fault trees in modeling sequence dependent system behaviors is demonstrated.
Modeling and measurement of fault-tolerant multiprocessors
NASA Technical Reports Server (NTRS)
Shin, K. G.; Woodbury, M. H.; Lee, Y. H.
1985-01-01
The workload effects on computer performance are addressed first for a highly reliable unibus multiprocessor used in real-time control. As an approach to studing these effects, a modified Stochastic Petri Net (SPN) is used to describe the synchronous operation of the multiprocessor system. From this model the vital components affecting performance can be determined. However, because of the complexity in solving the modified SPN, a simpler model, i.e., a closed priority queuing network, is constructed that represents the same critical aspects. The use of this model for a specific application requires the partitioning of the workload into job classes. It is shown that the steady state solution of the queuing model directly produces useful results. The use of this model in evaluating an existing system, the Fault Tolerant Multiprocessor (FTMP) at the NASA AIRLAB, is outlined with some experimental results. Also addressed is the technique of measuring fault latency, an important microscopic system parameter. Most related works have assumed no or a negligible fault latency and then performed approximate analyses. To eliminate this deficiency, a new methodology for indirectly measuring fault latency is presented.
Parallel reduced-instruction-set-computer architecture for real-time symbolic pattern matching
NASA Astrophysics Data System (ADS)
Parson, Dale E.
1991-03-01
This report discusses ongoing work on a parallel reduced-instruction- set-computer (RISC) architecture for automatic production matching. The PRIOPS compiler takes advantage of the memoryless character of automatic processing by translating a program's collection of automatic production tests into an equivalent combinational circuit-a digital circuit without memory, whose outputs are immediate functions of its inputs. The circuit provides a highly parallel, fine-grain model of automatic matching. The compiler then maps the combinational circuit onto RISC hardware. The heart of the processor is an array of comparators capable of testing production conditions in parallel, Each comparator attaches to private memory that contains virtual circuit nodes-records of the current state of nodes and busses in the combinational circuit. All comparator memories hold identical information, allowing simultaneous update for a single changing circuit node and simultaneous retrieval of different circuit nodes by different comparators. Along with the comparator-based logic unit is a sequencer that determines the current combination of production-derived comparisons to try, based on the combined success and failure of previous combinations of comparisons. The memoryless nature of automatic matching allows the compiler to designate invariant memory addresses for virtual circuit nodes, and to generate the most effective sequences of comparison test combinations. The result is maximal utilization of parallel hardware, indicating speed increases and scalability beyond that found for course-grain, multiprocessor approaches to concurrent Rete matching. Future work will consider application of this RISC architecture to the standard (controlled) Rete algorithm, where search through memory dominates portions of matching.
NASA Technical Reports Server (NTRS)
Arenstorf, Norbert S.; Jordan, Harry F.
1987-01-01
A barrier is a method for synchronizing a large number of concurrent computer processes. After considering some basic synchronization mechanisms, a collection of barrier algorithms with either linear or logarithmic depth are presented. A graphical model is described that profiles the execution of the barriers and other parallel programming constructs. This model shows how the interaction between the barrier algorithms and the work that they synchronize can impact their performance. One result is that logarithmic tree structured barriers show good performance when synchronizing fixed length work, while linear self-scheduled barriers show better performance when synchronizing fixed length work with an imbedded critical section. The linear barriers are better able to exploit the process skew associated with critical sections. Timing experiments, performed on an eighteen processor Flex/32 shared memory multiprocessor, that support these conclusions are detailed.
Asynchronous Data Retrieval from an Object-Oriented Database
NASA Astrophysics Data System (ADS)
Gilbert, Jonathan P.; Bic, Lubomir
We present an object-oriented semantic database model which, similar to other object-oriented systems, combines the virtues of four concepts: the functional data model, a property inheritance hierarchy, abstract data types and message-driven computation. The main emphasis is on the last of these four concepts. We describe generic procedures that permit queries to be processed in a purely message-driven manner. A database is represented as a network of nodes and directed arcs, in which each node is a logical processing element, capable of communicating with other nodes by exchanging messages. This eliminates the need for shared memory and for centralized control during query processing. Hence, the model is suitable for implementation on a multiprocessor computer architecture, consisting of large numbers of loosely coupled processing elements.
Multiprocessor programming environment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Smith, M.B.; Fornaro, R.
Programming tools and techniques have been well developed for traditional uniprocessor computer systems. The focus of this research project is on the development of a programming environment for a high speed real time heterogeneous multiprocessor system, with special emphasis on languages and compilers. The new tools and techniques will allow a smooth transition for programmers with experience only on single processor systems.
The Minerva Multi-Microprocessor.
A multiprocessor system is described which is an experiment in low cost, extensible, multiprocessor architectures. Global issues such as inclusion of a central bus, design of the bus arbiter, and methods of interrupt handling are considered. The system initially includes two processor types, based on microprocessors, and these are discussed. Methods for reducing processor demand for the central bus are described.
FTMP - A highly reliable Fault-Tolerant Multiprocessor for aircraft
NASA Technical Reports Server (NTRS)
Hopkins, A. L., Jr.; Smith, T. B., III; Lala, J. H.
1978-01-01
The FTMP (Fault-Tolerant Multiprocessor) is a complex multiprocessor computer that employs a form of redundancy related to systems considered by Mathur (1971), in which each major module can substitute for any other module of the same type. Despite the conceptual simplicity of the redundancy form, the implementation has many intricacies owing partly to the low target failure rate, and partly to the difficulty of eliminating single-fault vulnerability. An extensive analysis of the computer through the use of such modeling techniques as Markov processes and combinatorial mathematics shows that for random hard faults the computer can meet its requirements. It is also shown that the maintenance scheduled at intervals of 200 hr or more can be adequate most of the time.
Benchmarking GNU Radio Kernels and Multi-Processor Scheduling
2013-01-14
AMD E350 APU , comparable to Atom • ARM Cortex A8 running on a Gumstix Overo on an Ettus USRP E110 The general testing procedure consists of • Build...Intel Atom, and the AMD E350 APU . 3.2 Multi-Processor Scheduling Figure 1: GFLOPs per second through an FFT array on an Intel i7. Example output from
A Real-Time Linux for Multicore Platforms
2013-12-20
under ARO support) to obtain a fully-functional OS for supporting real-time workloads on multicore platforms. This system, called LITMUS -RT...to be specified as plugin components. LITMUS -RT is open-source software (available at The views, opinions and/or findings contained in this report... LITMUS -RT (LInux Testbed for MUltiprocessor Scheduling in Real-Time systems), allows different multiprocessor real-time scheduling and
Considerations for Multiprocessor Topologies
NASA Technical Reports Server (NTRS)
Byrd, Gregory T.; Delagi, Bruce A.
1987-01-01
Choosing a multiprocessor interconnection topology may depend on high-level considerations, such as the intended application domain and the expected number of processors. It certainly depends on low-level implementation details, such as packaging and communications protocols. The authors first use rough measures of cost and performance to characterize several topologies. They then examine how implementation details can affect the realizable performance of a topology.
Dataflow computing approach in high-speed digital simulation
NASA Technical Reports Server (NTRS)
Ercegovac, M. D.; Karplus, W. J.
1984-01-01
New computational tools and methodologies for the digital simulation of continuous systems were explored. Programmability, and cost effective performance in multiprocessor organizations for real time simulation was investigated. Approach is based on functional style languages and data flow computing principles, which allow for the natural representation of parallelism in algorithms and provides a suitable basis for the design of cost effective high performance distributed systems. The objectives of this research are to: (1) perform comparative evaluation of several existing data flow languages and develop an experimental data flow language suitable for real time simulation using multiprocessor systems; (2) investigate the main issues that arise in the architecture and organization of data flow multiprocessors for real time simulation; and (3) develop and apply performance evaluation models in typical applications.
Reproducibility in a multiprocessor system
Bellofatto, Ralph A; Chen, Dong; Coteus, Paul W; Eisley, Noel A; Gara, Alan; Gooding, Thomas M; Haring, Rudolf A; Heidelberger, Philip; Kopcsay, Gerard V; Liebsch, Thomas A; Ohmacht, Martin; Reed, Don D; Senger, Robert M; Steinmacher-Burow, Burkhard; Sugawara, Yutaka
2013-11-26
Fixing a problem is usually greatly aided if the problem is reproducible. To ensure reproducibility of a multiprocessor system, the following aspects are proposed; a deterministic system start state, a single system clock, phase alignment of clocks in the system, system-wide synchronization events, reproducible execution of system components, deterministic chip interfaces, zero-impact communication with the system, precise stop of the system and a scan of the system state.
Bibliography On Multiprocessors And Distributed Processing
NASA Technical Reports Server (NTRS)
Miya, Eugene N.
1988-01-01
Multiprocessor and Distributed Processing Bibliography package consists of large machine-readable bibliographic data base, which in addition to usual keyword searches, used for producing citations, indexes, and cross-references. Data base contains UNIX(R) "refer" -formatted ASCII data and implemented on any computer running under UNIX(R) operating system. Easily convertible to other operating systems. Requires approximately one megabyte of secondary storage. Bibliography compiled in 1985.
Analysis of Photonic Networks for a Chip Multiprocessor Using Scientific Applications
2009-05-01
Analysis of Photonic Networks for a Chip Multiprocessor Using Scientific Applications Gilbert Hendry†, Shoaib Kamil‡?, Aleksandr Biberman†, Johnnie...electronic networks -on-chip warrants investigating real application traces on functionally compa- rable photonic and electronic network designs. We... network can achieve 75× improvement in energy ef- ficiency for synthetic benchmarks and up to 37× improve- ment for real scientific applications
Method for wiring allocation and switch configuration in a multiprocessor environment
Aridor, Yariv [Zichron Ya'akov, IL; Domany, Tamar [Kiryat Tivon, IL; Frachtenberg, Eitan [Jerusalem, IL; Gal, Yoav [Haifa, IL; Shmueli, Edi [Haifa, IL; Stockmeyer, legal representative, Robert E.; Stockmeyer, Larry Joseph [San Jose, CA
2008-07-15
A method for wiring allocation and switch configuration in a multiprocessor computer, the method including employing depth-first tree traversal to determine a plurality of paths among a plurality of processing elements allocated to a job along a plurality of switches and wires in a plurality of D-lines, and selecting one of the paths in accordance with at least one selection criterion.
Generic Software for Emulating Multiprocessor Architectures.
1985-05-01
RD-A157 662 GENERIC SOFTWARE FOR EMULATING MULTIPROCESSOR 1/2 AlRCHITECTURES(J) MASSACHUSETTS INST OF TECH CAMBRIDGE U LRS LAB FOR COMPUTER SCIENCE R...AREA & WORK UNIT NUMBERS MIT Laboratory for Computer Science 545 Technology Square Cambridge, MA 02139 ____________ I I. CONTROLLING OFFICE NAME AND...aide If neceeasy end Identify by block number) Computer architecture, emulation, simulation, dataf low 20. ABSTRACT (Continue an reverse slde It
Two-dimensional systolic-array architecture for pixel-level vision tasks
NASA Astrophysics Data System (ADS)
Vijverberg, Julien A.; de With, Peter H. N.
2010-05-01
This paper presents ongoing work on the design of a two-dimensional (2D) systolic array for image processing. This component is designed to operate on a multi-processor system-on-chip. In contrast with other 2D systolic-array architectures and many other hardware accelerators, we investigate the applicability of executing multiple tasks in a time-interleaved fashion on the Systolic Array (SA). This leads to a lower external memory bandwidth and better load balancing of the tasks on the different processing tiles. To enable the interleaving of tasks, we add a shadow-state register for fast task switching. To reduce the number of accesses to the external memory, we propose to share the communication assist between consecutive tasks. A preliminary, non-functional version of the SA has been synthesized for an XV4S25 FPGA device and yields a maximum clock frequency of 150 MHz requiring 1,447 slices and 5 memory blocks. Mapping tasks from video content-analysis applications from literature on the SA yields reductions in the execution time of 1-2 orders of magnitude compared to the software implementation. We conclude that the choice for an SA architecture is useful, but a scaled version of the SA featuring less logic with fewer processing and pipeline stages yielding a lower clock frequency, would be sufficient for a video analysis system-on-chip.
Architectures for reasoning in parallel
NASA Technical Reports Server (NTRS)
Hall, Lawrence O.
1989-01-01
The research conducted has dealt with rule-based expert systems. The algorithms that may lead to effective parallelization of them were investigated. Both the forward and backward chained control paradigms were investigated in the course of this work. The best computer architecture for the developed and investigated algorithms has been researched. Two experimental vehicles were developed to facilitate this research. They are Backpac, a parallel backward chained rule-based reasoning system and Datapac, a parallel forward chained rule-based reasoning system. Both systems have been written in Multilisp, a version of Lisp which contains the parallel construct, future. Applying the future function to a function causes the function to become a task parallel to the spawning task. Additionally, Backpac and Datapac have been run on several disparate parallel processors. The machines are an Encore Multimax with 10 processors, the Concert Multiprocessor with 64 processors, and a 32 processor BBN GP1000. Both the Concert and the GP1000 are switch-based machines. The Multimax has all its processors hung off a common bus. All are shared memory machines, but have different schemes for sharing the memory and different locales for the shared memory. The main results of the investigations come from experiments on the 10 processor Encore and the Concert with partitions of 32 or less processors. Additionally, experiments have been run with a stripped down version of EMYCIN.
Recent Improvements in Aerodynamic Design Optimization on Unstructured Meshes
NASA Technical Reports Server (NTRS)
Nielsen, Eric J.; Anderson, W. Kyle
2000-01-01
Recent improvements in an unstructured-grid method for large-scale aerodynamic design are presented. Previous work had shown such computations to be prohibitively long in a sequential processing environment. Also, robust adjoint solutions and mesh movement procedures were difficult to realize, particularly for viscous flows. To overcome these limiting factors, a set of design codes based on a discrete adjoint method is extended to a multiprocessor environment using a shared memory approach. A nearly linear speedup is demonstrated, and the consistency of the linearizations is shown to remain valid. The full linearization of the residual is used to precondition the adjoint system, and a significantly improved convergence rate is obtained. A new mesh movement algorithm is implemented and several advantages over an existing technique are presented. Several design cases are shown for turbulent flows in two and three dimensions.
Fast neural net simulation with a DSP processor array.
Muller, U A; Gunzinger, A; Guggenbuhl, W
1995-01-01
This paper describes the implementation of a fast neural net simulator on a novel parallel distributed-memory computer. A 60-processor system, named MUSIC (multiprocessor system with intelligent communication), is operational and runs the backpropagation algorithm at a speed of 330 million connection updates per second (continuous weight update) using 32-b floating-point precision. This is equal to 1.4 Gflops sustained performance. The complete system with 3.8 Gflops peak performance consumes less than 800 W of electrical power and fits into a 19-in rack. While reaching the speed of modern supercomputers, MUSIC still can be used as a personal desktop computer at a researcher's own disposal. In neural net simulation, this gives a computing performance to a single user which was unthinkable before. The system's real-time interfaces make it especially useful for embedded applications.
Early MIMD experience on the CRAY X-MP
NASA Astrophysics Data System (ADS)
Rhoades, Clifford E.; Stevens, K. G.
1985-07-01
This paper describes some early experience with converting four physics simulation programs to the CRAY X-MP, a current Multiple Instruction, Multiple Data (MIMD) computer consisting of two processors each with an architecture similar to that of the CRAY-1. As a multi-processor, the CRAY X-MP together with the high speed Solid-state Storage Device (SSD) in an ideal machine upon which to study MIMD algorithms for solving the equations of mathematical physics because it is fast enough to run real problems. The computer programs used in this study are all FORTRAN versions of original production codes. They range in sophistication from a one-dimensional numerical simulation of collisionless plasma to a two-dimensional hydrodynamics code with heat flow to a couple of three-dimensional fluid dynamics codes with varying degrees of viscous modeling. Early research with a dual processor configuration has shown speed-ups ranging from 1.55 to 1.98. It has been observed that a few simple extensions to FORTRAN allow a typical programmer to achieve a remarkable level of efficiency. These extensions involve the concept of memory local to a concurrent subprogram and memory common to all concurrent subprograms.
NASA Astrophysics Data System (ADS)
Leamy, Michael J.; Springer, Adam C.
In this research we report parallel implementation of a Cellular Automata-based simulation tool for computing elastodynamic response on complex, two-dimensional domains. Elastodynamic simulation using Cellular Automata (CA) has recently been presented as an alternative, inherently object-oriented technique for accurately and efficiently computing linear and nonlinear wave propagation in arbitrarily-shaped geometries. The local, autonomous nature of the method should lead to straight-forward and efficient parallelization. We address this notion on symmetric multiprocessor (SMP) hardware using a Java-based object-oriented CA code implementing triangular state machines (i.e., automata) and the MPI bindings written in Java (MPJ Express). We use MPJ Express to reconfigure our existing CA code to distribute a domain's automata to cores present on a dual quad-core shared-memory system (eight total processors). We note that this message passing parallelization strategy is directly applicable to computer clustered computing, which will be the focus of follow-on research. Results on the shared memory platform indicate nearly-ideal, linear speed-up. We conclude that the CA-based elastodynamic simulator is easily configured to run in parallel, and yields excellent speed-up on SMP hardware.
Optical interconnection networks for high-performance computing systems
NASA Astrophysics Data System (ADS)
Biberman, Aleksandr; Bergman, Keren
2012-04-01
Enabled by silicon photonic technology, optical interconnection networks have the potential to be a key disruptive technology in computing and communication industries. The enduring pursuit of performance gains in computing, combined with stringent power constraints, has fostered the ever-growing computational parallelism associated with chip multiprocessors, memory systems, high-performance computing systems and data centers. Sustaining these parallelism growths introduces unique challenges for on- and off-chip communications, shifting the focus toward novel and fundamentally different communication approaches. Chip-scale photonic interconnection networks, enabled by high-performance silicon photonic devices, offer unprecedented bandwidth scalability with reduced power consumption. We demonstrate that the silicon photonic platforms have already produced all the high-performance photonic devices required to realize these types of networks. Through extensive empirical characterization in much of our work, we demonstrate such feasibility of waveguides, modulators, switches and photodetectors. We also demonstrate systems that simultaneously combine many functionalities to achieve more complex building blocks. We propose novel silicon photonic devices, subsystems, network topologies and architectures to enable unprecedented performance of these photonic interconnection networks. Furthermore, the advantages of photonic interconnection networks extend far beyond the chip, offering advanced communication environments for memory systems, high-performance computing systems, and data centers.
Operating system for a real-time multiprocessor propulsion system simulator
NASA Technical Reports Server (NTRS)
Cole, G. L.
1984-01-01
The success of the Real Time Multiprocessor Operating System (RTMPOS) in the development and evaluation of experimental hardware and software systems for real time interactive simulation of air breathing propulsion systems was evaluated. The Real Time Multiprocessor Operating System (RTMPOS) provides the user with a versatile, interactive means for loading, running, debugging and obtaining results from a multiprocessor based simulator. A front end processor (FEP) serves as the simulator controller and interface between the user and the simulator. These functions are facilitated by the RTMPOS which resides on the FEP. The RTMPOS acts in conjunction with the FEP's manufacturer supplied disk operating system that provides typical utilities like an assembler, linkage editor, text editor, file handling services, etc. Once a simulation is formulated, the RTMPOS provides for engineering level, run time operations such as loading, modifying and specifying computation flow of programs, simulator mode control, data handling and run time monitoring. Run time monitoring is a powerful feature of RTMPOS that allows the user to record all actions taken during a simulation session and to receive advisories from the simulator via the FEP. The RTMPOS is programmed mainly in PASCAL along with some assembly language routines. The RTMPOS software is easily modified to be applicable to hardware from different manufacturers.
Reducing Response Time Bounds for DAG-Based Task Systems on Heterogeneous Multicore Platforms
2016-01-01
synchronous parallel tasks on multicore platforms. In 25th ECRTS, 2013. [10] U. Devi. Soft Real - Time Scheduling on Multiprocessors. PhD thesis...report, Washington University in St Louis, 2014. [18] C. Liu and J. Anderson. Supporting soft real - time DAG-based sys- tems on multiprocessors with...analysis for DAG-based real - time task systems im- plemented on heterogeneous multicore platforms. The spe- cific analysis problem that is considered was
Graphics applications utilizing parallel processing
NASA Technical Reports Server (NTRS)
Rice, John R.
1990-01-01
The results are presented of research conducted to develop a parallel graphic application algorithm to depict the numerical solution of the 1-D wave equation, the vibrating string. The research was conducted on a Flexible Flex/32 multiprocessor and a Sequent Balance 21000 multiprocessor. The wave equation is implemented using the finite difference method. The synchronization issues that arose from the parallel implementation and the strategies used to alleviate the effects of the synchronization overhead are discussed.
Dynamic file-access characteristics of a production parallel scientific workload
NASA Technical Reports Server (NTRS)
Kotz, David; Nieuwejaar, Nils
1994-01-01
Multiprocessors have permitted astounding increases in computational performance, but many cannot meet the intense I/O requirements of some scientific applications. An important component of any solution to this I/O bottleneck is a parallel file system that can provide high-bandwidth access to tremendous amounts of data in parallel to hundreds or thousands of processors. Most successful systems are based on a solid understanding of the expected workload, but thus far there have been no comprehensive workload characterizations of multiprocessor file systems. This paper presents the results of a three week tracing study in which all file-related activity on a massively parallel computer was recorded. Our instrumentation differs from previous efforts in that it collects information about every I/O request and about the mix of jobs running in a production environment. We also present the results of a trace-driven caching simulation and recommendations for designers of multiprocessor file systems.
Safe and Efficient Support for Embeded Multi-Processors in ADA
NASA Astrophysics Data System (ADS)
Ruiz, Jose F.
2010-08-01
New software demands increasing processing power, and multi-processor platforms are spreading as the answer to achieve the required performance. Embedded real-time systems are also subject to this trend, but in the case of real-time mission-critical systems, the properties of reliability, predictability and analyzability are also paramount. The Ada 2005 language defined a subset of its tasking model, the Ravenscar profile, that provides the basis for the implementation of deterministic and time analyzable applications on top of a streamlined run-time system. This Ravenscar tasking profile, originally designed for single processors, has proven remarkably useful for modelling verifiable real-time single-processor systems. This paper proposes a simple extension to the Ravenscar profile to support multi-processor systems using a fully partitioned approach. The implementation of this scheme is simple, and it can be used to develop applications amenable to schedulability analysis.
A Multiprocessor Operating System Simulator
NASA Technical Reports Server (NTRS)
Johnston, Gary M.; Campbell, Roy H.
1988-01-01
This paper describes a multiprocessor operating system simulator that was developed by the authors in the Fall semester of 1987. The simulator was built in response to the need to provide students with an environment in which to build and test operating system concepts as part of the coursework of a third-year undergraduate operating systems course. Written in C++, the simulator uses the co-routine style task package that is distributed with the AT&T C++ Translator to provide a hierarchy of classes that represents a broad range of operating system software and hardware components. The class hierarchy closely follows that of the 'Choices' family of operating systems for loosely- and tightly-coupled multiprocessors. During an operating system course, these classes are refined and specialized by students in homework assignments to facilitate experimentation with different aspects of operating system design and policy decisions. The current implementation runs on the IBM RT PC under 4.3bsd UNIX.
Study of Thread Level Parallelism in a Video Encoding Application for Chip Multiprocessor Design
NASA Astrophysics Data System (ADS)
Debes, Eric; Kaine, Greg
2002-11-01
In media applications there is a high level of available thread level parallelism (TLP). In this paper we study the intra TLP in a video encoder. We show that a well-distributed highly optimized encoder running on a symmetric multiprocessor (SMP) system can run 3.2 faster on a 4-way SMP machine than on a single processor. The multithreaded encoder running on an SMP system is then used to understand the requirements of a chip multiprocessor (CMP) architecture, which is one possible architectural direction to better exploit TLP. In the framework of this study, we use a software approach to evaluate the dataflow between processors for the video encoder running on an SMP system. An estimation of the dataflow is done with L2 cache miss event counters using Intel® VTuneTM performance analyzer. The experimental measurements are compared to theoretical results.
NASA Technical Reports Server (NTRS)
Jost, Gabriele; Labarta, Jesus; Gimenez, Judit
2004-01-01
With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors, parallel programming techniques have evolved that support parallelism beyond a single level. When comparing the performance of applications based on different programming paradigms, it is important to differentiate between the influence of the programming model itself and other factors, such as implementation specific behavior of the operating system (OS) or architectural issues. Rewriting-a large scientific application in order to employ a new programming paradigms is usually a time consuming and error prone task. Before embarking on such an endeavor it is important to determine that there is really a gain that would not be possible with the current implementation. A detailed performance analysis is crucial to clarify these issues. The multilevel programming paradigms considered in this study are hybrid MPI/OpenMP, MLP, and nested OpenMP. The hybrid MPI/OpenMP approach is based on using MPI [7] for the coarse grained parallelization and OpenMP [9] for fine grained loop level parallelism. The MPI programming paradigm assumes a private address space for each process. Data is transferred by explicitly exchanging messages via calls to the MPI library. This model was originally designed for distributed memory architectures but is also suitable for shared memory systems. The second paradigm under consideration is MLP which was developed by Taft. The approach is similar to MPi/OpenMP, using a mix of coarse grain process level parallelization and loop level OpenMP parallelization. As it is the case with MPI, a private address space is assumed for each process. The MLP approach was developed for ccNUMA architectures and explicitly takes advantage of the availability of shared memory. A shared memory arena which is accessible by all processes is required. Communication is done by reading from and writing to the shared memory.
DOE Office of Scientific and Technical Information (OSTI.GOV)
McGhee, J.M.; Roberts, R.M.; Morel, J.E.
1997-06-01
A spherical harmonics research code (DANTE) has been developed which is compatible with parallel computer architectures. DANTE provides 3-D, multi-material, deterministic, transport capabilities using an arbitrary finite element mesh. The linearized Boltzmann transport equation is solved in a second order self-adjoint form utilizing a Galerkin finite element spatial differencing scheme. The core solver utilizes a preconditioned conjugate gradient algorithm. Other distinguishing features of the code include options for discrete-ordinates and simplified spherical harmonics angular differencing, an exact Marshak boundary treatment for arbitrarily oriented boundary faces, in-line matrix construction techniques to minimize memory consumption, and an effective diffusion based preconditioner formore » scattering dominated problems. Algorithm efficiency is demonstrated for a massively parallel SIMD architecture (CM-5), and compatibility with MPP multiprocessor platforms or workstation clusters is anticipated.« less
NASA Astrophysics Data System (ADS)
Qiang, Ji
2017-10-01
A three-dimensional (3D) Poisson solver with longitudinal periodic and transverse open boundary conditions can have important applications in beam physics of particle accelerators. In this paper, we present a fast efficient method to solve the Poisson equation using a spectral finite-difference method. This method uses a computational domain that contains the charged particle beam only and has a computational complexity of O(Nu(logNmode)) , where Nu is the total number of unknowns and Nmode is the maximum number of longitudinal or azimuthal modes. This saves both the computational time and the memory usage of using an artificial boundary condition in a large extended computational domain. The new 3D Poisson solver is parallelized using a message passing interface (MPI) on multi-processor computers and shows a reasonable parallel performance up to hundreds of processor cores.
Crosetto, D.B.
1996-12-31
The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor to a plurality of slave processors to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor`s status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer, a digital signal processor, a parallel transfer controller, and two three-port memory devices. A communication switch within each node connects it to a fast parallel hardware channel through which all high density data arrives or leaves the node. 6 figs.
Crosetto, Dario B.
1996-01-01
The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor (100) to a plurality of slave processors (200) to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor's status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer (104), a digital signal processor (114), a parallel transfer controller (106), and two three-port memory devices. A communication switch (108) within each node (100) connects it to a fast parallel hardware channel (70) through which all high density data arrives or leaves the node.
Multiprocessor Real-Time Locking Protocols for Replicated Resources
2016-07-01
circular buffer of slots, each representing a discrete segment of time . For example, if the maintenance of a timing wheel occurs af- ter an interrupt ...Experimental Evaluation To evaluate Algs. 2, 3, and 4, we conducted a series of ex- periments in which we measured relevant overheads and blocking times . We...Multiprocessor Real- Time Locking Protocols for Replicated Resources ∗ Catherine E. Jarrett1, Kecheng Yang1, Ming Yang1, Pontus Ekberg2, and James H
Cedar-a large scale multiprocessor
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gajski, D.; Kuck, D.; Lawrie, D.
1983-01-01
This paper presents an overview of Cedar, a large scale multiprocessor being designed at the University of Illinois. This machine is designed to accommodate several thousand high performance processors which are capable of working together on a single job, or they can be partitioned into groups of processors where each group of one or more processors can work on separate jobs. Various aspects of the machine are described including the control methodology, communication network, optimizing compiler and plans for construction. 13 references.
Thimmaiah, Tim; Voje, William E; Carothers, James M
2015-01-01
With progress toward inexpensive, large-scale DNA assembly, the demand for simulation tools that allow the rapid construction of synthetic biological devices with predictable behaviors continues to increase. By combining engineered transcript components, such as ribosome binding sites, transcriptional terminators, ligand-binding aptamers, catalytic ribozymes, and aptamer-controlled ribozymes (aptazymes), gene expression in bacteria can be fine-tuned, with many corollaries and applications in yeast and mammalian cells. The successful design of genetic constructs that implement these kinds of RNA-based control mechanisms requires modeling and analyzing kinetically determined co-transcriptional folding pathways. Transcript design methods using stochastic kinetic folding simulations to search spacer sequence libraries for motifs enabling the assembly of RNA component parts into static ribozyme- and dynamic aptazyme-regulated expression devices with quantitatively predictable functions (rREDs and aREDs, respectively) have been described (Carothers et al., Science 334:1716-1719, 2011). Here, we provide a detailed practical procedure for computational transcript design by illustrating a high throughput, multiprocessor approach for evaluating spacer sequences and generating functional rREDs. This chapter is written as a tutorial, complete with pseudo-code and step-by-step instructions for setting up a computational cluster with an Amazon, Inc. web server and performing the large numbers of kinefold-based stochastic kinetic co-transcriptional folding simulations needed to design functional rREDs and aREDs. The method described here should be broadly applicable for designing and analyzing a variety of synthetic RNA parts, devices and transcripts.
An Adaptive Flow Solver for Air-Borne Vehicles Undergoing Time-Dependent Motions/Deformations
NASA Technical Reports Server (NTRS)
Singh, Jatinder; Taylor, Stephen
1997-01-01
This report describes a concurrent Euler flow solver for flows around complex 3-D bodies. The solver is based on a cell-centered finite volume methodology on 3-D unstructured tetrahedral grids. In this algorithm, spatial discretization for the inviscid convective term is accomplished using an upwind scheme. A localized reconstruction is done for flow variables which is second order accurate. Evolution in time is accomplished using an explicit three-stage Runge-Kutta method which has second order temporal accuracy. This is adapted for concurrent execution using another proven methodology based on concurrent graph abstraction. This solver operates on heterogeneous network architectures. These architectures may include a broad variety of UNIX workstations and PCs running Windows NT, symmetric multiprocessors and distributed-memory multi-computers. The unstructured grid is generated using commercial grid generation tools. The grid is automatically partitioned using a concurrent algorithm based on heat diffusion. This results in memory requirements that are inversely proportional to the number of processors. The solver uses automatic granularity control and resource management techniques both to balance load and communication requirements, and deal with differing memory constraints. These ideas are again based on heat diffusion. Results are subsequently combined for visualization and analysis using commercial CFD tools. Flow simulation results are demonstrated for a constant section wing at subsonic, transonic, and a supersonic case. These results are compared with experimental data and numerical results of other researchers. Performance results are under way for a variety of network topologies.
Backend Control Processor for a Multi-Processor Relational Database Computer System.
1984-12-01
SCHOOL OF ENGI. UNCRSIFID MPONTIFF DEC 84 AFXT/GCS/ENG/84D-22 F/O 9/2 L ommhhhhmhhml mhhhommhhhhhm i-2 8 -- U0. 11111= Q. 2 111.8IIII- 1111111..6...THESIS Presented to the Faculty of the School of Engineering of the Air Force Institute of Technology Air University In Partial Fulfillment of the...development of a Backend Multi-Processor Relational Database Computer System. This thesis addresses a single component of this system, the Backend Control
Visual monitoring of autonomous life sciences experimentation
NASA Technical Reports Server (NTRS)
Blank, G. E.; Martin, W. N.
1987-01-01
The design and implementation of a computerized visual monitoring system to aid in the monitoring and control of life sciences experiments on board a space station was investigated. A likely multiprocessor design was chosen, a plausible life science experiment with which to work was defined, the theoretical issues involved in the programming of a visual monitoring system for the experiment was considered on the multiprocessor, a system for monitoring the experiment was designed, and simulations of such a system was implemented on a network of Apollo workstations.
A CAMAC-VME-Macintosh data acquisition system for nuclear experiments
NASA Astrophysics Data System (ADS)
Anzalone, A.; Giustolisi, F.
1989-10-01
A multiprocessor system for data acquisition and analysis in low-energy nuclear physics has been realized. The system is built around CAMAC, the VMEbus, and the Macintosh PC. Multiprocessor software has been developed, using RTF, MACsys, and CERN cross-software. The execution of several programs that run on several VME CPUs and on an external PC is coordinated by a mailbox protocol. No operating system is used on the VME CPUs. The hardware, software, and system performance are described.
Performance Evaluation of Parallel Algorithms and Architectures in Concurrent Multiprocessor Systems
1988-09-01
HEP and Other Parallel processors, Report no. ANL-83-97, Argonne National Laboratory, Argonne, Ill. 1983. [19] Davidson, G . S. A Practical Paradigm for...IEEE Comp. Soc., 1986. [241 Peir, Jih-kwon, and D. Gajski , "CAMP: A Programming Aide For Multiprocessors," Proc. 1986 ICPP, IEEE Comp. Soc., pp475...482. [251 Pfister, G . F., and V. A. Norton, "Hot Spot Contention and Combining in Multistage Interconnection Networks,"IEEE Trans. Comp., C-34, Oct
NASA Technical Reports Server (NTRS)
Murman, E. M. (Editor); Abarbanel, S. S. (Editor)
1985-01-01
Current developments and future trends in the application of supercomputers to computational fluid dynamics are discussed in reviews and reports. Topics examined include algorithm development for personal-size supercomputers, a multiblock three-dimensional Euler code for out-of-core and multiprocessor calculations, simulation of compressible inviscid and viscous flow, high-resolution solutions of the Euler equations for vortex flows, algorithms for the Navier-Stokes equations, and viscous-flow simulation by FEM and related techniques. Consideration is given to marching iterative methods for the parabolized and thin-layer Navier-Stokes equations, multigrid solutions to quasi-elliptic schemes, secondary instability of free shear flows, simulation of turbulent flow, and problems connected with weather prediction.
NASA Technical Reports Server (NTRS)
Padilla, Peter A.
1991-01-01
An investigation was made in AIRLAB of the fault handling performance of the Fault Tolerant MultiProcessor (FTMP). Fault handling errors detected during fault injection experiments were characterized. In these fault injection experiments, the FTMP disabled a working unit instead of the faulted unit once in every 500 faults, on the average. System design weaknesses allow active faults to exercise a part of the fault management software that handles Byzantine or lying faults. Byzantine faults behave such that the faulted unit points to a working unit as the source of errors. The design's problems involve: (1) the design and interface between the simplex error detection hardware and the error processing software, (2) the functional capabilities of the FTMP system bus, and (3) the communication requirements of a multiprocessor architecture. These weak areas in the FTMP's design increase the probability that, for any hardware fault, a good line replacement unit (LRU) is mistakenly disabled by the fault management software.
A multiprocessor operating system simulator
DOE Office of Scientific and Technical Information (OSTI.GOV)
Johnston, G.M.; Campbell, R.H.
1988-01-01
This paper describes a multiprocessor operating system simulator that was developed by the authors in the Fall of 1987. The simulator was built in response to the need to provide students with an environment in which to build and test operating system concepts as part of the coursework of a third-year undergraduate operating systems course. Written in C++, the simulator uses the co-routine style task package that is distributed with the AT and T C++ Translator to provide a hierarchy of classes that represents a broad range of operating system software and hardware components. The class hierarchy closely follows thatmore » of the Choices family of operating systems for loosely and tightly coupled multiprocessors. During an operating system course, these classes are refined and specialized by students in homework assignments to facilitate experimentation with different aspects of operating system design and policy decisions. The current implementation runs on the IBM RT PC under 4.3bsd UNIX.« less
Analysis of a Multiprocessor Guidance Computer. Ph.D. Thesis
NASA Technical Reports Server (NTRS)
Maltach, E. G.
1969-01-01
The design of the next generation of spaceborne digital computers is described. It analyzes a possible multiprocessor computer configuration. For the analysis, a set of representative space computing tasks was abstracted from the Lunar Module Guidance Computer programs as executed during the lunar landing, from the Apollo program. This computer performs at this time about 24 concurrent functions, with iteration rates from 10 times per second to once every two seconds. These jobs were tabulated in a machine-independent form, and statistics of the overall job set were obtained. It was concluded, based on a comparison of simulation and Markov results, that the Markov process analysis is accurate in predicting overall trends and in configuration comparisons, but does not provide useful detailed information in specific situations. Using both types of analysis, it was determined that the job scheduling function is a critical one for efficiency of the multiprocessor. It is recommended that research into the area of automatic job scheduling be performed.
Three-Dimensional Nacelle Aeroacoustics Code With Application to Impedance Education
NASA Technical Reports Server (NTRS)
Watson, Willie R.
2000-01-01
A three-dimensional nacelle acoustics code that accounts for uniform mean flow and variable surface impedance liners is developed. The code is linked to a commercial version of the NASA-developed General Purpose Solver (for solution of linear systems of equations) in order to obtain the capability to study high frequency waves that may require millions of grid points for resolution. Detailed, single-processor statistics for the performance of the solver in rigid and soft-wall ducts are presented. Over the range of frequencies of current interest in nacelle liner research, noise attenuation levels predicted from the code were in excellent agreement with those predicted from mode theory. The equation solver is memory efficient, requiring only a small fraction of the memory available on modern computers. As an application, the code is combined with an optimization algorithm and used to reduce the impedance spectrum of a ceramic liner. The primary problem with using the code to perform optimization studies at frequencies above I1kHz is the excessive CPU time (a major portion of which is matrix assembly). The research recommends that research be directed toward development of a rapid sparse assembler and exploitation of the multiprocessor capability of the solver to further reduce CPU time.
A language comparison for scientific computing on MIMD architectures
NASA Technical Reports Server (NTRS)
Jones, Mark T.; Patrick, Merrell L.; Voigt, Robert G.
1989-01-01
Choleski's method for solving banded symmetric, positive definite systems is implemented on a multiprocessor computer using three FORTRAN based parallel programming languages, the Force, PISCES and Concurrent FORTRAN. The capabilities of the language for expressing parallelism and their user friendliness are discussed, including readability of the code, debugging assistance offered, and expressiveness of the languages. The performance of the different implementations is compared. It is argued that PISCES, using the Force for medium-grained parallelism, is the appropriate choice for programming Choleski's method on the multiprocessor computer, Flex/32.
The MSFC UNIVAC 1108 EXEC 8 simulation model
NASA Technical Reports Server (NTRS)
Williams, T. G.; Richards, F. M.; Weatherbee, J. E.; Paul, L. K.
1972-01-01
A model is presented which simulates the MSFC Univac 1108 multiprocessor system. The hardware/operating system is described to enable a good statistical measurement of the system behavior. The performance of the 1108 is evaluated by performing twenty-four different experiments designed to locate system bottlenecks and also to test the sensitivity of system throughput with respect to perturbation of the various Exec 8 scheduling algorithms. The model is implemented in the general purpose system simulation language and the techniques described can be used to assist in the design, development, and evaluation of multiprocessor systems.
RTEMS Centre - Support and Maintenance Centre to RTEMS Operating System
NASA Astrophysics Data System (ADS)
Silva, H.; Constantino, A.; Freitas, D.; Coutinho, M.; Faustino, S.; Mota, M.; Colaço, P.; Sousa, J.; Dias, L.; Damjanovic, B.; Zulianello, M.; Rufino, J.
2009-05-01
RTEMS CENTRE - Support and Maintenance Centre to RTEMS Operating System is a joint ESA/Portuguese Task Force initiative to develop a support and maintenance centre to the Real-Time Executive for Multiprocessor Systems (RTEMS). This paper gives a high level visibility of the progress, the results obtained and the future work in the RTEMS CENTRE [6] and in the RTEMS Improvement [7] projects. RTEMS CENTRE started officially in November 2006, with the RTEMS 4.6.99.2 version. A full analysis of RTEMS operating system was produced. The architecture was analysed in terms of conceptual, organizational and operational concepts. The original objectives [1] of the centre were primarily to create and maintain technical expertise and competences in this RTOS, to develop a website to provide the European Space Community an entry point for obtaining support (http://rtemscentre.edisoft.pt), to design, develop, maintain and integrate some RTEMS support tools (Timeline Tool, Configuration and Management Tools), to maintain flight libraries and Board Support Packages, to develop a strong relationship with the World RTEMS Community and finally to produce some considerations in ARINC-653, DO-178B and ECSS E-40 standards. RTEMS Improvement is the continuation of the RTEMS CENTRE. Currently the RTEMS, version 4.8.0, is being facilitated for a future qualification. In this work, the validation material is being produced following the Galileo Software Standards Development Assurance Level B [5]. RTEMS is being completely tested, errors analysed, dead and deactivated code removed and tests produced to achieve 100% statement and decision coverage of source code [2]. The SW to exploit the LEON Memory Management Unit (MMU) hardware will be also added. A brief description of the expected implementations will be given.
Fast and Accurate Simulation of the Cray XMT Multithreaded Supercomputer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Villa, Oreste; Tumeo, Antonino; Secchi, Simone
Irregular applications, such as data mining and analysis or graph-based computations, show unpredictable memory/network access patterns and control structures. Highly multithreaded architectures with large processor counts, like the Cray MTA-1, MTA-2 and XMT, appear to address their requirements better than commodity clusters. However, the research on highly multithreaded systems is currently limited by the lack of adequate architectural simulation infrastructures due to issues such as size of the machines, memory footprint, simulation speed, accuracy and customization. At the same time, Shared-memory MultiProcessors (SMPs) with multi-core processors have become an attractive platform to simulate large scale machines. In this paper, wemore » introduce a cycle-level simulator of the highly multithreaded Cray XMT supercomputer. The simulator runs unmodified XMT applications. We discuss how we tackled the challenges posed by its development, detailing the techniques introduced to make the simulation as fast as possible while maintaining a high accuracy. By mapping XMT processors (ThreadStorm with 128 hardware threads) to host computing cores, the simulation speed remains constant as the number of simulated processors increases, up to the number of available host cores. The simulator supports zero-overhead switching among different accuracy levels at run-time and includes a network model that takes into account contention. On a modern 48-core SMP host, our infrastructure simulates a large set of irregular applications 500 to 2000 times slower than real time when compared to a 128-processor XMT, while remaining within 10\\% of accuracy. Emulation is only from 25 to 200 times slower than real time.« less
NASA Astrophysics Data System (ADS)
Georgiev, K.; Zlatev, Z.
2010-11-01
The Danish Eulerian Model (DEM) is an Eulerian model for studying the transport of air pollutants on large scale. Originally, the model was developed at the National Environmental Research Institute of Denmark. The model computational domain covers Europe and some neighbour parts belong to the Atlantic Ocean, Asia and Africa. If DEM model is to be applied by using fine grids, then its discretization leads to a huge computational problem. This implies that such a model as DEM must be run only on high-performance computer architectures. The implementation and tuning of such a complex large-scale model on each different computer is a non-trivial task. Here, some comparison results of running of this model on different kind of vector (CRAY C92A, Fujitsu, etc.), parallel computers with distributed memory (IBM SP, CRAY T3E, Beowulf clusters, Macintosh G4 clusters, etc.), parallel computers with shared memory (SGI Origin, SUN, etc.) and parallel computers with two levels of parallelism (IBM SMP, IBM BlueGene/P, clusters of multiprocessor nodes, etc.) will be presented. The main idea in the parallel version of DEM is domain partitioning approach. Discussions according to the effective use of the cache and hierarchical memories of the modern computers as well as the performance, speed-ups and efficiency achieved will be done. The parallel code of DEM, created by using MPI standard library, appears to be highly portable and shows good efficiency and scalability on different kind of vector and parallel computers. Some important applications of the computer model output are presented in short.
NASA Technical Reports Server (NTRS)
Pordes, Ruth (Editor)
1989-01-01
Papers on real-time computer applications in nuclear, particle, and plasma physics are presented, covering topics such as expert systems tactics in testing FASTBUS segment interconnect modules, trigger control in a high energy physcis experiment, the FASTBUS read-out system for the Aleph time projection chamber, a multiprocessor data acquisition systems, DAQ software architecture for Aleph, a VME multiprocessor system for plasma control at the JT-60 upgrade, and a multiasking, multisinked, multiprocessor data acquisition front end. Other topics include real-time data reduction using a microVAX processor, a transputer based coprocessor for VEDAS, simulation of a macropipelined multi-CPU event processor for use in FASTBUS, a distributed VME control system for the LISA superconducting Linac, a distributed system for laboratory process automation, and a distributed system for laboratory process automation. Additional topics include a structure macro assembler for the event handler, a data acquisition and control system for Thomson scattering on ATF, remote procedure execution software for distributed systems, and a PC-based graphic display real-time particle beam uniformity.
A multiprocessor airborne lidar data system
NASA Technical Reports Server (NTRS)
Wright, C. W.; Bailey, S. A.; Heath, G. E.; Piazza, C. R.
1988-01-01
A new multiprocessor data acquisition system was developed for the existing Airborne Oceanographic Lidar (AOL). This implementation simultaneously utilizes five single board 68010 microcomputers, the UNIX system V operating system, and the real time executive VRTX. The original data acquisition system was implemented on a Hewlett Packard HP 21-MX 16 bit minicomputer using a multi-tasking real time operating system and a mixture of assembly and FORTRAN languages. The present collection of data sources produce data at widely varied rates and require varied amounts of burdensome real time processing and formatting. It was decided to replace the aging HP 21-MX minicomputer with a multiprocessor system. A new and flexible recording format was devised and implemented to accommodate the constantly changing sensor configuration. A central feature of this data system is the minimization of non-remote sensing bus traffic. Therefore, it is highly desirable that each micro be capable of functioning as much as possible on-card or via private peripherals. The bus is used primarily for the transfer of remote sensing data to or from the buffer queue.
Experience with a Genetic Algorithm Implemented on a Multiprocessor Computer
NASA Technical Reports Server (NTRS)
Plassman, Gerald E.; Sobieszczanski-Sobieski, Jaroslaw
2000-01-01
Numerical experiments were conducted to find out the extent to which a Genetic Algorithm (GA) may benefit from a multiprocessor implementation, considering, on one hand, that analyses of individual designs in a population are independent of each other so that they may be executed concurrently on separate processors, and, on the other hand, that there are some operations in a GA that cannot be so distributed. The algorithm experimented with was based on a gaussian distribution rather than bit exchange in the GA reproductive mechanism, and the test case was a hub frame structure of up to 1080 design variables. The experimentation engaging up to 128 processors confirmed expectations of radical elapsed time reductions comparing to a conventional single processor implementation. It also demonstrated that the time spent in the non-distributable parts of the algorithm and the attendant cross-processor communication may have a very detrimental effect on the efficient utilization of the multiprocessor machine and on the number of processors that can be used effectively in a concurrent manner. Three techniques were devised and tested to mitigate that effect, resulting in efficiency increasing to exceed 99 percent.
Multitasking kernel for the C and Fortran programming languages
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brooks, E.D. III
1984-09-01
A multitasking kernel for the C and Fortran programming languages which runs on the Unix operating system is presented. The kernel provides a multitasking environment which serves two purposes. The first is to provide an efficient portable environment for the coding, debugging and execution of production multiprocessor programs. The second is to provide a means of evaluating the performance of a multitasking program on model multiprocessors. The performance evaluation features require no changes in the source code of the application and are implemented as a set of compile and run time options in the kernel.
The Simulation of Read-time Scalable Coherent Interface
NASA Technical Reports Server (NTRS)
Li, Qiang; Grant, Terry; Grover, Radhika S.
1997-01-01
Scalable Coherent Interface (SCI, IEEE/ANSI Std 1596-1992) (SCI1, SCI2) is a high performance interconnect for shared memory multiprocessor systems. In this project we investigate an SCI Real Time Protocols (RTSCI1) using Directed Flow Control Symbols. We studied the issues of efficient generation of control symbols, and created a simulation model of the protocol on a ring-based SCI system. This report presents the results of the study. The project has been implemented using SES/Workbench. The details that follow encompass aspects of both SCI and Flow Control Protocols, as well as the effect of realistic client/server processing delay. The report is organized as follows. Section 2 provides a description of the simulation model. Section 3 describes the protocol implementation details. The next three sections of the report elaborate on the workload, results and conclusions. Appended to the report is a description of the tool, SES/Workbench, used in our simulation, and internal details of our implementation of the protocol.
Parallel Discrete Molecular Dynamics Simulation With Speculation and In-Order Commitment*†
Khan, Md. Ashfaquzzaman; Herbordt, Martin C.
2011-01-01
Discrete molecular dynamics simulation (DMD) uses simplified and discretized models enabling simulations to advance by event rather than by timestep. DMD is an instance of discrete event simulation and so is difficult to scale: even in this multi-core era, all reported DMD codes are serial. In this paper we discuss the inherent difficulties of scaling DMD and present our method of parallelizing DMD through event-based decomposition. Our method is microarchitecture inspired: speculative processing of events exposes parallelism, while in-order commitment ensures correctness. We analyze the potential of this parallelization method for shared-memory multiprocessors. Achieving scalability required extensive experimentation with scheduling and synchronization methods to mitigate serialization. The speed-up achieved for a variety of system sizes and complexities is nearly 6× on an 8-core and over 9× on a 12-core processor. We present and verify analytical models that account for the achieved performance as a function of available concurrency and architectural limitations. PMID:21822327
Parallel Discrete Molecular Dynamics Simulation With Speculation and In-Order Commitment.
Khan, Md Ashfaquzzaman; Herbordt, Martin C
2011-07-20
Discrete molecular dynamics simulation (DMD) uses simplified and discretized models enabling simulations to advance by event rather than by timestep. DMD is an instance of discrete event simulation and so is difficult to scale: even in this multi-core era, all reported DMD codes are serial. In this paper we discuss the inherent difficulties of scaling DMD and present our method of parallelizing DMD through event-based decomposition. Our method is microarchitecture inspired: speculative processing of events exposes parallelism, while in-order commitment ensures correctness. We analyze the potential of this parallelization method for shared-memory multiprocessors. Achieving scalability required extensive experimentation with scheduling and synchronization methods to mitigate serialization. The speed-up achieved for a variety of system sizes and complexities is nearly 6× on an 8-core and over 9× on a 12-core processor. We present and verify analytical models that account for the achieved performance as a function of available concurrency and architectural limitations.
An efficient parallel-processing method for transposing large matrices in place.
Portnoff, M R
1999-01-01
We have developed an efficient algorithm for transposing large matrices in place. The algorithm is efficient because data are accessed either sequentially in blocks or randomly within blocks small enough to fit in cache, and because the same indexing calculations are shared among identical procedures operating on independent subsets of the data. This inherent parallelism makes the method well suited for a multiprocessor computing environment. The algorithm is easy to implement because the same two procedures are applied to the data in various groupings to carry out the complete transpose operation. Using only a single processor, we have demonstrated nearly an order of magnitude increase in speed over the previously published algorithm by Gate and Twigg for transposing a large rectangular matrix in place. With multiple processors operating in parallel, the processing speed increases almost linearly with the number of processors. A simplified version of the algorithm for square matrices is presented as well as an extension for matrices large enough to require virtual memory.
The cost of conservative synchronization in parallel discrete event simulations
NASA Technical Reports Server (NTRS)
Nicol, David M.
1990-01-01
The performance of a synchronous conservative parallel discrete-event simulation protocol is analyzed. The class of simulation models considered is oriented around a physical domain and possesses a limited ability to predict future behavior. A stochastic model is used to show that as the volume of simulation activity in the model increases relative to a fixed architecture, the complexity of the average per-event overhead due to synchronization, event list manipulation, lookahead calculations, and processor idle time approach the complexity of the average per-event overhead of a serial simulation. The method is therefore within a constant factor of optimal. The analysis demonstrates that on large problems--those for which parallel processing is ideally suited--there is often enough parallel workload so that processors are not usually idle. The viability of the method is also demonstrated empirically, showing how good performance is achieved on large problems using a thirty-two node Intel iPSC/2 distributed memory multiprocessor.
Ye, Byoung Seok; Chin, Juhee; Kim, Seong Yoon; Lee, Jung-Sun; Kim, Eun-Joo; Lee, Yunhwan; Hong, Chang Hyung; Choi, Seong Hye; Park, Kyung Won; Ku, Bon D; Moon, So Young; Kim, SangYun; Han, Seol-Hee; Lee, Jae-Hong; Cheong, Hae-Kwan; Park, Sun Ah; Jeong, Jee Hyang; Na, Duk L; Seo, Sang Won
2015-01-01
We evaluate the longitudinal outcomes of amnestic mild cognitive impairment (aMCI) according to the modality of memory impairment involved. We recruited 788 aMCI patients and followed them up. aMCI patients were categorized into three groups according to the modality of memory impairment: Visual-aMCI, only visual memory impaired; Verbal-aMCI, only verbal memory impaired; and Both-aMCI, both visual and verbal memory impaired. Each aMCI group was further categorized according to the presence or absence of recognition failure. Risk of progression to dementia was compared with pooled logistic regression analyses while controlling for age, gender, education, and interval from baseline. Of the sample, 219 (27.8%) aMCI patients progressed to dementia. Compared to the Visual-aMCI group, Verbal-aMCI (OR = 1.98, 95% CI = 1.19-3.28, p = 0.009) and Both-aMCI (OR = 3.05, 95% CI = 1.97-4.71, p < 0.001) groups exhibited higher risks of progression to dementia. Memory recognition failure was associated with increased risk of progression to dementia only in the Visual-aMCI group, but not in the Verbal-aMCI and Both-aMCI groups. The Visual-aMCI without recognition failure group were subcategorized into aMCI with depression, small vessel disease, or accelerated aging, and these subgroups showed a variety of progression rates. Our findings underlined the importance of heterogeneous longitudinal outcomes of aMCI, especially Visual-aMCI, for designing and interpreting future treatment trials in aMCI.
NASA Astrophysics Data System (ADS)
Kordy, M.; Wannamaker, P.; Maris, V.; Cherkaev, E.; Hill, G.
2016-01-01
Following the creation described in Part I of a deformable edge finite-element simulator for 3-D magnetotelluric (MT) responses using direct solvers, in Part II we develop an algorithm named HexMT for 3-D regularized inversion of MT data including topography. Direct solvers parallelized on large-RAM, symmetric multiprocessor (SMP) workstations are used also for the Gauss-Newton model update. By exploiting the data-space approach, the computational cost of the model update becomes much less in both time and computer memory than the cost of the forward simulation. In order to regularize using the second norm of the gradient, we factor the matrix related to the regularization term and apply its inverse to the Jacobian, which is done using the MKL PARDISO library. For dense matrix multiplication and factorization related to the model update, we use the PLASMA library which shows very good scalability across processor cores. A synthetic test inversion using a simple hill model shows that including topography can be important; in this case depression of the electric field by the hill can cause false conductors at depth or mask the presence of resistive structure. With a simple model of two buried bricks, a uniform spatial weighting for the norm of model smoothing recovered more accurate locations for the tomographic images compared to weightings which were a function of parameter Jacobians. We implement joint inversion for static distortion matrices tested using the Dublin secret model 2, for which we are able to reduce nRMS to ˜1.1 while avoiding oscillatory convergence. Finally we test the code on field data by inverting full impedance and tipper MT responses collected around Mount St Helens in the Cascade volcanic chain. Among several prominent structures, the north-south trending, eruption-controlling shear zone is clearly imaged in the inversion.
Smith, E N; Ghia, E M; DeBoever, C M; Rassenti, L Z; Jepsen, K; Yoon, K-A; Matsui, H; Rozenzhak, S; Alakus, H; Shepard, P J; Dai, Y; Khosroheidari, M; Bina, M; Gunderson, K L; Messer, K; Muthuswamy, L; Hudson, T J; Harismendy, O; Barrett, C L; Jamieson, C H M; Carson, D A; Kipps, T J; Frazer, K A
2015-04-10
We examined genetic and epigenetic changes that occur during disease progression from indolent to aggressive forms of chronic lymphocytic leukemia (CLL) using serial samples from 27 patients. Analysis of DNA mutations grouped the leukemia cases into three categories: evolving (26%), expanding (26%) and static (47%). Thus, approximately three-quarters of the CLL cases had little to no genetic subclonal evolution. However, we identified significant recurrent DNA methylation changes during progression at 4752 CpGs enriched for regions near Polycomb 2 repressive complex (PRC2) targets. Progression-associated CpGs near the PRC2 targets undergo methylation changes in the same direction during disease progression as during normal development from naive to memory B cells. Our study shows that CLL progression does not typically occur via subclonal evolution, but that certain CpG sites undergo recurrent methylation changes. Our results suggest CLL progression may involve developmental processes shared in common with the generation of normal memory B cells.
NASA Technical Reports Server (NTRS)
Kasahara, Hironori; Honda, Hiroki; Narita, Seinosuke
1989-01-01
Parallel processing of real-time dynamic systems simulation on a multiprocessor system named OSCAR is presented. In the simulation of dynamic systems, generally, the same calculation are repeated every time step. However, we cannot apply to Do-all or the Do-across techniques for parallel processing of the simulation since there exist data dependencies from the end of an iteration to the beginning of the next iteration and furthermore data-input and data-output are required every sampling time period. Therefore, parallelism inside the calculation required for a single time step, or a large basic block which consists of arithmetic assignment statements, must be used. In the proposed method, near fine grain tasks, each of which consists of one or more floating point operations, are generated to extract the parallelism from the calculation and assigned to processors by using optimal static scheduling at compile time in order to reduce large run time overhead caused by the use of near fine grain tasks. The practicality of the scheme is demonstrated on OSCAR (Optimally SCheduled Advanced multiprocessoR) which has been developed to extract advantageous features of static scheduling algorithms to the maximum extent.
Scalable Multiprocessor for High-Speed Computing in Space
NASA Technical Reports Server (NTRS)
Lux, James; Lang, Minh; Nishimoto, Kouji; Clark, Douglas; Stosic, Dorothy; Bachmann, Alex; Wilkinson, William; Steffke, Richard
2004-01-01
A report discusses the continuing development of a scalable multiprocessor computing system for hard real-time applications aboard a spacecraft. "Hard realtime applications" signifies applications, like real-time radar signal processing, in which the data to be processed are generated at "hundreds" of pulses per second, each pulse "requiring" millions of arithmetic operations. In these applications, the digital processors must be tightly integrated with analog instrumentation (e.g., radar equipment), and data input/output must be synchronized with analog instrumentation, controlled to within fractions of a microsecond. The scalable multiprocessor is a cluster of identical commercial-off-the-shelf generic DSP (digital-signal-processing) computers plus generic interface circuits, including analog-to-digital converters, all controlled by software. The processors are computers interconnected by high-speed serial links. Performance can be increased by adding hardware modules and correspondingly modifying the software. Work is distributed among the processors in a parallel or pipeline fashion by means of a flexible master/slave control and timing scheme. Each processor operates under its own local clock; synchronization is achieved by broadcasting master time signals to all the processors, which compute offsets between the master clock and their local clocks.
NASA Technical Reports Server (NTRS)
Russo, Vincent; Johnston, Gary; Campbell, Roy
1988-01-01
The programming of the interrupt handling mechanisms, process switching primitives, scheduling mechanism, and synchronization primitives of an operating system for a multiprocessor require both efficient code in order to support the needs of high- performance or real-time applications and careful organization to facilitate maintenance. Although many advantages have been claimed for object-oriented class hierarchical languages and their corresponding design methodologies, the application of these techniques to the design of the primitives within an operating system has not been widely demonstrated. To investigate the role of class hierarchical design in systems programming, the authors have constructed the Choices multiprocessor operating system architecture the C++ programming language. During the implementation, it was found that many operating system design concerns can be represented advantageously using a class hierarchical approach, including: the separation of mechanism and policy; the organization of an operating system into layers, each of which represents an abstract machine; and the notions of process and exception management. In this paper, we discuss an implementation of the low-level primitives of this system and outline the strategy by which we developed our solution.
A multiresolution halftoning algorithm for progressive display
NASA Astrophysics Data System (ADS)
Mukherjee, Mithun; Sharma, Gaurav
2005-01-01
We describe and implement an algorithmic framework for memory efficient, 'on-the-fly' halftoning in a progressive transmission environment. Instead of a conventional approach which repeatedly recalls the continuous tone image from memory and subsequently halftones it for display, the proposed method achieves significant memory efficiency by storing only the halftoned image and updating it in response to additional information received through progressive transmission. Thus the method requires only a single frame-buffer of bits for storage of the displayed binary image and no additional storage is required for the contone data. The additional image data received through progressive transmission is accommodated through in-place updates of the buffer. The method is thus particularly advantageous for high resolution bi-level displays where it can result in significant savings in memory. The proposed framework is implemented using a suitable multi-resolution, multi-level modification of error diffusion that is motivated by the presence of a single binary frame-buffer. Aggregates of individual display bits constitute the multiple output levels at a given resolution. This creates a natural progression of increasing resolution with decreasing bit-depth.
A parallel solver for huge dense linear systems
NASA Astrophysics Data System (ADS)
Badia, J. M.; Movilla, J. L.; Climente, J. I.; Castillo, M.; Marqués, M.; Mayo, R.; Quintana-Ortí, E. S.; Planelles, J.
2011-11-01
HDSS (Huge Dense Linear System Solver) is a Fortran Application Programming Interface (API) to facilitate the parallel solution of very large dense systems to scientists and engineers. The API makes use of parallelism to yield an efficient solution of the systems on a wide range of parallel platforms, from clusters of processors to massively parallel multiprocessors. It exploits out-of-core strategies to leverage the secondary memory in order to solve huge linear systems O(100.000). The API is based on the parallel linear algebra library PLAPACK, and on its Out-Of-Core (OOC) extension POOCLAPACK. Both PLAPACK and POOCLAPACK use the Message Passing Interface (MPI) as the communication layer and BLAS to perform the local matrix operations. The API provides a friendly interface to the users, hiding almost all the technical aspects related to the parallel execution of the code and the use of the secondary memory to solve the systems. In particular, the API can automatically select the best way to store and solve the systems, depending of the dimension of the system, the number of processes and the main memory of the platform. Experimental results on several parallel platforms report high performance, reaching more than 1 TFLOP with 64 cores to solve a system with more than 200 000 equations and more than 10 000 right-hand side vectors. New version program summaryProgram title: Huge Dense System Solver (HDSS) Catalogue identifier: AEHU_v1_1 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEHU_v1_1.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 87 062 No. of bytes in distributed program, including test data, etc.: 1 069 110 Distribution format: tar.gz Programming language: Fortran90, C Computer: Parallel architectures: multiprocessors, computer clusters Operating system: Linux/Unix Has the code been vectorized or parallelized?: Yes, includes MPI primitives. RAM: Tested for up to 190 GB Classification: 6.5 External routines: MPI ( http://www.mpi-forum.org/), BLAS ( http://www.netlib.org/blas/), PLAPACK ( http://www.cs.utexas.edu/~plapack/), POOCLAPACK ( ftp://ftp.cs.utexas.edu/pub/rvdg/PLAPACK/pooclapack.ps) (code for PLAPACK and POOCLAPACK is included in the distribution). Catalogue identifier of previous version: AEHU_v1_0 Journal reference of previous version: Comput. Phys. Comm. 182 (2011) 533 Does the new version supersede the previous version?: Yes Nature of problem: Huge scale dense systems of linear equations, Ax=B, beyond standard LAPACK capabilities. Solution method: The linear systems are solved by means of parallelized routines based on the LU factorization, using efficient secondary storage algorithms when the available main memory is insufficient. Reasons for new version: In many applications we need to guarantee a high accuracy in the solution of very large linear systems and we can do it by using double-precision arithmetic. Summary of revisions: Version 1.1 Can be used to solve linear systems using double-precision arithmetic. New version of the initialization routine. The user can choose the kind of arithmetic and the values of several parameters of the environment. Running time: About 5 hours to solve a system with more than 200 000 equations and more than 10 000 right-hand side vectors using double-precision arithmetic on an eight-node commodity cluster with a total of 64 Intel cores.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wasserman, H.J.
1996-02-01
The second generation of the Digital Equipment Corp. (DEC) DECchip Alpha AXP microprocessor is referred to as the 21164. From the viewpoint of numerically-intensive computing, the primary difference between it and its predecessor, the 21064, is that the 21164 has twice the multiply/add throughput per clock period (CP), a maximum of two floating point operations (FLOPS) per CP vs. one for 21064. The AlphaServer 8400 is a shared-memory multiprocessor server system that can accommodate up to 12 CPUs and up to 14 GB of memory. In this report we will compare single processor performance of the 8400 system with thatmore » of the International Business Machines Corp. (IBM) RISC System/6000 POWER-2 microprocessor running at 66 MHz, the Silicon Graphics, Inc. (SGI) MIPS R8000 microprocessor running at 75 MHz, and the Cray Research, Inc. CRAY J90. The performance comparison is based on a set of Fortran benchmark codes that represent a portion of the Los Alamos National Laboratory supercomputer workload. The advantage of using these codes, is that the codes also span a wide range of computational characteristics, such as vectorizability, problem size, and memory access pattern. The primary disadvantage of using them is that detailed, quantitative analysis of performance behavior of all codes on all machines is difficult. One important addition to the benchmark set appears for the first time in this report. Whereas the older version was written for a vector processor, the newer version is more optimized for microprocessor architectures. Therefore, we have for the first time, an opportunity to measure performance on a single application using implementations that expose the respective strengths of vector and superscalar architecture. All results in this report are from single processors. A subsequent article will explore shared-memory multiprocessing performance of the 8400 system.« less
Characterization of naïve, memory and effector T cells in progressive multiple sclerosis.
Nielsen, Birgitte Romme; Ratzer, Rikke; Börnsen, Lars; von Essen, Marina Rode; Christensen, Jeppe Romme; Sellebjerg, Finn
2017-09-15
We characterized naïve, central memory (CM), effector memory (EM) and terminally differentiated effector memory (TEMRA) CD4 + and CD8 + T cells and their expression of CD49d and CD26 in peripheral blood in patients with multiple sclerosis (MS) and healthy controls. CD26 + CD28 + CD4 + TEMRA T cells were increased in all subtypes of MS, and CD26 + CD28 + CD8 + TEMRA T cells were increased in relapsing-remitting and secondary progressive MS. Conversely, in progressive MS, CD49d + CM T cells were decreased and natalizumab increased the circulating number of all six subsets but reduced the frequency of most subsets expressing CD49d and CD26. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Lohn, Stefan B.; Dong, Xin; Carminati, Federico
2012-12-01
Chip-Multiprocessors are going to support massive parallelism by many additional physical and logical cores. Improving performance can no longer be obtained by increasing clock-frequency because the technical limits are almost reached. Instead, parallel execution must be used to gain performance. Resources like main memory, the cache hierarchy, bandwidth of the memory bus or links between cores and sockets are not going to be improved as fast. Hence, parallelism can only result into performance gains if the memory usage is optimized and the communication between threads is minimized. Besides concurrent programming has become a domain for experts. Implementing multi-threading is error prone and labor-intensive. A full reimplementation of the whole AliRoot source-code is unaffordable. This paper describes the effort to evaluate the adaption of AliRoot to the needs of multi-threading and to provide the capability of parallel processing by using a semi-automatic source-to-source transformation to address the problems as described before and to provide a straight-forward way of parallelization with almost no interference between threads. This makes the approach simple and reduces the required manual changes in the code. In a first step, unconditional thread-safety will be introduced to bring the original sequential and thread unaware source-code into the position of utilizing multi-threading. Afterwards further investigations have to be performed to point out candidates of classes that are useful to share amongst threads. Then in a second step, the transformation has to change the code to share these classes and finally to verify if there are anymore invalid interferences between threads.
Tan, Rachel H; Wong, Stephanie; Kril, Jillian J; Piguet, Olivier; Hornberger, Michael; Hodges, John R; Halliday, Glenda M
2014-07-01
Despite accruing evidence for relative preservation of episodic memory in the semantic variant of primary progressive aphasia (previously semantic dementia), the neural basis for this remains unclear, particularly in light of their well-established hippocampal involvement. We recently investigated the Papez network of memory structures across pathological subtypes of behavioural variant frontotemporal dementia and demonstrated severe degeneration of all relay nodes, with the anterior thalamus in particular emerging as crucial for intact episodic memory. The present study investigated the status of key components of Papez circuit (hippocampus, mammillary bodies, anterior thalamus, cingulate cortex) and anterior temporal cortex using volumetric and quantitative cell counting methods in pathologically-confirmed cases with semantic variant of primary progressive aphasia (n = 8; 61-83 years; three males), behavioural variant frontotemporal dementia with TDP pathology (n = 9; 53-82 years; six males) and healthy controls (n = 8, 50-86 years; four males). Behavioural variant frontotemporal dementia cases with TDP pathology were selected because of the association between the semantic variant of primary progressive aphasia and TDP pathology. Our findings revealed that the semantic variant of primary progressive aphasia and behavioural variant frontotemporal dementia show similar degrees of anterior thalamic atrophy. The mammillary bodies and hippocampal body and tail were preserved in the semantic variant of primary progressive aphasia but were significantly atrophic in behavioural variant frontotemporal dementia. Importantly, atrophy in the anterior thalamus and mild progressive atrophy in the body of the hippocampus emerged as the main memory circuit regions correlated with increasing dementia severity in the semantic variant of primary progressive aphasia. Quantitation of neuronal populations in the cingulate cortices confirmed the selective loss of anterior cingulate von Economo neurons in behavioural variant frontotemporal dementia. We also show that by end-stage these neurons selectively degenerate in the semantic variant of primary progressive aphasia with preservation of neurons in the posterior cingulate cortex. Overall, our findings demonstrate for the first time, severe atrophy, although not necessarily neuronal loss, across all relay nodes of Papez circuit with the exception of the mammillary bodies and hippocampal body and tail in the semantic variant of primary progressive aphasia. Despite the longer disease course in the semantic variant of primary progressive aphasia compared with behavioural variant frontotemporal dementia, we suggest here that the neural preservation of crucial memory relays (hippocampal→mammillary bodies and posterior cingulate→hippocampus) likely reflects the conservation of specific episodic memory components observed in most patients with semantic variant of primary progressive aphasia. © The Author (2014). Published by Oxford University Press on behalf of the Guarantors of Brain. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
UTD at TREC 2014: Query Expansion for Clinical Decision Support
2014-11-01
Description: A 62-year-old man sees a neurologist for progressive memory loss and jerking movements of the lower ex- tremities. Neurologic examination confirms...infiltration. Summary: 62-year-old man with progressive memory loss and in- voluntary leg movements. Brain MRI reveals cortical atrophy, and cortical...latent topics produced by the Latent Dirichlet Allocation (LDA) on the TREC-CDS corpus of scientific articles. The position of words “ loss ” and “ memory
OSCAR API for Real-Time Low-Power Multicores and Its Performance on Multicores and SMP Servers
NASA Astrophysics Data System (ADS)
Kimura, Keiji; Mase, Masayoshi; Mikami, Hiroki; Miyamoto, Takamichi; Shirako, Jun; Kasahara, Hironori
OSCAR (Optimally Scheduled Advanced Multiprocessor) API has been designed for real-time embedded low-power multicores to generate parallel programs for various multicores from different vendors by using the OSCAR parallelizing compiler. The OSCAR API has been developed by Waseda University in collaboration with Fujitsu Laboratory, Hitachi, NEC, Panasonic, Renesas Technology, and Toshiba in an METI/NEDO project entitled "Multicore Technology for Realtime Consumer Electronics." By using the OSCAR API as an interface between the OSCAR compiler and backend compilers, the OSCAR compiler enables hierarchical multigrain parallel processing with memory optimization under capacity restriction for cache memory, local memory, distributed shared memory, and on-chip/off-chip shared memory; data transfer using a DMA controller; and power reduction control using DVFS (Dynamic Voltage and Frequency Scaling), clock gating, and power gating for various embedded multicores. In addition, a parallelized program automatically generated by the OSCAR compiler with OSCAR API can be compiled by the ordinary OpenMP compilers since the OSCAR API is designed on a subset of the OpenMP. This paper describes the OSCAR API and its compatibility with the OSCAR compiler by showing code examples. Performance evaluations of the OSCAR compiler and the OSCAR API are carried out using an IBM Power5+ workstation, an IBM Power6 high-end SMP server, and a newly developed consumer electronics multicore chip RP2 by Renesas, Hitachi and Waseda. From the results of scalability evaluation, it is found that on an average, the OSCAR compiler with the OSCAR API can exploit 5.8 times speedup over the sequential execution on the Power5+ workstation with eight cores and 2.9 times speedup on RP2 with four cores, respectively. In addition, the OSCAR compiler can accelerate an IBM XL Fortran compiler up to 3.3 times on the Power6 SMP server. Due to low-power optimization on RP2, the OSCAR compiler with the OSCAR API achieves a maximum power reduction of 84% in the real-time execution mode.
Working memory and semantic involvement in sentence processing: a case of pure progressive amnesia.
Fossard, Marion; Rigalleau, François; Puel, Michèle; Nespoulous, Jean-Luc; Viallard, Gérard; Démonet, Jean-François; Cardebat, Dominique
2006-01-01
ED, a 83-year-old woman, meets the criteria of pure progressive amnesia, with gradual impairment of episodic and autobiographical memory, sparing of semantic processing and strong working memory (WM) deficit. The dissociation between disturbed WM and spared semantic processing permitted testing the role of WM in processing anaphors like pronouns or repeated names. Results showed a globally normal anaphoric behavior in two experiments requiring anaphoric processing in sentence production and comprehension. We suggest that preserved semantic processing in ED would have compensated for working memory deficit in anaphoric processing.
Learning and memory in zebrafish larvae
Roberts, Adam C.; Bill, Brent R.; Glanzman, David L.
2013-01-01
Larval zebrafish possess several experimental advantages for investigating the molecular and neural bases of learning and memory. Despite this, neuroscientists have only recently begun to use these animals to study memory. However, in a relatively short period of time a number of forms of learning have been described in zebrafish larvae, and significant progress has been made toward their understanding. Here we provide a comprehensive review of this progress; we also describe several promising new experimental technologies currently being used in larval zebrafish that are likely to contribute major insights into the processes that underlie learning and memory. PMID:23935566
Meeting the memory challenges of brain-scale network simulation.
Kunkel, Susanne; Potjans, Tobias C; Eppler, Jochen M; Plesser, Hans Ekkehard; Morrison, Abigail; Diesmann, Markus
2011-01-01
The development of high-performance simulation software is crucial for studying the brain connectome. Using connectome data to generate neurocomputational models requires software capable of coping with models on a variety of scales: from the microscale, investigating plasticity, and dynamics of circuits in local networks, to the macroscale, investigating the interactions between distinct brain regions. Prior to any serious dynamical investigation, the first task of network simulations is to check the consistency of data integrated in the connectome and constrain ranges for yet unknown parameters. Thanks to distributed computing techniques, it is possible today to routinely simulate local cortical networks of around 10(5) neurons with up to 10(9) synapses on clusters and multi-processor shared-memory machines. However, brain-scale networks are orders of magnitude larger than such local networks, in terms of numbers of neurons and synapses as well as in terms of computational load. Such networks have been investigated in individual studies, but the underlying simulation technologies have neither been described in sufficient detail to be reproducible nor made publicly available. Here, we discover that as the network model sizes approach the regime of meso- and macroscale simulations, memory consumption on individual compute nodes becomes a critical bottleneck. This is especially relevant on modern supercomputers such as the Blue Gene/P architecture where the available working memory per CPU core is rather limited. We develop a simple linear model to analyze the memory consumption of the constituent components of neuronal simulators as a function of network size and the number of cores used. This approach has multiple benefits. The model enables identification of key contributing components to memory saturation and prediction of the effects of potential improvements to code before any implementation takes place. As a consequence, development cycles can be shorter and less expensive. Applying the model to our freely available Neural Simulation Tool (NEST), we identify the software components dominant at different scales, and develop general strategies for reducing the memory consumption, in particular by using data structures that exploit the sparseness of the local representation of the network. We show that these adaptations enable our simulation software to scale up to the order of 10,000 processors and beyond. As memory consumption issues are likely to be relevant for any software dealing with complex connectome data on such architectures, our approach and our findings should be useful for researchers developing novel neuroinformatics solutions to the challenges posed by the connectome project.
Dodge, Hiroko H; Zhu, Jian; Harvey, Danielle; Saito, Naomi; Silbert, Lisa C; Kaye, Jeffrey A; Koeppe, Robert A; Albin, Roger L
2014-11-01
It is unknown which commonly used Alzheimer disease (AD) biomarker values-baseline or progression-best predict longitudinal cognitive decline. 526 subjects from the Alzheimer's Disease Neuroimaging Initiative (ADNI). ADNI composite memory and executive scores were the primary outcomes. Individual-specific slope of the longitudinal trajectory of each biomarker was first estimated. These estimates and observed baseline biomarker values were used as predictors of cognitive declines. Variability in cognitive declines explained by baseline biomarker values was compared with variability explained by biomarker progression values. About 40% of variability in memory and executive function declines was explained by ventricular volume progression among mild cognitive impairment patients. A total of 84% of memory and 65% of executive function declines were explained by fluorodeoxyglucose positron emission tomography (FDG-PET) score progression and ventricular volume progression, respectively, among AD patients. For most biomarkers, biomarker progressions explained higher variability in cognitive decline than biomarker baseline values. This has important implications for clinical trials targeted to modify AD biomarkers. Copyright © 2014 The Alzheimer's Association. Published by Elsevier Inc. All rights reserved.
Performance issues for domain-oriented time-driven distributed simulations
NASA Technical Reports Server (NTRS)
Nicol, David M.
1987-01-01
It has long been recognized that simulations form an interesting and important class of computations that may benefit from distributed or parallel processing. Since the point of parallel processing is improved performance, the recent proliferation of multiprocessors requires that we consider the performance issues that naturally arise when attempting to implement a distributed simulation. Three such issues are: (1) the problem of mapping the simulation onto the architecture, (2) the possibilities for performing redundant computation in order to reduce communication, and (3) the avoidance of deadlock due to distributed contention for message-buffer space. These issues are discussed in the context of a battlefield simulation implemented on a medium-scale multiprocessor message-passing architecture.
Partitioning and packing mathematical simulation models for calculation on parallel computers
NASA Technical Reports Server (NTRS)
Arpasi, D. J.; Milner, E. J.
1986-01-01
The development of multiprocessor simulations from a serial set of ordinary differential equations describing a physical system is described. Degrees of parallelism (i.e., coupling between the equations) and their impact on parallel processing are discussed. The problem of identifying computational parallelism within sets of closely coupled equations that require the exchange of current values of variables is described. A technique is presented for identifying this parallelism and for partitioning the equations for parallel solution on a multiprocessor. An algorithm which packs the equations into a minimum number of processors is also described. The results of the packing algorithm when applied to a turbojet engine model are presented in terms of processor utilization.
NASA Technical Reports Server (NTRS)
Smith, T. B., III; Lala, J. H.
1984-01-01
The FTMP architecture is a high reliability computer concept modeled after a homogeneous multiprocessor architecture. Elements of the FTMP are operated in tight synchronism with one another and hardware fault-detection and fault-masking is provided which is transparent to the software. Operating system design and user software design is thus greatly simplified. Performance of the FTMP is also comparable to that of a simplex equivalent due to the efficiency of fault handling hardware. The FTMP project constructed an engineering module of the FTMP, programmed the machine and extensively tested the architecture through fault injection and other stress testing. This testing confirmed the soundness of the FTMP concepts.
A measurement-based performability model for a multiprocessor system
NASA Technical Reports Server (NTRS)
Ilsueh, M. C.; Iyer, Ravi K.; Trivedi, K. S.
1987-01-01
A measurement-based performability model based on real error-data collected on a multiprocessor system is described. Model development from the raw errror-data to the estimation of cumulative reward is described. Both normal and failure behavior of the system are characterized. The measured data show that the holding times in key operational and failure states are not simple exponential and that semi-Markov process is necessary to model the system behavior. A reward function, based on the service rate and the error rate in each state, is then defined in order to estimate the performability of the system and to depict the cost of different failure types and recovery procedures.
How preserved is episodic memory in behavioral variant frontotemporal dementia?
Hornberger, M; Piguet, O; Graham, A J; Nestor, P J; Hodges, J R
2010-02-09
Studies have shown variable memory performance in patients with behavioral variant frontotemporal dementia (bvFTD). Our study investigated whether this variability is due to the admixture of patients with true bvFTD and phenocopy patients. We also sought to compare performance of patients with bvFTD and patients with Alzheimer disease (AD). We analyzed neuropsychological memory performance in patients with a clinical diagnosis of bvFTD divided into those who progressed (n = 50) and those who remained stable (n = 39), patients with AD (n = 64), and healthy controls (n = 64). Patients with progressive bvFTD were impaired on most memory tests to a similar level to that of patients with early AD. Findings from a subset of patients with progressive bvFTD with confirmed FTLD pathology (n = 10) corroborated these findings. By contrast, patients with phenocopy bvFTD performed significantly better than progressors and patients with AD. Logistic regression revealed that patients with bvFTD can be distinguished to a high degree (85%) on the immediate recall score of a word list learning test (Rey Auditory Verbal Learning Test). Our results provide evidence for an underlying memory deficit in "real" or progressive behavioral variant frontotemporal dementia (bvFTD) similar to Alzheimer disease, though the groups differ in orientation scores, with patients with bvFTD being intact. Exclusion solely based on impaired neuropsychological memory performance can potentially lead to an underdiagnosis of FTD.
From network heterogeneities to familiarity detection and hippocampal memory management
Wang, Jane X.; Poe, Gina; Zochowski, Michal
2009-01-01
Hippocampal-neocortical interactions are key to the rapid formation of novel associative memories in the hippocampus and consolidation to long term storage sites in the neocortex. We investigated the role of network correlates during information processing in hippocampal-cortical networks. We found that changes in the intrinsic network dynamics due to the formation of structural network heterogeneities alone act as a dynamical and regulatory mechanism for stimulus novelty and familiarity detection, thereby controlling memory management in the context of memory consolidation. This network dynamic, coupled with an anatomically established feedback between the hippocampus and the neocortex, recovered heretofore unexplained properties of neural activity patterns during memory management tasks which we observed during sleep in multiunit recordings from behaving animals. Our simple dynamical mechanism shows an experimentally matched progressive shift of memory activation from the hippocampus to the neocortex and thus provides the means to achieve an autonomous off-line progression of memory consolidation. PMID:18999453
Memory--a century of consolidation.
McGaugh, J L
2000-01-14
The memory consolidation hypothesis proposed 100 years ago by Müller and Pilzecker continues to guide memory research. The hypothesis that new memories consolidate slowly over time has stimulated studies revealing the hormonal and neural influences regulating memory consolidation, as well as molecular and cellular mechanisms. This review examines the progress made over the century in understanding the time-dependent processes that create our lasting memories.
ERIC Educational Resources Information Center
Dunning, Darren L.; Holmes, Joni; Gathercole, Susan E.
2013-01-01
Children with low working memory typically make poor educational progress, and it has been speculated that difficulties in meeting the heavy working memory demands of the classroom may be a contributory factor. Intensive working memory training has been shown to boost performance on untrained memory tasks in a variety of populations. This first…
Müller, S; Saur, R; Greve, B; Melms, A; Hautzinger, M; Fallgatter, A J; Leyhe, T
2013-02-01
Memory disturbance is a common symptom of multiple sclerosis (MS), but little is known about autobiographical memory deficits in the long-term course of different MS subtypes. Inflammatory activity and demyelination is pronounced in relapsing-remitting multiple sclerosis (RRMS) whereas, similar to Alzheimer's disease, neurodegeneration affecting autobiographical memory-associated areas is seen in secondary progressive multiple sclerosis (SPMS). In light of distinct disease mechanisms, we evaluated autobiographical memory in different MS subtypes and hypothesized similarities between elderly patients with SPMS and Alzheimer's disease. We used the Autobiographical Memory Interview to assess episodic and semantic autobiographical memory in 112 education- and gender-matched participants, including healthy controls and patients with RRMS, SPMS, amnesic mild cognitive impairment (aMCI) and early Alzheimer's dementia (AD). Patients with SPMS, AD, and aMCI, but not with RRMS, exhibited a pattern of episodic autobiographical memory impairment that followed Ribot's Law; older memories were better preserved than more recent memories. In contrast to aMCI and AD, neither SPMS nor RRMS was associated with semantic autobiographical memory impairment. Our neuropsychological findings suggest that episodic autobiographical memory is affected in long-term patients with SPMS, possibly due to neurodegenerative processes in functional relevant brain regions.
Instrumentation, performance visualization, and debugging tools for multiprocessors
NASA Technical Reports Server (NTRS)
Yan, Jerry C.; Fineman, Charles E.; Hontalas, Philip J.
1991-01-01
The need for computing power has forced a migration from serial computation on a single processor to parallel processing on multiprocessor architectures. However, without effective means to monitor (and visualize) program execution, debugging, and tuning parallel programs becomes intractably difficult as program complexity increases with the number of processors. Research on performance evaluation tools for multiprocessors is being carried out at ARC. Besides investigating new techniques for instrumenting, monitoring, and presenting the state of parallel program execution in a coherent and user-friendly manner, prototypes of software tools are being incorporated into the run-time environments of various hardware testbeds to evaluate their impact on user productivity. Our current tool set, the Ames Instrumentation Systems (AIMS), incorporates features from various software systems developed in academia and industry. The execution of FORTRAN programs on the Intel iPSC/860 can be automatically instrumented and monitored. Performance data collected in this manner can be displayed graphically on workstations supporting X-Windows. We have successfully compared various parallel algorithms for computational fluid dynamics (CFD) applications in collaboration with scientists from the Numerical Aerodynamic Simulation Systems Division. By performing these comparisons, we show that performance monitors and debuggers such as AIMS are practical and can illuminate the complex dynamics that occur within parallel programs.
NASA Technical Reports Server (NTRS)
Patten, William Neff
1989-01-01
There is an evident need to discover a means of establishing reliable, implementable controls for systems that are plagued by nonlinear and, or uncertain, model dynamics. The development of a generic controller design tool for tough-to-control systems is reported. The method utilizes a moving grid, time infinite element based solution of the necessary conditions that describe an optimal controller for a system. The technique produces a discrete feedback controller. Real time laboratory experiments are now being conducted to demonstrate the viability of the method. The algorithm that results is being implemented in a microprocessor environment. Critical computational tasks are accomplished using a low cost, on-board, multiprocessor (INMOS T800 Transputers) and parallel processing. Progress to date validates the methodology presented. Applications of the technique to the control of highly flexible robotic appendages are suggested.
Microlaser-based compact optical neuro-processors (Invited Paper)
NASA Astrophysics Data System (ADS)
Paek, Eung Gi; Chan, Winston K.; Zah, Chung-En; Cheung, Kwok-wai; Curtis, L.; Chang-Hasnain, Constance J.
1992-10-01
This paper reviews the recent progress in the development of holographic neural networks using surface-emitting laser diode arrays (SELDAs). Since the previous work on ultrafast holographic memory readout system and a robust incoherent correlator, progress has been made in several areas: the use of an array of monolithic `neurons' to reconstruct holographic memories; two-dimensional (2-D) wavelength-division multiplexing (WDM) for image transmission through a single-mode fiber; and finally, an associative memory using time- division multiplexing (TDM). Experimental demonstrations on these are presented.
Kirova, Anna-Mariya; Bays, Rebecca B; Lagalwar, Sarita
2015-01-01
Alzheimer's disease (AD) is a progressive neurodegenerative disease marked by deficits in episodic memory, working memory (WM), and executive function. Examples of executive dysfunction in AD include poor selective and divided attention, failed inhibition of interfering stimuli, and poor manipulation skills. Although episodic deficits during disease progression have been widely studied and are the benchmark of a probable AD diagnosis, more recent research has investigated WM and executive function decline during mild cognitive impairment (MCI), also referred to as the preclinical stage of AD. MCI is a critical period during which cognitive restructuring and neuroplasticity such as compensation still occur; therefore, cognitive therapies could have a beneficial effect on decreasing the likelihood of AD progression during MCI. Monitoring performance on working memory and executive function tasks to track cognitive function may signal progression from normal cognition to MCI to AD. The present review tracks WM decline through normal aging, MCI, and AD to highlight the behavioral and neurological differences that distinguish these three stages in an effort to guide future research on MCI diagnosis, cognitive therapy, and AD prevention.
Luckey, Chance John; Bhattacharya, Deepta; Goldrath, Ananda W.; Weissman, Irving L.; Benoist, Christophe; Mathis, Diane
2006-01-01
The only cells of the hematopoietic system that undergo self-renewal for the lifetime of the organism are long-term hematopoietic stem cells and memory T and B cells. To determine whether there is a shared transcriptional program among these self-renewing populations, we first compared the gene-expression profiles of naïve, effector and memory CD8+ T cells with those of long-term hematopoietic stem cells, short-term hematopoietic stem cells, and lineage-committed progenitors. Transcripts augmented in memory CD8+ T cells relative to naïve and effector T cells were selectively enriched in long-term hematopoietic stem cells and were progressively lost in their short-term and lineage-committed counterparts. Furthermore, transcripts selectively decreased in memory CD8+ T cells were selectively down-regulated in long-term hematopoietic stem cells and progressively increased with differentiation. To confirm that this pattern was a general property of immunologic memory, we turned to independently generated gene expression profiles of memory, naïve, germinal center, and plasma B cells. Once again, memory-enriched and -depleted transcripts were also appropriately augmented and diminished in long-term hematopoietic stem cells, and their expression correlated with progressive loss of self-renewal function. Thus, there appears to be a common signature of both up- and down-regulated transcripts shared between memory T cells, memory B cells, and long-term hematopoietic stem cells. This signature was not consistently enriched in neural or embryonic stem cell populations and, therefore, appears to be restricted to the hematopoeitic system. These observations provide evidence that the shared phenotype of self-renewal in the hematopoietic system is linked at the molecular level. PMID:16492737
NASA Technical Reports Server (NTRS)
Keyes, David E.; Smooke, Mitchell D.
1987-01-01
A parallelized finite difference code based on the Newton method for systems of nonlinear elliptic boundary value problems in two dimensions is analyzed in terms of computational complexity and parallel efficiency. An approximate cost function depending on 15 dimensionless parameters is derived for algorithms based on stripwise and boxwise decompositions of the domain and a one-to-one assignment of the strip or box subdomains to processors. The sensitivity of the cost functions to the parameters is explored in regions of parameter space corresponding to model small-order systems with inexpensive function evaluations and also a coupled system of nineteen equations with very expensive function evaluations. The algorithm was implemented on the Intel Hypercube, and some experimental results for the model problems with stripwise decompositions are presented and compared with the theory. In the context of computational combustion problems, multiprocessors of either message-passing or shared-memory type may be employed with stripwise decompositions to realize speedup of O(n), where n is mesh resolution in one direction, for reasonable n.
A Measurement and Simulation Based Methodology for Cache Performance Modeling and Tuning
NASA Technical Reports Server (NTRS)
Waheed, Abdul; Yan, Jerry; Saini, Subhash (Technical Monitor)
1998-01-01
We present a cache performance modeling methodology that facilitates the tuning of uniprocessor cache performance for applications executing on shared memory multiprocessors by accurately predicting the effects of source code level modifications. Measurements on a single processor are initially used for identifying parts of code where cache utilization improvements may significantly impact the overall performance. Cache simulation based on trace-driven techniques can be carried out without gathering detailed address traces. Minimal runtime information for modeling cache performance of a selected code block includes: base virtual addresses of arrays, virtual addresses of variables, and loop bounds for that code block. Rest of the information is obtained from the source code. We show that the cache performance predictions are as reliable as those obtained through trace-driven simulations. This technique is particularly helpful to the exploration of various "what-if' scenarios regarding the cache performance impact for alternative code structures. We explain and validate this methodology using a simple matrix-matrix multiplication program. We then apply this methodology to predict and tune the cache performance of two realistic scientific applications taken from the Computational Fluid Dynamics (CFD) domain.
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster
NASA Technical Reports Server (NTRS)
Jost, Gabriele; Jin, Haoqiang; anMey, Dieter; Hatay, Ferhat F.
2003-01-01
With the advent of parallel hardware and software technologies users are faced with the challenge to choose a programming paradigm best suited for the underlying computer architecture. With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors (SMP), parallel programming techniques have evolved to support parallelism beyond a single level. Which programming paradigm is the best will depend on the nature of the given problem, the hardware architecture, and the available software. In this study we will compare different programming paradigms for the parallelization of a selected benchmark application on a cluster of SMP nodes. We compare the timings of different implementations of the same CFD benchmark application employing the same numerical algorithm on a cluster of Sun Fire SMP nodes. The rest of the paper is structured as follows: In section 2 we briefly discuss the programming models under consideration. We describe our compute platform in section 3. The different implementations of our benchmark code are described in section 4 and the performance results are presented in section 5. We conclude our study in section 6.
High-performance multiprocessor architecture for a 3-D lattice gas model
NASA Technical Reports Server (NTRS)
Lee, F.; Flynn, M.; Morf, M.
1991-01-01
The lattice gas method has recently emerged as a promising discrete particle simulation method in areas such as fluid dynamics. We present a very high-performance scalable multiprocessor architecture, called ALGE, proposed for the simulation of a realistic 3-D lattice gas model, Henon's 24-bit FCHC isometric model. Each of these VLSI processors is as powerful as a CRAY-2 for this application. ALGE is scalable in the sense that it achieves linear speedup for both fixed and increasing problem sizes with more processors. The core computation of a lattice gas model consists of many repetitions of two alternating phases: particle collision and propagation. Functional decomposition by symmetry group and virtual move are the respective keys to efficient implementation of collision and propagation.
Eichenbaum, Howard; Amaral, David G.; Buffalo, Elizabeth A.; Buzsáki, György; Cohen, Neal; Davachi, Lila; Frank, Loren; Heckers, Stephan; Morris, Richard G. M.; Moser, Edvard I.; Nadel, Lynn; O'Keefe, John; Preston, Alison; Ranganath, Charan; Silva, Alcino; Witter, Menno
2017-01-01
The journal Hippocampus has passed the milestone of 25 years of publications on the topic of a highly studied brain structure, and its closely associated brain areas. In a recent celebration of this event, a Boston memory group invited 16 speakers to address the question of progress in understanding the hippocampus that has been achieved. Here we present a summary of these talks organized as progress on four main themes: (1) Understanding the hippocampus in terms of its interactions with multiple cortical areas within the medial temporal lobe memory system, (2) understanding the relationship between memory and spatial information processing functions of the hippocampal region, (3) understanding the role of temporal organization in spatial and memory processing by the hippocampus, and (4) understanding how the hippocampus integrates related events into networks of memories. PMID:27399159
ERIC Educational Resources Information Center
Wiley, Jennifer; Jarosz, Andrew F.; Cushen, Patrick J.; Colflesh, Gregory J. H.
2011-01-01
The correlation between individual differences in working memory capacity and performance on the Raven's Advanced Progressive Matrices (RAPM) is well documented yet poorly understood. The present work proposes a new explanation: that the need to use a new combination of rules on RAPM problems drives the relation between performance and working…
The Advanced Software Development and Commercialization Project
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gallopoulos, E.; Canfield, T.R.; Minkoff, M.
1990-09-01
This is the first of a series of reports pertaining to progress in the Advanced Software Development and Commercialization Project, a joint collaborative effort between the Center for Supercomputing Research and Development of the University of Illinois and the Computing and Telecommunications Division of Argonne National Laboratory. The purpose of this work is to apply techniques of parallel computing that were pioneered by University of Illinois researchers to mature computational fluid dynamics (CFD) and structural dynamics (SD) computer codes developed at Argonne. The collaboration in this project will bring this unique combination of expertise to bear, for the first time,more » on industrially important problems. By so doing, it will expose the strengths and weaknesses of existing techniques for parallelizing programs and will identify those problems that need to be solved in order to enable wide spread production use of parallel computers. Secondly, the increased efficiency of the CFD and SD codes themselves will enable the simulation of larger, more accurate engineering models that involve fluid and structural dynamics. In order to realize the above two goals, we are considering two production codes that have been developed at ANL and are widely used by both industry and Universities. These are COMMIX and WHAMS-3D. The first is a computational fluid dynamics code that is used for both nuclear reactor design and safety and as a design tool for the casting industry. The second is a three-dimensional structural dynamics code used in nuclear reactor safety as well as crashworthiness studies. These codes are currently available for both sequential and vector computers only. Our main goal is to port and optimize these two codes on shared memory multiprocessors. In so doing, we shall establish a process that can be followed in optimizing other sequential or vector engineering codes for parallel processors.« less
Eichenbaum, Howard; Amaral, David G; Buffalo, Elizabeth A; Buzsáki, György; Cohen, Neal; Davachi, Lila; Frank, Loren; Heckers, Stephan; Morris, Richard G M; Moser, Edvard I; Nadel, Lynn; O'Keefe, John; Preston, Alison; Ranganath, Charan; Silva, Alcino; Witter, Menno
2016-10-01
The journal Hippocampus has passed the milestone of 25 years of publications on the topic of a highly studied brain structure, and its closely associated brain areas. In a recent celebration of this event, a Boston memory group invited 16 speakers to address the question of progress in understanding the hippocampus that has been achieved. Here we present a summary of these talks organized as progress on four main themes: (1) Understanding the hippocampus in terms of its interactions with multiple cortical areas within the medial temporal lobe memory system, (2) understanding the relationship between memory and spatial information processing functions of the hippocampal region, (3) understanding the role of temporal organization in spatial and memory processing by the hippocampus, and (4) understanding how the hippocampus integrates related events into networks of memories. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Episodic and working memory function in Primary Progressive Aphasia: A meta-analysis.
Eikelboom, Willem S; Janssen, Nikki; Jiskoot, Lize C; van den Berg, Esther; Roelofs, Ardi; Kessels, Roy P C
2018-06-18
The distinction between Primary Progressive Aphasia (PPA) variants remains challenging for clinicians, especially for the non-fluent (nfv-PPA) and the logopenic variants (lv-PPA). Previous research suggests that memory tests might aid this differentiation. This meta-analysis compares memory function among PPA variants. Effects sizes were extracted from 41 studies (N = 849). Random-effects models were used to compare performance on episodic and working memory tests among PPA patients and healthy controls, and between the PPA variants. Memory deficits were frequently observed in PPA compared to controls, with large effect sizes for lv-PPA (Hedges' g = -2.04 [-2.58 to -1.49]), nfv-PPA (Hedges' g = -1.34 [-1.69 to -1.00]), and the semantic variant (sv-PPA; Hedges' g = -1.23 [-1.50 to -0.97]). Sv-PPA showed primarily verbal memory deficits, whereas lv-PPA showed worse performance than nfv-PPA on both verbal and non-verbal memory tests. Memory deficits were more pronounced in lv-PPA compared to nfv-PPA. This suggests that memory tests may be helpful to distinguish between these PPA variants. Copyright © 2018. Published by Elsevier Ltd.
Nilakantan, Aneesha S; Voss, Joel L; Weintraub, Sandra; Mesulam, M-Marsel; Rogalski, Emily J
2017-06-01
Primary progressive aphasia (PPA) is clinically defined by an initial loss of language function and preservation of other cognitive abilities, including episodic memory. While PPA primarily affects the left-lateralized perisylvian language network, some clinical neuropsychological tests suggest concurrent initial memory loss. The goal of this study was to test recognition memory of objects and words in the visual and auditory modality to separate language-processing impairments from retentive memory in PPA. Individuals with non-semantic PPA had longer reaction times and higher false alarms for auditory word stimuli compared to visual object stimuli. Moreover, false alarms for auditory word recognition memory were related to cortical thickness within the left inferior frontal gyrus and left temporal pole, while false alarms for visual object recognition memory was related to cortical thickness within the right-temporal pole. This pattern of results suggests that specific vulnerability in processing verbal stimuli can hinder episodic memory in PPA, and provides evidence for differential contributions of the left and right temporal poles in word and object recognition memory. Copyright © 2017 Elsevier Ltd. All rights reserved.
Multiprocessor Neural Network in Healthcare.
Godó, Zoltán Attila; Kiss, Gábor; Kocsis, Dénes
2015-01-01
A possible way of creating a multiprocessor artificial neural network is by the use of microcontrollers. The RISC processors' high performance and the large number of I/O ports mean they are greatly suitable for creating such a system. During our research, we wanted to see if it is possible to efficiently create interaction between the artifical neural network and the natural nervous system. To achieve as much analogy to the living nervous system as possible, we created a frequency-modulated analog connection between the units. Our system is connected to the living nervous system through 128 microelectrodes. Two-way communication is provided through A/D transformation, which is even capable of testing psychopharmacons. The microcontroller-based analog artificial neural network can play a great role in medical singal processing, such as ECG, EEG etc.
Characterizing parallel file-access patterns on a large-scale multiprocessor
NASA Technical Reports Server (NTRS)
Purakayastha, A.; Ellis, Carla; Kotz, David; Nieuwejaar, Nils; Best, Michael L.
1995-01-01
High-performance parallel file systems are needed to satisfy tremendous I/O requirements of parallel scientific applications. The design of such high-performance parallel file systems depends on a comprehensive understanding of the expected workload, but so far there have been very few usage studies of multiprocessor file systems. This paper is part of the CHARISMA project, which intends to fill this void by measuring real file-system workloads on various production parallel machines. In particular, we present results from the CM-5 at the National Center for Supercomputing Applications. Our results are unique because we collect information about nearly every individual I/O request from the mix of jobs running on the machine. Analysis of the traces leads to various recommendations for parallel file-system design.
NASA Technical Reports Server (NTRS)
Huynh, Loc C.; Duval, R. W.
1986-01-01
The use of Redundant Asynchronous Multiprocessor System to achieve ultrareliable Fault Tolerant Control Systems shows great promise. The development has been hampered by the inability to determine whether differences in the outputs of redundant CPU's are due to failures or to accrued error built up by slight differences in CPU clock intervals. This study derives an analytical dynamic model of the difference between redundant CPU's due to differences in their clock intervals and uses this model with on-line parameter identification to idenitify the differences in the clock intervals. The ability of this methodology to accurately track errors due to asynchronisity generate an error signal with the effect of asynchronisity removed and this signal may be used to detect and isolate actual system failures.
Probabilistic evaluation of on-line checks in fault-tolerant multiprocessor systems
NASA Technical Reports Server (NTRS)
Nair, V. S. S.; Hoskote, Yatin V.; Abraham, Jacob A.
1992-01-01
The analysis of fault-tolerant multiprocessor systems that use concurrent error detection (CED) schemes is much more difficult than the analysis of conventional fault-tolerant architectures. Various analytical techniques have been proposed to evaluate CED schemes deterministically. However, these approaches are based on worst-case assumptions related to the failure of system components. Often, the evaluation results do not reflect the actual fault tolerance capabilities of the system. A probabilistic approach to evaluate the fault detecting and locating capabilities of on-line checks in a system is developed. The various probabilities associated with the checking schemes are identified and used in the framework of the matrix-based model. Based on these probabilistic matrices, estimates for the fault tolerance capabilities of various systems are derived analytically.
Working Memory Training Does Not Improve Intelligence in Healthy Young Adults
ERIC Educational Resources Information Center
Chooi, Weng-Tink; Thompson, Lee A.
2012-01-01
Jaeggi and her colleagues claimed that they were able to improve fluid intelligence by training working memory. Subjects who trained their working memory on a dual n-back task for a period of time showed significant improvements in working memory span tasks and fluid intelligence tests such as the Raven's Progressive Matrices and the Bochumer…
Progress In Optical Memory Technology
NASA Astrophysics Data System (ADS)
Tsunoda, Yoshito
1987-01-01
More than 20 years have passed since the concept of optical memory was first proposed in 1966. Since then considerable progress has been made in this area together with the creation of completely new markets of optical memory in consumer and computer application areas. The first generation of optical memory was mainly developed with holographic recording technology in late 1960s and early 1970s. Considerable number of developments have been done in both analog and digital memory applications. Unfortunately, these technologies did not meet a chance to be a commercial product. The second generation of optical memory started at the beginning of 1970s with bit by bit recording technology. Read-only type optical memories such as video disks and compact audio disks have extensively investigated. Since laser diodes were first applied to optical video disk read out in 1976, there have been extensive developments of laser diode pick-ups for optical disk memory systems. The third generation of optical memory started in 1978 with bit by bit read/write technology using laser diodes. Developments of recording materials including both write-once and erasable have been actively pursued at several research institutes. These technologies are mainly focused on the optical memory systems for computer application. Such practical applications of optical memory technology has resulted in the creation of such new products as compact audio disks and computer file memories.
Enhancement of Immune Memory Responses to Respiratory Infection
2017-08-01
induction of highly specific B and T cell responses against viral infections. Despite recent progress in vaccine development, the molecular mechanisms...highly expressed in memory B cells in mice, and Atg7 is required for maintenance of long-term memory B cells needed to protect against influenza...infection. Human influenza-specific memory B cells also have high levels of autophagy, but whether autophagy protects memory B cell survival in humans
Carrasquillo, Minerva M; Crook, Julia E; Pedraza, Otto; Thomas, Colleen S; Pankratz, V Shane; Allen, Mariet; Nguyen, Thuy; Malphrus, Kimberly G; Ma, Li; Bisceglio, Gina D; Roberts, Rosebud O; Lucas, John A; Smith, Glenn E; Ivnik, Robert J; Machulda, Mary M; Graff-Radford, Neill R; Petersen, Ronald C; Younkin, Steven G; Ertekin-Taner, Nilüfer
2015-01-01
We tested association of nine late-onset Alzheimer's disease (LOAD) risk variants from genome-wide association studies (GWAS) with memory and progression to mild cognitive impairment (MCI) or LOAD (MCI/LOAD) in older Caucasians, cognitively normal at baseline and longitudinally evaluated at Mayo Clinic Rochester and Jacksonville (n>2000). Each variant was tested both individually and collectively using a weighted risk score. APOE-e4 associated with worse baseline memory and increased decline with highly significant overall effect on memory. CLU-rs11136000-G associated with worse baseline memory and incident MCI/LOAD. MS4A6A-rs610932-C associated with increased incident MCI/LOAD and suggestively with lower baseline memory. ABCA7-rs3764650-C and EPHA1-rs11767557-A associated with increased rates of memory decline in subjects with a final diagnosis of MCI/LOAD. PICALM-rs3851179-G had an unexpected protective effect on incident MCI/LOAD. Only APOE-inclusive risk scores associated with worse memory and incident MCI/LOAD. The collective influence of the nine top LOAD GWAS variants on memory decline and progression to MCI/LOAD appears limited. Discovery of biologically functional variants at these loci may uncover stronger effects on memory and incident disease. Copyright © 2015 Elsevier Inc. All rights reserved.
Tabuchi, Katsuhiko; Chen, Guiquan; Südhof, Thomas C; Shen, Jie
2009-06-03
Loss of presenilin function in adult mouse brains causes memory loss and age-related neurodegeneration. Since presenilin possesses gamma-secretase-dependent and -independent activities, it remains unknown which activity is required for presenilin-dependent memory formation and neuronal survival. To address this question, we generated postnatal forebrain-specific nicastrin conditional knock-out (cKO) mice, in which nicastrin, a subunit of gamma-secretase, is inactivated selectively in mature excitatory neurons of the cerebral cortex. nicastrin cKO mice display progressive impairment in learning and memory and exhibit age-dependent cortical neuronal loss, accompanied by astrocytosis, microgliosis, and hyperphosphorylation of the microtubule-associated protein Tau. The neurodegeneration observed in nicastrin cKO mice likely occurs via apoptosis, as evidenced by increased numbers of apoptotic neurons. These findings demonstrate an essential role of nicastrin in the execution of learning and memory and the maintenance of neuronal survival in the brain and suggest that presenilin functions in memory and neuronal survival via its role as a gamma-secretase subunit.
Everyday episodic memory in amnestic mild cognitive impairment: a preliminary investigation.
Irish, Muireann; Lawlor, Brian A; Coen, Robert F; O'Mara, Shane M
2011-08-04
Decline in episodic memory is one of the hallmark features of Alzheimer's disease (AD) and is also a defining feature of amnestic Mild Cognitive Impairment (MCI), which is posited as a potential prodrome of AD. While deficits in episodic memory are well documented in MCI, the nature of this impairment remains relatively under-researched, particularly for those domains with direct relevance and meaning for the patient's daily life. In order to fully explore the impact of disruption to the episodic memory system on everyday memory in MCI, we examined participants' episodic memory capacity using a battery of experimental tasks with real-world relevance. We investigated episodic acquisition and delayed recall (story-memory), associative memory (face-name pairings), spatial memory (route learning and recall), and memory for everyday mundane events in 16 amnestic MCI and 18 control participants. Furthermore, we followed MCI participants longitudinally to gain preliminary evidence regarding the possible predictive efficacy of these real-world episodic memory tasks for subsequent conversion to AD. The most discriminating tests at baseline were measures of acquisition, delayed recall, and associative memory, followed by everyday memory, and spatial memory tasks, with MCI patients scoring significantly lower than controls. At follow-up (mean time elapsed: 22.4 months), 6 MCI cases had progressed to clinically probable AD. Exploratory logistic regression analyses revealed that delayed associative memory performance at baseline was a potential predictor of subsequent conversion to AD. As a preliminary study, our findings suggest that simple associative memory paradigms with real-world relevance represent an important line of enquiry in future longitudinal studies charting MCI progression over time.
Memory retrieval by activating engram cells in mouse models of early Alzheimer's disease.
Roy, Dheeraj S; Arons, Autumn; Mitchell, Teryn I; Pignatelli, Michele; Ryan, Tomás J; Tonegawa, Susumu
2016-03-24
Alzheimer's disease (AD) is a neurodegenerative disorder characterized by progressive memory decline and subsequent loss of broader cognitive functions. Memory decline in the early stages of AD is mostly limited to episodic memory, for which the hippocampus has a crucial role. However, it has been uncertain whether the observed amnesia in the early stages of AD is due to disrupted encoding and consolidation of episodic information, or an impairment in the retrieval of stored memory information. Here we show that in transgenic mouse models of early AD, direct optogenetic activation of hippocampal memory engram cells results in memory retrieval despite the fact that these mice are amnesic in long-term memory tests when natural recall cues are used, revealing a retrieval, rather than a storage impairment. Before amyloid plaque deposition, the amnesia in these mice is age-dependent, which correlates with a progressive reduction in spine density of hippocampal dentate gyrus engram cells. We show that optogenetic induction of long-term potentiation at perforant path synapses of dentate gyrus engram cells restores both spine density and long-term memory. We also demonstrate that an ablation of dentate gyrus engram cells containing restored spine density prevents the rescue of long-term memory. Thus, selective rescue of spine density in engram cells may lead to an effective strategy for treating memory loss in the early stages of AD.
NASA Workshop on Computational Structural Mechanics 1987, part 1
NASA Technical Reports Server (NTRS)
Sykes, Nancy P. (Editor)
1989-01-01
Topics in Computational Structural Mechanics (CSM) are reviewed. CSM parallel structural methods, a transputer finite element solver, architectures for multiprocessor computers, and parallel eigenvalue extraction are among the topics discussed.
Monitoring of computing resource use of active software releases at ATLAS
NASA Astrophysics Data System (ADS)
Limosani, Antonio; ATLAS Collaboration
2017-10-01
The LHC is the world’s most powerful particle accelerator, colliding protons at centre of mass energy of 13 TeV. As the energy and frequency of collisions has grown in the search for new physics, so too has demand for computing resources needed for event reconstruction. We will report on the evolution of resource usage in terms of CPU and RAM in key ATLAS offline reconstruction workflows at the TierO at CERN and on the WLCG. Monitoring of workflows is achieved using the ATLAS PerfMon package, which is the standard ATLAS performance monitoring system running inside Athena jobs. Systematic daily monitoring has recently been expanded to include all workflows beginning at Monte Carlo generation through to end-user physics analysis, beyond that of event reconstruction. Moreover, the move to a multiprocessor mode in production jobs has facilitated the use of tools, such as “MemoryMonitor”, to measure the memory shared across processors in jobs. Resource consumption is broken down into software domains and displayed in plots generated using Python visualization libraries and collected into pre-formatted auto-generated Web pages, which allow the ATLAS developer community to track the performance of their algorithms. This information is however preferentially filtered to domain leaders and developers through the use of JIRA and via reports given at ATLAS software meetings. Finally, we take a glimpse of the future by reporting on the expected CPU and RAM usage in benchmark workflows associated with the High Luminosity LHC and anticipate the ways performance monitoring will evolve to understand and benchmark future workflows.
NASA Technical Reports Server (NTRS)
Koppenhoefer, Kyle C.; Gullerud, Arne S.; Ruggieri, Claudio; Dodds, Robert H., Jr.; Healy, Brian E.
1998-01-01
This report describes theoretical background material and commands necessary to use the WARP3D finite element code. WARP3D is under continuing development as a research code for the solution of very large-scale, 3-D solid models subjected to static and dynamic loads. Specific features in the code oriented toward the investigation of ductile fracture in metals include a robust finite strain formulation, a general J-integral computation facility (with inertia, face loading), an element extinction facility to model crack growth, nonlinear material models including viscoplastic effects, and the Gurson-Tver-gaard dilatant plasticity model for void growth. The nonlinear, dynamic equilibrium equations are solved using an incremental-iterative, implicit formulation with full Newton iterations to eliminate residual nodal forces. The history integration of the nonlinear equations of motion is accomplished with Newmarks Beta method. A central feature of WARP3D involves the use of a linear-preconditioned conjugate gradient (LPCG) solver implemented in an element-by-element format to replace a conventional direct linear equation solver. This software architecture dramatically reduces both the memory requirements and CPU time for very large, nonlinear solid models since formation of the assembled (dynamic) stiffness matrix is avoided. Analyses thus exhibit the numerical stability for large time (load) steps provided by the implicit formulation coupled with the low memory requirements characteristic of an explicit code. In addition to the much lower memory requirements of the LPCG solver, the CPU time required for solution of the linear equations during each Newton iteration is generally one-half or less of the CPU time required for a traditional direct solver. All other computational aspects of the code (element stiffnesses, element strains, stress updating, element internal forces) are implemented in the element-by- element, blocked architecture. This greatly improves vectorization of the code on uni-processor hardware and enables straightforward parallel-vector processing of element blocks on multi-processor hardware.
Multi-GPU and multi-CPU accelerated FDTD scheme for vibroacoustic applications
NASA Astrophysics Data System (ADS)
Francés, J.; Otero, B.; Bleda, S.; Gallego, S.; Neipp, C.; Márquez, A.; Beléndez, A.
2015-06-01
The Finite-Difference Time-Domain (FDTD) method is applied to the analysis of vibroacoustic problems and to study the propagation of longitudinal and transversal waves in a stratified media. The potential of the scheme and the relevance of each acceleration strategy for massively computations in FDTD are demonstrated in this work. In this paper, we propose two new specific implementations of the bi-dimensional scheme of the FDTD method using multi-CPU and multi-GPU, respectively. In the first implementation, an open source message passing interface (OMPI) has been included in order to massively exploit the resources of a biprocessor station with two Intel Xeon processors. Moreover, regarding CPU code version, the streaming SIMD extensions (SSE) and also the advanced vectorial extensions (AVX) have been included with shared memory approaches that take advantage of the multi-core platforms. On the other hand, the second implementation called the multi-GPU code version is based on Peer-to-Peer communications available in CUDA on two GPUs (NVIDIA GTX 670). Subsequently, this paper presents an accurate analysis of the influence of the different code versions including shared memory approaches, vector instructions and multi-processors (both CPU and GPU) and compares them in order to delimit the degree of improvement of using distributed solutions based on multi-CPU and multi-GPU. The performance of both approaches was analysed and it has been demonstrated that the addition of shared memory schemes to CPU computing improves substantially the performance of vector instructions enlarging the simulation sizes that use efficiently the cache memory of CPUs. In this case GPU computing is slightly twice times faster than the fine tuned CPU version in both cases one and two nodes. However, for massively computations explicit vector instructions do not worth it since the memory bandwidth is the limiting factor and the performance tends to be the same than the sequential version with auto-vectorisation and also shared memory approach. In this scenario GPU computing is the best option since it provides a homogeneous behaviour. More specifically, the speedup of GPU computing achieves an upper limit of 12 for both one and two GPUs, whereas the performance reaches peak values of 80 GFlops and 146 GFlops for the performance for one GPU and two GPUs respectively. Finally, the method is applied to an earth crust profile in order to demonstrate the potential of our approach and the necessity of applying acceleration strategies in these type of applications.
FMRI of visual working memory in high school football players.
Shenk, Trey E; Robinson, Meghan E; Svaldi, Diana O; Abbas, Kausar; Breedlove, Katherine M; Leverenz, Larry J; Nauman, Eric A; Talavage, Thomas M
2015-01-01
Visual working memory deficits have been observed in at-risk athletes. This study uses a visual N-back working memory functional magnetic resonance imaging task to longitudinally assess asymptomatic football athletes for abnormal activity. Athletes were increasingly "flagged" as the season progressed. Flagging may provide early detection of injury.
A macro-micro robot for precise force applications
NASA Technical Reports Server (NTRS)
Marzwell, Neville I.; Wang, Yulun
1993-01-01
This paper describes an 8 degree-of-freedom macro-micro robot capable of performing tasks which require accurate force control. Applications such as polishing, finishing, grinding, deburring, and cleaning are a few examples of tasks which need this capability. Currently these tasks are either performed manually or with dedicated machinery because of the lack of a flexible and cost effective tool, such as a programmable force-controlled robot. The basic design and control of the macro-micro robot is described in this paper. A modular high-performance multiprocessor control system was designed to provide sufficient compute power for executing advanced control methods. An 8 degree of freedom macro-micro mechanism was constructed to enable accurate tip forces. Control algorithms based on the impedance control method were derived, coded, and load balanced for maximum execution speed on the multiprocessor system.
Closed-form solutions of performability. [modeling of a degradable buffer/multiprocessor system
NASA Technical Reports Server (NTRS)
Meyer, J. F.
1981-01-01
Methods which yield closed form performability solutions for continuous valued variables are developed. The models are similar to those employed in performance modeling (i.e., Markovian queueing models) but are extended so as to account for variations in structure due to faults. In particular, the modeling of a degradable buffer/multiprocessor system is considered whose performance Y is the (normalized) average throughput rate realized during a bounded interval of time. To avoid known difficulties associated with exact transient solutions, an approximate decomposition of the model is employed permitting certain submodels to be solved in equilibrium. These solutions are then incorporated in a model with fewer transient states and by solving the latter, a closed form solution of the system's performability is obtained. In conclusion, some applications of this solution are discussed and illustrated, including an example of design optimization.
Parallel algorithm of VLBI software correlator under multiprocessor environment
NASA Astrophysics Data System (ADS)
Zheng, Weimin; Zhang, Dong
2007-11-01
The correlator is the key signal processing equipment of a Very Lone Baseline Interferometry (VLBI) synthetic aperture telescope. It receives the mass data collected by the VLBI observatories and produces the visibility function of the target, which can be used to spacecraft position, baseline length measurement, synthesis imaging, and other scientific applications. VLBI data correlation is a task of data intensive and computation intensive. This paper presents the algorithms of two parallel software correlators under multiprocessor environments. A near real-time correlator for spacecraft tracking adopts the pipelining and thread-parallel technology, and runs on the SMP (Symmetric Multiple Processor) servers. Another high speed prototype correlator using the mixed Pthreads and MPI (Massage Passing Interface) parallel algorithm is realized on a small Beowulf cluster platform. Both correlators have the characteristic of flexible structure, scalability, and with 10-station data correlating abilities.
Everyday episodic memory in amnestic mild cognitive impairment: a preliminary investigation
2011-01-01
Background Decline in episodic memory is one of the hallmark features of Alzheimer's disease (AD) and is also a defining feature of amnestic Mild Cognitive Impairment (MCI), which is posited as a potential prodrome of AD. While deficits in episodic memory are well documented in MCI, the nature of this impairment remains relatively under-researched, particularly for those domains with direct relevance and meaning for the patient's daily life. In order to fully explore the impact of disruption to the episodic memory system on everyday memory in MCI, we examined participants' episodic memory capacity using a battery of experimental tasks with real-world relevance. We investigated episodic acquisition and delayed recall (story-memory), associative memory (face-name pairings), spatial memory (route learning and recall), and memory for everyday mundane events in 16 amnestic MCI and 18 control participants. Furthermore, we followed MCI participants longitudinally to gain preliminary evidence regarding the possible predictive efficacy of these real-world episodic memory tasks for subsequent conversion to AD. Results The most discriminating tests at baseline were measures of acquisition, delayed recall, and associative memory, followed by everyday memory, and spatial memory tasks, with MCI patients scoring significantly lower than controls. At follow-up (mean time elapsed: 22.4 months), 6 MCI cases had progressed to clinically probable AD. Exploratory logistic regression analyses revealed that delayed associative memory performance at baseline was a potential predictor of subsequent conversion to AD. Conclusions As a preliminary study, our findings suggest that simple associative memory paradigms with real-world relevance represent an important line of enquiry in future longitudinal studies charting MCI progression over time. PMID:21816065
Cognitive improvement following repair of a basal encephalocele.
Tulloch, Isabel; Palmer, Siobhan; Scott, Richard; Lozsadi, Dora; Martin, Andrew J
2018-06-01
We report the case of a 55-year-old woman presenting with progressive memory impairment secondary to a transsphenoidal encephalocele involving her dominant medial temporal lobe. Her clinical deterioration was accompanied by radiological progression in the encephalocele's size and associated encephalomalacia. Through a temporal craniotomy, her encephalocele was resected and the defect closed. Baseline neuropsychological assessment indicated global cognitive impairment, but post-operatively, she reported improved memory and concentration. Standardized assessment reflected an improvement in perceptual skills and an associated improved recall of a complex figure. This is the first case report to date of a patient's memory improving following treatment of a basal encephalocele.
Loss of memory B cells impairs maintenance of long-term serologic memory during HIV-1 infection.
Titanji, Kehmia; De Milito, Angelo; Cagigi, Alberto; Thorstensson, Rigmor; Grützmeier, Sven; Atlas, Ann; Hejdeman, Bo; Kroon, Frank P; Lopalco, Lucia; Nilsson, Anna; Chiodi, Francesca
2006-09-01
Circulating memory B cells are severely reduced in the peripheral blood of HIV-1-infected patients. We investigated whether dysfunctional serologic memory to non-HIV antigens is related to disease progression by evaluating the frequency of memory B cells, plasma IgG, plasma levels of antibodies to measles, and Streptococcus pneumoniae, and enumerating measles-specific antibody-secreting cells in patients with primary, chronic, and long-term nonprogressive HIV-1 infection. We also evaluated the in vitro production of IgM and IgG antibodies against measles and S pneumoniae antigens following polyclonal activation of peripheral blood mononuclear cells (PBMCs) from patients. The percentage of memory B cells correlated with CD4+ T-cell counts in patients, thus representing a marker of disease progression. While patients with primary and chronic infection had severe defects in serologic memory, long-term nonprogressors had memory B-cell frequency and levels of antigen-specific antibodies comparable with controls. We also evaluated the effect of antiretroviral therapy on these serologic memory defects and found that antiretroviral therapy did not restore serologic memory in primary or in chronic infection. We suggest that HIV infection impairs maintenance of long-term serologic immunity to HIV-1-unrelated antigens and this defect is initiated early in infection. This may have important consequences for the response of HIV-infected patients to immunizations.
RTEMS CENTRE- RTEMS Improvement
NASA Astrophysics Data System (ADS)
Silva, Helder; Constantino, Alexandre; Freitas, Daniel; Coutinho, Manuel; Faustino, Sergio; Sousa, Jose; Dias, Luis; Zulianello, Marco
2010-08-01
During the last two years, EDISOFT's RTEMS CENTRE team [1], jointly with the European Space Agency and with the support of the worldwide RTEMS community [2], have been developing an activity to facilitate the qualification of the real-time operating system RTEMS (Real-Time Operating System for Multiprocessor Systems). This paper intends to give a high level visibility of the progress and the results obtained in the RTEMS Improvement [3] activity. The primary objective [4] of the project is to improve the RTEMS product, its documentation and to facilitate the qualification of RTEMS for future space missions, taking into consideration the specific operational requirements. The sections below provide a brief overview of the RTEMS operating system and the activities performed in the RTEMS Improvement project, which includes the selection of API managers to be qualified, the tailoring process, the requirements analysis, the reverse engineering and design of the RTEMS, the quality assurance process, the ISVV activities, the test campaign, the results obtained, the criticality analysis and the facilitation of qualification process.
Parallel processing and expert systems
NASA Technical Reports Server (NTRS)
Yan, Jerry C.; Lau, Sonie
1991-01-01
Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 90's cannot enjoy an increased level of autonomy without the efficient use of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real time demands are met for large expert systems. Speed-up via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial labs in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems was surveyed. The survey is divided into three major sections: (1) multiprocessors for parallel expert systems; (2) parallel languages for symbolic computations; and (3) measurements of parallelism of expert system. Results to date indicate that the parallelism achieved for these systems is small. In order to obtain greater speed-ups, data parallelism and application parallelism must be exploited.
Numerical Analysis of Ginzburg-Landau Models for Superconductivity.
NASA Astrophysics Data System (ADS)
Coskun, Erhan
Thin film conventional, as well as High T _{c} superconductors of various geometric shapes placed under both uniform and variable strength magnetic field are studied using the universially accepted macroscopic Ginzburg-Landau model. A series of new theoretical results concerning the properties of solution is presented using the semi -discrete time-dependent Ginzburg-Landau equations, staggered grid setup and natural boundary conditions. Efficient serial algorithms including a novel adaptive algorithm is developed and successfully implemented for solving the governing highly nonlinear parabolic system of equations. Refinement technique used in the adaptive algorithm is based on modified forward Euler method which was also developed by us to ease the restriction on time step size for stability considerations. Stability and convergence properties of forward and modified forward Euler schemes are studied. Numerical simulations of various recent physical experiments of technological importance such as vortes motion and pinning are performed. The numerical code for solving time-dependent Ginzburg-Landau equations is parallelized using BlockComm -Chameleon and PCN. The parallel code was run on the distributed memory multiprocessors intel iPSC/860, IBM-SP1 and cluster of Sun Sparc workstations, all located at Mathematics and Computer Science Division, Argonne National Laboratory.
Parallel algorithms for quantum chemistry. I. Integral transformations on a hypercube multiprocessor
DOE Office of Scientific and Technical Information (OSTI.GOV)
Whiteside, R.A.; Binkley, J.S.; Colvin, M.E.
1987-02-15
For many years it has been recognized that fundamental physical constraints such as the speed of light will limit the ultimate speed of single processor computers to less than about three billion floating point operations per second (3 GFLOPS). This limitation is becoming increasingly restrictive as commercially available machines are now within an order of magnitude of this asymptotic limit. A natural way to avoid this limit is to harness together many processors to work on a single computational problem. In principle, these parallel processing computers have speeds limited only by the number of processors one chooses to acquire. Themore » usefulness of potentially unlimited processing speed to a computationally intensive field such as quantum chemistry is obvious. If these methods are to be applied to significantly larger chemical systems, parallel schemes will have to be employed. For this reason we have developed distributed-memory algorithms for a number of standard quantum chemical methods. We are currently implementing these on a 32 processor Intel hypercube. In this paper we present our algorithm and benchmark results for one of the bottleneck steps in quantum chemical calculations: the four index integral transformation.« less
Multilevel Summation of Electrostatic Potentials Using Graphics Processing Units*
Hardy, David J.; Stone, John E.; Schulten, Klaus
2009-01-01
Physical and engineering practicalities involved in microprocessor design have resulted in flat performance growth for traditional single-core microprocessors. The urgent need for continuing increases in the performance of scientific applications requires the use of many-core processors and accelerators such as graphics processing units (GPUs). This paper discusses GPU acceleration of the multilevel summation method for computing electrostatic potentials and forces for a system of charged atoms, which is a problem of paramount importance in biomolecular modeling applications. We present and test a new GPU algorithm for the long-range part of the potentials that computes a cutoff pair potential between lattice points, essentially convolving a fixed 3-D lattice of “weights” over all sub-cubes of a much larger lattice. The implementation exploits the different memory subsystems provided on the GPU to stream optimally sized data sets through the multiprocessors. We demonstrate for the full multilevel summation calculation speedups of up to 26 using a single GPU and 46 using multiple GPUs, enabling the computation of a high-resolution map of the electrostatic potential for a system of 1.5 million atoms in under 12 seconds. PMID:20161132
Performance and evaluation of real-time multicomputer control systems
NASA Technical Reports Server (NTRS)
Shin, K. G.
1985-01-01
Three experiments on fault tolerant multiprocessors (FTMP) were begun. They are: (1) measurement of fault latency in FTMP; (2) validation and analysis of FTMP synchronization protocols; and investigation of error propagation in FTMP.
Memory retrieval by activating engram cells in mouse models of early Alzheimer’s disease
Roy, Dheeraj S.; Arons, Autumn; Mitchell, Teryn I.; Pignatelli, Michele; Ryan, Tomás J.; Tonegawa, Susumu
2016-01-01
Summary Alzheimer’s disease (AD) is a neurodegenerative disorder characterized by progressive memory decline and subsequent loss of broader cognitive functions1. Memory decline in early stages of Alzheimer’s is mostly limited to episodic memory, for which the hippocampus (HPC) plays a crucial role2. However, it has been uncertain whether the observed amnesia in early stages of Alzheimer’s is due to disrupted encoding and consolidation of episodic information, or an impairment in the retrieval of stored memory information. Here we show that in transgenic mouse models of early Alzheimer’s, direct optogenetic activation of hippocampal memory engram cells results in memory retrieval despite the fact that these mice are amnesic in long-term memory tests when natural recall cues are utilized, revealing a retrieval, rather than a storage impairment. Prior to amyloid plaque deposition, the amnesia in these mice is age-dependent3–5, which correlates with a progressive reduction of spine density of hippocampal dentate gyrus (DG) engram cells. We show that optogenetic induction of long-term potentiation (LTP) at perforant path (PP) synapses of DG engram cells restores both spine density and long-term memory. We also demonstrate that an ablation of DG engram cells containing restored spine density prevents the rescue of long-term memory. Thus, selective rescue of spine density in engram cells may lead to an effective strategy for treating memory loss in early stages of Alzheimer’s disease. PMID:26982728
Transient Relay Function of Midline Thalamic Nuclei during Long-Term Memory Consolidation in Humans
ERIC Educational Resources Information Center
Thielen, Jan-Willem; Takashima, Atsuko; Rutters, Femke; Tendolkar, Indira; Fernández, Guillén
2015-01-01
To test the hypothesis that thalamic midline nuclei play a transient role in memory consolidation, we reanalyzed a prospective functional MRI study, contrasting recent and progressively more remote memory retrieval. We revealed a transient thalamic connectivity increase with the hippocampus, the medial prefrontal cortex (mPFC), and a…
British memory research: a journey through the 20th century.
Parkin, A J; Hunkin, N M
2001-02-01
A century of research in memory has generated a wealth of knowledge encompassing theoretical developments within a number of distinct domains of memory. The aim of this article is to explore the progress made in memory research during the 20th century, to indicate critical influences on the direction of research, and to illustrate the important contribution made by British researchers. This article is confined to human memory research, and reviews research findings from the various psychological disciplines studied over the past 100 years.
Spin transport and spin torque in antiferromagnetic devices
Zelezny, J.; Wadley, P.; Olejnik, K.; ...
2018-03-02
Ferromagnets are key materials for sensing and memory applications. In contrast, antiferromagnets which represent the more common form of magnetically ordered materials, have found less practical application beyond their use for establishing reference magnetic orientations via exchange bias. This might change in the future due to the recent progress in materials research and discoveries of antiferromagnetic spintronic phenomena suitable for device applications. Experimental demonstration of the electrical switching and detection of the Néel order open a route towards memory devices based on antiferromagnets. Apart from the radiation and magnetic-field hardness, memory cells fabricated from antiferromagnets can be inherently multilevel, whichmore » could be used for neuromorphic computing. Switching speeds attainable in antiferromagnets far exceed those of ferromagnetic and semiconductor memory technologies. Here we review the recent progress in electronic spin-transport and spin-torque phenomena in antiferromagnets that are dominantly of the relativistic quantum mechanical origin. We discuss their utility in pure antiferromagnetic or hybrid ferromagnetic/antiferromagnetic memory devices.« less
Spin transport and spin torque in antiferromagnetic devices
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zelezny, J.; Wadley, P.; Olejnik, K.
Ferromagnets are key materials for sensing and memory applications. In contrast, antiferromagnets which represent the more common form of magnetically ordered materials, have found less practical application beyond their use for establishing reference magnetic orientations via exchange bias. This might change in the future due to the recent progress in materials research and discoveries of antiferromagnetic spintronic phenomena suitable for device applications. Experimental demonstration of the electrical switching and detection of the Néel order open a route towards memory devices based on antiferromagnets. Apart from the radiation and magnetic-field hardness, memory cells fabricated from antiferromagnets can be inherently multilevel, whichmore » could be used for neuromorphic computing. Switching speeds attainable in antiferromagnets far exceed those of ferromagnetic and semiconductor memory technologies. Here we review the recent progress in electronic spin-transport and spin-torque phenomena in antiferromagnets that are dominantly of the relativistic quantum mechanical origin. We discuss their utility in pure antiferromagnetic or hybrid ferromagnetic/antiferromagnetic memory devices.« less
Progress towards broadband Raman quantum memory in Bose-Einstein condensates
NASA Astrophysics Data System (ADS)
Saglamyurek, Erhan; Hrushevskyi, Taras; Smith, Benjamin; Leblanc, Lindsay
2017-04-01
Optical quantum memories are building blocks for quantum information technologies. Efficient and long-lived storage in combination with high-speed (broadband) operation are key features required for practical applications. While the realization has been a great challenge, Raman memory in Bose-Einstein condensates (BECs) is a promising approach, due to negligible decoherence from diffusion and collisions that leads to seconds-scale memory times, high efficiency due to large atomic density, the possibility for atom-chip integration with micro photonics, and the suitability of the far off-resonant Raman approach with storage of broadband photons (over GHz) [5]. Here we report our progress towards Raman memory in a BEC. We describe our apparatus recently built for producing BEC with 87Rb atoms, and present the observation of nearly pure BEC with 5x105 atoms at 40 nK. After showing our initial characterizations, we discuss the suitability of our system for Raman-based light storage in our BEC.
Spin transport and spin torque in antiferromagnetic devices
NASA Astrophysics Data System (ADS)
Železný, J.; Wadley, P.; Olejník, K.; Hoffmann, A.; Ohno, H.
2018-03-01
Ferromagnets are key materials for sensing and memory applications. In contrast, antiferromagnets, which represent the more common form of magnetically ordered materials, have found less practical application beyond their use for establishing reference magnetic orientations via exchange bias. This might change in the future due to the recent progress in materials research and discoveries of antiferromagnetic spintronic phenomena suitable for device applications. Experimental demonstration of the electrical switching and detection of the Néel order open a route towards memory devices based on antiferromagnets. Apart from the radiation and magnetic-field hardness, memory cells fabricated from antiferromagnets can be inherently multilevel, which could be used for neuromorphic computing. Switching speeds attainable in antiferromagnets far exceed those of ferromagnetic and semiconductor memory technologies. Here, we review the recent progress in electronic spin-transport and spin-torque phenomena in antiferromagnets that are dominantly of the relativistic quantum-mechanical origin. We discuss their utility in pure antiferromagnetic or hybrid ferromagnetic/antiferromagnetic memory devices.
Stochastic airspace simulation tool development
DOT National Transportation Integrated Search
2009-10-01
Modeling and simulation is often used to study : the physical world when observation may not be : practical. The overall goal of a recent and ongoing : simulation tool project has been to provide a : documented, lifecycle-managed, multi-processor : c...
ARTS III/Parallel Processor Design Study
DOT National Transportation Integrated Search
1975-04-01
It was the purpose of this design study to investigate the feasibility, suitability, and cost-effectiveness of augmenting the ARTS III failsafe/failsoft multiprocessor system with a form of parallel processor to accomodate a large growth in air traff...
Fault recovery characteristics of the fault tolerant multi-processor
NASA Technical Reports Server (NTRS)
Padilla, Peter A.
1990-01-01
The fault handling performance of the fault tolerant multiprocessor (FTMP) was investigated. Fault handling errors detected during fault injection experiments were characterized. In these fault injection experiments, the FTMP disabled a working unit instead of the faulted unit once every 500 faults, on the average. System design weaknesses allow active faults to exercise a part of the fault management software that handles byzantine or lying faults. It is pointed out that these weak areas in the FTMP's design increase the probability that, for any hardware fault, a good LRU (line replaceable unit) is mistakenly disabled by the fault management software. It is concluded that fault injection can help detect and analyze the behavior of a system in the ultra-reliable regime. Although fault injection testing cannot be exhaustive, it has been demonstrated that it provides a unique capability to unmask problems and to characterize the behavior of a fault-tolerant system.
Strategies for concurrent processing of complex algorithms in data driven architectures
NASA Technical Reports Server (NTRS)
Stoughton, John W.; Mielke, Roland R.
1987-01-01
The results of ongoing research directed at developing a graph theoretical model for describing data and control flow associated with the execution of large grained algorithms in a spatial distributed computer environment is presented. This model is identified by the acronym ATAMM (Algorithm/Architecture Mapping Model). The purpose of such a model is to provide a basis for establishing rules for relating an algorithm to its execution in a multiprocessor environment. Specifications derived from the model lead directly to the description of a data flow architecture which is a consequence of the inherent behavior of the data and control flow described by the model. The purpose of the ATAMM based architecture is to optimize computational concurrency in the multiprocessor environment and to provide an analytical basis for performance evaluation. The ATAMM model and architecture specifications are demonstrated on a prototype system for concept validation.
NASA Technical Reports Server (NTRS)
Noor, Ahmed K.; Peters, Jeanne M.
1989-01-01
A computational procedure is presented for the nonlinear dynamic analysis of unsymmetric structures on vector multiprocessor systems. The procedure is based on a novel hierarchical partitioning strategy in which the response of the unsymmetric and antisymmetric response vectors (modes), each obtained by using only a fraction of the degrees of freedom of the original finite element model. The three key elements of the procedure which result in high degree of concurrency throughout the solution process are: (1) mixed (or primitive variable) formulation with independent shape functions for the different fields; (2) operator splitting or restructuring of the discrete equations at each time step to delineate the symmetric and antisymmetric vectors constituting the response; and (3) two level iterative process for generating the response of the structure. An assessment is made of the effectiveness of the procedure on the CRAY X-MP/4 computers.
System architecture for asynchronous multi-processor robotic control system
NASA Technical Reports Server (NTRS)
Steele, Robert D.; Long, Mark; Backes, Paul
1993-01-01
The architecture for the Modular Telerobot Task Execution System (MOTES) as implemented in the Supervisory Telerobotics (STELER) Laboratory is described. MOTES is the software component of the remote site of a local-remote telerobotic system which is being developed for NASA for space applications, in particular Space Station Freedom applications. The system is being developed to provide control and supervised autonomous control to support both space based operation and ground-remote control with time delay. The local-remote architecture places task planning responsibilities at the local site and task execution responsibilities at the remote site. This separation allows the remote site to be designed to optimize task execution capability within a limited computational environment such as is expected in flight systems. The local site task planning system could be placed on the ground where few computational limitations are expected. MOTES is written in the Ada programming language for a multiprocessor environment.
Content addressable memory project
NASA Technical Reports Server (NTRS)
Hall, Josh; Levy, Saul; Smith, D.; Wei, S.; Miyake, K.; Murdocca, M.
1991-01-01
The progress on the Rutgers CAM (Content Addressable Memory) Project is described. The overall design of the system is completed at the architectural level and described. The machine is composed of two kinds of cells: (1) the CAM cells which include both memory and processor, and support local processing within each cell; and (2) the tree cells, which have smaller instruction set, and provide global processing over the CAM cells. A parameterized design of the basic CAM cell is completed. Progress was made on the final specification of the CPS. The machine architecture was driven by the design of algorithms whose requirements are reflected in the resulted instruction set(s). A few of these algorithms are described.
Synthetic Analog and Digital Circuits for Cellular Computation and Memory
Purcell, Oliver; Lu, Timothy K.
2014-01-01
Biological computation is a major area of focus in synthetic biology because it has the potential to enable a wide range of applications. Synthetic biologists have applied engineering concepts to biological systems in order to construct progressively more complex gene circuits capable of processing information in living cells. Here, we review the current state of computational genetic circuits and describe artificial gene circuits that perform digital and analog computation. We then discuss recent progress in designing gene circuits that exhibit memory, and how memory and computation have been integrated to yield more complex systems that can both process and record information. Finally, we suggest new directions for engineering biological circuits capable of computation. PMID:24794536
ERIC Educational Resources Information Center
Dewar, Michaela; Pesallaccia, Martina; Cowan, Nelson; Provinciali, Leandro; Della Sala, Sergio
2012-01-01
Impairment on standard tests of delayed recall is often already maximal in the aMCI stage of Alzheimer's Disease. Neuropathological work shows that the neural substrates of memory function continue to deteriorate throughout the progression of the disease, hinting that further changes in memory performance could be tracked by a more sensitive test…
Wong-Goodrich, Sarah J.E.; Pfau, Madeline L.; Flores, Catherine T.; Fraser, Jennifer A.; Williams, Christina L.; Jones, Lee W.
2010-01-01
Whole-brain irradiation (WBI) therapy produces progressive learning and memory deficits in patients with primary or secondary brain tumors. Exercise enhances memory and adult hippocampal neurogenesis in the intact brain, so we hypothesized that exercise may be an effective treatment to alleviate consequences of WBI. Previous studies using animal models to address this issue have yielded mixed results and have not examined potential molecular mechanisms. We investigated the short- and long-term effects of WBI on spatial learning and memory retention, and determined whether voluntary running after WBI aids recovery of brain and cognitive function. Forty adult female C57Bl/6 mice given a single dose of 5 Gy or sham WBI were trained 2.5 weeks and up to four months after WBI in a Barnes maze. Half of the mice received daily voluntary wheel access starting one month after sham- or WBI. Daily running following WBI prevented the marked decline in spatial memory retention observed months after irradiation. Bromodeoxyuridine (BrdU) immunolabeling and ELISA indicated that this behavioral rescue was accompanied by a partial restoration of newborn BrdU+/NeuN+ neurons in the dentate gyrus and increased hippocampal expression of brain-derived vascular endothelial growth factor and insulin-like growth factor, and occurred despite irradiation-induced elevations in hippocampal pro-inflammatory cytokines. WBI in adult mice produced a progressive memory decline consistent with what has been reported in cancer patients receiving WBI therapy. Our findings show that running can abrogate this memory decline and aid recovery of adult hippocampal plasticity, thus highlighting exercise as a potential therapeutic intervention. PMID:20884629
Some neglected contributions of Wilhelm Wundt to the psychology of memory.
Carpenter, Shana K
2005-08-01
Wilhelm Wundt, whose name is rarely associated with the scientific study of memory, conducted a number of memory experiments that appear to have escaped the awareness of modern cognitive psychologists. Aspects of Wundt's system are reviewed, particularly with respect to his experimental work on memory. Wundt investigated phenomena that would fall under the modern headings of iconic memory, short-term memory, and the enactment and generation effects, but this research has been neglected. Revisiting the Wundtian perspective may provide insight into some of the reasons behind the historical course of memory research and in general into the progress of science in psychology.
Memory as the "whole brain work": a large-scale model based on "oscillations in super-synergy".
Başar, Erol
2005-01-01
According to recent trends, memory depends on several brain structures working in concert across many levels of neural organization; "memory is a constant work-in progress." The proposition of a brain theory based on super-synergy in neural populations is most pertinent for the understanding of this constant work in progress. This report introduces a new model on memory basing on the processes of EEG oscillations and Brain Dynamics. This model is shaped by the following conceptual and experimental steps: 1. The machineries of super-synergy in the whole brain are responsible for formation of sensory-cognitive percepts. 2. The expression "dynamic memory" is used for memory processes that evoke relevant changes in alpha, gamma, theta and delta activities. The concerted action of distributed multiple oscillatory processes provides a major key for understanding of distributed memory. It comprehends also the phyletic memory and reflexes. 3. The evolving memory, which incorporates reciprocal actions or reverberations in the APLR alliance and during working memory processes, is especially emphasized. 4. A new model related to "hierarchy of memories as a continuum" is introduced. 5. The notions of "longer activated memory" and "persistent memory" are proposed instead of long-term memory. 6. The new analysis to recognize faces emphasizes the importance of EEG oscillations in neurophysiology and Gestalt analysis. 7. The proposed basic framework called "Memory in the Whole Brain Work" emphasizes that memory and all brain functions are inseparable and are acting as a "whole" in the whole brain. 8. The role of genetic factors is fundamental in living system settings and oscillations and accordingly in memory, according to recent publications. 9. A link from the "whole brain" to "whole body," and incorporation of vegetative and neurological system, is proposed, EEG oscillations and ultraslow oscillations being a control parameter.
Characterizing parallel file-access patterns on a large-scale multiprocessor
NASA Technical Reports Server (NTRS)
Purakayastha, Apratim; Ellis, Carla Schlatter; Kotz, David; Nieuwejaar, Nils; Best, Michael
1994-01-01
Rapid increases in the computational speeds of multiprocessors have not been matched by corresponding performance enhancements in the I/O subsystem. To satisfy the large and growing I/O requirements of some parallel scientific applications, we need parallel file systems that can provide high-bandwidth and high-volume data transfer between the I/O subsystem and thousands of processors. Design of such high-performance parallel file systems depends on a thorough grasp of the expected workload. So far there have been no comprehensive usage studies of multiprocessor file systems. Our CHARISMA project intends to fill this void. The first results from our study involve an iPSC/860 at NASA Ames. This paper presents results from a different platform, the CM-5 at the National Center for Supercomputing Applications. The CHARISMA studies are unique because we collect information about every individual read and write request and about the entire mix of applications running on the machines. The results of our trace analysis lead to recommendations for parallel file system design. First the file system should support efficient concurrent access to many files, and I/O requests from many jobs under varying load conditions. Second, it must efficiently manage large files kept open for long periods. Third, it should expect to see small requests predominantly sequential access patterns, application-wide synchronous access, no concurrent file-sharing between jobs appreciable byte and block sharing between processes within jobs, and strong interprocess locality. Finally, the trace data suggest that node-level write caches and collective I/O request interfaces may be useful in certain environments.
The Automated Instrumentation and Monitoring System (AIMS) reference manual
NASA Technical Reports Server (NTRS)
Yan, Jerry; Hontalas, Philip; Listgarten, Sherry
1993-01-01
Whether a researcher is designing the 'next parallel programming paradigm,' another 'scalable multiprocessor' or investigating resource allocation algorithms for multiprocessors, a facility that enables parallel program execution to be captured and displayed is invaluable. Careful analysis of execution traces can help computer designers and software architects to uncover system behavior and to take advantage of specific application characteristics and hardware features. A software tool kit that facilitates performance evaluation of parallel applications on multiprocessors is described. The Automated Instrumentation and Monitoring System (AIMS) has four major software components: a source code instrumentor which automatically inserts active event recorders into the program's source code before compilation; a run time performance-monitoring library, which collects performance data; a trace file animation and analysis tool kit which reconstructs program execution from the trace file; and a trace post-processor which compensate for data collection overhead. Besides being used as prototype for developing new techniques for instrumenting, monitoring, and visualizing parallel program execution, AIMS is also being incorporated into the run-time environments of various hardware test beds to evaluate their impact on user productivity. Currently, AIMS instrumentors accept FORTRAN and C parallel programs written for Intel's NX operating system on the iPSC family of multi computers. A run-time performance-monitoring library for the iPSC/860 is included in this release. We plan to release monitors for other platforms (such as PVM and TMC's CM-5) in the near future. Performance data collected can be graphically displayed on workstations (e.g. Sun Sparc and SGI) supporting X-Windows (in particular, Xl IR5, Motif 1.1.3).
The Automated Instrumentation and Monitoring System (AIMS): Design and Architecture. 3.2
NASA Technical Reports Server (NTRS)
Yan, Jerry C.; Schmidt, Melisa; Schulbach, Cathy; Bailey, David (Technical Monitor)
1997-01-01
Whether a researcher is designing the 'next parallel programming paradigm', another 'scalable multiprocessor' or investigating resource allocation algorithms for multiprocessors, a facility that enables parallel program execution to be captured and displayed is invaluable. Careful analysis of such information can help computer and software architects to capture, and therefore, exploit behavioral variations among/within various parallel programs to take advantage of specific hardware characteristics. A software tool-set that facilitates performance evaluation of parallel applications on multiprocessors has been put together at NASA Ames Research Center under the sponsorship of NASA's High Performance Computing and Communications Program over the past five years. The Automated Instrumentation and Monitoring Systematic has three major software components: a source code instrumentor which automatically inserts active event recorders into program source code before compilation; a run-time performance monitoring library which collects performance data; and a visualization tool-set which reconstructs program execution based on the data collected. Besides being used as a prototype for developing new techniques for instrumenting, monitoring and presenting parallel program execution, AIMS is also being incorporated into the run-time environments of various hardware testbeds to evaluate their impact on user productivity. Currently, the execution of FORTRAN and C programs on the Intel Paragon and PALM workstations can be automatically instrumented and monitored. Performance data thus collected can be displayed graphically on various workstations. The process of performance tuning with AIMS will be illustrated using various NAB Parallel Benchmarks. This report includes a description of the internal architecture of AIMS and a listing of the source code.
Information processing efficiency in patients with multiple sclerosis.
Archibald, C J; Fisk, J D
2000-10-01
Reduced information processing efficiency, consequent to impaired neural transmission, has been proposed as underlying various cognitive problems in patients with Multiple Sclerosis (MS). This study employed two measures developed from experimental psychology that control for the potential confound of perceptual-motor abnormalities (Salthouse, Babcock, & Shaw, 1991; Sternberg, 1966, 1969) to assess the speed of information processing and working memory capacity in patients with mild to moderate MS. Although patients had significantly more cognitive complaints than neurologically intact matched controls, their performance on standard tests of immediate memory span did not differ from control participants and their word list learning was within normal limits. On the experimental measures, both relapsing-remitting and secondary-progressive patients exhibited significantly slowed information processing speed relative to controls. However, only the secondary-progressive patients had an additional decrement in working memory capacity. Depression, fatigue, or neurologic disability did not account for performance differences on these measures. While speed of information processing may be slowed early in the disease process, deficits in working memory capacity may appear only as there is progression of MS. It is these latter deficits, however, that may underlie the impairment of new learning that patients with MS demonstrate.
Hippocampal subfield surface deformity in non-semantic primary progressive aphasia.
Christensen, Adam; Alpert, Kathryn; Rogalski, Emily; Cobia, Derin; Rao, Julia; Beg, Mirza Faisal; Weintraub, Sandra; Mesulam, M-Marsel; Wang, Lei
2015-03-01
Alzheimer neuropathology (AD) is found in almost half of patients with non-semantic primary progressive aphasia (PPA). This study examined hippocampal abnormalities in PPA to determine similarities to those described in amnestic AD. In 37 PPA patients and 32 healthy controls, we generated hippocampal subfield surface maps from structural MRIs and administered a face memory test. We analyzed group and hemisphere differences for surface shape measures and their relationship with test scores and ApoE genotype. The hippocampus in PPA showed inward deformity (CA1 and subiculum subfields) and outward deformity (CA2-4+DG subfield) and smaller left than right volumes. Memory performance was related to hippocampal shape abnormalities in PPA patients, but not controls, even in the absence of memory impairments. Hippocampal deformity in PPA is related to memory test scores. This may reflect a combination of intrinsic degenerative phenomena with transsynaptic or Wallerian effects of neocortical neuronal loss.
Spruyt, Karen; Capdevila, Oscar Sans; Kheirandish-Gozal, Leila; Gozal, David
2010-01-01
Memory (M) impairments have been suggested in pediatric Obstructive Sleep Apnea along with attention and executive (AE), language (L) and visuospatial (V) dysfunctions. NEPSY assessment of children aged 5–9-years who were either healthy (n= 43), or who had OSA without L, V, AE (OSA−, n= 22) or with L (n=6), V (n=1), AE (n=3) (OSA+, n=10) dysfunctions revealed no gross memory problems in OSA; however, over the 3 learning trials of cross-modal association learning of name with face, the OSA− progressively improved performance, whereas the OSA+ failed to progress. No within-group differences between immediate and delayed memory tasks were apparent. The data suggest the presence of slower information processing, and/or secondary memory problems, in the absence of retrieval or recall impairments among a subset of children with OSA. We hypothesize that inefficient/insufficient encoding may account for the primary deficit. PMID:20183722
Age-Related Neurodegeneration and Memory Loss in Down Syndrome
Lockrow, Jason P.; Fortress, Ashley M.; Granholm, Ann-Charlotte E.
2012-01-01
Down syndrome (DS) is a condition where a complete or segmental chromosome 21 trisomy causes variable intellectual disability, and progressive memory loss and neurodegeneration with age. Many research groups have examined development of the brain in DS individuals, but studies on age-related changes should also be considered, with the increased lifespan observed in DS. DS leads to pathological hallmarks of Alzheimer's disease (AD) by 40 or 50 years of age. Progressive age-related memory deficits occurring in both AD and in DS have been connected to degeneration of several neuronal populations, but mechanisms are not fully elucidated. Inflammation and oxidative stress are early events in DS pathology, and focusing on these pathways may lead to development of successful intervention strategies for AD associated with DS. Here we discuss recent findings and potential treatment avenues regarding development of AD neuropathology and memory loss in DS. PMID:22545043
Synchronizing Data-Bus Messages
NASA Technical Reports Server (NTRS)
Harris, L. H.
1985-01-01
Adapter allows communications among as many as 30 data processors without central bus controller. Adapter improves reliability of multiprocessor system by eliminating point of failure that causes entire system to fail. Scheme prevents data collisions and eliminates nonessential polling, thereby reducing power consumption.
Highway Traffic Simulations on Multi-Processor Computers
DOT National Transportation Integrated Search
1997-01-01
A computer model has been developed to simulate highway traffic for various degrees of automation with a high degree of fidelity in regard to driver control and vehicle characteristics. The model simulates vehicle maneuvering in a multi-lane highway ...
The Association of Aging and Aerobic Fitness With Memory
Bullock, Alexis M.; Mizzi, Allison L.; Kovacevic, Ana; Heisz, Jennifer J.
2018-01-01
The present study examined the differential effects of aging and fitness on memory. Ninety-five young adults (YA) and 81 older adults (OA) performed the Mnemonic Similarity Task (MST) to assess high-interference memory and general recognition memory. Age-related differences in high-interference memory were observed across the lifespan, with performance progressively worsening from young to old. In contrast, age-related differences in general recognition memory were not observed until after 60 years of age. Furthermore, OA with higher aerobic fitness had better high-interference memory, suggesting that exercise may be an important lifestyle factor influencing this aspect of memory. Overall, these findings suggest different trajectories of decline for high-interference and general recognition memory, with a selective role for physical activity in promoting high-interference memory. PMID:29593524
Graziano, Martin; Sigman, Mariano
2008-05-23
When a stimulus is presented, its sensory trace decays rapidly, lasting for approximately 1000 ms. This brief and labile memory, referred as iconic memory, serves as a buffer before information is transferred to working memory and executive control. Here we explored the effect of different factors--geometric, spatial, and experience--with respect to the access and the maintenance of information in iconic memory and the progressive distortion of this memory. We studied performance in a partial report paradigm, a design wherein recall of only part of a stimulus array is required. Subjects had to report the identity of a letter in a location that was cued in a variable delay after the stimulus onset. Performance decayed exponentially with time, and we studied the different parameters (time constant, zero-delay value, and decay amplitude) as a function of the different factors. We observed that experience (determined by letter frequency) affected the access to iconic memory but not the temporal decay constant. On the contrary, spatial position affected the temporal course of delay. The entropy of the error distribution increased with time reflecting a progressive morphological distortion of the iconic buffer. We discuss our results on the context of a model of information access to executive control and how it is affected by learning and attention.
Rabin, Laura A.; Paré, Nadia; Saykin, Andrew J.; Brown, Michael J.; Wishart, Heather A.; Flashman, Laura A.; Santulli, Robert B.
2011-01-01
Episodic memory is the first and most severely affected cognitive domain in Alzheimer's disease (AD), and it is also the key early marker in prodromal stages including amnestic mild cognitive impairment (MCI). The relative ability of memory tests to discriminate between MCI and normal aging has not been well characterized. We compared the classification value of widely used verbal memory tests in distinguishing healthy older adults (n = 51) from those with MCI (n = 38). Univariate logistic regression indicated that the total learning score from the California Verbal Learning Test-II (CVLT-II) ranked highest in terms of distinguishing MCI from normal aging (sensitivity = 90.2; specificity = 84.2). Inclusion of the delayed recall condition of a story memory task (i.e., WMS-III Logical Memory, Story A) enhanced the overall accuracy of classification (sensitivity = 92.2; specificity = 94.7). Combining Logical Memory recognition and CVLT-II long delay best predicted progression from MCI to AD over a 4-year period (accurate classification = 87.5%). Learning across multiple trials may provide the most sensitive index for initial diagnosis of MCI, but inclusion of additional variables may enhance overall accuracy and may represent the optimal strategy for identifying individuals most likely to progress to dementia. PMID:19353345
Buckley, Rachel F; Ellis, Kathryn A; Ames, David; Rowe, Christopher C; Lautenschlager, Nicola T; Maruff, Paul; Villemagne, Victor L; Macaulay, S Lance; Szoeke, Cassandra; Martins, Ralph N; Masters, Colin L; Savage, Greg; Rainey-Smith, Stephanie R; Rembach, Alan; Saling, Michael M
2015-07-01
To explore the subjective experience of memory change in groups at risk of dementia (those with mild cognitive impairment MCI or high β-amyloid (Aβ+) burden) to determine the existence of potential phenomenological typologies. We recruited 123 healthy controls (HC) and individuals with MCI from the Australian Imaging, Biomarker and Lifestyle (AIBL) study. Sixty-7 (HC = 47,MCI = 20) had Aβ scans available for analysis. Semistructured interviews were administered, transcribed, and meaningful phrases extracted from transcripts. Twelve themes were defined and compared across diagnostic status and Aβ status. MCI endorsed more complaints of burdensome coping strategies, increasing frequency, sense of predomination, poor contextualization, progression, dependency, impact on affect, and dismissive attitudes. HCAβ+ acknowledged a progressive memory decline compared to HCAβ-, while MCIAβ+ expressed more burdensome coping strategies, dismissive attitudes, and dependency comparative to either healthy group. Depression was more likely to be related to complaint themes in HCs, while complaint themes were associated with poorer list-learning performance in individuals with MCI. Complaint themes in those with MCI align with the MCI symptom complex, particularly when accompanied with high Aβ load. Healthy Aβ+ individuals acknowledged progressive memory change, suggesting they are aware of memory changes not yet detectable via neuropsychological measures. Depressive symptomatology associated with HC complaints, suggesting certain themes are affect-driven, while complaints in MCI are associated with organically driven functional impairment. Qualitative analysis of SMCs can inform the earliest clinical manifestations of Alzheimer's disease. Our findings can inform diagnostic approaches to the clinical evaluation of memory complaints in the nondemented elderly. (c) 2015 APA, all rights reserved).
Checkpointing Shared Memory Programs at the Application-level
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bronevetsky, G; Schulz, M; Szwed, P
2004-09-08
Trends in high-performance computing are making it necessary for long-running applications to tolerate hardware faults. The most commonly used approach is checkpoint and restart(CPR)-the state of the computation is saved periodically on disk, and when a failure occurs, the computation is restarted from the last saved state. At present, it is the responsibility of the programmer to instrument applications for CPR. Our group is investigating the use of compiler technology to instrument codes to make them self-checkpointing and self-restarting, thereby providing an automatic solution to the problem of making long-running scientific applications resilient to hardware faults. Our previous work focusedmore » on message-passing programs. In this paper, we describe such a system for shared-memory programs running on symmetric multiprocessors. The system has two components: (i)a pre-compiler for source-to-source modification of applications, and (ii) a runtime system that implements a protocol for coordinating CPR among the threads of the parallel application. For the sake of concreteness, we focus on a non-trivial subset of OpenMP that includes barriers and locks. One of the advantages of this approach is that the ability to tolerate faults becomes embedded within the application itself, so applications become self-checkpointing and self-restarting on any platform. We demonstrate this by showing that our transformed benchmarks can checkpoint and restart on three different platforms (Windows/x86, Linux/x86, and Tru64/Alpha). Our experiments show that the overhead introduced by this approach is usually quite small; they also suggest ways in which the current implementation can be tuned to reduced overheads further.« less
Semantic Memory in the Clinical Progression of Alzheimer Disease.
Tchakoute, Christophe T; Sainani, Kristin L; Henderson, Victor W
2017-09-01
Semantic memory measures may be useful in tracking and predicting progression of Alzheimer disease. We investigated relationships among semantic memory tasks and their 1-year predictive value in women with Alzheimer disease. We conducted secondary analyses of a randomized clinical trial of raloxifene in 42 women with late-onset mild-to-moderate Alzheimer disease. We assessed semantic memory with tests of oral confrontation naming, category fluency, semantic recognition and semantic naming, and semantic density in written narrative discourse. We measured global cognition (Alzheimer Disease Assessment Scale, cognitive subscale), dementia severity (Clinical Dementia Rating sum of boxes), and daily function (Activities of Daily Living Inventory) at baseline and 1 year. At baseline and 1 year, most semantic memory scores correlated highly or moderately with each other and with global cognition, dementia severity, and daily function. Semantic memory task performance at 1 year had worsened one-third to one-half standard deviation. Factor analysis of baseline test scores distinguished processes in semantic and lexical retrieval (semantic recognition, semantic naming, confrontation naming) from processes in lexical search (semantic density, category fluency). The semantic-lexical retrieval factor predicted global cognition at 1 year. Considered separately, baseline confrontation naming and category fluency predicted dementia severity, while semantic recognition and a composite of semantic recognition and semantic naming predicted global cognition. No individual semantic memory test predicted daily function. Semantic-lexical retrieval and lexical search may represent distinct aspects of semantic memory. Semantic memory processes are sensitive to cognitive decline and dementia severity in Alzheimer disease.
A program for undergraduate research into the mechanisms of sensory coding and memory decay
DOE Office of Scientific and Technical Information (OSTI.GOV)
Calin-Jageman, R J
This is the final technical report for this DOE project, entitltled "A program for undergraduate research into the mechanisms of sensory coding and memory decay". The report summarizes progress on the three research aims: 1) to identify phyisological and genetic correlates of long-term habituation, 2) to understand mechanisms of olfactory coding, and 3) to foster a world-class undergraduate neuroscience program. Progress on the first aim has enabled comparison of learning-regulated transcripts across closely related learning paradigms and species, and results suggest that only a small core of transcripts serve truly general roles in long-term memory. Progress on the second aimmore » has enabled testing of several mutant phenotypes for olfactory behaviors, and results show that responses are not fully consistent with the combinitoral coding hypothesis. Finally, 14 undergraduate students participated in this research, the neuroscience program attracted extramural funding, and we completed a successful summer program to enhance transitions for community-college students into 4-year colleges to persue STEM fields.« less
Synthetic analog and digital circuits for cellular computation and memory.
Purcell, Oliver; Lu, Timothy K
2014-10-01
Biological computation is a major area of focus in synthetic biology because it has the potential to enable a wide range of applications. Synthetic biologists have applied engineering concepts to biological systems in order to construct progressively more complex gene circuits capable of processing information in living cells. Here, we review the current state of computational genetic circuits and describe artificial gene circuits that perform digital and analog computation. We then discuss recent progress in designing gene networks that exhibit memory, and how memory and computation have been integrated to yield more complex systems that can both process and record information. Finally, we suggest new directions for engineering biological circuits capable of computation. Copyright © 2014 The Authors. Published by Elsevier Ltd.. All rights reserved.
[Pure progressive amnesia, isolated for 16 years with focal hippocampus atrophy].
Richard-Mornas, A; Foyatier-Michel, N; Thomas-Antérion, C
2011-01-01
Pure progressive amnesia is a rare and unusual syndrome involving long preservation of autonomy and absence of progression in other cognitive domains. We report a case which remained quiescent for 16 years characterized by severe isolated episodic amnesia and preservation of spatial, semantic and implicit memory and autonomy. MRI revealed bilateral focal atrophy of the hippocampus. This specific pattern of impairment differs from other types of amnesic syndromes. It is important to identify this kind of amnesia because of its specific course. Studying the topography of the brain lesions may contribute to a better understanding of the neural systems involved in declarative memory. Copyright © 2011 Elsevier Masson SAS. All rights reserved.
Role of adult neurogenesis in hippocampal-cortical memory consolidation
2014-01-01
Acquired memory is initially dependent on the hippocampus (HPC) for permanent memory formation. This hippocampal dependency of memory recall progressively decays with time, a process that is associated with a gradual increase in dependency upon cortical structures. This process is commonly referred to as systems consolidation theory. In this paper, we first review how memory becomes hippocampal dependent to cortical dependent with an emphasis on the interactions that occur between the HPC and cortex during systems consolidation. We also review the mechanisms underlying the gradual decay of HPC dependency during systems consolidation from the perspective of memory erasures by adult hippocampal neurogenesis. Finally, we discuss the relationship between systems consolidation and memory precision. PMID:24552281
Conway, Martin A
2009-09-01
An account of episodic memories is developed that focuses on the types of knowledge they represent, their properties, and the functions they might serve. It is proposed that episodic memories consist of episodic elements, summary records of experience often in the form of visual images, associated to a conceptual frame that provides a conceptual context. Episodic memories are embedded in a more complex conceptual system in which they can become the basis of autobiographical memories. However, the function of episodic memories is to keep a record of progress with short-term goals and access to most episodic memories is lost soon after their formation. Finally, it is suggested that developmentally episodic memories form the basis of the conceptual system and it is from sets of episodic memories that early non-verbal conceptual knowledge is abstracted.
Behavioural pharmacology: 40+ years of progress, with a focus on glutamate receptors and cognition
Robbins, Trevor W.; Murphy, Emily R.
2006-01-01
Behavioural pharmacology is an interdisciplinary field at the intersection of several research areas that ultimately lead to the development of drugs for clinical use and build understanding of how brain functions enable cognition and behaviour. In this article, the development of behavioural pharmacology in the UK is briefly surveyed, and the current status and success of the field is highlighted by the progress in our understanding of learning and memory that has resulted from discoveries in glutamate receptor pharmacology allied to theoretical and methodological advances in behavioural neuroscience. We describe the original breakthrough in terms of the role of NMDA receptors in hippocampal-mediated spatial learning and long-term potentiation, and review recent advances that demonstrate the involvement of glutamate receptor in working memory, recognition memory, stimulus–response learning and memory, and higher cognitive functions. We also discuss the unique functions of NMDA receptors and the fundamental role of AMPA receptors in processes that are common to some of these forms of memory, including encoding, consolidation and retrieval. PMID:16490260
NMF-mGPU: non-negative matrix factorization on multi-GPU systems.
Mejía-Roa, Edgardo; Tabas-Madrid, Daniel; Setoain, Javier; García, Carlos; Tirado, Francisco; Pascual-Montano, Alberto
2015-02-13
In the last few years, the Non-negative Matrix Factorization ( NMF ) technique has gained a great interest among the Bioinformatics community, since it is able to extract interpretable parts from high-dimensional datasets. However, the computing time required to process large data matrices may become impractical, even for a parallel application running on a multiprocessors cluster. In this paper, we present NMF-mGPU, an efficient and easy-to-use implementation of the NMF algorithm that takes advantage of the high computing performance delivered by Graphics-Processing Units ( GPUs ). Driven by the ever-growing demands from the video-games industry, graphics cards usually provided in PCs and laptops have evolved from simple graphics-drawing platforms into high-performance programmable systems that can be used as coprocessors for linear-algebra operations. However, these devices may have a limited amount of on-board memory, which is not considered by other NMF implementations on GPU. NMF-mGPU is based on CUDA ( Compute Unified Device Architecture ), the NVIDIA's framework for GPU computing. On devices with low memory available, large input matrices are blockwise transferred from the system's main memory to the GPU's memory, and processed accordingly. In addition, NMF-mGPU has been explicitly optimized for the different CUDA architectures. Finally, platforms with multiple GPUs can be synchronized through MPI ( Message Passing Interface ). In a four-GPU system, this implementation is about 120 times faster than a single conventional processor, and more than four times faster than a single GPU device (i.e., a super-linear speedup). Applications of GPUs in Bioinformatics are getting more and more attention due to their outstanding performance when compared to traditional processors. In addition, their relatively low price represents a highly cost-effective alternative to conventional clusters. In life sciences, this results in an excellent opportunity to facilitate the daily work of bioinformaticians that are trying to extract biological meaning out of hundreds of gigabytes of experimental information. NMF-mGPU can be used "out of the box" by researchers with little or no expertise in GPU programming in a variety of platforms, such as PCs, laptops, or high-end GPU clusters. NMF-mGPU is freely available at https://github.com/bioinfo-cnb/bionmf-gpu .
How does negative emotion cause false memories?
Brainerd, C J; Stein, L M; Silveira, R A; Rohenkohl, G; Reyna, V F
2008-09-01
Remembering negative events can stimulate high levels of false memory, relative to remembering neutral events. In experiments in which the emotional valence of encoded materials was manipulated with their arousal levels controlled, valence produced a continuum of memory falsification. Falsification was highest for negative materials, intermediate for neutral materials, and lowest for positive materials. Conjoint-recognition analysis produced a simple process-level explanation: As one progresses from positive to neutral to negative valence, false memory increases because (a) the perceived meaning resemblance between false and true items increases and (b) subjects are less able to use verbatim memories of true items to suppress errors.
Peng, Sheng; Sun, Haiyan; Zhang, Xiaoqing; Liu, Gongjian; Wang, Guanglei
2014-09-01
Phosphodiesterase-4 (PDE-4) regulates the intracellular level of cyclic adenosine monophosphate. Recent studies demonstrated that PDE-4 inhibitors can counteract deficits in long-term memory caused by aging or increased expression of mutant forms of human amyloid precursor proteins, and can influence the process of memory function and cognitive enhancement. Therapeutics, such as ketamine, a drug used in clinical anesthesia, can also cause memory deficits as adverse effects. Targeting PDE-4 with selective inhibitors may offer a novel therapeutic strategy to prevent, slow the progress, and, eventually, treat memory deficits.
AERIAL - MSC SITE - CONSTRUCTION PROGRESS - MSC
1963-12-24
S63-23656 (1963) --- Aerial view of construction progress at the Manned Spacecraft Center, Houston, Texas. NOTE: The Manned Spacecraft Center was named Lyndon B. Johnson Space Center in memory of the late President following his death.
NASA Technical Reports Server (NTRS)
Yan, Jerry C.; Jespersen, Dennis; Buning, Peter; Bailey, David (Technical Monitor)
1996-01-01
The Gorden Bell Prizes given out at Supercomputing every year includes at least two catergories: performance (highest GFLOP count) and price-performance (GFLOP/million $$) for real applications. In the past five years, the winners of the price-performance categories all came from networks of work-stations. This reflects three important facts: 1. supercomputers are still too expensive for the masses; 2. achieving high performance for real applications takes real work; and, most importantly; 3. it is possible to obtain acceptable performance for certain real applications on network of work stations. With the continued advance of network technology as well as increased performance of "desktop" workstation, the "Swarm of Ants vs. Herd of Elephants" debate, which began with vector multiprocessors (VPPs) against SIMD type multiprocessors (e.g. CM2), is now recast as VPPs against Symetric Multiprocessors (SMPs, e.g. SGI PowerChallenge). This paper reports on performance studies we performed solving a large scale (2-million grid pt.s) CFD problem involving a Boeing 747 based on a parallel version of OVERFLOW that utilizes message passing on PVM. A performance monitoring tool developed under NASA HPCC, called AIMS, was used to instrument and analyze the the performance data thus obtained. We plan to compare its performance data obtained across a wide spectrum of architectures including: the Cray C90, IBM/SP2, SGI/Power Challenge Cluster, to a group of workstations connected over a simple network. The metrics of comparison includes speed-up, price-performance, throughput, and turn-around time. We also plan to present a plan of attack for various issues that will make the execution of Grand Challenge Applications across the Global Information Infrastructure a reality.
Brain reserve and cognitive reserve protect against cognitive decline over 4.5 years in MS
Rocca, Maria A.; Leavitt, Victoria M.; Dackovic, Jelena; Mesaros, Sarlota; Drulovic, Jelena; DeLuca, John; Filippi, Massimo
2014-01-01
Objective: Based on the theories of brain reserve and cognitive reserve, we investigated whether larger maximal lifetime brain growth (MLBG) and/or greater lifetime intellectual enrichment protect against cognitive decline over time. Methods: Forty patients with multiple sclerosis (MS) underwent baseline and 4.5-year follow-up evaluations of cognitive efficiency (Symbol Digit Modalities Test, Paced Auditory Serial Addition Task) and memory (Selective Reminding Test, Spatial Recall Test). Baseline and follow-up MRIs quantified disease progression: percentage brain volume change (cerebral atrophy), percentage change in T2 lesion volume. MLBG (brain reserve) was estimated with intracranial volume; intellectual enrichment (cognitive reserve) was estimated with vocabulary. We performed repeated-measures analyses of covariance to investigate whether larger MLBG and/or greater intellectual enrichment moderate/attenuate cognitive decline over time, controlling for disease progression. Results: Patients with MS declined in cognitive efficiency and memory (p < 0.001). MLBG moderated decline in cognitive efficiency (p = 0.031, ηp2 = 0.122), with larger MLBG protecting against decline. MLBG did not moderate memory decline (p = 0.234, ηp2 = 0.039). Intellectual enrichment moderated decline in cognitive efficiency (p = 0.031, ηp2 = 0.126) and memory (p = 0.037, ηp2 = 0.115), with greater intellectual enrichment protecting against decline. MS disease progression was more negatively associated with change in cognitive efficiency and memory among patients with lower vs higher MLBG and intellectual enrichment. Conclusion: We provide longitudinal support for theories of brain reserve and cognitive reserve in MS. Larger MLBG protects against decline in cognitive efficiency, and greater intellectual enrichment protects against decline in cognitive efficiency and memory. Consideration of these protective factors should improve prediction of future cognitive decline in patients with MS. PMID:24748670
The neurobiological bases of memory formation: from physiological conditions to psychopathology.
Bisaz, Reto; Travaglia, Alessio; Alberini, Cristina M
2014-01-01
The formation of long-term memories is a function necessary for an adaptive survival. In the last two decades, great progress has been made in the understanding of the biological bases of memory formation. The identification of mechanisms necessary for memory consolidation and reconsolidation, the processes by which the posttraining and postretrieval fragile memory traces become stronger and insensitive to disruption, has indicated new approaches for investigating and treating psychopathologies. In this review, we will discuss some key biological mechanisms found to be critical for memory consolidation and strengthening, the role/s and mechanisms of memory reconsolidation, and how the interference with consolidation and/or reconsolidation can modulate the retention and/or storage of memories that are linked to psychopathologies. © 2014 S. Karger AG, Basel.
Individual differences in false memory from misinformation: cognitive factors.
Zhu, Bi; Chen, Chuansheng; Loftus, Elizabeth F; Lin, Chongde; He, Qinghua; Chen, Chunhui; Li, He; Xue, Gui; Lu, Zhonglin; Dong, Qi
2010-07-01
This research investigated the cognitive correlates of false memories that are induced by the misinformation paradigm. A large sample of Chinese college students (N=436) participated in a misinformation procedure and also took a battery of cognitive tests. Results revealed sizable and systematic individual differences in false memory arising from exposure to misinformation. False memories were significantly and negatively correlated with measures of intelligence (measured with Raven's Advanced Progressive Matrices and Wechsler Adult Intelligence Scale), perception (Motor-Free Visual Perception Test, Change Blindness, and Tone Discrimination), memory (Wechsler Memory Scales and 2-back Working Memory tasks), and face judgement (Face Recognition and Facial Expression Recognition). These findings suggest that people with relatively low intelligence and poor perceptual abilities might be more susceptible to the misinformation effect.
Aversive and non-aversive memory impairment in the mucopolysaccharidosis II mouse model.
Azambuja, Amanda Stapenhorst; Correa, Lilian; Gabiatti, Bernardo Pappi; Martins, Giselle Renata; de Oliveira Franco, Álvaro; Ribeiro, Maria Flávia Marques; Baldo, Guilherme
2018-02-01
Hunter syndrome (MPS II, OMIM 309900) is a lysosomal storage disorder due to deficient iduronate sulphatase activity. Patients present multiple cognitive alterations, and the aim of this work was to verify if MPS II mice also present some progressive cognitive alterations. For that, MPS II mice from 2 to 6 months of age were submitted to repeated open field and inhibitory avoidance tests to evaluate memory parameters. MPS II mice presented impaired memory at 6 months evaluated by open field test. They also performed poorly in the inhibitory avoidance test from 4 months. We conclude that MPS II mice develop cognitive alterations as the disease progresses. These tests can be used in the future to study the efficacy of therapeutic approaches in the central nervous system.
Performance and economy of a fault-tolerant multiprocessor
NASA Technical Reports Server (NTRS)
Lala, J. H.; Smith, C. J.
1979-01-01
The FTMP (Fault-Tolerant Multiprocessor) is one of two central aircraft fault-tolerant architectures now in the prototype phase under NASA sponsorship. The intended application of the computer includes such critical real-time tasks as 'fly-by-wire' active control and completely automatic Category III landings of commercial aircraft. The FTMP architecture is briefly described and it is shown that it is a viable solution to the multi-faceted problems of safety, speed, and cost. Three job dispatch strategies are described, and their results with respect to job-starting delay are presented. The first strategy is a simple First-Come-First-Serve (FCFS) job dispatch executive. The other two schedulers are an adaptive FCFS and an interrupt driven scheduler. Three failure modes are discussed, and the FTMP survival probability in the face of random hard failures is evaluated. It is noted that the hourly cost of operating two FTMPs in a transport aircraft can be as little as one-to-two percent of the total flight-hour cost of the aircraft.
NASA Astrophysics Data System (ADS)
Devaraj, Rajesh; Sarkar, Arnab; Biswas, Santosh
2015-11-01
In the article 'Supervisory control for fault-tolerant scheduling of real-time multiprocessor systems with aperiodic tasks', Park and Cho presented a systematic way of computing a largest fault-tolerant and schedulable language that provides information on whether the scheduler (i.e., supervisor) should accept or reject a newly arrived aperiodic task. The computation of such a language is mainly dependent on the task execution model presented in their paper. However, the task execution model is unable to capture the situation when the fault of a processor occurs even before the task has arrived. Consequently, a task execution model that does not capture this fact may possibly be assigned for execution on a faulty processor. This problem has been illustrated with an appropriate example. Then, the task execution model of Park and Cho has been modified to strengthen the requirement that none of the tasks are assigned for execution on a faulty processor.
Multitasking runtime systems for the Cedar Multiprocessor
DOE Office of Scientific and Technical Information (OSTI.GOV)
Guzzi, M.D.
1986-07-01
The programming of a MIMD machine is more complex than for SISD and SIMD machines. The multiple computational resources of the machine must be made available to the programming language compiler and to the programmer so that multitasking programs may be written. This thesis will explore the additional complexity of programming a MIMD machine, the Cedar Multiprocessor specifically, and the multitasking runtime system necessary to provide multitasking resources to the user. First, the problem will be well defined: the Cedar machine, its operating system, the programming language, and multitasking concepts will be described. Second, a solution to the problem, calledmore » macrotasking, will be proposed. This solution provides multitasking facilities to the programmer at a very coarse level with many visible machine dependencies. Third, an alternate solution, called microtasking, will be proposed. This solution provides multitasking facilities of a much finer grain. This solution does not depend so rigidly on the specific architecture of the machine. Finally, the two solutions will be compared for effectiveness. 12 refs., 16 figs.« less
NASA Technical Reports Server (NTRS)
Jensen, E. Douglas
1988-01-01
Alpha is a new kind of operating system that is unique in two highly significant ways. First, it is decentralized transparently providing reliable resource management across physically dispersed nodes, so that distributed applications programming can be done largely as though it were centralized. And second, it provides comprehensive, high technology support for real-time system integration and operation, an application area which consists predominately of aperiodic activities having critical time constraints such as deadlines. Alpha is extremely adaptable so that it can be easily optimized for a wide range of problem-specific functionality, performance, and cost. Alpha is the first systems effort of the Archons Project, and the prototype was created at Carnegie-Mellon University directly on modified Sun multiprocessor workstation hardware. It has been demonstrated with a real-time C(sup 2) application. Continuing research is leading to a series of enhanced follow-ons to Alpha; these are portable but initially hosted on Concurrent's MASSCOMP line of multiprocessor products.
NASA Astrophysics Data System (ADS)
Zou, Liang; Fu, Zhuang; Zhao, YanZheng; Yang, JunYan
2010-07-01
This paper proposes a kind of pipelined electric circuit architecture implemented in FPGA, a very large scale integrated circuit (VLSI), which efficiently deals with the real time non-uniformity correction (NUC) algorithm for infrared focal plane arrays (IRFPA). Dual Nios II soft-core processors and a DSP with a 64+ core together constitute this image system. Each processor undertakes own systematic task, coordinating its work with each other's. The system on programmable chip (SOPC) in FPGA works steadily under the global clock frequency of 96Mhz. Adequate time allowance makes FPGA perform NUC image pre-processing algorithm with ease, which has offered favorable guarantee for the work of post image processing in DSP. And at the meantime, this paper presents a hardware (HW) and software (SW) co-design in FPGA. Thus, this systematic architecture yields an image processing system with multiprocessor, and a smart solution to the satisfaction with the performance of the system.
Fault-free behavior of reliable multiprocessor systems: FTMP experiments in AIRLAB
NASA Technical Reports Server (NTRS)
Clune, E.; Segall, Z.; Siewiorek, D.
1985-01-01
This report describes a set of experiments which were implemented on the Fault tolerant Multi-Processor (FTMP) at NASA/Langley's AIRLAB facility. These experiments are part of an effort to formulate and evaluate validation methodologies for fault-tolerant computers. This report deals with the measurement of single parameters (baselines) of a fault free system. The initial set of baseline experiments lead to the following conclusions: (1) The system clock is constant and independent of workload in the tested cases; (2) the instruction execution times are constant; (3) the R4 frame size is 40mS with some variation; (4) the frame stretching mechanism has some flaws in its implementation that allow the possibility of an infinite stretching of frame duration. Future experiments are planned. Some will broaden the results of these initial experiments. Others will measure the system more dynamically. The implementation of a synthetic workload generation mechanism for FTMP is planned to enhance the experimental environment of the system.
Solving Partial Differential Equations in a data-driven multiprocessor environment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gaudiot, J.L.; Lin, C.M.; Hosseiniyar, M.
1988-12-31
Partial differential equations can be found in a host of engineering and scientific problems. The emergence of new parallel architectures has spurred research in the definition of parallel PDE solvers. Concurrently, highly programmable systems such as data-how architectures have been proposed for the exploitation of large scale parallelism. The implementation of some Partial Differential Equation solvers (such as the Jacobi method) on a tagged token data-flow graph is demonstrated here. Asynchronous methods (chaotic relaxation) are studied and new scheduling approaches (the Token No-Labeling scheme) are introduced in order to support the implementation of the asychronous methods in a data-driven environment.more » New high-level data-flow language program constructs are introduced in order to handle chaotic operations. Finally, the performance of the program graphs is demonstrated by a deterministic simulation of a message passing data-flow multiprocessor. An analysis of the overhead in the data-flow graphs is undertaken to demonstrate the limits of parallel operations in dataflow PDE program graphs.« less
Compilation time analysis to minimize run-time overhead in preemptive scheduling on multiprocessors
NASA Astrophysics Data System (ADS)
Wauters, Piet; Lauwereins, Rudy; Peperstraete, J.
1994-10-01
This paper describes a scheduling method for hard real-time Digital Signal Processing (DSP) applications, implemented on a multi-processor. Due to the very high operating frequencies of DSP applications (typically hundreds of kHz) runtime overhead should be kept as small as possible. Because static scheduling introduces very little run-time overhead it is used as much as possible. Dynamic pre-emption of tasks is allowed if and only if it leads to better performance in spite of the extra run-time overhead. We essentially combine static scheduling with dynamic pre-emption using static priorities. Since we are dealing with hard real-time applications we must be able to guarantee at compile-time that all timing requirements will be satisfied at run-time. We will show that our method performs at least as good as any static scheduling method. It also reduces the total amount of dynamic pre-emptions compared with run time methods like deadline monotonic scheduling.
Is There a Relation between EEG-Slow Waves and Memory Dysfunction in Epilepsy? A Critical Appraisal
Höller, Yvonne; Trinka, Eugen
2015-01-01
Is there a relationship between peri-ictal slow waves, loss of consciousness, memory, and slow-wave sleep, in patients with different forms of epilepsy? We hypothesize that mechanisms, which result in peri-ictal slow-wave activity as detected by the electroencephalogram, could negatively affect memory processes. Slow waves (≤4 Hz) can be found in seizures with impairment of consciousness and also occur in focal seizures without impairment of consciousness but with inhibited access to memory functions. Peri-ictal slow waves are regarded as dysfunctional and are probably caused by mechanisms, which are essential to disturb the consolidation of memory entries in these patients. This is in strong contrast to physiological slow-wave activity during deep sleep, which is thought to group memory-consolidating fast oscillatory activity. In patients with epilepsy, slow waves may not only correlate with the peri-ictal clouding of consciousness, but could be the epiphenomenon of mechanisms, which interfere with normal brain function in a wider range. These mechanisms may have transient impacts on memory, such as temporary inhibition of memory systems, altered patterns of hippocampal–neocortical interactions during slow-wave sleep, or disturbed cross-frequency coupling of slow and fast oscillations. In addition, repeated tonic–clonic seizures over the years in uncontrolled chronic epilepsy may cause a progressive cognitive decline. This hypothesis can only be assessed in long-term prospective studies. These studies could disentangle the reversible short-term impacts of seizures, and the impacts of chronic uncontrolled seizures. Chronic uncontrolled seizures lead to irreversible memory impairment. By contrast, short-term impacts do not necessarily lead to a progressive cognitive decline but result in significantly impaired peri-ictal memory performance. PMID:26124717
Still searching for the engram.
Eichenbaum, Howard
2016-09-01
For nearly a century, neurobiologists have searched for the engram-the neural representation of a memory. Early studies showed that the engram is widely distributed both within and across brain areas and is supported by interactions among large networks of neurons. Subsequent research has identified engrams that support memory within dedicated functional systems for habit learning and emotional memory, but the engram for declarative memories has been elusive. Nevertheless, recent years have brought progress from molecular biological approaches that identify neurons and networks that are necessary and sufficient to support memory, and from recording approaches and population analyses that characterize the information coded by large neural networks. These new directions offer the promise of revealing the engrams for episodic and semantic memories.
[Progress on metaplasticity and its role in learning and memory].
Wang, Shao-Li; Lu, Wei
2016-08-25
Long-term potentiation (LTP) and long-term depression (LTD) are two major forms of synaptic plasticity that are widely considered as important cellular models of learning and memory. Metaplasticity is defined as the plasticity of synaptic plasticity and thus is an advanced form of plasticity. The history of synaptic activity can affect the subsequent synaptic plasticity induction. Therefore, it is important to study metaplasticity to explore new mechanisms underlying various brain functions including learning and memory. Since the concept of metaplasticity was proposed, it has aroused widespread concerns and attracted numerous researchers to dig more details on this topic. These new-found experimental phenomena and cellular mechanisms have established the basis of theoretical studies on metaplasticity. In recent years, researchers have found that metaplasticity can not only affect the synaptic plasticity, but also regulate the neural network to encode specific content and enhance the learning and memory. These findings have greatly enriched our knowledge on plasticity and opened a new route to study the mechanism of learning and memory. In this review, we discuss the recent progress on metaplasticity on following three aspects: (1) the molecular mechanisms of metaplasticity; (2) the role of metaplasticity in learning and memory; and (3) the outlook of future study on metaplasticity.
Reactor Dosimetry Applications Using RAPTOR-M3G:. a New Parallel 3-D Radiation Transport Code
NASA Astrophysics Data System (ADS)
Longoni, Gianluca; Anderson, Stanwood L.
2009-08-01
The numerical solution of the Linearized Boltzmann Equation (LBE) via the Discrete Ordinates method (SN) requires extensive computational resources for large 3-D neutron and gamma transport applications due to the concurrent discretization of the angular, spatial, and energy domains. This paper will discuss the development RAPTOR-M3G (RApid Parallel Transport Of Radiation - Multiple 3D Geometries), a new 3-D parallel radiation transport code, and its application to the calculation of ex-vessel neutron dosimetry responses in the cavity of a commercial 2-loop Pressurized Water Reactor (PWR). RAPTOR-M3G is based domain decomposition algorithms, where the spatial and angular domains are allocated and processed on multi-processor computer architectures. As compared to traditional single-processor applications, this approach reduces the computational load as well as the memory requirement per processor, yielding an efficient solution methodology for large 3-D problems. Measured neutron dosimetry responses in the reactor cavity air gap will be compared to the RAPTOR-M3G predictions. This paper is organized as follows: Section 1 discusses the RAPTOR-M3G methodology; Section 2 describes the 2-loop PWR model and the numerical results obtained. Section 3 addresses the parallel performance of the code, and Section 4 concludes this paper with final remarks and future work.
A Family of ACO Routing Protocols for Mobile Ad Hoc Networks.
Rupérez Cañas, Delfín; Sandoval Orozco, Ana Lucila; García Villalba, Luis Javier; Kim, Tai-Hoon
2017-05-22
In this work, an ACO routing protocol for mobile ad hoc networks based on AntHocNet is specified. As its predecessor, this new protocol, called AntOR, is hybrid in the sense that it contains elements from both reactive and proactive routing. Specifically, it combines a reactive route setup process with a proactive route maintenance and improvement process. Key aspects of the AntOR protocol are the disjoint-link and disjoint-node routes, separation between the regular pheromone and the virtual pheromone in the diffusion process and the exploration of routes, taking into consideration the number of hops in the best routes. In this work, a family of ACO routing protocols based on AntOR is also specified. These protocols are based on protocol successive refinements. In this work, we also present a parallelized version of AntOR that we call PAntOR. Using programming multiprocessor architectures based on the shared memory protocol, PAntOR allows running tasks in parallel using threads. This parallelization is applicable in the route setup phase, route local repair process and link failure notification. In addition, a variant of PAntOR that consists of having more than one interface, which we call PAntOR-MI (PAntOR-Multiple Interface), is specified. This approach parallelizes the sending of broadcast messages by interface through threads.
Donovan, Nancy J; Amariglio, Rebecca E; Zoller, Amy S; Rudel, Rebecca K; Gomez-Isla, Teresa; Blacker, Deborah; Hyman, Bradley T; Locascio, Joseph J; Johnson, Keith A; Sperling, Reisa A; Marshall, Gad A; Rentz, Dorene M
2014-12-01
To examine neuropsychiatric and neuropsychological predictors of progression from normal to early clinical stages of Alzheimer disease (AD). From a total sample of 559 older adults from the Massachusetts Alzheimer's Disease Research Center longitudinal cohort, 454 were included in the primary analysis: 283 with clinically normal cognition (CN), 115 with mild cognitive impairment (MCI), and 56 with subjective cognitive concerns (SCC) but no objective impairment, a proposed transitional group between CN and MCI. Two latent cognitive factors (memory-semantic, attention-executive) and two neuropsychiatric factors (affective, psychotic) were derived from the Alzheimer's Disease Centers' Uniform Data Set neuropsychological battery and Neuropsychiatric Inventory brief questionnaire. Factors were analyzed as predictors of time to progression to a worse diagnosis using a Cox proportional hazards regression model with backward elimination. Covariates included baseline diagnosis, gender, age, education, prior depression, antidepressant medication, symptom duration, and interaction terms. Higher/better memory-semantic factor score predicted lower hazard of progression (hazard ratio [HR] = 0.4 for 1 standard deviation [SD] increase, p <0.0001), and higher/worse affective factor score predicted higher hazard (HR = 1.3 for one SD increase, p = 0.01). No other predictors were significant in adjusted analyses. Using diagnosis as a sole predictor of transition to MCI, the SCC diagnosis carried a fourfold risk of progression compared with CN (HR = 4.1, p <0.0001). These results identify affective and memory-semantic factors as significant predictors of more rapid progression from normal to early stages of cognitive decline and highlight the subgroup of cognitively normal elderly with SCC as those with elevated risk of progression to MCI. Copyright © 2014 American Association for Geriatric Psychiatry. Published by Elsevier Inc. All rights reserved.
Entorhinal-Hippocampal Neuronal Circuits Bridge Temporally Discontiguous Events
ERIC Educational Resources Information Center
Kitamura, Takashi; Macdonald, Christopher J.; Tonegawa, Susumu
2015-01-01
The entorhinal cortex (EC)-hippocampal (HPC) network plays an essential role for episodic memory, which preserves spatial and temporal information about the occurrence of past events. Although there has been significant progress toward understanding the neural circuits underlying the spatial dimension of episodic memory, the relevant circuits…
Dynamic modulation of innate immunity programming and memory.
Yuan, Ruoxi; Li, Liwu
2016-01-01
Recent progress harkens back to the old theme of immune memory, except this time in the area of innate immunity, to which traditional paradigm only prescribes a rudimentary first-line defense function with no memory. However, both in vitro and in vivo studies reveal that innate leukocytes may adopt distinct activation states such as priming, tolerance, and exhaustion, depending upon the history of prior challenges. The dynamic programming and potential memory of innate leukocytes may have far-reaching consequences in health and disease. This review aims to provide some salient features of innate programing and memory, patho-physiological consequences, underlying mechanisms, and current pressing issues.
McKinnon, Margaret C; Black, Sandra E; Miller, Bruce; Moscovitch, Morris; Levine, Brian
2006-01-01
We examined autobiographical memory performance in two patients with semantic dementia using a novel measure, the Autobiographical Interview [Levine, Svoboda, Hay, Winocur, & Moscovitch (2002). Aging and autobiographical memory: Dissociating episodic from semantic retrieval. Psychology and Aging, 17, 677-689], that is capable of dissociating episodic and personal semantic recall under varying levels of retrieval support. Earlier reports indicated that patients with semantic dementia demonstrate autobiographical episodic memory loss following a "reverse gradient" by which recent memories are preserved relative to remote memories. We found limited evidence for this pattern at conditions of low retrieval support. When structured probing was provided, patients' autobiographical memory performance was similar to that of controls. Retesting of one patient after 1 year indicated that retrieval support was insufficient to bolster performance following progressive prefrontal volume loss, as documented with quantified structural neuroimaging. These findings are discussed in relation to theories of limbic-neocortical interaction in autobiographical memory.
Tree, Jeremy; Kay, Janice
2015-09-01
In the field of dementia research, there are reports of neurodegenerative cases with a focal loss of language, termed primary progressive aphasia (PPA). Currently, this condition has been further sub-classified, with the most recent sub-type dubbed logopenic variant (PPA-LV). As yet, there remains somewhat limited evaluation of the characteristics of this condition, with no studies providing longitudinal assessment accompanied by post-mortem examination. Moreover, a key characteristic of the PPA-LV case is a deterioration of phonological short-term memory, but again little work has scrutinized the nature of this impairment over time. The current study seeks to redress these oversights and presents detailed longitudinal examination of language and memory function in a case of PPA-LV, with special focus on tests linked to components of phonological short-term memory function. Our findings are then considered with reference to a contemporary model of the neuropsychology of phonological short-term memory. Additionally, post-mortem examinations indicated Alzheimer's disease type pathology, providing further evidence that the PPA-LV presentation may reflect an atypical presentation of this condition. © 2014 The British Psychological Society.
Meilán García, Juan José; Iodice, Rosario; Carro, Juan; Sánchez, José Antonio; Palmero, Francisco; Mateos, Ana María
2012-06-01
Autobiographic memory undergoes progressive deterioration during the evolution of Alzheimer's disease (AD). The aim of this study was to analyze mechanisms which facilitate recovery of autobiographic memories. We used a repeatedly employed mechanism, music, with the addition of an emotional factor. Autobiographic memory provoked by a variety of sounds (music which was happy, sad, lacking emotion, ambient noise in a coffee bar and no sound) was analyzed in a sample of 25 patients with AD. Emotional music, especially sad music for remote memories, was found to be the most effective kind for recall of autobiographic experiences. The factor evoking the memory is not the music itself, but rather the emotion associated with it, and is useful for semantic rather than episodic memory.
Active Control of Flexible Space Structures Using the Nitinol Shape Memory Actuators
1987-10-01
number) FIELD !GROUP SUBGROUP I Active Control, Nitinol Actuators, Space Structures 9. ABSTRACT (Continue on reverse if necessary and identify by block...number) Summarizes research progress in the feasibility demonstration of active vibration control using Nitinol shape memory actuators. Tests on...FLEXIBLE SPACE STRUCTURES USING NITINOL SHAPE MEMORY ACTUATORS FINAL REPORT FOR PHASE I SDIO CONTRACT #F49620-87-C-0035 0 BY DR. AMR M. BAZ KARIM R
Recent progress in tungsten oxides based memristors and their neuromorphological applications
NASA Astrophysics Data System (ADS)
Qu, Bo; Younis, Adnan; Chu, Dewei
2016-09-01
The advance in conventional silicon based semiconductor industry is now becoming indeterminacy as it still along the road of Moore's Law and concomitant problems associated with it are the emergence of a number of practical issues such as short channel effect. In terms of memory applications, it is generally believed that transistors based memory devices will approach to their scaling limits up to 2018. Therefore, one of the most prominent challenges today in semiconductor industry is the need of a new memory technology which is able to combine the best characterises of current devices. The resistive switching memories which are regarded as "memristors" thus gain great attentions thanks to their specific nonlinear electrical properties. More importantly, their behaviour resembles with the transmission characteristic of synapse in biology. Therefore, the research of synapses biomimetic devices based on memristor will certainly bring a great research prospect in studying synapse emulation as well as building artificial neural networks. Tungsten oxides (WO x ) exhibits many essential characteristics as a great candidate for memristive devices including: accredited endurance (over 105 cycles), stoichiometric flexibility, complimentary metal-oxide-semiconductor (CMOS) process compatibility and configurable properties including non-volatile rectification, memorization and learning functions. Herein, recent progress on Tungsten oxide based materials and its associating memory devices had been reviewed. The possible implementation of this material as a bio-inspired artificial synapse is also highlighted. The penultimate section summaries the current research progress for tungsten oxide based biological synapses and end up with several proposals that have been suggested for possible future developments.
Okoye, Afam; Meier-Schellersheim, Martin; Brenchley, Jason M.; Hagen, Shoko I.; Walker, Joshua M.; Rohankhedkar, Mukta; Lum, Richard; Edgar, John B.; Planer, Shannon L.; Legasse, Alfred; Sylwester, Andrew W.; Piatak, Michael; Lifson, Jeffrey D.; Maino, Vernon C.; Sodora, Donald L.; Douek, Daniel C.; Axthelm, Michael K.; Grossman, Zvi; Picker, Louis J.
2007-01-01
Primary simian immunodeficiency virus (SIV) infections of rhesus macaques result in the dramatic depletion of CD4+ CCR5+ effector–memory T (TEM) cells from extra-lymphoid effector sites, but in most infections, an increased rate of CD4+ memory T cell proliferation appears to prevent collapse of effector site CD4+ TEM cell populations and acute-phase AIDS. Eventually, persistent SIV replication results in chronic-phase AIDS, but the responsible mechanisms remain controversial. Here, we demonstrate that in the chronic phase of progressive SIV infection, effector site CD4+ TEM cell populations manifest a slow, continuous decline, and that the degree of this depletion remains a highly significant correlate of late-onset AIDS. We further show that due to persistent immune activation, effector site CD4+ TEM cells are predominantly short-lived, and that their homeostasis is strikingly dependent on the production of new CD4+ TEM cells from central–memory T (TCM) cell precursors. The instability of effector site CD4+ TEM cell populations over time was not explained by increasing destruction of these cells, but rather was attributable to progressive reduction in their production, secondary to decreasing numbers of CCR5− CD4+ TCM cells. These data suggest that although CD4+ TEM cell depletion is a proximate mechanism of immunodeficiency, the tempo of this depletion and the timing of disease onset are largely determined by destruction, failing production, and gradual decline of CD4+ TCM cells. PMID:17724130
Okoye, Afam; Meier-Schellersheim, Martin; Brenchley, Jason M; Hagen, Shoko I; Walker, Joshua M; Rohankhedkar, Mukta; Lum, Richard; Edgar, John B; Planer, Shannon L; Legasse, Alfred; Sylwester, Andrew W; Piatak, Michael; Lifson, Jeffrey D; Maino, Vernon C; Sodora, Donald L; Douek, Daniel C; Axthelm, Michael K; Grossman, Zvi; Picker, Louis J
2007-09-03
Primary simian immunodeficiency virus (SIV) infections of rhesus macaques result in the dramatic depletion of CD4(+) CCR5(+) effector-memory T (T(EM)) cells from extra-lymphoid effector sites, but in most infections, an increased rate of CD4(+) memory T cell proliferation appears to prevent collapse of effector site CD4(+) T(EM) cell populations and acute-phase AIDS. Eventually, persistent SIV replication results in chronic-phase AIDS, but the responsible mechanisms remain controversial. Here, we demonstrate that in the chronic phase of progressive SIV infection, effector site CD4(+) T(EM) cell populations manifest a slow, continuous decline, and that the degree of this depletion remains a highly significant correlate of late-onset AIDS. We further show that due to persistent immune activation, effector site CD4(+) T(EM) cells are predominantly short-lived, and that their homeostasis is strikingly dependent on the production of new CD4(+) T(EM) cells from central-memory T (T(CM)) cell precursors. The instability of effector site CD4(+) T(EM) cell populations over time was not explained by increasing destruction of these cells, but rather was attributable to progressive reduction in their production, secondary to decreasing numbers of CCR5(-) CD4(+) T(CM) cells. These data suggest that although CD4(+) T(EM) cell depletion is a proximate mechanism of immunodeficiency, the tempo of this depletion and the timing of disease onset are largely determined by destruction, failing production, and gradual decline of CD4(+) T(CM) cells.
Proactive Interference and Item Similarity in Working Memory
ERIC Educational Resources Information Center
Bunting, Michael
2006-01-01
Proactive interference (PI) may influence the predictive utility of working memory span tasks. Participants in one experiment (N=70) completed Ravens Advanced Progressive Matrices (RAPM) and multiple versions of operation span and probed recall, modified for the type of memoranda (digits or words). Changing memoranda within- or across-trials…
Josselyn, Sheena A; Köhler, Stefan; Frankland, Paul W
2015-09-01
Many attempts have been made to localize the physical trace of a memory, or engram, in the brain. However, until recently, engrams have remained largely elusive. In this Review, we develop four defining criteria that enable us to critically assess the recent progress that has been made towards finding the engram. Recent 'capture' studies use novel approaches to tag populations of neurons that are active during memory encoding, thereby allowing these engram-associated neurons to be manipulated at later times. We propose that findings from these capture studies represent considerable progress in allowing us to observe, erase and express the engram.
Progressive Functional Impairments of Hippocampal Neurons in a Tauopathy Mouse Model
Ciupek, Sarah M.; Cheng, Jingheng; Ali, Yousuf O.; Lu, Hui-Chen
2015-01-01
The age-dependent progression of tau pathology is a major characteristic of tauopathies, including Alzheimer's disease (AD), and plays an important role in the behavioral phenotypes of AD, including memory deficits. Despite extensive molecular and cellular studies on tau pathology, it remains to be determined how it alters the neural circuit functions underlying learning and memory in vivo. In rTg4510 mice, a Tau-P301L tauopathy model, hippocampal place fields that support spatial memories are abnormal at old age (7–9 months) when tau tangles and neurodegeneration are extensive. However, it is unclear how the abnormality in the hippocampal circuit function arises and progresses with the age-dependent progression of tau pathology. Here we show that in young (2–4 months of age) rTg4510 mice, place fields of hippocampal CA1 cells are largely normal, with only subtle differences from those of age-matched wild-type control mice. Second, high-frequency ripple oscillations of local field potentials in the hippocampal CA1 area are significantly reduced in young rTg4510 mice, and even further deteriorated in old rTg4510 mice. The ripple reduction is associated with less bursty firing and altered synchrony of CA1 cells. Together, the data indicate that deficits in ripples and neuronal synchronization occur before overt deficits in place fields in these mice. The results reveal a tau-pathology-induced progression of hippocampal functional changes in vivo. PMID:26019329
Still searching for the engram
Eichenbaum, Howard
2016-01-01
For nearly a century neurobiologists have searched for the engram - the neural representation of a memory. Early studies showed that the engram is widely distributed both within and across brain areas and is supported by interactions among large networks of neurons. Subsequent research has identified engrams that support memory within dedicated functional systems for habit learning and emotional memory, but the engram for declarative memories has been elusive. Nevertheless, recent years have brought progress from molecular biological approaches that identify neurons and networks that are necessary and sufficient to support memory, and from recording approaches and population analyses that characterize the information coded by large neural networks. These new directions offer the promise of revealing the engrams for episodic and semantic memories. PMID:26944423
Senaha, Mirna Lie Hosogi; Caramelli, Paulo; Porto, Claudia Sellitto; Nitrini, Ricardo
2007-01-01
Selective disturbances of semantic memory have attracted the interest of many investigators and the question of the existence of single or multiple semantic systems remains a very controversial theme in the literature. Objectives To discuss the question of multiple semantic systems based on a longitudinal study of a patient who presented semantic dementia from fluent primary progressive aphasia. Methods A 66 year-old woman with selective impairment of semantic memory was examined on two occasions, undergoing neuropsychological and language evaluations, the results of which were compared to those of three paired control individuals. Results In the first evaluation, physical examination was normal and the score on the Mini-Mental State Examination was 26. Language evaluation revealed fluent speech, anomia, disturbance in word comprehension, preservation of the syntactic and phonological aspects of the language, besides surface dyslexia and dysgraphia. Autobiographical and episodic memories were relatively preserved. In semantic memory tests, the following dissociation was found: disturbance of verbal semantic memory with preservation of non-verbal semantic memory. Magnetic resonance of the brain revealed marked atrophy of the left anterior temporal lobe. After 14 months, the difficulties in verbal semantic memory had become more severe and the semantic disturbance, limited initially to the linguistic sphere, had worsened to involve non-verbal domains. Conclusions Given the dissociation found in the first examination, we believe there is sufficient clinical evidence to refute the existence of a unitary semantic system. PMID:29213389
The Experimental Mathematician: The Pleasure of Discovery and the Role of Proof
ERIC Educational Resources Information Center
Borwein, Jonathan M.
2005-01-01
The emergence of powerful mathematical computing environments, the growing availability of correspondingly powerful (multi-processor) computers and the pervasive presence of the Internet allow for mathematicians, students and teachers, to proceed heuristically and "quasi-inductively." We may increasingly use symbolic and numeric computation,…
Experience with a UNIX based batch computing facility for H1
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gerhards, R.; Kruener-Marquis, U.; Szkutnik, Z.
1994-12-31
A UNIX based batch computing facility for the H1 experiment at DESY is described. The ultimate goal is to replace the DESY IBM mainframe by a multiprocessor SGI Challenge series computer, using the UNIX operating system, for most of the computing tasks in H1.
Model Checking, Abstraction, and Compositional Verification
1993-07-01
the ( alois connections used by Bensalrnu el al. [6], and also has some relation to Kurshan’s automata homonuor- phisms [62]. (Actually. we can impose a...multiprocessor simulation model. ACM Transactions on Computer Systems, 4(4):273-298, November 1986. [41 D. L. Beatty, R. E. Bryant, and C.-J. Seger
Developing Software to Use Parallel Processing Effectively
1988-10-01
Experience, Vol 15(6), June 1985, p53 Gajski85 Gajski , Daniel D. and Jih-Kwon Peir, "Essential Issues in Multiprocessor Systems", IEEE Computer, June...Treleaven (eds.), Springer-Verlag, pp. 213-225 (June 1987). Kuck83 David Kuck, Duncan Lawrie, Ron Cytron, Ahmed Sameh and Daniel Gajski , The Architecture and
Autobiographical memory and patterns of brain atrophy in frontotemporal lobar degeneration.
McKinnon, Margaret C; Nica, Elena I; Sengdy, Pheth; Kovacevic, Natasa; Moscovitch, Morris; Freedman, Morris; Miller, Bruce L; Black, Sandra E; Levine, Brian
2008-10-01
Autobiographical memory paradigms have been increasingly used to study the behavioral and neuroanatomical correlates of human remote memory. Although there are numerous functional neuroimaging studies on this topic, relatively few studies of patient samples exist, with heterogeneity of results owing to methodological variability. In this study, fronto-temporal lobar degeneration (FTLD), a form of dementia affecting regions crucial to autobiographical memory, was used as a model of autobiographical memory loss. We emphasized the separation of episodic (recollection of specific event, perceptual, and mental state information) from semantic (factual information unspecific in time and place) autobiographical memory, derived from a reliable method for scoring transcribed autobiographical protocols, the Autobiographical Interview [Levine, B., Svoboda, E., Hay, J., Winocur, G., & Moscovitch, M. Aging and autobiographical memory: Dissociating episodic from semantic retrieval. Psychology and Aging, 17, 677-689, 2002]. Patients with the fronto-temporal dementia (FTD) and mixed fronto-temporal and semantic dementia (FTD/SD) variants of FTLD were impaired at reconstructing episodically rich autobiographical memories across the lifespan, with FTD/SD patients generating an excess of generic semantic autobiographical information. Patients with progressive nonfluent aphasia were mildly impaired for episodic autobiographical memory, but this impairment was eliminated with the provision of structured cueing, likely reflecting relatively intact medial-temporal lobe function, whereas the same cueing failed to bolster the FTD and FTD/SD patients' performance relative to that of matched comparison subjects. The pattern of episodic, but not semantic, autobiographical impairment was enhanced with disease progression on 1- to 2-year follow-up testing in a subset of patients, supplementing the cross-sectional evidence for specificity of episodic autobiographical impairment with longitudinal data. This behavioral pattern covaried with volume loss in a distributed left-lateralized posterior network centered on the temporal lobe, consistent with evidence from other patient and functional neuroimaging studies of autobiographical memory. Frontal lobe volumes, however, did not significantly contribute to this network, suggesting that frontal contributions to autobiographical episodic memory may be more complex than previously appreciated.
Kessels, Roy P C; Meulenbroek, Olga; Fernández, Guillén; Olde Rikkert, Marcel G M
2010-09-01
Mild Cognitive Impairment (MCI) is characterized by episodic memory deficits, while aspects of working memory may also be implicated, but studies into this latter domain are scarce and results are inconclusive. Using a computerized search paradigm, this study compares 25 young adults, 25 typically aging older adults and 15 amnestic MCI patients as to their working-memory capacities for object-location information and potential differential effects of memory load and additional context cues. An age-related deficit in visuospatial working-memory maintenance was found that became more pronounced with increasing task demands. The MCI group additionally showed reduced maintenance of bound information, i.e., object-location associations, again especially at elevated memory load. No effects of contextual cueing were found. The current findings indicate that working memory should be considered when screening patients for suspected MCI and monitoring its progression.
Memory loss in Alzheimer's disease: implications for development of therapeutics
Gold, Carl A; Budson, Andrew E
2009-01-01
Alzheimer's disease (AD) is a progressive neurodegenerative disease marked by a constellation of cognitive disturbances, the earliest and most prominent being impaired episodic memory. Episodic memory refers to the memory system that allows an individual to consciously retrieve a previously experienced item or episode of life. Many recent studies have focused on characterizing how AD pathology impacts particular aspects of episodic memory and underlying mental and neural processes. This review summarizes the findings of those studies and discusses the effects of current and promising treatments for AD on episodic memory. The goal of this review is to raise awareness of the strides that cognitive neuroscientists have made in understanding intact and dysfunctional memory. Knowledge of the specific memorial processes that are impaired in AD may be of great value to basic scientists developing novel therapies and to clinical researchers assessing the efficacy of those therapies. PMID:19086882
ERIC Educational Resources Information Center
Pick, Joseph E.; Malumbres, Marcos; Klann, Eric
2013-01-01
The anaphase promoting complex/cyclosome (APC/C) is an E3 ligase regulated by Cdh1. Beyond its role in controlling cell cycle progression, APC/C-Cdh1 has been detected in neurons and plays a role in long-lasting synaptic plasticity and long-term memory. Herein, we further examined the role of Cdh1 in synaptic plasticity and memory by generating…
Tohda, Chihiro; Nakada, Rie; Urano, Takuya; Okonogi, Akira; Kuboyama, Tomoharu
2011-12-01
Alzheimer's disease (AD) is a chronic progressive neurodegenerative disorder. Current agents for AD are employed for symptomatic therapy and insufficient to cure. We consider that this is quite necessary for AD treatment and have investigated axon/synapse formation-promoting activity. The aim of this study is to investigate the effects of Kamikihi-to [KKT; traditional Japanese (Kampo) medicine] on memory deficits in an AD model, 5XFAD. KKT (200 mg/kg, p.o.) was administered for 15 days to 5XFAD mice. Object recognition memory was tested in vehicle-treated wild-type and 5XFAD mice and KKT-treated 5XFAD mice. KKT-treated 5XFAD mice showed significant improvement of object recognition memory. KKT treatment significantly reduced the number of amyloid plaques in the frontal cortex and hippocampus. Only inside of amyloid plaques were abnormal structures such as bulb-like axons and swollen presynaptic boutons observed. These degenerated axons and presynaptic terminals were significantly reduced by KKT treatment in the frontal cortex. In primary cortical neurons, KKT treatment significantly increased axon length when applied after Aβ(25-35)-induced axonal atrophy had progressed. In conclusion, KKT improved object recognition memory deficit in an AD model 5XFAD mice. Restoration of degenerated axons and synapses may be associated with the memory recovery by KKT.
Deiber, Marie-Pierre; Ibañez, Vicente; Missonnier, Pascal; Herrmann, François; Fazio-Costa, Lara; Gold, Gabriel; Giannakopoulos, Panteleimon
2009-09-01
The electroencephalography (EEG) theta frequency band reacts to memory and selective attention paradigms. Global theta oscillatory activity includes a posterior phase-locked component related to stimulus processing and a frontal-induced component modulated by directed attention. To investigate the presence of early deficits in the directed attention-related network in elderly individuals with mild cognitive impairment (MCI), time-frequency analysis at baseline was used to assess global and induced theta oscillatory activity (4-6Hz) during n-back working memory tasks in 29 individuals with MCI and 24 elderly controls (EC). At 1-year follow-up, 13 MCI patients were still stable and 16 had progressed. Baseline task performance was similar in stable and progressive MCI cases. Induced theta activity at baseline was significantly reduced in progressive MCI as compared to EC and stable MCI in all n-back tasks, which were similar in terms of directed attention requirements. While performance is maintained, the decrease of induced theta activity suggests early deficits in the directed-attention network in progressive MCI, whereas this network is functionally preserved in stable MCI.
Short-term memory coding in children with intellectual disabilities.
Henry, Lucy
2008-05-01
To examine visual and verbal coding strategies, I asked children with intellectual disabilities and peers matched for MA and CA to perform picture memory span tasks with phonologically similar, visually similar, long, or nonsimilar named items. The CA group showed effects consistent with advanced verbal memory coding (phonological similarity and word length effects). Neither the intellectual disabilities nor MA groups showed evidence for memory coding strategies. However, children in these groups with MAs above 6 years showed significant visual similarity and word length effects, broadly consistent with an intermediate stage of dual visual and verbal coding. These results suggest that developmental progressions in memory coding strategies are independent of intellectual disabilities status and consistent with MA.
The specificity of memory enhancement during interaction with a virtual environment.
Brooks, B M; Attree, E A; Rose, F D; Clifford, B R; Leadbetter, A G
1999-01-01
Two experiments investigated differences between active and passive participation in a computer-generated virtual environment in terms of spatial memory, object memory, and object location memory. It was found that active participants, who controlled their movements in the virtual environment using a joystick, recalled the spatial layout of the virtual environment better than passive participants, who merely watched the active participants' progress. Conversely, there were no significant differences between the active and passive participants' recall or recognition of the virtual objects, nor in their recall of the correct locations of objects in the virtual environment. These findings are discussed in terms of subject-performed task research and the specificity of memory enhancement in virtual environments.
Advances in Early Memory Development Research: Insights about the Dark Side of the Moon
ERIC Educational Resources Information Center
Courage, Mary L.; Howe, Mark L.
2004-01-01
Over the past three decades impressive progress has been made in documenting the development of encoding, storage, and retrieval processes in preverbal infants and children. This literature includes an extensive and diverse database as well as theoretical conjecture about the underlying processes that drive early memory development. A selective…
Recognition Memory: A Review of the Critical Findings and an Integrated Theory for Relating Them
ERIC Educational Resources Information Center
Malmberg, Kenneth J.
2008-01-01
The development of formal models has aided theoretical progress in recognition memory research. Here, I review the findings that are critical for testing them, including behavioral and brain imaging results of single-item recognition, plurality discrimination, and associative recognition experiments under a variety of testing conditions. I also…
Molecular mechanisms of memory in imprinting.
Solomonia, Revaz O; McCabe, Brian J
2015-03-01
Converging evidence implicates the intermediate and medial mesopallium (IMM) of the domestic chick forebrain in memory for a visual imprinting stimulus. During and after imprinting training, neuronal responsiveness in the IMM to the familiar stimulus exhibits a distinct temporal profile, suggesting several memory phases. We discuss the temporal progression of learning-related biochemical changes in the IMM, relative to the start of this electrophysiological profile. c-fos gene expression increases <15 min after training onset, followed by a learning-related increase in Fos expression, in neurons immunopositive for GABA, taurine and parvalbumin (not calbindin). Approximately simultaneously or shortly after, there are increases in phosphorylation level of glutamate (AMPA) receptor subunits and in releasable neurotransmitter pools of GABA and taurine. Later, the mean area of spine synapse post-synaptic densities, N-methyl-D-aspartate receptor number and phosphorylation level of further synaptic proteins are elevated. After ∼ 15 h, learning-related changes in amounts of several synaptic proteins are observed. The results indicate progression from transient/labile to trophic synaptic modification, culminating in stable recognition memory. Crown Copyright © 2014. Published by Elsevier Ltd. All rights reserved.
Molecular mechanisms of memory in imprinting
Solomonia, Revaz O.; McCabe, Brian J.
2015-01-01
Converging evidence implicates the intermediate and medial mesopallium (IMM) of the domestic chick forebrain in memory for a visual imprinting stimulus. During and after imprinting training, neuronal responsiveness in the IMM to the familiar stimulus exhibits a distinct temporal profile, suggesting several memory phases. We discuss the temporal progression of learning-related biochemical changes in the IMM, relative to the start of this electrophysiological profile. c-fos gene expression increases <15 min after training onset, followed by a learning-related increase in Fos expression, in neurons immunopositive for GABA, taurine and parvalbumin (not calbindin). Approximately simultaneously or shortly after, there are increases in phosphorylation level of glutamate (AMPA) receptor subunits and in releasable neurotransmitter pools of GABA and taurine. Later, the mean area of spine synapse post-synaptic densities, N-methyl-d-aspartate receptor number and phosphorylation level of further synaptic proteins are elevated. After ∼15 h, learning-related changes in amounts of several synaptic proteins are observed. The results indicate progression from transient/labile to trophic synaptic modification, culminating in stable recognition memory. PMID:25280906
Sala, Isabel; Illán-Gala, Ignacio; Alcolea, Daniel; Sánchez-Saudinós, Ma Belén; Salgado, Sergio Andrés; Morenas-Rodríguez, Estrella; Subirana, Andrea; Videla, Laura; Clarimón, Jordi; Carmona-Iragui, María; Ribosa-Nogué, Roser; Blesa, Rafael; Fortea, Juan; Lleó, Alberto
2017-01-01
Episodic memory impairment is the core feature of typical Alzheimer's disease. To evaluate the performance of two commonly used verbal memory tests to detect mild cognitive impairment due to Alzheimer's disease (MCI-AD) and to predict progression to Alzheimer's disease dementia (AD-d). Prospective study of MCI patients in a tertiary memory disorder unit. Patients underwent an extensive neuropsychological battery including two tests of declarative verbal memory: The Free and Cued Selective Reminding Test (FCSRT) and the word list learning task from the Consortium to Establish a Registry for Alzheimer's disease (CERAD-WL). Cerebrospinal fluid (CSF) was obtained from all patients and MCI-AD was defined by means of the t-Tau/Aβ1-42 ratio. Logistic regression analyses tested whether the combination of FCSRT and CERAD-WL measures significantly improved the prediction of MCI-AD. Progression to AD-d was analyzed in a Cox regression model. A total of 202 MCI patients with a mean follow-up of 34.2±24.2 months were included and 98 (48.5%) met the criteria for MCI-AD. The combination of FCSRT and CERAD-WL measures improved MCI-AD classification accuracy based on CSF biomarkers. Both tests yielded similar global predictive values (59.9-65.3% and 59.4-62.8% for FCSRT and CERAD-WL, respectively). MCI-AD patients with deficits in both FCSRT and CERAD-WL had a faster progression to AD-d than patients with deficits in only one test. The combination of FCSRT and CERAD-WL improves the classification of MCI-AD and defines different prognostic profiles. These findings have important implications for clinical practice and the design of clinical trials.
Kurtulus, Sema; Tripathi, Pulak; Hildeman, David A.
2013-01-01
Vaccines, arguably the single most important intervention in improving human health, have exploited the phenomenon of immunological memory. The elicitation of memory T cells is often an essential part of successful long-lived protective immunity. Our understanding of T cell memory has been greatly aided by the development of TCR Tg mice and MHC tetrameric staining reagents that have allowed the precise tracking of antigen-specific T cell responses. Indeed, following acute infection or immunization, naïve T cells undergo a massive expansion culminating in the generation of a robust effector T cell population. This peak effector response is relatively short-lived and, while most effector T cells die by apoptosis, some remain and develop into memory cells. Although the molecular mechanisms underlying this cell fate decision remain incompletely defined, substantial progress has been made, particularly with regards to CD8+ T cells. For example, the effector CD8+ T cells generated during a response are heterogeneous, consisting of cells with more or less potential to develop into full-fledged memory cells. Development of CD8+ T cell memory is regulated by the transcriptional programs that control the differentiation and survival of effector T cells. While the type of antigenic stimulation and level of inflammation control effector CD8+ T cell differentiation, availability of cytokines and their ability to control expression and function of Bcl-2 family members governs their survival. These distinct differentiation and survival programs may allow for finer therapeutic intervention to control both the quality and quantity of CD8+ T cell memory. Effector to memory transition of CD4+ T cells is less well characterized than CD8+ T cells, emerging details will be discussed. This review will focus on the recent progress made in our understanding of the mechanisms underlying the development of T cell memory with an emphasis on factors controlling survival of effector T cells. PMID:23346085
The demand to progress: critical nostalgia in LGBTQ cultural memory.
de Szegheo Lang, Tamara
2015-01-01
This article argues that, while representations of tragic lesbian, gay, bisexual, transgender, and queer (LGBTQ) histories are disseminated widely, positive aspects of the past must be largely pushed out of the cultural imaginary to support a vision of the present in which sexual rights and freedoms have been achieved. It proposes that this view relies on a linear progress narrative wherein the experiences of LGBTQ people are held as consistently improving over time. In considering the construction of cultural memory through popular media and art, it claims a nostalgic turn to the past as a useful political tool for dismantling the pacifying aspects of the present.
Selective alterations of neurons and circuits related to early memory loss in Alzheimer’s disease
Llorens-Martín, Maria; Blazquez-Llorca, Lidia; Benavides-Piccione, Ruth; Rabano, Alberto; Hernandez, Felix; Avila, Jesus; DeFelipe, Javier
2014-01-01
A progressive loss of episodic memory is a well-known clinical symptom that characterizes Alzheimer’s disease (AD). The beginning of this loss of memory has been associated with the very early, pathological accumulation of tau and neuronal degeneration observed in the entorhinal cortex (EC). Tau-related pathology is thought to then spread progressively to the hippocampal formation and other brain areas as the disease progresses. The major cortical afferent source of the hippocampus and dentate gyrus is the EC through the perforant pathway. At least two main circuits participate in the connection between EC and the hippocampus; one originating in layer II and the other in layer III of the EC giving rise to the classical trisynaptic (ECII → dentate gyrus → CA3 → CA1) and monosynaptic (ECIII → CA1) circuits. Thus, the study of the early pathological changes in these circuits is of great interest. In this review, we will discuss mainly the alterations of the granule cell neurons of the dentate gyrus and the atrophy of CA1 pyramidal neurons that occur in AD in relation to the possible differential alterations of these two main circuits. PMID:24904307
Selective alterations of neurons and circuits related to early memory loss in Alzheimer's disease.
Llorens-Martín, Maria; Blazquez-Llorca, Lidia; Benavides-Piccione, Ruth; Rabano, Alberto; Hernandez, Felix; Avila, Jesus; DeFelipe, Javier
2014-01-01
A progressive loss of episodic memory is a well-known clinical symptom that characterizes Alzheimer's disease (AD). The beginning of this loss of memory has been associated with the very early, pathological accumulation of tau and neuronal degeneration observed in the entorhinal cortex (EC). Tau-related pathology is thought to then spread progressively to the hippocampal formation and other brain areas as the disease progresses. The major cortical afferent source of the hippocampus and dentate gyrus is the EC through the perforant pathway. At least two main circuits participate in the connection between EC and the hippocampus; one originating in layer II and the other in layer III of the EC giving rise to the classical trisynaptic (ECII → dentate gyrus → CA3 → CA1) and monosynaptic (ECIII → CA1) circuits. Thus, the study of the early pathological changes in these circuits is of great interest. In this review, we will discuss mainly the alterations of the granule cell neurons of the dentate gyrus and the atrophy of CA1 pyramidal neurons that occur in AD in relation to the possible differential alterations of these two main circuits.
Dardalhon, V; Jaleco, S; Kinet, S; Herpers, B; Steinberg, M; Ferrand, C; Froger, D; Leveau, C; Tiberghien, P; Charneau, P; Noraz, N; Taylor, N
2001-07-31
Differences in the immunological reactivity of umbilical cord (UC) and adult peripheral blood (APB) T cells are poorly understood. Here, we show that IL-7, a cytokine involved in lymphoid homeostasis, has distinct regulatory effects on APB and UC lymphocytes. Neither naive nor memory APB CD4(+) cells proliferated in response to IL-7, whereas naive UC CD4(+) lymphocytes underwent multiple divisions. Nevertheless, both naive and memory IL-7-treated APB T cells progressed into the G(1b) phase of the cell cycle, albeit at higher levels in the latter subset. The IL-7-treated memory CD4(+) lymphocyte population was significantly more susceptible to infection with an HIV-1-derived vector than dividing CD4(+) UC lymphocytes. However, activation through the T cell receptor rendered UC lymphocytes fully susceptible to HIV-1-based vector infection. These data unveil differences between UC and APB CD4(+) T cells with regard to IL-7-mediated cell cycle progression and HIV-1-based vector infectivity. This evidence indicates that IL-7 differentially regulates lymphoid homeostasis in adults and neonates.
Dardalhon, Valérie; Jaleco, Sara; Kinet, Sandrina; Herpers, Bjorn; Steinberg, Marcos; Ferrand, Christophe; Froger, Delphine; Leveau, Christelle; Tiberghien, Pierre; Charneau, Pierre; Noraz, Nelly; Taylor, Naomi
2001-01-01
Differences in the immunological reactivity of umbilical cord (UC) and adult peripheral blood (APB) T cells are poorly understood. Here, we show that IL-7, a cytokine involved in lymphoid homeostasis, has distinct regulatory effects on APB and UC lymphocytes. Neither naive nor memory APB CD4+ cells proliferated in response to IL-7, whereas naive UC CD4+ lymphocytes underwent multiple divisions. Nevertheless, both naive and memory IL-7-treated APB T cells progressed into the G1b phase of the cell cycle, albeit at higher levels in the latter subset. The IL-7-treated memory CD4+ lymphocyte population was significantly more susceptible to infection with an HIV-1-derived vector than dividing CD4+ UC lymphocytes. However, activation through the T cell receptor rendered UC lymphocytes fully susceptible to HIV-1-based vector infection. These data unveil differences between UC and APB CD4+ T cells with regard to IL-7-mediated cell cycle progression and HIV-1-based vector infectivity. This evidence indicates that IL-7 differentially regulates lymphoid homeostasis in adults and neonates. PMID:11470908
Nearest Neighbor Searching in Binary Search Trees: Simulation of a Multiprocessor System.
ERIC Educational Resources Information Center
Stewart, Mark; Willett, Peter
1987-01-01
Describes the simulation of a nearest neighbor searching algorithm for document retrieval using a pool of microprocessors. Three techniques are described which allow parallel searching of a binary search tree as well as a PASCAL-based system, PASSIM, which can simulate these techniques. Fifty-six references are provided. (Author/LRW)
Expert Systems on Multiprocessor Architectures. Volume 4. Technical Reports
1991-06-01
Floated-Current-Time0 -> The time that this function is called in user time uflts, expressed as a floating point number. Halt- Poligono Arrests the...default a statistics file will be printed out, if it can be. To prevent this make No-Statistics true. Unhalt- Poligono Unarrests the process in which the
Clocking and Synchronization Circuits in Multiprocessor Systems
1989-04-01
18 3.4 Inter -chip Clocking Strategies...may occur when two or more of the switches make transitions at different times during the inter - val during which those inputs are being processed...increased without any fruitful computation. The sources of the inter -chip clock skew are the electromagnetic propagation delay, the buffer delay within
Cache coherency without line exclusivity in MP systems having store-in caches
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pomerene, J.H.; Puzak, T.R.; Rechtschaffen, R.N.
1983-11-01
By modifying the function of the storage control unit, a multiprocessor (MP) system having store-in caches is enabled to operate with the same versatility as an MP system having store-through caches, thereby eliminating the requirement for line exclusivity and greatly reducing the occurrence of cross-interrogates.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hanham, R.; Vogt, W.G.; Mickle, M.H.
1986-01-01
This book presents the papers given at a conference on computerized simulation. Topics considered at the conference included expert systems, modeling in electric power systems, power systems operating strategies, energy analysis, a linear programming approach to optimum load shedding in transmission systems, econometrics, simulation in natural gas engineering, solar energy studies, artificial intelligence, vision systems, hydrology, multiprocessors, and flow models.
Data Traffic Reduction Schemes for Cholesky Factorization on Asynchronous Multiprocessor Systems
1989-06-01
Engineering NASA Langley Research Center Hampton, Virginia 23665-5225 Operated by the Universities Space Research Association DTIC ELECTE NASA jAUG 23...Hampton, VA 23665. ti- 1. Introduction Consider the problem of solving a system of linear equations Ax=b where .4 is an n x n symmetric, positive
3D Navier-Stokes Flow Analysis for a Large-Array Multiprocessor
1989-04-17
computer, Alliant’s FX /8, Intel’s Hypercube, and Encore’s Multimax. Unfortunately, the current algorithms have been developed pri- marily for SISD machines...Reversing and Thrust-Vectoring Nozzle Flows," Ph.D. Dissertation in the Dept. of Aero. and Astro ., Univ. of Wash., Washington, 1986. [11] Anderson
MULTIPROCESSOR AND DISTRIBUTED PROCESSING BIBLIOGRAPHIC DATA BASE SOFTWARE SYSTEM
NASA Technical Reports Server (NTRS)
Miya, E. N.
1994-01-01
Multiprocessors and distributed processing are undergoing increased scientific scrutiny for many reasons. It is more and more difficult to keep track of the existing research in these fields. This package consists of a large machine-readable bibliographic data base which, in addition to the usual keyword searches, can be used for producing citations, indexes, and cross-references. The data base is compiled from smaller existing multiprocessing bibliographies, and tables of contents from journals and significant conferences. There are approximately 4,000 entries covering topics such as parallel and vector processing, networks, supercomputers, fault-tolerant computers, and cellular automata. Each entry is represented by 21 fields including keywords, author, referencing book or journal title, volume and page number, and date and city of publication. The data base contains UNIX 'refer' formatted ASCII data and can be implemented on any computer running under the UNIX operating system. The data base requires approximately one megabyte of secondary storage. The documentation for this program is included with the distribution tape, although it can be purchased for the price below. This bibliography was compiled in 1985 and updated in 1988.
Design and control of a macro-micro robot for precise force applications
NASA Technical Reports Server (NTRS)
Wang, Yulun; Mangaser, Amante; Laby, Keith; Jordan, Steve; Wilson, Jeff
1993-01-01
Creating a robot which can delicately interact with its environment has been the goal of much research. Primarily two difficulties have made this goal hard to attain. The execution of control strategies which enable precise force manipulations are difficult to implement in real time because such algorithms have been too computationally complex for available controllers. Also, a robot mechanism which can quickly and precisely execute a force command is difficult to design. Actuation joints must be sufficiently stiff, frictionless, and lightweight so that desired torques can be accurately applied. This paper describes a robotic system which is capable of delicate manipulations. A modular high-performance multiprocessor control system was designed to provide sufficient compute power for executing advanced control methods. An 8 degree of freedom macro-micro mechanism was constructed to enable accurate tip forces. Control algorithms based on the impedance control method were derived, coded, and load balanced for maximum execution speed on the multiprocessor system. Delicate force tasks such as polishing, finishing, cleaning, and deburring, are the target applications of the robot.
Distributed Virtual System (DIVIRS) Project
NASA Technical Reports Server (NTRS)
Schorr, Herbert; Neuman, B. Clifford
1993-01-01
As outlined in our continuation proposal 92-ISI-50R (revised) on contract NCC 2-539, we are (1) developing software, including a system manager and a job manager, that will manage available resources and that will enable programmers to program parallel applications in terms of a virtual configuration of processors, hiding the mapping to physical nodes; (2) developing communications routines that support the abstractions implemented in item one; (3) continuing the development of file and information systems based on the virtual system model; and (4) incorporating appropriate security measures to allow the mechanisms developed in items 1 through 3 to be used on an open network. The goal throughout our work is to provide a uniform model that can be applied to both parallel and distributed systems. We believe that multiprocessor systems should exist in the context of distributed systems, allowing them to be more easily shared by those that need them. Our work provides the mechanisms through which nodes on multiprocessors are allocated to jobs running within the distributed system and the mechanisms through which files needed by those jobs can be located and accessed.
NASA Technical Reports Server (NTRS)
Sargent, Jeff Scott
1988-01-01
A new row-based parallel algorithm for standard-cell placement targeted for execution on a hypercube multiprocessor is presented. Key features of this implementation include a dynamic simulated-annealing schedule, row-partitioning of the VLSI chip image, and two novel new approaches to controlling error in parallel cell-placement algorithms; Heuristic Cell-Coloring and Adaptive (Parallel Move) Sequence Control. Heuristic Cell-Coloring identifies sets of noninteracting cells that can be moved repeatedly, and in parallel, with no buildup of error in the placement cost. Adaptive Sequence Control allows multiple parallel cell moves to take place between global cell-position updates. This feedback mechanism is based on an error bound derived analytically from the traditional annealing move-acceptance profile. Placement results are presented for real industry circuits and the performance is summarized of an implementation on the Intel iPSC/2 Hypercube. The runtime of this algorithm is 5 to 16 times faster than a previous program developed for the Hypercube, while producing equivalent quality placement. An integrated place and route program for the Intel iPSC/2 Hypercube is currently being developed.
DIstributed VIRtual System (DIVIRS) project
NASA Technical Reports Server (NTRS)
Schorr, Herbert; Neuman, B. Clifford
1994-01-01
As outlined in our continuation proposal 92-ISI-. OR (revised) on NASA cooperative agreement NCC2-539, we are (1) developing software, including a system manager and a job manager, that will manage available resources and that will enable programmers to develop and execute parallel applications in terms of a virtual configuration of processors, hiding the mapping to physical nodes; (2) developing communications routines that support the abstractions implemented in item one; (3) continuing the development of file and information systems based on the Virtual System Model; and (4) incorporating appropriate security measures to allow the mechanisms developed in items 1 through 3 to be used on an open network. The goal throughout our work is to provide a uniform model that can be applied to both parallel and distributed systems. We believe that multiprocessor systems should exist in the context of distributed systems, allowing them to be more easily shared by those that need them. Our work provides the mechanisms through which nodes on multiprocessors are allocated to jobs running within the distributed system and the mechanisms through which files needed by those jobs can be located and accessed.
DIstributed VIRtual System (DIVIRS) project
NASA Technical Reports Server (NTRS)
Schorr, Herbert; Neuman, Clifford B.
1995-01-01
As outlined in our continuation proposal 92-ISI-50R (revised) on NASA cooperative agreement NCC2-539, we are (1) developing software, including a system manager and a job manager, that will manage available resources and that will enable programmers to develop and execute parallel applications in terms of a virtual configuration of processors, hiding the mapping to physical nodes; (2) developing communications routines that support the abstractions implemented in item one; (3) continuing the development of file and information systems based on the Virtual System Model; and (4) incorporating appropriate security measures to allow the mechanisms developed in items 1 through 3 to be used on an open network. The goal throughout our work is to provide a uniform model that can be applied to both parallel and distributed systems. We believe that multiprocessor systems should exist in the context of distributed systems, allowing them to be more easily shared by those that need them. Our work provides the mechanisms through which nodes on multiprocessors are allocated to jobs running within the distributed system and the mechanisms through which files needed by those jobs can be located and accessed.
Distributed Virtual System (DIVIRS) project
NASA Technical Reports Server (NTRS)
Schorr, Herbert; Neuman, B. Clifford
1993-01-01
As outlined in the continuation proposal 92-ISI-50R (revised) on NASA cooperative agreement NCC 2-539, the investigators are developing software, including a system manager and a job manager, that will manage available resources and that will enable programmers to develop and execute parallel applications in terms of a virtual configuration of processors, hiding the mapping to physical nodes; developing communications routines that support the abstractions implemented; continuing the development of file and information systems based on the Virtual System Model; and incorporating appropriate security measures to allow the mechanisms developed to be used on an open network. The goal throughout the work is to provide a uniform model that can be applied to both parallel and distributed systems. The authors believe that multiprocessor systems should exist in the context of distributed systems, allowing them to be more easily shared by those that need them. The work provides the mechanisms through which nodes on multiprocessors are allocated to jobs running within the distributed system and the mechanisms through which files needed by those jobs can be located and accessed.
Prefetching in file systems for MIMD multiprocessors
NASA Technical Reports Server (NTRS)
Kotz, David F.; Ellis, Carla Schlatter
1990-01-01
The question of whether prefetching blocks on the file into the block cache can effectively reduce overall execution time of a parallel computation, even under favorable assumptions, is considered. Experiments have been conducted with an interleaved file system testbed on the Butterfly Plus multiprocessor. Results of these experiments suggest that (1) the hit ratio, the accepted measure in traditional caching studies, may not be an adequate measure of performance when the workload consists of parallel computations and parallel file access patterns, (2) caching with prefetching can significantly improve the hit ratio and the average time to perform an I/O (input/output) operation, and (3) an improvement in overall execution time has been observed in most cases. In spite of these gains, prefetching sometimes results in increased execution times (a negative result, given the optimistic nature of the study). The authors explore why it is not trivial to translate savings on individual I/O requests into consistently better overall performance and identify the key problems that need to be addressed in order to improve the potential of prefetching techniques in the environment.
Pedraza, Lizeth K; Sierra, Rodrigo O; Boos, Flávia Z; Haubrich, Josué; Quillfeldt, Jorge A; Alvares, Lucas de Oliveira
2016-03-01
Memory fades over time, becoming more schematic or abstract. The loss of contextual detail in memory may reflect a time-dependent change in the brain structures supporting memory. It has been well established that contextual fear memory relies on the hippocampus for expression shortly after learning, but it becomes hippocampus-independent at a later time point, a process called systems consolidation. This time-dependent process correlates with the loss of memory precision. Here, we investigated whether training intensity predicts the gradual decay of hippocampal dependency to retrieve memory, and the quality of the contextual memory representation over time. We have found that training intensity modulates the progressive decay of hippocampal dependency and memory precision. Strong training intensity accelerates systems consolidation and memory generalization in a remarkable timeframe match. The mechanisms underpinning such process are triggered by glucocorticoid and noradrenaline released during training. These results suggest that the stress levels during emotional learning act as a switch, determining the fate of memory quality. Moderate stress will create a detailed memory, whereas a highly stressful training will develop a generic gist-like memory. © 2015 Wiley Periodicals, Inc.
Working Memory Development in Children with Mild to Borderline Intellectual Disabilities
ERIC Educational Resources Information Center
Van der Molen, M. J.; Henry, L. A.; Van Luit, J. E. H.
2014-01-01
Background: The purpose of the current cross-sectional study was to examine the developmental progression in working memory (WM) between the ages of 9 and 16 years in a large sample of children with mild to borderline intellectual disabilities (MBID). Baddeley's influential WM model was used as a theoretical framework. Furthermore, the…
ERIC Educational Resources Information Center
Swanson, H. Lee
2011-01-01
This study examined whether children's growth on measures of fluid (Raven Colored Progressive Matrices) and crystallized (reading and math achievement) intelligence was attributable to domain-specific or domain-general functions of working memory (WM). A sample of 290 elementary school children was tested on measures of intelligence across three…
Proactive interference and item similarity in working memory.
Bunting, Michael
2006-03-01
Proactive interference (PI) may influence the predictive utility of working memory span tasks. Participants in one experiment (N=70) completed Ravens Advanced Progressive Matrices (RAPM) and multiple versions of operation span and probed recall, modified for the type of memoranda (digits or words). Changing memoranda within- or across-trials released PI, but not doing so permitted PI buildup. Scores from PI-build trials, but not PI-release trials, correlated with RAPM and accounted for as much variance in RAPM as unmodified tasks. These results are consistent with controlled attention and inhibition accounts of working memory, and they elucidate a fundamental component of working memory span tasks.
Recent Progress on Modeling Slip Deformation in Shape Memory Alloys
NASA Astrophysics Data System (ADS)
Sehitoglu, H.; Alkan, S.
2018-03-01
This paper presents an overview of slip deformation in shape memory alloys. The performance of shape memory alloys depends on their slip resistance often quantified through the Critical Resolved Shear Stress (CRSS) or the flow stress. We highlight previous studies that identify the active slip systems and then proceed to show how non- Schmid effects can be dominant in shape memory slip behavior. The work is mostly derived from our recent studies while we highlight key earlier works on slip deformation. We finally discuss the implications of understanding the role of slip on curtailing the transformation strains and also the temperature range over which superelasticity prevails.
Ramanan, Siddharth; Flanagan, Emma; Leyton, Cristian E; Villemagne, Victor L; Rowe, Christopher C; Hodges, John R; Hornberger, Michael
2016-01-01
Diagnostic distinction of primary progressive aphasias (PPA) remains challenging, in particular for the logopenic (lvPPA) and nonfluent/agrammatic (naPPA) variants. Recent findings highlight that episodic memory deficits appear to discriminate these PPA variants from each other, as only lvPPA perform poorly on these tasks while having underlying amyloid pathology similar to that seen in amnestic dementias like Alzheimer's disease (AD). Most memory tests are, however, language based and thus potentially confounded by the prevalent language deficits in PPA. The current study investigated this issue across PPA variants by contrasting verbal and non-verbal episodic memory measures while controlling for their performance on a language subtest of a general cognitive screen. A total of 203 participants were included (25 lvPPA; 29 naPPA; 59 AD; 90 controls) and underwent extensive verbal and non-verbal episodic memory testing, with a subset of patients (n = 45) with confirmed amyloid profiles as assessed by Pittsburgh Compound B and PET. The most powerful discriminator between naPPA and lvPPA patients was a non-verbal recall measure (Rey Complex Figure delayed recall), with 81% of PPA patients classified correctly at presentation. Importantly, AD and lvPPA patients performed comparably on this measure, further highlighting the importance of underlying amyloid pathology in episodic memory profiles. The findings demonstrate that non-verbal recall emerges as the best discriminator of lvPPA and naPPA when controlling for language deficits in high load amyloid PPA cases.
A transcriptome-based model of central memory CD4 T cell death in HIV infection.
Olvera-García, Gustavo; Aguilar-García, Tania; Gutiérrez-Jasso, Fany; Imaz-Rosshandler, Iván; Rangel-Escareño, Claudia; Orozco, Lorena; Aguilar-Delfín, Irma; Vázquez-Pérez, Joel A; Zúñiga, Joaquín; Pérez-Patrigeon, Santiago; Espinosa, Enrique
2016-11-22
Human central memory CD4 T cells are characterized by their capacity of proliferation and differentiation into effector memory CD4 T cells. Homeostasis of central memory CD4 T cells is considered a key factor sustaining the asymptomatic stage of Human Immunodeficiency Virus type 1 (HIV-1) infection, while progression to acquired immunodeficiency syndrome is imputed to central memory CD4 T cells homeostatic failure. We investigated if central memory CD4 T cells from patients with HIV-1 infection have a gene expression profile impeding proliferation and survival, despite their activated state. Using gene expression microarrays, we analyzed mRNA expression patterns in naive, central memory, and effector memory CD4 T cells from healthy controls, and naive and central memory CD4 T cells from patients with HIV-1 infection. Differentially expressed genes, defined by Log 2 Fold Change (FC) ≥ |0.5| and Log (odds) > 0, were used in pathway enrichment analyses. Central memory CD4 T cells from patients and controls showed comparable expression of differentiation-related genes, ruling out an effector-like differentiation of central memory CD4 T cells in HIV infection. However, 210 genes were differentially expressed in central memory CD4 T cells from patients compared with those from controls. Expression of 75 of these genes was validated by semi quantitative RT-PCR, and independently reproduced enrichment results from this gene expression signature. The results of functional enrichment analysis indicated movement to cell cycle phases G1 and S (increased CCNE1, MKI67, IL12RB2, ADAM9, decreased FGF9, etc.), but also arrest in G2/M (increased CHK1, RBBP8, KIF11, etc.). Unexpectedly, the results also suggested decreased apoptosis (increased CSTA, NFKBIA, decreased RNASEL, etc.). Results also suggested increased IL-1β, IFN-γ, TNF, and RANTES (CCR5) activity upstream of the central memory CD4 T cells signature, consistent with the demonstrated milieu in HIV infection. Our findings support a model where progressive loss of central memory CD4 T cells in chronic HIV-1 infection is driven by increased cell cycle entry followed by mitotic arrest, leading to a non-apoptotic death pathway without actual proliferation, possibly contributing to increased turnover.
The neurobiology of consolidations, or, how stable is the engram?
Dudai, Yadin
2004-01-01
Consolidation is the progressive postacquisition stabilization of long-term memory. The term is commonly used to refer to two types of processes: synaptic consolidation, which is accomplished within the first minutes to hours after learning and occurs in all memory systems studied so far; and system consolidation, which takes much longer, and in which memories that are initially dependent upon the hippocampus undergo reorganization and may become hippocampal-independent. The textbook account of consolidation is that for any item in memory, consolidation starts and ends just once. Recently, a heated debate has been revitalized on whether this is indeed the case, or, alternatively, whether memories become labile and must undergo some form of renewed consolidation every time they are activated. This debate focuses attention on fundamental issues concerning the nature of the memory trace, its maturation, persistence, retrievability, and modifiability.
Memory reactivation and consolidation during sleep
Paller, Ken A.; Voss, Joel L.
2004-01-01
Do our memories remain static during sleep, or do they change? We argue here that memory change is not only a natural result of sleep cognition, but further, that such change constitutes a fundamental characteristic of declarative memories. In general, declarative memories change due to retrieval events at various times after initial learning and due to the formation and elaboration of associations with other memories, including memories formed after the initial learning episode. We propose that declarative memories change both during waking and during sleep, and that such change contributes to enhancing binding of the distinct representational components of some memories, and thus to a gradual process of cross-cortical consolidation. As a result of this special form of consolidation, declarative memories can become more cohesive and also more thoroughly integrated with other stored information. Further benefits of this memory reprocessing can include developing complex networks of interrelated memories, aligning memories with long-term strategies and goals, and generating insights based on novel combinations of memory fragments. A variety of research findings are consistent with the hypothesis that cross-cortical consolidation can progress during sleep, although further support is needed, and we suggest some potentially fruitful research directions. Determining how processing during sleep can facilitate memory storage will be an exciting focus of research in the coming years. PMID:15576883