NASA Technical Reports Server (NTRS)
Hall, William A. (Inventor)
1993-01-01
A bus programmable slave module card for use in a computer control system is disclosed which comprises a master computer and one or more slave computer modules interfacing by means of a bus. Each slave module includes its own microprocessor, memory, and control program for acting as a single loop controller. The slave card includes a plurality of memory means (S1, S2...) corresponding to a like plurality of memory devices (C1, C2...) in the master computer, for each slave memory means its own communication lines connectable through the bus with memory communication lines of an associated memory device in the master computer, and a one-way electronic door which is switchable to either a closed condition or a one-way open condition. With the door closed, communication lines between master computer memory (C1, C2...) and slave memory (S1, S2...) are blocked. In the one-way open condition invention, the memory communication lines or each slave memory means (S1, S2...) connect with the memory communication lines of its associated memory device (C1, C2...) in the master computer, and the memory devices (C1, C2...) of the master computer and slave card are electrically parallel such that information seen by the master's memory is also seen by the slave's memory. The slave card is also connectable to a switch for electronically removing the slave microprocessor from the system. With the master computer and the slave card in programming mode relationship, and the slave microprocessor electronically removed from the system, loading a program in the memory devices (C1, C2...) of the master accomplishes a parallel loading into the memory devices (S1, S2...) of the slave.
Reducing power consumption during execution of an application on a plurality of compute nodes
Archer, Charles J.; Blocksome, Michael A.; Peters, Amanda E.; Ratterman, Joseph D.; Smith, Brian E.
2013-09-10
Methods, apparatus, and products are disclosed for reducing power consumption during execution of an application on a plurality of compute nodes that include: powering up, during compute node initialization, only a portion of computer memory of the compute node, including configuring an operating system for the compute node in the powered up portion of computer memory; receiving, by the operating system, an instruction to load an application for execution; allocating, by the operating system, additional portions of computer memory to the application for use during execution; powering up the additional portions of computer memory allocated for use by the application during execution; and loading, by the operating system, the application into the powered up additional portions of computer memory.
Optical memories in digital computing
NASA Technical Reports Server (NTRS)
Alford, C. O.; Gaylord, T. K.
1979-01-01
High capacity optical memories with relatively-high data-transfer rate and multiport simultaneous access capability may serve as basis for new computer architectures. Several computer structures that might profitably use memories are: a) simultaneous record-access system, b) simultaneously-shared memory computer system, and c) parallel digital processing structure.
Method and apparatus for managing access to a memory
DOE Office of Scientific and Technical Information (OSTI.GOV)
DeBenedictis, Erik
A method and apparatus for managing access to a memory of a computing system. A controller transforms a plurality of operations that represent a computing job into an operational memory layout that reduces a size of a selected portion of the memory that needs to be accessed to perform the computing job. The controller stores the operational memory layout in a plurality of memory cells within the selected portion of the memory. The controller controls a sequence by which a processor in the computing system accesses the memory to perform the computing job using the operational memory layout. The operationalmore » memory layout reduces an amount of energy consumed by the processor to perform the computing job.« less
Paging memory from random access memory to backing storage in a parallel computer
Archer, Charles J; Blocksome, Michael A; Inglett, Todd A; Ratterman, Joseph D; Smith, Brian E
2013-05-21
Paging memory from random access memory (`RAM`) to backing storage in a parallel computer that includes a plurality of compute nodes, including: executing a data processing application on a virtual machine operating system in a virtual machine on a first compute node; providing, by a second compute node, backing storage for the contents of RAM on the first compute node; and swapping, by the virtual machine operating system in the virtual machine on the first compute node, a page of memory from RAM on the first compute node to the backing storage on the second compute node.
Code of Federal Regulations, 2012 CFR
2012-07-01
... of computer codes. The emission control diagnostic system shall record and store in computer memory..., shall be stored in computer memory to identify correctly functioning emission control systems and those... in computer memory. Should a subsequent fuel system or misfire malfunction occur, any previously...
Code of Federal Regulations, 2013 CFR
2013-07-01
... of computer codes. The emission control diagnostic system shall record and store in computer memory..., shall be stored in computer memory to identify correctly functioning emission control systems and those... in computer memory. Should a subsequent fuel system or misfire malfunction occur, any previously...
The potential of multi-port optical memories in digital computing
NASA Technical Reports Server (NTRS)
Alford, C. O.; Gaylord, T. K.
1975-01-01
A high-capacity memory with a relatively high data transfer rate and multi-port simultaneous access capability may serve as the basis for new computer architectures. The implementation of a multi-port optical memory is discussed. Several computer structures are presented that might profitably use such a memory. These structures include (1) a simultaneous record access system, (2) a simultaneously shared memory computer system, and (3) a parallel digital processing structure.
Importance of balanced architectures in the design of high-performance imaging systems
NASA Astrophysics Data System (ADS)
Sgro, Joseph A.; Stanton, Paul C.
1999-03-01
Imaging systems employed in demanding military and industrial applications, such as automatic target recognition and computer vision, typically require real-time high-performance computing resources. While high- performances computing systems have traditionally relied on proprietary architectures and custom components, recent advances in high performance general-purpose microprocessor technology have produced an abundance of low cost components suitable for use in high-performance computing systems. A common pitfall in the design of high performance imaging system, particularly systems employing scalable multiprocessor architectures, is the failure to balance computational and memory bandwidth. The performance of standard cluster designs, for example, in which several processors share a common memory bus, is typically constrained by memory bandwidth. The symptom characteristic of this problem is failure to the performance of the system to scale as more processors are added. The problem becomes exacerbated if I/O and memory functions share the same bus. The recent introduction of microprocessors with large internal caches and high performance external memory interfaces makes it practical to design high performance imaging system with balanced computational and memory bandwidth. Real word examples of such designs will be presented, along with a discussion of adapting algorithm design to best utilize available memory bandwidth.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Betin, A Yu; Bobrinev, V I; Verenikina, N M
A multiplex method of recording computer-synthesised one-dimensional Fourier holograms intended for holographic memory devices is proposed. The method potentially allows increasing the recording density in the previously proposed holographic memory system based on the computer synthesis and projection recording of data page holograms. (holographic memory)
NASA Astrophysics Data System (ADS)
Nebashi, Ryusuke; Sakimura, Noboru; Sugibayashi, Tadahiko
2017-08-01
We evaluated the soft-error tolerance and energy consumption of an embedded computer with magnetic random access memory (MRAM) using two computer simulators. One is a central processing unit (CPU) simulator of a typical embedded computer system. We simulated the radiation-induced single-event-upset (SEU) probability in a spin-transfer-torque MRAM cell and also the failure rate of a typical embedded computer due to its main memory SEU error. The other is a delay tolerant network (DTN) system simulator. It simulates the power dissipation of wireless sensor network nodes of the system using a revised CPU simulator and a network simulator. We demonstrated that the SEU effect on the embedded computer with 1 Gbit MRAM-based working memory is less than 1 failure in time (FIT). We also demonstrated that the energy consumption of the DTN sensor node with MRAM-based working memory can be reduced to 1/11. These results indicate that MRAM-based working memory enhances the disaster tolerance of embedded computers.
Rapid solution of large-scale systems of equations
NASA Technical Reports Server (NTRS)
Storaasli, Olaf O.
1994-01-01
The analysis and design of complex aerospace structures requires the rapid solution of large systems of linear and nonlinear equations, eigenvalue extraction for buckling, vibration and flutter modes, structural optimization and design sensitivity calculation. Computers with multiple processors and vector capabilities can offer substantial computational advantages over traditional scalar computer for these analyses. These computers fall into two categories: shared memory computers and distributed memory computers. This presentation covers general-purpose, highly efficient algorithms for generation/assembly or element matrices, solution of systems of linear and nonlinear equations, eigenvalue and design sensitivity analysis and optimization. All algorithms are coded in FORTRAN for shared memory computers and many are adapted to distributed memory computers. The capability and numerical performance of these algorithms will be addressed.
Space-Bounded Church-Turing Thesis and Computational Tractability of Closed Systems.
Braverman, Mark; Schneider, Jonathan; Rojas, Cristóbal
2015-08-28
We report a new limitation on the ability of physical systems to perform computation-one that is based on generalizing the notion of memory, or storage space, available to the system to perform the computation. Roughly, we define memory as the maximal amount of information that the evolving system can carry from one instant to the next. We show that memory is a limiting factor in computation even in lieu of any time limitations on the evolving system-such as when considering its equilibrium regime. We call this limitation the space-bounded Church-Turing thesis (SBCT). The SBCT is supported by a simulation assertion (SA), which states that predicting the long-term behavior of bounded-memory systems is computationally tractable. In particular, one corollary of SA is an explicit bound on the computational hardness of the long-term behavior of a discrete-time finite-dimensional dynamical system that is affected by noise. We prove such a bound explicitly.
Non-volatile memory for checkpoint storage
DOE Office of Scientific and Technical Information (OSTI.GOV)
Blumrich, Matthias A.; Chen, Dong; Cipolla, Thomas M.
A system, method and computer program product for supporting system initiated checkpoints in high performance parallel computing systems and storing of checkpoint data to a non-volatile memory storage device. The system and method generates selective control signals to perform checkpointing of system related data in presence of messaging activity associated with a user application running at the node. The checkpointing is initiated by the system such that checkpoint data of a plurality of network nodes may be obtained even in the presence of user applications running on highly parallel computers that include ongoing user messaging activity. In one embodiment, themore » non-volatile memory is a pluggable flash memory card.« less
Conditional load and store in a shared memory
Blumrich, Matthias A; Ohmacht, Martin
2015-02-03
A method, system and computer program product for implementing load-reserve and store-conditional instructions in a multi-processor computing system. The computing system includes a multitude of processor units and a shared memory cache, and each of the processor units has access to the memory cache. In one embodiment, the method comprises providing the memory cache with a series of reservation registers, and storing in these registers addresses reserved in the memory cache for the processor units as a result of issuing load-reserve requests. In this embodiment, when one of the processor units makes a request to store data in the memory cache using a store-conditional request, the reservation registers are checked to determine if an address in the memory cache is reserved for that processor unit. If an address in the memory cache is reserved for that processor, the data are stored at this address.
The Science of Computing: Virtual Memory
NASA Technical Reports Server (NTRS)
Denning, Peter J.
1986-01-01
In the March-April issue, I described how a computer's storage system is organized as a hierarchy consisting of cache, main memory, and secondary memory (e.g., disk). The cache and main memory form a subsystem that functions like main memory but attains speeds approaching cache. What happens if a program and its data are too large for the main memory? This is not a frivolous question. Every generation of computer users has been frustrated by insufficient memory. A new line of computers may have sufficient storage for the computations of its predecessor, but new programs will soon exhaust its capacity. In 1960, a longrange planning committee at MIT dared to dream of a computer with 1 million words of main memory. In 1985, the Cray-2 was delivered with 256 million words. Computational physicists dream of computers with 1 billion words. Computer architects have done an outstanding job of enlarging main memories yet they have never kept up with demand. Only the shortsighted believe they can.
Opportunities for nonvolatile memory systems in extreme-scale high-performance computing
Vetter, Jeffrey S.; Mittal, Sparsh
2015-01-12
For extreme-scale high-performance computing systems, system-wide power consumption has been identified as one of the key constraints moving forward, where DRAM main memory systems account for about 30 to 50 percent of a node's overall power consumption. As the benefits of device scaling for DRAM memory slow, it will become increasingly difficult to keep memory capacities balanced with increasing computational rates offered by next-generation processors. However, several emerging memory technologies related to nonvolatile memory (NVM) devices are being investigated as an alternative for DRAM. Moving forward, NVM devices could offer solutions for HPC architectures. Researchers are investigating how to integratemore » these emerging technologies into future extreme-scale HPC systems and how to expose these capabilities in the software stack and applications. In addition, current results show several of these strategies could offer high-bandwidth I/O, larger main memory capacities, persistent data structures, and new approaches for application resilience and output postprocessing, such as transaction-based incremental checkpointing and in situ visualization, respectively.« less
ERIC Educational Resources Information Center
Paesler, M. A.
2009-01-01
Digital computers use different kinds of memory, each of which is either volatile or nonvolatile. On most computers only the hard drive memory is nonvolatile, i.e., it retains all information stored on it when the power is off. When a computer is turned on, an operating system stored on the hard drive is loaded into the computer's memory cache and…
Development of 3-Year Roadmap to Transform the Discipline of Systems Engineering
2010-03-31
quickly humans could physically construct them. Indeed, magnetic core memory was entirely constructed by human hands until it was superseded by...For their mainframe computers, IBM develops the applications, operating system, computer hardware and microprocessors (off the shelf standard memory ...processor developers work on potential computational and memory pipelines to support the required performance capabilities and use the available transistors
System and method for programmable bank selection for banked memory subsystems
Blumrich, Matthias A.; Chen, Dong; Gara, Alan G.; Giampapa, Mark E.; Hoenicke, Dirk; Ohmacht, Martin; Salapura, Valentina; Sugavanam, Krishnan
2010-09-07
A programmable memory system and method for enabling one or more processor devices access to shared memory in a computing environment, the shared memory including one or more memory storage structures having addressable locations for storing data. The system comprises: one or more first logic devices associated with a respective one or more processor devices, each first logic device for receiving physical memory address signals and programmable for generating a respective memory storage structure select signal upon receipt of pre-determined address bit values at selected physical memory address bit locations; and, a second logic device responsive to each of the respective select signal for generating an address signal used for selecting a memory storage structure for processor access. The system thus enables each processor device of a computing environment memory storage access distributed across the one or more memory storage structures.
Integrating Commercial Off-The-Shelf (COTS) graphics and extended memory packages with CLIPS
NASA Technical Reports Server (NTRS)
Callegari, Andres C.
1990-01-01
This paper addresses the question of how to mix CLIPS with graphics and how to overcome PC's memory limitations by using the extended memory available in the computer. By adding graphics and extended memory capabilities, CLIPS can be converted into a complete and powerful system development tool, on the other most economical and popular computer platform. New models of PCs have amazing processing capabilities and graphic resolutions that cannot be ignored and should be used to the fullest of their resources. CLIPS is a powerful expert system development tool, but it cannot be complete without the support of a graphics package needed to create user interfaces and general purpose graphics, or without enough memory to handle large knowledge bases. Now, a well known limitation on the PC's is the usage of real memory which limits CLIPS to use only 640 Kb of real memory, but now that problem can be solved by developing a version of CLIPS that uses extended memory. The user has access of up to 16 MB of memory on 80286 based computers and, practically, all the available memory (4 GB) on computers that use the 80386 processor. So if we give CLIPS a self-configuring graphics package that will automatically detect the graphics hardware and pointing device present in the computer, and we add the availability of the extended memory that exists in the computer (with no special hardware needed), the user will be able to create more powerful systems at a fraction of the cost and on the most popular, portable, and economic platform available such as the PC platform.
Space-Bounded Church-Turing Thesis and Computational Tractability of Closed Systems
NASA Astrophysics Data System (ADS)
Braverman, Mark; Schneider, Jonathan; Rojas, Cristóbal
2015-08-01
We report a new limitation on the ability of physical systems to perform computation—one that is based on generalizing the notion of memory, or storage space, available to the system to perform the computation. Roughly, we define memory as the maximal amount of information that the evolving system can carry from one instant to the next. We show that memory is a limiting factor in computation even in lieu of any time limitations on the evolving system—such as when considering its equilibrium regime. We call this limitation the space-bounded Church-Turing thesis (SBCT). The SBCT is supported by a simulation assertion (SA), which states that predicting the long-term behavior of bounded-memory systems is computationally tractable. In particular, one corollary of SA is an explicit bound on the computational hardness of the long-term behavior of a discrete-time finite-dimensional dynamical system that is affected by noise. We prove such a bound explicitly.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Venkata, Manjunath Gorentla; Aderholdt, William F
The pre-exascale systems are expected to have a significant amount of hierarchical and heterogeneous on-node memory, and this trend of system architecture in extreme-scale systems is expected to continue into the exascale era. along with hierarchical-heterogeneous memory, the system typically has a high-performing network ad a compute accelerator. This system architecture is not only effective for running traditional High Performance Computing (HPC) applications (Big-Compute), but also for running data-intensive HPC applications and Big-Data applications. As a consequence, there is a growing desire to have a single system serve the needs of both Big-Compute and Big-Data applications. Though the system architecturemore » supports the convergence of the Big-Compute and Big-Data, the programming models and software layer have yet to evolve to support either hierarchical-heterogeneous memory systems or the convergence. A programming abstraction to address this problem. The programming abstraction is implemented as a software library and runs on pre-exascale and exascale systems supporting current and emerging system architecture. Using distributed data-structures as a central concept, it provides (1) a simple, usable, and portable abstraction for hierarchical-heterogeneous memory and (2) a unified programming abstraction for Big-Compute and Big-Data applications.« less
NASA Technical Reports Server (NTRS)
Chow, Edward T.; Schatzel, Donald V.; Whitaker, William D.; Sterling, Thomas
2008-01-01
A Spaceborne Processor Array in Multifunctional Structure (SPAMS) can lower the total mass of the electronic and structural overhead of spacecraft, resulting in reduced launch costs, while increasing the science return through dynamic onboard computing. SPAMS integrates the multifunctional structure (MFS) and the Gilgamesh Memory, Intelligence, and Network Device (MIND) multi-core in-memory computer architecture into a single-system super-architecture. This transforms every inch of a spacecraft into a sharable, interconnected, smart computing element to increase computing performance while simultaneously reducing mass. The MIND in-memory architecture provides a foundation for high-performance, low-power, and fault-tolerant computing. The MIND chip has an internal structure that includes memory, processing, and communication functionality. The Gilgamesh is a scalable system comprising multiple MIND chips interconnected to operate as a single, tightly coupled, parallel computer. The array of MIND components shares a global, virtual name space for program variables and tasks that are allocated at run time to the distributed physical memory and processing resources. Individual processor- memory nodes can be activated or powered down at run time to provide active power management and to configure around faults. A SPAMS system is comprised of a distributed Gilgamesh array built into MFS, interfaces into instrument and communication subsystems, a mass storage interface, and a radiation-hardened flight computer.
Computer memory power control for the Galileo spacecraft
NASA Technical Reports Server (NTRS)
Detwiler, R. C.
1983-01-01
The developmental history, major design drives, and final topology of the computer memory power system on the Galileo spacecraft are described. A unique method of generating memory backup power directly from the fault current drawn during a spacecraft power overload or fault condition allows this system to provide continuous memory power. This concept provides a unique solution to the problem of volatile memory loss without the use of a battery of other large energy storage elements usually associated with uninterrupted power supply designs.
Curtis, Evan T; Jamieson, Randall K
2018-04-01
Current theory has divided memory into multiple systems, resulting in a fractionated account of human behaviour. By an alternative perspective, memory is a single system. However, debate over the details of different single-system theories has overshadowed the converging agreement among them, slowing the reunification of memory. Evidence in favour of dividing memory often takes the form of dissociations observed in amnesia, where amnesic patients are impaired on some memory tasks but not others. The dissociations are taken as evidence for separate explicit and implicit memory systems. We argue against this perspective. We simulate two key dissociations between classification and recognition in a computational model of memory, A Theory of Nonanalytic Association. We assume that amnesia reflects a quantitative difference in the quality of encoding. We also present empirical evidence that replicates the dissociations in healthy participants, simulating amnesic behaviour by reducing study time. In both analyses, we successfully reproduce the dissociations. We integrate our computational and empirical successes with the success of alternative models and manipulations and argue that our demonstrations, taken in concert with similar demonstrations with similar models, provide converging evidence for a more general set of single-system analyses that support the conclusion that a wide variety of memory phenomena can be explained by a unified and coherent set of principles.
Multiple-User, Multitasking, Virtual-Memory Computer System
NASA Technical Reports Server (NTRS)
Generazio, Edward R.; Roth, Don J.; Stang, David B.
1993-01-01
Computer system designed and programmed to serve multiple users in research laboratory. Provides for computer control and monitoring of laboratory instruments, acquisition and anlaysis of data from those instruments, and interaction with users via remote terminals. System provides fast access to shared central processing units and associated large (from megabytes to gigabytes) memories. Underlying concept of system also applicable to monitoring and control of industrial processes.
Computers, the Human Mind, and My In-Laws' House.
ERIC Educational Resources Information Center
Esque, Timm J.
1996-01-01
Discussion of human memory, computer memory, and the storage of information focuses on a metaphor that can account for memory without storage and can set the stage for systemic research around a more comprehensive, understandable theory. (Author/LRW)
Persistent Memory in Single Node Delay-Coupled Reservoir Computing.
Kovac, André David; Koall, Maximilian; Pipa, Gordon; Toutounji, Hazem
2016-01-01
Delays are ubiquitous in biological systems, ranging from genetic regulatory networks and synaptic conductances, to predator/pray population interactions. The evidence is mounting, not only to the presence of delays as physical constraints in signal propagation speed, but also to their functional role in providing dynamical diversity to the systems that comprise them. The latter observation in biological systems inspired the recent development of a computational architecture that harnesses this dynamical diversity, by delay-coupling a single nonlinear element to itself. This architecture is a particular realization of Reservoir Computing, where stimuli are injected into the system in time rather than in space as is the case with classical recurrent neural network realizations. This architecture also exhibits an internal memory which fades in time, an important prerequisite to the functioning of any reservoir computing device. However, fading memory is also a limitation to any computation that requires persistent storage. In order to overcome this limitation, the current work introduces an extended version to the single node Delay-Coupled Reservoir, that is based on trained linear feedback. We show by numerical simulations that adding task-specific linear feedback to the single node Delay-Coupled Reservoir extends the class of solvable tasks to those that require nonfading memory. We demonstrate, through several case studies, the ability of the extended system to carry out complex nonlinear computations that depend on past information, whereas the computational power of the system with fading memory alone quickly deteriorates. Our findings provide the theoretical basis for future physical realizations of a biologically-inspired ultrafast computing device with extended functionality.
Persistent Memory in Single Node Delay-Coupled Reservoir Computing
Pipa, Gordon; Toutounji, Hazem
2016-01-01
Delays are ubiquitous in biological systems, ranging from genetic regulatory networks and synaptic conductances, to predator/pray population interactions. The evidence is mounting, not only to the presence of delays as physical constraints in signal propagation speed, but also to their functional role in providing dynamical diversity to the systems that comprise them. The latter observation in biological systems inspired the recent development of a computational architecture that harnesses this dynamical diversity, by delay-coupling a single nonlinear element to itself. This architecture is a particular realization of Reservoir Computing, where stimuli are injected into the system in time rather than in space as is the case with classical recurrent neural network realizations. This architecture also exhibits an internal memory which fades in time, an important prerequisite to the functioning of any reservoir computing device. However, fading memory is also a limitation to any computation that requires persistent storage. In order to overcome this limitation, the current work introduces an extended version to the single node Delay-Coupled Reservoir, that is based on trained linear feedback. We show by numerical simulations that adding task-specific linear feedback to the single node Delay-Coupled Reservoir extends the class of solvable tasks to those that require nonfading memory. We demonstrate, through several case studies, the ability of the extended system to carry out complex nonlinear computations that depend on past information, whereas the computational power of the system with fading memory alone quickly deteriorates. Our findings provide the theoretical basis for future physical realizations of a biologically-inspired ultrafast computing device with extended functionality. PMID:27783690
40 CFR 86.1806-01 - On-board diagnostics.
Code of Federal Regulations, 2013 CFR
2013-07-01
.... The emission control diagnostic system shall record and store in computer memory diagnostic trouble... or system, “freeze frame” engine conditions present at the time shall be stored in computer memory... equipped with an onboard diagnostic (OBD) system capable of monitoring, for each vehicle's useful life, all...
40 CFR 86.1806-01 - On-board diagnostics.
Code of Federal Regulations, 2011 CFR
2011-07-01
.... The emission control diagnostic system shall record and store in computer memory diagnostic trouble... or system, “freeze frame” engine conditions present at the time shall be stored in computer memory... equipped with an onboard diagnostic (OBD) system capable of monitoring, for each vehicle's useful life, all...
40 CFR 86.1806-01 - On-board diagnostics.
Code of Federal Regulations, 2012 CFR
2012-07-01
.... The emission control diagnostic system shall record and store in computer memory diagnostic trouble... or system, “freeze frame” engine conditions present at the time shall be stored in computer memory... equipped with an onboard diagnostic (OBD) system capable of monitoring, for each vehicle's useful life, all...
Synthetic Analog and Digital Circuits for Cellular Computation and Memory
Purcell, Oliver; Lu, Timothy K.
2014-01-01
Biological computation is a major area of focus in synthetic biology because it has the potential to enable a wide range of applications. Synthetic biologists have applied engineering concepts to biological systems in order to construct progressively more complex gene circuits capable of processing information in living cells. Here, we review the current state of computational genetic circuits and describe artificial gene circuits that perform digital and analog computation. We then discuss recent progress in designing gene circuits that exhibit memory, and how memory and computation have been integrated to yield more complex systems that can both process and record information. Finally, we suggest new directions for engineering biological circuits capable of computation. PMID:24794536
Hypercluster Parallel Processor
NASA Technical Reports Server (NTRS)
Blech, Richard A.; Cole, Gary L.; Milner, Edward J.; Quealy, Angela
1992-01-01
Hypercluster computer system includes multiple digital processors, operation of which coordinated through specialized software. Configurable according to various parallel-computing architectures of shared-memory or distributed-memory class, including scalar computer, vector computer, reduced-instruction-set computer, and complex-instruction-set computer. Designed as flexible, relatively inexpensive system that provides single programming and operating environment within which one can investigate effects of various parallel-computing architectures and combinations on performance in solution of complicated problems like those of three-dimensional flows in turbomachines. Hypercluster software and architectural concepts are in public domain.
NASA Astrophysics Data System (ADS)
Ando, K.; Fujita, S.; Ito, J.; Yuasa, S.; Suzuki, Y.; Nakatani, Y.; Miyazaki, T.; Yoda, H.
2014-05-01
Most parts of present computer systems are made of volatile devices, and the power to supply them to avoid information loss causes huge energy losses. We can eliminate this meaningless energy loss by utilizing the non-volatile function of advanced spin-transfer torque magnetoresistive random-access memory (STT-MRAM) technology and create a new type of computer, i.e., normally off computers. Critical tasks to achieve normally off computers are implementations of STT-MRAM technologies in the main memory and low-level cache memories. STT-MRAM technology for applications to the main memory has been successfully developed by using perpendicular STT-MRAMs, and faster STT-MRAM technologies for applications to the cache memory are now being developed. The present status of STT-MRAMs and challenges that remain for normally off computers are discussed.
Extended write combining using a write continuation hint flag
Chen, Dong; Gara, Alan; Heidelberger, Philip; Ohmacht, Martin; Vranas, Pavlos
2013-06-04
A computing apparatus for reducing the amount of processing in a network computing system which includes a network system device of a receiving node for receiving electronic messages comprising data. The electronic messages are transmitted from a sending node. The network system device determines when more data of a specific electronic message is being transmitted. A memory device stores the electronic message data and communicating with the network system device. A memory subsystem communicates with the memory device. The memory subsystem stores a portion of the electronic message when more data of the specific message will be received, and the buffer combines the portion with later received data and moves the data to the memory device for accessible storage.
Computer memory management system
Kirk, III, Whitson John
2002-01-01
A computer memory management system utilizing a memory structure system of "intelligent" pointers in which information related to the use status of the memory structure is designed into the pointer. Through this pointer system, The present invention provides essentially automatic memory management (often referred to as garbage collection) by allowing relationships between objects to have definite memory management behavior by use of coding protocol which describes when relationships should be maintained and when the relationships should be broken. In one aspect, the present invention system allows automatic breaking of strong links to facilitate object garbage collection, coupled with relationship adjectives which define deletion of associated objects. In another aspect, The present invention includes simple-to-use infinite undo/redo functionality in that it has the capability, through a simple function call, to undo all of the changes made to a data model since the previous `valid state` was noted.
High efficiency coherent optical memory with warm rubidium vapour
Hosseini, M.; Sparkes, B.M.; Campbell, G.; Lam, P.K.; Buchler, B.C.
2011-01-01
By harnessing aspects of quantum mechanics, communication and information processing could be radically transformed. Promising forms of quantum information technology include optical quantum cryptographic systems and computing using photons for quantum logic operations. As with current information processing systems, some form of memory will be required. Quantum repeaters, which are required for long distance quantum key distribution, require quantum optical memory as do deterministic logic gates for optical quantum computing. Here, we present results from a coherent optical memory based on warm rubidium vapour and show 87% efficient recall of light pulses, the highest efficiency measured to date for any coherent optical memory suitable for quantum information applications. We also show storage and recall of up to 20 pulses from our system. These results show that simple warm atomic vapour systems have clear potential as a platform for quantum memory. PMID:21285952
High efficiency coherent optical memory with warm rubidium vapour.
Hosseini, M; Sparkes, B M; Campbell, G; Lam, P K; Buchler, B C
2011-02-01
By harnessing aspects of quantum mechanics, communication and information processing could be radically transformed. Promising forms of quantum information technology include optical quantum cryptographic systems and computing using photons for quantum logic operations. As with current information processing systems, some form of memory will be required. Quantum repeaters, which are required for long distance quantum key distribution, require quantum optical memory as do deterministic logic gates for optical quantum computing. Here, we present results from a coherent optical memory based on warm rubidium vapour and show 87% efficient recall of light pulses, the highest efficiency measured to date for any coherent optical memory suitable for quantum information applications. We also show storage and recall of up to 20 pulses from our system. These results show that simple warm atomic vapour systems have clear potential as a platform for quantum memory.
Synthetic analog and digital circuits for cellular computation and memory.
Purcell, Oliver; Lu, Timothy K
2014-10-01
Biological computation is a major area of focus in synthetic biology because it has the potential to enable a wide range of applications. Synthetic biologists have applied engineering concepts to biological systems in order to construct progressively more complex gene circuits capable of processing information in living cells. Here, we review the current state of computational genetic circuits and describe artificial gene circuits that perform digital and analog computation. We then discuss recent progress in designing gene networks that exhibit memory, and how memory and computation have been integrated to yield more complex systems that can both process and record information. Finally, we suggest new directions for engineering biological circuits capable of computation. Copyright © 2014 The Authors. Published by Elsevier Ltd.. All rights reserved.
Microprogramming Handbook. Second Edition.
ERIC Educational Resources Information Center
Microdata Corp., Santa Ana, CA.
Instead of instructions residing in the main memory as in a fixed instruction computer, a micro-programable computer has a separete read-only memory which is alterable so that the system can be efficiently adapted to the application at hand. Microprogramable computers are faster than fixed instruction computers for several reasons: instruction…
Fault tolerant computing: A preamble for assuring viability of large computer systems
NASA Technical Reports Server (NTRS)
Lim, R. S.
1977-01-01
The need for fault-tolerant computing is addressed from the viewpoints of (1) why it is needed, (2) how to apply it in the current state of technology, and (3) what it means in the context of the Phoenix computer system and other related systems. To this end, the value of concurrent error detection and correction is described. User protection, program retry, and repair are among the factors considered. The technology of algebraic codes to protect memory systems and arithmetic codes to protect memory systems and arithmetic codes to protect arithmetic operations is discussed.
Implementation of Parallel Dynamic Simulation on Shared-Memory vs. Distributed-Memory Environments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jin, Shuangshuang; Chen, Yousu; Wu, Di
2015-12-09
Power system dynamic simulation computes the system response to a sequence of large disturbance, such as sudden changes in generation or load, or a network short circuit followed by protective branch switching operation. It consists of a large set of differential and algebraic equations, which is computational intensive and challenging to solve using single-processor based dynamic simulation solution. High-performance computing (HPC) based parallel computing is a very promising technology to speed up the computation and facilitate the simulation process. This paper presents two different parallel implementations of power grid dynamic simulation using Open Multi-processing (OpenMP) on shared-memory platform, and Messagemore » Passing Interface (MPI) on distributed-memory clusters, respectively. The difference of the parallel simulation algorithms and architectures of the two HPC technologies are illustrated, and their performances for running parallel dynamic simulation are compared and demonstrated.« less
Memory management and compiler support for rapid recovery from failures in computer systems
NASA Technical Reports Server (NTRS)
Fuchs, W. K.
1991-01-01
This paper describes recent developments in the use of memory management and compiler technology to support rapid recovery from failures in computer systems. The techniques described include cache coherence protocols for user transparent checkpointing in multiprocessor systems, compiler-based checkpoint placement, compiler-based code modification for multiple instruction retry, and forward recovery in distributed systems utilizing optimistic execution.
Systems and methods for rapid processing and storage of data
Stalzer, Mark A.
2017-01-24
Systems and methods of building massively parallel computing systems using low power computing complexes in accordance with embodiments of the invention are disclosed. A massively parallel computing system in accordance with one embodiment of the invention includes at least one Solid State Blade configured to communicate via a high performance network fabric. In addition, each Solid State Blade includes a processor configured to communicate with a plurality of low power computing complexes interconnected by a router, and each low power computing complex includes at least one general processing core, an accelerator, an I/O interface, and cache memory and is configured to communicate with non-volatile solid state memory.
DMA shared byte counters in a parallel computer
Chen, Dong; Gara, Alan G.; Heidelberger, Philip; Vranas, Pavlos
2010-04-06
A parallel computer system is constructed as a network of interconnected compute nodes. Each of the compute nodes includes at least one processor, a memory and a DMA engine. The DMA engine includes a processor interface for interfacing with the at least one processor, DMA logic, a memory interface for interfacing with the memory, a DMA network interface for interfacing with the network, injection and reception byte counters, injection and reception FIFO metadata, and status registers and control registers. The injection FIFOs maintain memory locations of the injection FIFO metadata memory locations including its current head and tail, and the reception FIFOs maintain the reception FIFO metadata memory locations including its current head and tail. The injection byte counters and reception byte counters may be shared between messages.
OS friendly microprocessor architecture: Hardware level computer security
NASA Astrophysics Data System (ADS)
Jungwirth, Patrick; La Fratta, Patrick
2016-05-01
We present an introduction to the patented OS Friendly Microprocessor Architecture (OSFA) and hardware level computer security. Conventional microprocessors have not tried to balance hardware performance and OS performance at the same time. Conventional microprocessors have depended on the Operating System for computer security and information assurance. The goal of the OS Friendly Architecture is to provide a high performance and secure microprocessor and OS system. We are interested in cyber security, information technology (IT), and SCADA control professionals reviewing the hardware level security features. The OS Friendly Architecture is a switched set of cache memory banks in a pipeline configuration. For light-weight threads, the memory pipeline configuration provides near instantaneous context switching times. The pipelining and parallelism provided by the cache memory pipeline provides for background cache read and write operations while the microprocessor's execution pipeline is running instructions. The cache bank selection controllers provide arbitration to prevent the memory pipeline and microprocessor's execution pipeline from accessing the same cache bank at the same time. This separation allows the cache memory pages to transfer to and from level 1 (L1) caching while the microprocessor pipeline is executing instructions. Computer security operations are implemented in hardware. By extending Unix file permissions bits to each cache memory bank and memory address, the OSFA provides hardware level computer security.
Data Movement Dominates: Advanced Memory Technology to Address the Real Exascale Power Problem
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bergman, Keren
Energy is the fundamental barrier to Exascale supercomputing and is dominated by the cost of moving data from one point to another, not computation. Similarly, performance is dominated by data movement, not computation. The solution to this problem requires three critical technologies: 3D integration, optical chip-to-chip communication, and a new communication model. The central goal of the Sandia led "Data Movement Dominates" project aimed to develop memory systems and new architectures based on these technologies that have the potential to lower the cost of local memory accesses by orders of magnitude and provide substantially more bandwidth. Only through these transformationalmore » advances can future systems reach the goals of Exascale computing with a manageable power budgets. The Sandia led team included co-PIs from Columbia University, Lawrence Berkeley Lab, and the University of Maryland. The Columbia effort of Data Movement Dominates focused on developing a physically accurate simulation environment and experimental verification for optically-connected memory (OCM) systems that can enable continued performance scaling through high-bandwidth capacity, energy-efficient bit-rate transparency, and time-of-flight latency. With OCM, memory device parallelism and total capacity can scale to match future high-performance computing requirements without sacrificing data-movement efficiency. When we consider systems with integrated photonics, links to memory can be seamlessly integrated with the interconnection network-in a sense, memory becomes a primary aspect of the interconnection network. At the core of the Columbia effort, toward expanding our understanding of OCM enabled computing we have created an integrated modeling and simulation environment that uniquely integrates the physical behavior of the optical layer. The PhoenxSim suite of design and software tools developed under this effort has enabled the co-design of and performance evaluation photonics-enabled OCM architectures on Exascale computing systems.« less
Kinetic energy classification and smoothing for compact B-spline basis sets in quantum Monte Carlo
Krogel, Jaron T.; Reboredo, Fernando A.
2018-01-25
Quantum Monte Carlo calculations of defect properties of transition metal oxides have become feasible in recent years due to increases in computing power. As the system size has grown, availability of on-node memory has become a limiting factor. Saving memory while minimizing computational cost is now a priority. The main growth in memory demand stems from the B-spline representation of the single particle orbitals, especially for heavier elements such as transition metals where semi-core states are present. Despite the associated memory costs, splines are computationally efficient. In this paper, we explore alternatives to reduce the memory usage of splined orbitalsmore » without significantly affecting numerical fidelity or computational efficiency. We make use of the kinetic energy operator to both classify and smooth the occupied set of orbitals prior to splining. By using a partitioning scheme based on the per-orbital kinetic energy distributions, we show that memory savings of about 50% is possible for select transition metal oxide systems. Finally, for production supercells of practical interest, our scheme incurs a performance penalty of less than 5%.« less
Kinetic energy classification and smoothing for compact B-spline basis sets in quantum Monte Carlo
DOE Office of Scientific and Technical Information (OSTI.GOV)
Krogel, Jaron T.; Reboredo, Fernando A.
Quantum Monte Carlo calculations of defect properties of transition metal oxides have become feasible in recent years due to increases in computing power. As the system size has grown, availability of on-node memory has become a limiting factor. Saving memory while minimizing computational cost is now a priority. The main growth in memory demand stems from the B-spline representation of the single particle orbitals, especially for heavier elements such as transition metals where semi-core states are present. Despite the associated memory costs, splines are computationally efficient. In this paper, we explore alternatives to reduce the memory usage of splined orbitalsmore » without significantly affecting numerical fidelity or computational efficiency. We make use of the kinetic energy operator to both classify and smooth the occupied set of orbitals prior to splining. By using a partitioning scheme based on the per-orbital kinetic energy distributions, we show that memory savings of about 50% is possible for select transition metal oxide systems. Finally, for production supercells of practical interest, our scheme incurs a performance penalty of less than 5%.« less
Kinetic energy classification and smoothing for compact B-spline basis sets in quantum Monte Carlo
NASA Astrophysics Data System (ADS)
Krogel, Jaron T.; Reboredo, Fernando A.
2018-01-01
Quantum Monte Carlo calculations of defect properties of transition metal oxides have become feasible in recent years due to increases in computing power. As the system size has grown, availability of on-node memory has become a limiting factor. Saving memory while minimizing computational cost is now a priority. The main growth in memory demand stems from the B-spline representation of the single particle orbitals, especially for heavier elements such as transition metals where semi-core states are present. Despite the associated memory costs, splines are computationally efficient. In this work, we explore alternatives to reduce the memory usage of splined orbitals without significantly affecting numerical fidelity or computational efficiency. We make use of the kinetic energy operator to both classify and smooth the occupied set of orbitals prior to splining. By using a partitioning scheme based on the per-orbital kinetic energy distributions, we show that memory savings of about 50% is possible for select transition metal oxide systems. For production supercells of practical interest, our scheme incurs a performance penalty of less than 5%.
NASA Technical Reports Server (NTRS)
LaBel, Kenneth A.; Ladbury, Ray; Oldhamm, Timothy
2010-01-01
As NASA has evolved it's usage of spaceflight computing, memory applications have followed as well. In this slide presentation, the history of NASA's memories from magnetic core and tape recorders to current semiconductor approaches is discussed. There is a brief description of current functional memory usage in NASA space systems followed by a description of potential radiation-induced failure modes along with considerations for reliable system design.
ERIC Educational Resources Information Center
Kumaran, Dharshan; McClelland, James L.
2012-01-01
In this article, we present a perspective on the role of the hippocampal system in generalization, instantiated in a computational model called REMERGE (recurrency and episodic memory results in generalization). We expose a fundamental, but neglected, tension between prevailing computational theories that emphasize the function of the hippocampus…
A method to compute SEU fault probabilities in memory arrays with error correction
NASA Technical Reports Server (NTRS)
Gercek, Gokhan
1994-01-01
With the increasing packing densities in VLSI technology, Single Event Upsets (SEU) due to cosmic radiations are becoming more of a critical issue in the design of space avionics systems. In this paper, a method is introduced to compute the fault (mishap) probability for a computer memory of size M words. It is assumed that a Hamming code is used for each word to provide single error correction. It is also assumed that every time a memory location is read, single errors are corrected. Memory is read randomly whose distribution is assumed to be known. In such a scenario, a mishap is defined as two SEU's corrupting the same memory location prior to a read. The paper introduces a method to compute the overall mishap probability for the entire memory for a mission duration of T hours.
PCI-based WILDFIRE reconfigurable computing engines
NASA Astrophysics Data System (ADS)
Fross, Bradley K.; Donaldson, Robert L.; Palmer, Douglas J.
1996-10-01
WILDFORCE is the first PCI-based custom reconfigurable computer that is based on the Splash 2 technology transferred from the National Security Agency and the Institute for Defense Analyses, Supercomputing Research Center (SRC). The WILDFORCE architecture has many of the features of the WILDFIRE computer, such as field- programmable gate array (FPGA) based processing elements, linear array and crossbar interconnection, and high- performance memory and I/O subsystems. New features introduced in the PCI-based WILDFIRE systems include memory/processor options that can be added to any processing element. These options include static and dynamic memory, digital signal processors (DSPs), FPGAs, and microprocessors. In addition to memory/processor options, many different application specific connectors can be used to extend the I/O capabilities of the system, including systolic I/O, camera input and video display output. This paper also discusses how this new PCI-based reconfigurable computing engine is used for rapid-prototyping, real-time video processing and other DSP applications.
FPGA-Based, Self-Checking, Fault-Tolerant Computers
NASA Technical Reports Server (NTRS)
Some, Raphael; Rennels, David
2004-01-01
A proposed computer architecture would exploit the capabilities of commercially available field-programmable gate arrays (FPGAs) to enable computers to detect and recover from bit errors. The main purpose of the proposed architecture is to enable fault-tolerant computing in the presence of single-event upsets (SEUs). [An SEU is a spurious bit flip (also called a soft error) caused by a single impact of ionizing radiation.] The architecture would also enable recovery from some soft errors caused by electrical transients and, to some extent, from intermittent and permanent (hard) errors caused by aging of electronic components. A typical FPGA of the current generation contains one or more complete processor cores, memories, and highspeed serial input/output (I/O) channels, making it possible to shrink a board-level processor node to a single integrated-circuit chip. Custom, highly efficient microcontrollers, general-purpose computers, custom I/O processors, and signal processors can be rapidly and efficiently implemented by use of FPGAs. Unfortunately, FPGAs are susceptible to SEUs. Prior efforts to mitigate the effects of SEUs have yielded solutions that degrade performance of the system and require support from external hardware and software. In comparison with other fault-tolerant- computing architectures (e.g., triple modular redundancy), the proposed architecture could be implemented with less circuitry and lower power demand. Moreover, the fault-tolerant computing functions would require only minimal support from circuitry outside the central processing units (CPUs) of computers, would not require any software support, and would be largely transparent to software and to other computer hardware. There would be two types of modules: a self-checking processor module and a memory system (see figure). The self-checking processor module would be implemented on a single FPGA and would be capable of detecting its own internal errors. It would contain two CPUs executing identical programs in lock step, with comparison of their outputs to detect errors. It would also contain various cache local memory circuits, communication circuits, and configurable special-purpose processors that would use self-checking checkers. (The basic principle of the self-checking checker method is to utilize logic circuitry that generates error signals whenever there is an error in either the checker or the circuit being checked.) The memory system would comprise a main memory and a hardware-controlled check-pointing system (CPS) based on a buffer memory denoted the recovery cache. The main memory would contain random-access memory (RAM) chips and FPGAs that would, in addition to everything else, implement double-error-detecting and single-error-correcting memory functions to enable recovery from single-bit errors.
Computational modelling of memory retention from synapse to behaviour
NASA Astrophysics Data System (ADS)
van Rossum, Mark C. W.; Shippi, Maria
2013-03-01
One of our most intriguing mental abilities is the capacity to store information and recall it from memory. Computational neuroscience has been influential in developing models and concepts of learning and memory. In this tutorial review we focus on the interplay between learning and forgetting. We discuss recent advances in the computational description of the learning and forgetting processes on synaptic, neuronal, and systems levels, as well as recent data that open up new challenges for statistical physicists.
ERIC Educational Resources Information Center
Cavalier, Al; And Others
A federally sponsored project was designed to incorporate a memory-assessment task and a memory strategy into a computer-based instructional system for assessing and assisting in remediating basic memory-processing and metacognitive deficiencies. The project resulted in an instructional system for school-aged children and youth with mild to…
Kramer, Tobias; Noack, Matthias; Reinefeld, Alexander; Rodríguez, Mirta; Zelinskyy, Yaroslav
2018-06-11
Time- and frequency-resolved optical signals provide insights into the properties of light-harvesting molecular complexes, including excitation energies, dipole strengths and orientations, as well as in the exciton energy flow through the complex. The hierarchical equations of motion (HEOM) provide a unifying theory, which allows one to study the combined effects of system-environment dissipation and non-Markovian memory without making restrictive assumptions about weak or strong couplings or separability of vibrational and electronic degrees of freedom. With increasing system size the exact solution of the open quantum system dynamics requires memory and compute resources beyond a single compute node. To overcome this barrier, we developed a scalable variant of HEOM. Our distributed memory HEOM, DM-HEOM, is a universal tool for open quantum system dynamics. It is used to accurately compute all experimentally accessible time- and frequency-resolved processes in light-harvesting molecular complexes with arbitrary system-environment couplings for a wide range of temperatures and complex sizes. © 2018 Wiley Periodicals, Inc. © 2018 Wiley Periodicals, Inc.
Progress In Optical Memory Technology
NASA Astrophysics Data System (ADS)
Tsunoda, Yoshito
1987-01-01
More than 20 years have passed since the concept of optical memory was first proposed in 1966. Since then considerable progress has been made in this area together with the creation of completely new markets of optical memory in consumer and computer application areas. The first generation of optical memory was mainly developed with holographic recording technology in late 1960s and early 1970s. Considerable number of developments have been done in both analog and digital memory applications. Unfortunately, these technologies did not meet a chance to be a commercial product. The second generation of optical memory started at the beginning of 1970s with bit by bit recording technology. Read-only type optical memories such as video disks and compact audio disks have extensively investigated. Since laser diodes were first applied to optical video disk read out in 1976, there have been extensive developments of laser diode pick-ups for optical disk memory systems. The third generation of optical memory started in 1978 with bit by bit read/write technology using laser diodes. Developments of recording materials including both write-once and erasable have been actively pursued at several research institutes. These technologies are mainly focused on the optical memory systems for computer application. Such practical applications of optical memory technology has resulted in the creation of such new products as compact audio disks and computer file memories.
Memory Benchmarks for SMP-Based High Performance Parallel Computers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yoo, A B; de Supinski, B; Mueller, F
2001-11-20
As the speed gap between CPU and main memory continues to grow, memory accesses increasingly dominates the performance of many applications. The problem is particularly acute for symmetric multiprocessor (SMP) systems, where the shared memory may be accessed concurrently by a group of threads running on separate CPUs. Unfortunately, several key issues governing memory system performance in current systems are not well understood. Complex interactions between the levels of the memory hierarchy, buses or switches, DRAM back-ends, system software, and application access patterns can make it difficult to pinpoint bottlenecks and determine appropriate optimizations, and the situation is even moremore » complex for SMP systems. To partially address this problem, we formulated a set of multi-threaded microbenchmarks for characterizing and measuring the performance of the underlying memory system in SMP-based high-performance computers. We report our use of these microbenchmarks on two important SMP-based machines. This paper has four primary contributions. First, we introduce a microbenchmark suite to systematically assess and compare the performance of different levels in SMP memory hierarchies. Second, we present a new tool based on hardware performance monitors to determine a wide array of memory system characteristics, such as cache sizes, quickly and easily; by using this tool, memory performance studies can be targeted to the full spectrum of performance regimes with many fewer data points than is otherwise required. Third, we present experimental results indicating that the performance of applications with large memory footprints remains largely constrained by memory. Fourth, we demonstrate that thread-level parallelism further degrades memory performance, even for the latest SMPs with hardware prefetching and switch-based memory interconnects.« less
Method of up-front load balancing for local memory parallel processors
NASA Technical Reports Server (NTRS)
Baffes, Paul Thomas (Inventor)
1990-01-01
In a parallel processing computer system with multiple processing units and shared memory, a method is disclosed for uniformly balancing the aggregate computational load in, and utilizing minimal memory by, a network having identical computations to be executed at each connection therein. Read-only and read-write memory are subdivided into a plurality of process sets, which function like artificial processing units. Said plurality of process sets is iteratively merged and reduced to the number of processing units without exceeding the balance load. Said merger is based upon the value of a partition threshold, which is a measure of the memory utilization. The turnaround time and memory savings of the instant method are functions of the number of processing units available and the number of partitions into which the memory is subdivided. Typical results of the preferred embodiment yielded memory savings of from sixty to seventy five percent.
Parallel processing for scientific computations
NASA Technical Reports Server (NTRS)
Alkhatib, Hasan S.
1995-01-01
The scope of this project dealt with the investigation of the requirements to support distributed computing of scientific computations over a cluster of cooperative workstations. Various experiments on computations for the solution of simultaneous linear equations were performed in the early phase of the project to gain experience in the general nature and requirements of scientific applications. A specification of a distributed integrated computing environment, DICE, based on a distributed shared memory communication paradigm has been developed and evaluated. The distributed shared memory model facilitates porting existing parallel algorithms that have been designed for shared memory multiprocessor systems to the new environment. The potential of this new environment is to provide supercomputing capability through the utilization of the aggregate power of workstations cooperating in a cluster interconnected via a local area network. Workstations, generally, do not have the computing power to tackle complex scientific applications, making them primarily useful for visualization, data reduction, and filtering as far as complex scientific applications are concerned. There is a tremendous amount of computing power that is left unused in a network of workstations. Very often a workstation is simply sitting idle on a desk. A set of tools can be developed to take advantage of this potential computing power to create a platform suitable for large scientific computations. The integration of several workstations into a logical cluster of distributed, cooperative, computing stations presents an alternative to shared memory multiprocessor systems. In this project we designed and evaluated such a system.
NASA's 3D Flight Computer for Space Applications
NASA Technical Reports Server (NTRS)
Alkalai, Leon
2000-01-01
The New Millennium Program (NMP) Integrated Product Development Team (IPDT) for Microelectronics Systems was planning to validate a newly developed 3D Flight Computer system on its first deep-space flight, DS1, launched in October 1998. This computer, developed in the 1995-97 time frame, contains many new computer technologies previously never used in deep-space systems. They include: advanced 3D packaging architecture for future low-mass and low-volume avionics systems; high-density 3D packaged chip-stacks for both volatile and non-volatile mass memory: 400 Mbytes of local DRAM memory, and 128 Mbytes of Flash memory; high-bandwidth Peripheral Component Interface (Per) local-bus with a bridge to VME; high-bandwidth (20 Mbps) fiber-optic serial bus; and other attributes, such as standard support for Design for Testability (DFT). Even though this computer system did not complete on time for delivery to the DS1 project, it was an important development along a technology roadmap towards highly integrated and highly miniaturized avionics systems for deep-space applications. This continued technology development is now being performed by NASA's Deep Space System Development Program (also known as X2000) and within JPL's Center for Integrated Space Microsystems (CISM).
Memory Overview - Technologies and Needs
NASA Technical Reports Server (NTRS)
LaBel, Kenneth A.
2010-01-01
As NASA has evolved it's usage of spaceflight computing, memory applications have followed as well. In this talk, we will discuss the history of NASA's memories from magnetic core and tape recorders to current semiconductor approaches. We will briefly describe current functional memory usage in NASA space systems followed by a description of potential radiation-induced failure modes along with considerations for reliable system design.
FFTs in external or hierarchical memory
NASA Technical Reports Server (NTRS)
Bailey, David H.
1989-01-01
A description is given of advanced techniques for computing an ordered FFT on a computer with external or hierarchical memory. These algorithms (1) require as few as two passes through the external data set, (2) use strictly unit stride, long vector transfers between main memory and external storage, (3) require only a modest amount of scratch space in main memory, and (4) are well suited for vector and parallel computation. Performance figures are included for implementations of some of these algorithms on Cray supercomputers. Of interest is the fact that a main memory version outperforms the current Cray library FFT routines on the Cray-2, the Cray X-MP, and the Cray Y-MP systems. Using all eight processors on the Cray Y-MP, this main memory routine runs at nearly 2 Gflops.
The Research on Linux Memory Forensics
NASA Astrophysics Data System (ADS)
Zhang, Jun; Che, ShengBing
2018-03-01
Memory forensics is a branch of computer forensics. It does not depend on the operating system API, and analyzes operating system information from binary memory data. Based on the 64-bit Linux operating system, it analyzes system process and thread information from physical memory data. Using ELF file debugging information and propose a method for locating kernel structure member variable, it can be applied to different versions of the Linux operating system. The experimental results show that the method can successfully obtain the sytem process information from physical memory data, and can be compatible with multiple versions of the Linux kernel.
A review of emerging non-volatile memory (NVM) technologies and applications
NASA Astrophysics Data System (ADS)
Chen, An
2016-11-01
This paper will review emerging non-volatile memory (NVM) technologies, with the focus on phase change memory (PCM), spin-transfer-torque random-access-memory (STTRAM), resistive random-access-memory (RRAM), and ferroelectric field-effect-transistor (FeFET) memory. These promising NVM devices are evaluated in terms of their advantages, challenges, and applications. Their performance is compared based on reported parameters of major industrial test chips. Memory selector devices and cell structures are discussed. Changing market trends toward low power (e.g., mobile, IoT) and data-centric applications create opportunities for emerging NVMs. High-performance and low-cost emerging NVMs may simplify memory hierarchy, introduce non-volatility in logic gates and circuits, reduce system power, and enable novel architectures. Storage-class memory (SCM) based on high-density NVMs could fill the performance and density gap between memory and storage. Some unique characteristics of emerging NVMs can be utilized for novel applications beyond the memory space, e.g., neuromorphic computing, hardware security, etc. In the beyond-CMOS era, emerging NVMs have the potential to fulfill more important functions and enable more efficient, intelligent, and secure computing systems.
Effect of virtual memory on efficient solution of two model problems
NASA Technical Reports Server (NTRS)
Lambiotte, J. J., Jr.
1977-01-01
Computers with virtual memory architecture allow programs to be written as if they were small enough to be contained in memory. Two types of problems are investigated to show that this luxury can lead to quite an inefficient performance if the programmer does not interact strongly with the characteristics of the operating system when developing the program. The two problems considered are the simultaneous solutions of a large linear system of equations by Gaussian elimination and a model three-dimensional finite-difference problem. The Control Data STAR-100 computer runs are made to demonstrate the inefficiencies of programming the problems in the manner one would naturally do if the problems were indeed, small enough to be contained in memory. Program redesigns are presented which achieve large improvements in performance through changes in the computational procedure and the data base arrangement.
Experimentally modeling stochastic processes with less memory by the use of a quantum processor
Palsson, Matthew S.; Gu, Mile; Ho, Joseph; Wiseman, Howard M.; Pryde, Geoff J.
2017-01-01
Computer simulation of observable phenomena is an indispensable tool for engineering new technology, understanding the natural world, and studying human society. However, the most interesting systems are often so complex that simulating their future behavior demands storing immense amounts of information regarding how they have behaved in the past. For increasingly complex systems, simulation becomes increasingly difficult and is ultimately constrained by resources such as computer memory. Recent theoretical work shows that quantum theory can reduce this memory requirement beyond ultimate classical limits, as measured by a process’ statistical complexity, C. We experimentally demonstrate this quantum advantage in simulating stochastic processes. Our quantum implementation observes a memory requirement of Cq = 0.05 ± 0.01, far below the ultimate classical limit of C = 1. Scaling up this technique would substantially reduce the memory required in simulations of more complex systems. PMID:28168218
HTMT-class Latency Tolerant Parallel Architecture for Petaflops Scale Computation
NASA Technical Reports Server (NTRS)
Sterling, Thomas; Bergman, Larry
2000-01-01
Computational Aero Sciences and other numeric intensive computation disciplines demand computing throughputs substantially greater than the Teraflops scale systems only now becoming available. The related fields of fluids, structures, thermal, combustion, and dynamic controls are among the interdisciplinary areas that in combination with sufficient resolution and advanced adaptive techniques may force performance requirements towards Petaflops. This will be especially true for compute intensive models such as Navier-Stokes are or when such system models are only part of a larger design optimization computation involving many design points. Yet recent experience with conventional MPP configurations comprising commodity processing and memory components has shown that larger scale frequently results in higher programming difficulty and lower system efficiency. While important advances in system software and algorithms techniques have had some impact on efficiency and programmability for certain classes of problems, in general it is unlikely that software alone will resolve the challenges to higher scalability. As in the past, future generations of high-end computers may require a combination of hardware architecture and system software advances to enable efficient operation at a Petaflops level. The NASA led HTMT project has engaged the talents of a broad interdisciplinary team to develop a new strategy in high-end system architecture to deliver petaflops scale computing in the 2004/5 timeframe. The Hybrid-Technology, MultiThreaded parallel computer architecture incorporates several advanced technologies in combination with an innovative dynamic adaptive scheduling mechanism to provide unprecedented performance and efficiency within practical constraints of cost, complexity, and power consumption. The emerging superconductor Rapid Single Flux Quantum electronics can operate at 100 GHz (the record is 770 GHz) and one percent of the power required by convention semiconductor logic. Wave Division Multiplexing optical communications can approach a peak per fiber bandwidth of 1 Tbps and the new Data Vortex network topology employing this technology can connect tens of thousands of ports providing a bi-section bandwidth on the order of a Petabyte per second with latencies well below 100 nanoseconds, even under heavy loads. Processor-in-Memory (PIM) technology combines logic and memory on the same chip exposing the internal bandwidth of the memory row buffers at low latency. And holographic storage photorefractive storage technologies provide high-density memory with access a thousand times faster than conventional disk technologies. Together these technologies enable a new class of shared memory system architecture with a peak performance in the range of a Petaflops but size and power requirements comparable to today's largest Teraflops scale systems. To achieve high-sustained performance, HTMT combines an advanced multithreading processor architecture with a memory-driven coarse-grained latency management strategy called "percolation", yielding high efficiency while reducing the much of the parallel programming burden. This paper will present the basic system architecture characteristics made possible through this series of advanced technologies and then give a detailed description of the new percolation approach to runtime latency management.
Cost aware cache replacement policy in shared last-level cache for hybrid memory based fog computing
NASA Astrophysics Data System (ADS)
Jia, Gangyong; Han, Guangjie; Wang, Hao; Wang, Feng
2018-04-01
Fog computing requires a large main memory capacity to decrease latency and increase the Quality of Service (QoS). However, dynamic random access memory (DRAM), the commonly used random access memory, cannot be included into a fog computing system due to its high consumption of power. In recent years, non-volatile memories (NVM) such as Phase-Change Memory (PCM) and Spin-transfer torque RAM (STT-RAM) with their low power consumption have emerged to replace DRAM. Moreover, the currently proposed hybrid main memory, consisting of both DRAM and NVM, have shown promising advantages in terms of scalability and power consumption. However, the drawbacks of NVM, such as long read/write latency give rise to potential problems leading to asymmetric cache misses in the hybrid main memory. Current last level cache (LLC) policies are based on the unified miss cost, and result in poor performance in LLC and add to the cost of using NVM. In order to minimize the cache miss cost in the hybrid main memory, we propose a cost aware cache replacement policy (CACRP) that reduces the number of cache misses from NVM and improves the cache performance for a hybrid memory system. Experimental results show that our CACRP behaves better in LLC performance, improving performance up to 43.6% (15.5% on average) compared to LRU.
Memory interface simulator: A computer design aid
NASA Technical Reports Server (NTRS)
Taylor, D. S.; Williams, T.; Weatherbee, J. E.
1972-01-01
Results are presented of a study conducted with a digital simulation model being used in the design of the Automatically Reconfigurable Modular Multiprocessor System (ARMMS), a candidate computer system for future manned and unmanned space missions. The model simulates the activity involved as instructions are fetched from random access memory for execution in one of the system central processing units. A series of model runs measured instruction execution time under various assumptions pertaining to the CPU's and the interface between the CPU's and RAM. Design tradeoffs are presented in the following areas: Bus widths, CPU microprogram read only memory cycle time, multiple instruction fetch, and instruction mix.
Generic, Type-Safe and Object Oriented Computer Algebra Software
NASA Astrophysics Data System (ADS)
Kredel, Heinz; Jolly, Raphael
Advances in computer science, in particular object oriented programming, and software engineering have had little practical impact on computer algebra systems in the last 30 years. The software design of existing systems is still dominated by ad-hoc memory management, weakly typed algorithm libraries and proprietary domain specific interactive expression interpreters. We discuss a modular approach to computer algebra software: usage of state-of-the-art memory management and run-time systems (e.g. JVM) usage of strongly typed, generic, object oriented programming languages (e.g. Java) and usage of general purpose, dynamic interactive expression interpreters (e.g. Python) To illustrate the workability of this approach, we have implemented and studied computer algebra systems in Java and Scala. In this paper we report on the current state of this work by presenting new examples.
Aberg, Kristoffer C; Müller, Julia; Schwartz, Sophie
2017-01-01
Anticipation and delivery of rewards improves memory formation, but little effort has been made to disentangle their respective contributions to memory enhancement. Moreover, it has been suggested that the effects of reward on memory are mediated by dopaminergic influences on hippocampal plasticity. Yet, evidence linking memory improvements to actual reward computations reflected in the activity of the dopaminergic system, i.e., prediction errors and expected values, is scarce and inconclusive. For example, different previous studies reported that the magnitude of prediction errors during a reinforcement learning task was a positive, negative, or non-significant predictor of successfully encoding simultaneously presented images. Individual sensitivities to reward and punishment have been found to influence the activation of the dopaminergic reward system and could therefore help explain these seemingly discrepant results. Here, we used a novel associative memory task combined with computational modeling and showed independent effects of reward-delivery and reward-anticipation on memory. Strikingly, the computational approach revealed positive influences from both reward delivery, as mediated by prediction error magnitude, and reward anticipation, as mediated by magnitude of expected value, even in the absence of behavioral effects when analyzed using standard methods, i.e., by collapsing memory performance across trials within conditions. We additionally measured trait estimates of reward and punishment sensitivity and found that individuals with increased reward (vs. punishment) sensitivity had better memory for associations encoded during positive (vs. negative) prediction errors when tested after 20 min, but a negative trend when tested after 24 h. In conclusion, modeling trial-by-trial fluctuations in the magnitude of reward, as we did here for prediction errors and expected value computations, provides a comprehensive and biologically plausible description of the dynamic interplay between reward, dopamine, and associative memory formation. Our results also underline the importance of considering individual traits when assessing reward-related influences on memory.
Including Memory Friction in Single- and Two-State Quantum Dynamics Simulations.
Brown, Paul A; Messina, Michael
2016-03-03
We present a simple computational algorithm that allows for the inclusion of memory friction in a quantum dynamics simulation of a small, quantum, primary system coupled to many atoms in the surroundings. We show how including a memory friction operator, F̂, in the primary quantum system's Hamiltonian operator builds memory friction into the dynamics of the primary quantum system. We show that, in the harmonic, semi-classical limit, this friction operator causes the classical phase-space centers of a wavepacket to evolve exactly as if it were a classical particle experiencing memory friction. We also show that this friction operator can be used to include memory friction in the quantum dynamics of an anharmonic primary system. We then generalize the algorithm so that it can be used to treat a primary quantum system that is evolving, non-adiabatically on two coupled potential energy surfaces, i.e., a model that can be used to model H atom transfer, for example. We demonstrate this approach's computational ease and flexibility by showing numerical results for both harmonic and anharmonic primary quantum systems in the single surface case. Finally, we present numerical results for a model of non-adiabatic H atom transfer between a reactant and product state that includes memory friction on one or both of the non-adiabatic potential energy surfaces and uncover some interesting dynamical effects of non-memory friction on the H atom transfer process.
Extended memory management under RTOS
NASA Technical Reports Server (NTRS)
Plummer, M.
1981-01-01
A technique for extended memory management in ROLM 1666 computers using FORTRAN is presented. A general software system is described for which the technique can be ideally applied. The memory manager interface with the system is described. The protocols by which the manager is invoked are presented, as well as the methods used by the manager.
NAS Applications and Advanced Algorithms
NASA Technical Reports Server (NTRS)
Bailey, David H.; Biswas, Rupak; VanDerWijngaart, Rob; Kutler, Paul (Technical Monitor)
1997-01-01
This paper examines the applications most commonly run on the supercomputers at the Numerical Aerospace Simulation (NAS) facility. It analyzes the extent to which such applications are fundamentally oriented to vector computers, and whether or not they can be efficiently implemented on hierarchical memory machines, such as systems with cache memories and highly parallel, distributed memory systems.
Information Processing Capacity of Dynamical Systems
NASA Astrophysics Data System (ADS)
Dambre, Joni; Verstraeten, David; Schrauwen, Benjamin; Massar, Serge
2012-07-01
Many dynamical systems, both natural and artificial, are stimulated by time dependent external signals, somehow processing the information contained therein. We demonstrate how to quantify the different modes in which information can be processed by such systems and combine them to define the computational capacity of a dynamical system. This is bounded by the number of linearly independent state variables of the dynamical system, equaling it if the system obeys the fading memory condition. It can be interpreted as the total number of linearly independent functions of its stimuli the system can compute. Our theory combines concepts from machine learning (reservoir computing), system modeling, stochastic processes, and functional analysis. We illustrate our theory by numerical simulations for the logistic map, a recurrent neural network, and a two-dimensional reaction diffusion system, uncovering universal trade-offs between the non-linearity of the computation and the system's short-term memory.
Information Processing Capacity of Dynamical Systems
Dambre, Joni; Verstraeten, David; Schrauwen, Benjamin; Massar, Serge
2012-01-01
Many dynamical systems, both natural and artificial, are stimulated by time dependent external signals, somehow processing the information contained therein. We demonstrate how to quantify the different modes in which information can be processed by such systems and combine them to define the computational capacity of a dynamical system. This is bounded by the number of linearly independent state variables of the dynamical system, equaling it if the system obeys the fading memory condition. It can be interpreted as the total number of linearly independent functions of its stimuli the system can compute. Our theory combines concepts from machine learning (reservoir computing), system modeling, stochastic processes, and functional analysis. We illustrate our theory by numerical simulations for the logistic map, a recurrent neural network, and a two-dimensional reaction diffusion system, uncovering universal trade-offs between the non-linearity of the computation and the system's short-term memory. PMID:22816038
A multiarchitecture parallel-processing development environment
NASA Technical Reports Server (NTRS)
Townsend, Scott; Blech, Richard; Cole, Gary
1993-01-01
A description is given of the hardware and software of a multiprocessor test bed - the second generation Hypercluster system. The Hypercluster architecture consists of a standard hypercube distributed-memory topology, with multiprocessor shared-memory nodes. By using standard, off-the-shelf hardware, the system can be upgraded to use rapidly improving computer technology. The Hypercluster's multiarchitecture nature makes it suitable for researching parallel algorithms in computational field simulation applications (e.g., computational fluid dynamics). The dedicated test-bed environment of the Hypercluster and its custom-built software allows experiments with various parallel-processing concepts such as message passing algorithms, debugging tools, and computational 'steering'. Such research would be difficult, if not impossible, to achieve on shared, commercial systems.
Virtual memory support for distributed computing environments using a shared data object model
NASA Astrophysics Data System (ADS)
Huang, F.; Bacon, J.; Mapp, G.
1995-12-01
Conventional storage management systems provide one interface for accessing memory segments and another for accessing secondary storage objects. This hinders application programming and affects overall system performance due to mandatory data copying and user/kernel boundary crossings, which in the microkernel case may involve context switches. Memory-mapping techniques may be used to provide programmers with a unified view of the storage system. This paper extends such techniques to support a shared data object model for distributed computing environments in which good support for coherence and synchronization is essential. The approach is based on a microkernel, typed memory objects, and integrated coherence control. A microkernel architecture is used to support multiple coherence protocols and the addition of new protocols. Memory objects are typed and applications can choose the most suitable protocols for different types of object to avoid protocol mismatch. Low-level coherence control is integrated with high-level concurrency control so that the number of messages required to maintain memory coherence is reduced and system-wide synchronization is realized without severely impacting the system performance. These features together contribute a novel approach to the support for flexible coherence under application control.
NASA Technical Reports Server (NTRS)
Bradley, D. B.; Irwin, J. D.
1974-01-01
A computer simulation model for a multiprocessor computer is developed that is useful for studying the problem of matching multiprocessor's memory space, memory bandwidth and numbers and speeds of processors with aggregate job set characteristics. The model assumes an input work load of a set of recurrent jobs. The model includes a feedback scheduler/allocator which attempts to improve system performance through higher memory bandwidth utilization by matching individual job requirements for space and bandwidth with space availability and estimates of bandwidth availability at the times of memory allocation. The simulation model includes provisions for specifying precedence relations among the jobs in a job set, and provisions for specifying precedence execution of TMR (Triple Modular Redundant and SIMPLEX (non redundant) jobs.
Computational principles of working memory in sentence comprehension.
Lewis, Richard L; Vasishth, Shravan; Van Dyke, Julie A
2006-10-01
Understanding a sentence requires a working memory of the partial products of comprehension, so that linguistic relations between temporally distal parts of the sentence can be rapidly computed. We describe an emerging theoretical framework for this working memory system that incorporates several independently motivated principles of memory: a sharply limited attentional focus, rapid retrieval of item (but not order) information subject to interference from similar items, and activation decay (forgetting over time). A computational model embodying these principles provides an explanation of the functional capacities and severe limitations of human processing, as well as accounts of reading times. The broad implication is that the detailed nature of cross-linguistic sentence processing emerges from the interaction of general principles of human memory with the specialized task of language comprehension.
Multi-processor including data flow accelerator module
Davidson, George S.; Pierce, Paul E.
1990-01-01
An accelerator module for a data flow computer includes an intelligent memory. The module is added to a multiprocessor arrangement and uses a shared tagged memory architecture in the data flow computer. The intelligent memory module assigns locations for holding data values in correspondence with arcs leading to a node in a data dependency graph. Each primitive computation is associated with a corresponding memory cell, including a number of slots for operands needed to execute a primitive computation, a primitive identifying pointer, and linking slots for distributing the result of the cell computation to other cells requiring that result as an operand. Circuitry is provided for utilizing tag bits to determine automatically when all operands required by a processor are available and for scheduling the primitive for execution in a queue. Each memory cell of the module may be associated with any of the primitives, and the particular primitive to be executed by the processor associated with the cell is identified by providing an index, such as the cell number for the primitive, to the primitive lookup table of starting addresses. The module thus serves to perform functions previously performed by a number of sections of data flow architectures and coexists with conventional shared memory therein. A multiprocessing system including the module operates in a hybrid mode, wherein the same processing modules are used to perform some processing in a sequential mode, under immediate control of an operating system, while performing other processing in a data flow mode.
Merlin - Massively parallel heterogeneous computing
NASA Technical Reports Server (NTRS)
Wittie, Larry; Maples, Creve
1989-01-01
Hardware and software for Merlin, a new kind of massively parallel computing system, are described. Eight computers are linked as a 300-MIPS prototype to develop system software for a larger Merlin network with 16 to 64 nodes, totaling 600 to 3000 MIPS. These working prototypes help refine a mapped reflective memory technique that offers a new, very general way of linking many types of computer to form supercomputers. Processors share data selectively and rapidly on a word-by-word basis. Fast firmware virtual circuits are reconfigured to match topological needs of individual application programs. Merlin's low-latency memory-sharing interfaces solve many problems in the design of high-performance computing systems. The Merlin prototypes are intended to run parallel programs for scientific applications and to determine hardware and software needs for a future Teraflops Merlin network.
Levy, Scott; Ferreira, Kurt B.; Bridges, Patrick G.; ...
2014-12-09
Building the next-generation of extreme-scale distributed systems will require overcoming several challenges related to system resilience. As the number of processors in these systems grow, the failure rate increases proportionally. One of the most common sources of failure in large-scale systems is memory. In this paper, we propose a novel runtime for transparently exploiting memory content similarity to improve system resilience by reducing the rate at which memory errors lead to node failure. We evaluate the viability of this approach by examining memory snapshots collected from eight high-performance computing (HPC) applications and two important HPC operating systems. Based on themore » characteristics of the similarity uncovered, we conclude that our proposed approach shows promise for addressing system resilience in large-scale systems.« less
El-Zawawy, Mohamed A.
2014-01-01
This paper introduces new approaches for the analysis of frequent statement and dereference elimination for imperative and object-oriented distributed programs running on parallel machines equipped with hierarchical memories. The paper uses languages whose address spaces are globally partitioned. Distributed programs allow defining data layout and threads writing to and reading from other thread memories. Three type systems (for imperative distributed programs) are the tools of the proposed techniques. The first type system defines for every program point a set of calculated (ready) statements and memory accesses. The second type system uses an enriched version of types of the first type system and determines which of the ready statements and memory accesses are used later in the program. The third type system uses the information gather so far to eliminate unnecessary statement computations and memory accesses (the analysis of frequent statement and dereference elimination). Extensions to these type systems are also presented to cover object-oriented distributed programs. Two advantages of our work over related work are the following. The hierarchical style of concurrent parallel computers is similar to the memory model used in this paper. In our approach, each analysis result is assigned a type derivation (serves as a correctness proof). PMID:24892098
DANoC: An Efficient Algorithm and Hardware Codesign of Deep Neural Networks on Chip.
Zhou, Xichuan; Li, Shengli; Tang, Fang; Hu, Shengdong; Lin, Zhi; Zhang, Lei
2017-07-18
Deep neural networks (NNs) are the state-of-the-art models for understanding the content of images and videos. However, implementing deep NNs in embedded systems is a challenging task, e.g., a typical deep belief network could exhaust gigabytes of memory and result in bandwidth and computational bottlenecks. To address this challenge, this paper presents an algorithm and hardware codesign for efficient deep neural computation. A hardware-oriented deep learning algorithm, named the deep adaptive network, is proposed to explore the sparsity of neural connections. By adaptively removing the majority of neural connections and robustly representing the reserved connections using binary integers, the proposed algorithm could save up to 99.9% memory utility and computational resources without undermining classification accuracy. An efficient sparse-mapping-memory-based hardware architecture is proposed to fully take advantage of the algorithmic optimization. Different from traditional Von Neumann architecture, the deep-adaptive network on chip (DANoC) brings communication and computation in close proximity to avoid power-hungry parameter transfers between on-board memory and on-chip computational units. Experiments over different image classification benchmarks show that the DANoC system achieves competitively high accuracy and efficiency comparing with the state-of-the-art approaches.
NASA Technical Reports Server (NTRS)
Byrne, F.
1981-01-01
Time-shared interface speeds data processing in distributed computer network. Two-level high-speed scanning approach routes information to buffer, portion of which is reserved for series of "first-in, first-out" memory stacks. Buffer address structure and memory are protected from noise or failed components by error correcting code. System is applicable to any computer or processing language.
Support for Debugging Automatically Parallelized Programs
NASA Technical Reports Server (NTRS)
Hood, Robert; Jost, Gabriele; Biegel, Bryan (Technical Monitor)
2001-01-01
This viewgraph presentation provides information on the technical aspects of debugging computer code that has been automatically converted for use in a parallel computing system. Shared memory parallelization and distributed memory parallelization entail separate and distinct challenges for a debugging program. A prototype system has been developed which integrates various tools for the debugging of automatically parallelized programs including the CAPTools Database which provides variable definition information across subroutines as well as array distribution information.
Solitonic Josephson-based meminductive systems
NASA Astrophysics Data System (ADS)
Guarcello, Claudio; Solinas, Paolo; di Ventra, Massimiliano; Giazotto, Francesco
2017-04-01
Memristors, memcapacitors, and meminductors represent an innovative generation of circuit elements whose properties depend on the state and history of the system. The hysteretic behavior of one of their constituent variables, is their distinctive fingerprint. This feature endows them with the ability to store and process information on the same physical location, a property that is expected to benefit many applications ranging from unconventional computing to adaptive electronics to robotics. Therefore, it is important to find appropriate memory elements that combine a wide range of memory states, long memory retention times, and protection against unavoidable noise. Although several physical systems belong to the general class of memelements, few of them combine these important physical features in a single component. Here, we demonstrate theoretically a superconducting memory based on solitonic long Josephson junctions. Moreover, since solitons are at the core of its operation, this system provides an intrinsic topological protection against external perturbations. We show that the Josephson critical current behaves hysteretically as an external magnetic field is properly swept. Accordingly, long Josephson junctions can be used as multi-state memories, with a controllable number of available states, and in other emerging areas such as memcomputing, i.e., computing directly in/by the memory.
Integrated semiconductor-magnetic random access memory system
NASA Technical Reports Server (NTRS)
Katti, Romney R. (Inventor); Blaes, Brent R. (Inventor)
2001-01-01
The present disclosure describes a non-volatile magnetic random access memory (RAM) system having a semiconductor control circuit and a magnetic array element. The integrated magnetic RAM system uses CMOS control circuit to read and write data magnetoresistively. The system provides a fast access, non-volatile, radiation hard, high density RAM for high speed computing.
Building a Terabyte Memory Bandwidth Compute Node with Four Consumer Electronics GPUs
NASA Astrophysics Data System (ADS)
Omlin, Samuel; Räss, Ludovic; Podladchikov, Yuri
2014-05-01
GPUs released for consumer electronics are generally built with the same chip architectures as the GPUs released for professional usage. With regards to scientific computing, there are no obvious important differences in functionality or performance between the two types of releases, yet the price can differ up to one order of magnitude. For example, the consumer electronics release of the most recent NVIDIA Kepler architecture (GK110), named GeForce GTX TITAN, performed equally well in conducted memory bandwidth tests as the professional release, named Tesla K20; the consumer electronics release costs about one third of the professional release. We explain how to design and assemble a well adjusted computer with four high-end consumer electronics GPUs (GeForce GTX TITAN) combining more than 1 terabyte/s memory bandwidth. We compare the system's performance and precision with the one of hardware released for professional usage. The system can be used as a powerful workstation for scientific computing or as a compute node in a home-built GPU cluster.
NASA Technical Reports Server (NTRS)
Janetzke, David C.; Murthy, Durbha V.
1991-01-01
Aeroelastic analysis is multi-disciplinary and computationally expensive. Hence, it can greatly benefit from parallel processing. As part of an effort to develop an aeroelastic capability on a distributed memory transputer network, a parallel algorithm for the computation of aerodynamic influence coefficients is implemented on a network of 32 transputers. The aerodynamic influence coefficients are calculated using a 3-D unsteady aerodynamic model and a parallel discretization. Efficiencies up to 85 percent were demonstrated using 32 processors. The effect of subtask ordering, problem size, and network topology are presented. A comparison to results on a shared memory computer indicates that higher speedup is achieved on the distributed memory system.
Data systems and computer science space data systems: Onboard memory and storage
NASA Technical Reports Server (NTRS)
Shull, Tom
1991-01-01
The topics are presented in viewgraph form and include the following: technical objectives; technology challenges; state-of-the-art assessment; mass storage comparison; SODR drive and system concepts; program description; vertical Bloch line (VBL) device concept; relationship to external programs; and backup charts for memory and storage.
Contrasting single and multi-component working-memory systems in dual tasking.
Nijboer, Menno; Borst, Jelmer; van Rijn, Hedderik; Taatgen, Niels
2016-05-01
Working memory can be a major source of interference in dual tasking. However, there is no consensus on whether this interference is the result of a single working memory bottleneck, or of interactions between different working memory components that together form a complete working-memory system. We report a behavioral and an fMRI dataset in which working memory requirements are manipulated during multitasking. We show that a computational cognitive model that assumes a distributed version of working memory accounts for both behavioral and neuroimaging data better than a model that takes a more centralized approach. The model's working memory consists of an attentional focus, declarative memory, and a subvocalized rehearsal mechanism. Thus, the data and model favor an account where working memory interference in dual tasking is the result of interactions between different resources that together form a working-memory system. Copyright © 2016 Elsevier Inc. All rights reserved.
Leahy, P.P.
1982-01-01
The Trescott computer program for modeling groundwater flow in three dimensions has been modified to (1) treat aquifer and confining bed pinchouts more realistically and (2) reduce the computer memory requirements needed for the input data. Using the original program, simulation of aquifer systems with nonrectangular external boundaries may result in a large number of nodes that are not involved in the numerical solution of the problem, but require computer storage. (USGS)
Interfacing laboratory instruments to multiuser, virtual memory computers
NASA Technical Reports Server (NTRS)
Generazio, Edward R.; Stang, David B.; Roth, Don J.
1989-01-01
Incentives, problems and solutions associated with interfacing laboratory equipment with multiuser, virtual memory computers are presented. The major difficulty concerns how to utilize these computers effectively in a medium sized research group. This entails optimization of hardware interconnections and software to facilitate multiple instrument control, data acquisition and processing. The architecture of the system that was devised, and associated programming and subroutines are described. An example program involving computer controlled hardware for ultrasonic scan imaging is provided to illustrate the operational features.
Shared versus distributed memory multiprocessors
NASA Technical Reports Server (NTRS)
Jordan, Harry F.
1991-01-01
The question of whether multiprocessors should have shared or distributed memory has attracted a great deal of attention. Some researchers argue strongly for building distributed memory machines, while others argue just as strongly for programming shared memory multiprocessors. A great deal of research is underway on both types of parallel systems. Special emphasis is placed on systems with a very large number of processors for computation intensive tasks and considers research and implementation trends. It appears that the two types of systems will likely converge to a common form for large scale multiprocessors.
Noise reduction in optically controlled quantum memory
NASA Astrophysics Data System (ADS)
Ma, Lijun; Slattery, Oliver; Tang, Xiao
2018-05-01
Quantum memory is an essential tool for quantum communications systems and quantum computers. An important category of quantum memory, called optically controlled quantum memory, uses a strong classical beam to control the storage and re-emission of a single-photon signal through an atomic ensemble. In this type of memory, the residual light from the strong classical control beam can cause severe noise and degrade the system performance significantly. Efficiently suppressing this noise is a requirement for the successful implementation of optically controlled quantum memories. In this paper, we briefly introduce the latest and most common approaches to quantum memory and review the various noise-reduction techniques used in implementing them.
Automated quantitative muscle biopsy analysis system
NASA Technical Reports Server (NTRS)
Castleman, Kenneth R. (Inventor)
1980-01-01
An automated system to aid the diagnosis of neuromuscular diseases by producing fiber size histograms utilizing histochemically stained muscle biopsy tissue. Televised images of the microscopic fibers are processed electronically by a multi-microprocessor computer, which isolates, measures, and classifies the fibers and displays the fiber size distribution. The architecture of the multi-microprocessor computer, which is iterated to any required degree of complexity, features a series of individual microprocessors P.sub.n each receiving data from a shared memory M.sub.n-1 and outputing processed data to a separate shared memory M.sub.n+1 under control of a program stored in dedicated memory M.sub.n.
SUMC fault tolerant computer system
NASA Technical Reports Server (NTRS)
1980-01-01
The results of the trade studies are presented. These trades cover: establishing the basic configuration, establishing the CPU/memory configuration, establishing an approach to crosstrapping interfaces, defining the requirements of the redundancy management unit (RMU), establishing a spare plane switching strategy for the fault-tolerant memory (FTM), and identifying the most cost effective way of extending the memory addressing capability beyond the 64 K-bytes (K=1024) of SUMC-II B. The results of the design are compiled in Contract End Item (CEI) Specification for the NASA Standard Spacecraft Computer II (NSSC-II), IBM 7934507. The implementation of the FTM and memory address expansion.
NASA Astrophysics Data System (ADS)
MacDonald, Christopher L.; Bhattacharya, Nirupama; Sprouse, Brian P.; Silva, Gabriel A.
2015-09-01
Computing numerical solutions to fractional differential equations can be computationally intensive due to the effect of non-local derivatives in which all previous time points contribute to the current iteration. In general, numerical approaches that depend on truncating part of the system history while efficient, can suffer from high degrees of error and inaccuracy. Here we present an adaptive time step memory method for smooth functions applied to the Grünwald-Letnikov fractional diffusion derivative. This method is computationally efficient and results in smaller errors during numerical simulations. Sampled points along the system's history at progressively longer intervals are assumed to reflect the values of neighboring time points. By including progressively fewer points backward in time, a temporally 'weighted' history is computed that includes contributions from the entire past of the system, maintaining accuracy, but with fewer points actually calculated, greatly improving computational efficiency.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Almasi, Gheorghe; Blumrich, Matthias Augustin; Chen, Dong
Methods and apparatus perform fault isolation in multiple node computing systems using commutative error detection values for--example, checksums--to identify and to isolate faulty nodes. When information associated with a reproducible portion of a computer program is injected into a network by a node, a commutative error detection value is calculated. At intervals, node fault detection apparatus associated with the multiple node computer system retrieve commutative error detection values associated with the node and stores them in memory. When the computer program is executed again by the multiple node computer system, new commutative error detection values are created and stored inmore » memory. The node fault detection apparatus identifies faulty nodes by comparing commutative error detection values associated with reproducible portions of the application program generated by a particular node from different runs of the application program. Differences in values indicate a possible faulty node.« less
Reder, Lynne M.; Park, Heekyeong; Kieffaber, Paul D.
2009-01-01
There is a popular hypothesis that performance on implicit and explicit memory tasks reflects 2 distinct memory systems. Explicit memory is said to store those experiences that can be consciously recollected, and implicit memory is said to store experiences and affect subsequent behavior but to be unavailable to conscious awareness. Although this division based on awareness is a useful taxonomy for memory tasks, the authors review the evidence that the unconscious character of implicit memory does not necessitate that it be treated as a separate system of human memory. They also argue that some implicit and explicit memory tasks share the same memory representations and that the important distinction is whether the task (implicit or explicit) requires the formation of a new association. The authors review and critique dissociations from the behavioral, amnesia, and neuroimaging literatures that have been advanced in support of separate explicit and implicit memory systems by highlighting contradictory evidence and by illustrating how the data can be accounted for using a simple computational memory model that assumes the same memory representation for those disparate tasks. PMID:19210052
NASA Astrophysics Data System (ADS)
Hunter, Geoffrey
2004-01-01
A computational process is classified according to the theoretical model that is capable of executing it; computational processes that require a non-predeterminable amount of intermediate storage for their execution are Turing-machine (TM) processes, while those whose storage are predeterminable are Finite Automation (FA) processes. Simple processes (such as traffic light controller) are executable by Finite Automation, whereas the most general kind of computation requires a Turing Machine for its execution. This implies that a TM process must have a non-predeterminable amount of memory allocated to it at intermediate instants of its execution; i.e. dynamic memory allocation. Many processes encountered in practice are TM processes. The implication for computational practice is that the hardware (CPU) architecture and its operating system must facilitate dynamic memory allocation, and that the programming language used to specify TM processes must have statements with the semantic attribute of dynamic memory allocation, for in Alan Turing"s thesis on computation (1936) the "standard description" of a process is invariant over the most general data that the process is designed to process; i.e. the program describing the process should never have to be modified to allow for differences in the data that is to be processed in different instantiations; i.e. data-invariant programming. Any non-trivial program is partitioned into sub-programs (procedures, subroutines, functions, modules, etc). Examination of the calls/returns between the subprograms reveals that they are nodes in a tree-structure; this tree-structure is independent of the programming language used to encode (define) the process. Each sub-program typically needs some memory for its own use (to store values intermediate between its received data and its computed results); this locally required memory is not needed before the subprogram commences execution, and it is not needed after its execution terminates; it may be allocated as its execution commences, and deallocated as its execution terminates, and if the amount of this local memory is not known until just before execution commencement, then it is essential that it be allocated dynamically as the first action of its execution. This dynamically allocated/deallocated storage of each subprogram"s intermediate values, conforms with the stack discipline; i.e. last allocated = first to be deallocated, an incidental benefit of which is automatic overlaying of variables. This stack-based dynamic memory allocation was a semantic implication of the nested block structure that originated in the ALGOL-60 programming language. AGLOL-60 was a TM language, because the amount of memory allocated on subprogram (block/procedure) entry (for arrays, etc) was computable at execution time. A more general requirement of a Turing machine process is for code generation at run-time; this mandates access to the source language processor (compiler/interpretor) during execution of the process. This fundamental aspect of computer science is important to the future of system design, because it has been overlooked throughout the 55 years since modern computing began in 1048. The popular computer systems of this first half-century of computing were constrained by compile-time (or even operating system boot-time) memory allocation, and were thus limited to executing FA processes. The practical effect was that the distinction between the data-invariant program and its variable data was blurred; programmers had to make trial and error executions, modifying the program"s compile-time constants (array dimensions) to iterate towards the values required at run-time by the data being processed. This era of trial and error computing still persists; it pervades the culture of current (2003) computing practice.
Optical read/write memory system components
NASA Technical Reports Server (NTRS)
Kozma, A.
1972-01-01
The optical components of a breadboard holographic read/write memory system have been fabricated and the parameters specified of the major system components: (1) a laser system; (2) an x-y beam deflector; (3) a block data composer; (4) the read/write memory material; (5) an output detector array; and (6) the electronics to drive, synchronize, and control all system components. The objectives of the investigation were divided into three concurrent phases: (1) to supply and fabricate the major components according to the previously established specifications; (2) to prepare computer programs to simulate the entire holographic memory system so that a designer can balance the requirements on the various components; and (3) to conduct a development program to optimize the combined recording and reconstruction process of the high density holographic memory system.
Providing the Public with Online Access to Large Bibliographic Data Bases.
ERIC Educational Resources Information Center
Firschein, Oscar; Summit, Roger K.
DIALOG, an interactive, computer-based information retrieval language, consists of a series of computer programs designed to make use of direct access memory devices in order to provide the user with a rapid means of identifying records within a specific memory bank. Using the system, a library user can be provided access to sixteen distinct and…
Studies of Human Memory and Language Processing.
ERIC Educational Resources Information Center
Collins, Allan M.
The purposes of this study were to determine the nature of human semantic memory and to obtain knowledge usable in the future development of computer systems that can converse with people. The work was based on a computer model which is designed to comprehend English text, relating the text to information stored in a semantic data base that is…
Mass Memory Storage Devices for AN/SLQ-32(V).
1985-06-01
tactical programs and libraries into the AN/UYK-19 computer , the RP-16 microprocessor, and other peripheral processors (e.g., ADLS and Band 1) will be...software must be loaded into computer memory from the 4-track magnetic tape cartridges (MTCs) on which the programs are stored. Program load begins...software. Future computer programs , which will reside in peripheral processors, include the Automated Decoy Launching System (ADLS) and Band 1. As
A Survey of Techniques for Modeling and Improving Reliability of Computing Systems
Mittal, Sparsh; Vetter, Jeffrey S.
2015-04-24
Recent trends of aggressive technology scaling have greatly exacerbated the occurrences and impact of faults in computing systems. This has made `reliability' a first-order design constraint. To address the challenges of reliability, several techniques have been proposed. In this study, we provide a survey of architectural techniques for improving resilience of computing systems. We especially focus on techniques proposed for microarchitectural components, such as processor registers, functional units, cache and main memory etc. In addition, we discuss techniques proposed for non-volatile memory, GPUs and 3D-stacked processors. To underscore the similarities and differences of the techniques, we classify them based onmore » their key characteristics. We also review the metrics proposed to quantify vulnerability of processor structures. Finally, we believe that this survey will help researchers, system-architects and processor designers in gaining insights into the techniques for improving reliability of computing systems.« less
A Survey of Techniques for Modeling and Improving Reliability of Computing Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mittal, Sparsh; Vetter, Jeffrey S.
Recent trends of aggressive technology scaling have greatly exacerbated the occurrences and impact of faults in computing systems. This has made `reliability' a first-order design constraint. To address the challenges of reliability, several techniques have been proposed. In this study, we provide a survey of architectural techniques for improving resilience of computing systems. We especially focus on techniques proposed for microarchitectural components, such as processor registers, functional units, cache and main memory etc. In addition, we discuss techniques proposed for non-volatile memory, GPUs and 3D-stacked processors. To underscore the similarities and differences of the techniques, we classify them based onmore » their key characteristics. We also review the metrics proposed to quantify vulnerability of processor structures. Finally, we believe that this survey will help researchers, system-architects and processor designers in gaining insights into the techniques for improving reliability of computing systems.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nash, T.; Atac, R.; Cook, A.
1989-03-06
The ACPMAPS multipocessor is a highly cost effective, local memory parallel computer with a hypercube or compound hypercube architecture. Communication requires the attention of only the two communicating nodes. The design is aimed at floating point intensive, grid like problems, particularly those with extreme computing requirements. The processing nodes of the system are single board array processors, each with a peak power of 20 Mflops, supported by 8 Mbytes of data and 2 Mbytes of instruction memory. The system currently being assembled has a peak power of 5 Gflops. The nodes are based on the Weitek XL Chip set. Themore » system delivers performance at approximately $300/Mflop. 8 refs., 4 figs.« less
Computational efficiency improvements for image colorization
NASA Astrophysics Data System (ADS)
Yu, Chao; Sharma, Gaurav; Aly, Hussein
2013-03-01
We propose an efficient algorithm for colorization of greyscale images. As in prior work, colorization is posed as an optimization problem: a user specifies the color for a few scribbles drawn on the greyscale image and the color image is obtained by propagating color information from the scribbles to surrounding regions, while maximizing the local smoothness of colors. In this formulation, colorization is obtained by solving a large sparse linear system, which normally requires substantial computation and memory resources. Our algorithm improves the computational performance through three innovations over prior colorization implementations. First, the linear system is solved iteratively without explicitly constructing the sparse matrix, which significantly reduces the required memory. Second, we formulate each iteration in terms of integral images obtained by dynamic programming, reducing repetitive computation. Third, we use a coarseto- fine framework, where a lower resolution subsampled image is first colorized and this low resolution color image is upsampled to initialize the colorization process for the fine level. The improvements we develop provide significant speedup and memory savings compared to the conventional approach of solving the linear system directly using off-the-shelf sparse solvers, and allow us to colorize images with typical sizes encountered in realistic applications on typical commodity computing platforms.
NASA Astrophysics Data System (ADS)
Bhanota, Gyan; Chen, Dong; Gara, Alan; Vranas, Pavlos
2003-05-01
The architecture of the BlueGene/L massively parallel supercomputer is described. Each computing node consists of a single compute ASIC plus 256 MB of external memory. The compute ASIC integrates two 700 MHz PowerPC 440 integer CPU cores, two 2.8 Gflops floating point units, 4 MB of embedded DRAM as cache, a memory controller for external memory, six 1.4 Gbit/s bi-directional ports for a 3-dimensional torus network connection, three 2.8 Gbit/s bi-directional ports for connecting to a global tree network and a Gigabit Ethernet for I/O. 65,536 of such nodes are connected into a 3-d torus with a geometry of 32×32×64. The total peak performance of the system is 360 Teraflops and the total amount of memory is 16 TeraBytes.
NASA Astrophysics Data System (ADS)
Strzałka, Dominik; Dymora, Paweł; Mazurek, Mirosław
2018-02-01
In this paper we present some preliminary results in the field of computer systems management with relation to Tsallis thermostatistics and the ubiquitous problem of hardware limited resources. In the case of systems with non-deterministic behaviour, management of their resources is a key point that guarantees theirs acceptable performance and proper working. This is very wide problem that stands for many challenges in financial, transport, water and food, health, etc. areas. We focus on computer systems with attention paid to cache memory and propose to use an analytical model that is able to connect non-extensive entropy formalism, long-range dependencies, management of system resources and queuing theory. Obtained analytical results are related to the practical experiment showing interesting and valuable results.
Integrating Software Modules For Robot Control
NASA Technical Reports Server (NTRS)
Volpe, Richard A.; Khosla, Pradeep; Stewart, David B.
1993-01-01
Reconfigurable, sensor-based control system uses state variables in systematic integration of reusable control modules. Designed for open-architecture hardware including many general-purpose microprocessors, each having own local memory plus access to global shared memory. Implemented in software as extension of Chimera II real-time operating system. Provides transparent computing mechanism for intertask communication between control modules and generic process-module architecture for multiprocessor realtime computation. Used to control robot arm. Proves useful in variety of other control and robotic applications.
Age effects on explicit and implicit memory
Ward, Emma V.; Berry, Christopher J.; Shanks, David R.
2013-01-01
It is well-documented that explicit memory (e.g., recognition) declines with age. In contrast, many argue that implicit memory (e.g., priming) is preserved in healthy aging. For example, priming on tasks such as perceptual identification is often not statistically different in groups of young and older adults. Such observations are commonly taken as evidence for distinct explicit and implicit learning/memory systems. In this article we discuss several lines of evidence that challenge this view. We describe how patterns of differential age-related decline may arise from differences in the ways in which the two forms of memory are commonly measured, and review recent research suggesting that under improved measurement methods, implicit memory is not age-invariant. Formal computational models are of considerable utility in revealing the nature of underlying systems. We report the results of applying single and multiple-systems models to data on age effects in implicit and explicit memory. Model comparison clearly favors the single-system view. Implications for the memory systems debate are discussed. PMID:24065942
AFTOMS Technology Issues and Alternatives Report
1989-12-01
color , resolu- power requirements, physi- tion; memory , processor speed; cal and weather rugged- IAN interfaces, etc,) f,: these ness. display...Telephone and Telegraph 3 CD-I Compact Disk - Interactive CD-ROM Compact Disk-Read Only Memory CGM Computer Graphics Metafile CNWDI Critical Nuclear...Database Management System RFP Request For Proposal 3 RFS Remote File System ROM Read Only Memory 3 S SA-ALC San Antonio Air Logistics Center 3 SAC
Final Project Report: Data Locality Enhancement of Dynamic Simulations for Exascale Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shen, Xipeng
The goal of this project is to develop a set of techniques and software tools to enhance the matching between memory accesses in dynamic simulations and the prominent features of modern and future manycore systems, alleviating the memory performance issues for exascale computing. In the first three years, the PI and his group have achieves some significant progress towards the goal, producing a set of novel techniques for improving the memory performance and data locality in manycore systems, yielding 18 conference and workshop papers and 4 journal papers and graduating 6 Ph.Ds. This report summarizes the research results of thismore » project through that period.« less
Optical computing, optical memory, and SBIRs at Foster-Miller
NASA Astrophysics Data System (ADS)
Domash, Lawrence H.
1994-03-01
A desktop design and manufacturing system for binary diffractive elements, MacBEEP, was developed with the optical researcher in mind. Optical processing systems for specialized tasks such as cellular automation computation and fractal measurement were constructed. A new family of switchable holograms has enabled several applications for control of laser beams in optical memories. New spatial light modulators and optical logic elements have been demonstrated based on a more manufacturable semiconductor technology. Novel synthetic and polymeric nonlinear materials for optical storage are under development in an integrated memory architecture. SBIR programs enable creative contributions from smaller companies, both product oriented and technology oriented, and support advances that might not otherwise be developed.
A High Performance VLSI Computer Architecture For Computer Graphics
NASA Astrophysics Data System (ADS)
Chin, Chi-Yuan; Lin, Wen-Tai
1988-10-01
A VLSI computer architecture, consisting of multiple processors, is presented in this paper to satisfy the modern computer graphics demands, e.g. high resolution, realistic animation, real-time display etc.. All processors share a global memory which are partitioned into multiple banks. Through a crossbar network, data from one memory bank can be broadcasted to many processors. Processors are physically interconnected through a hyper-crossbar network (a crossbar-like network). By programming the network, the topology of communication links among processors can be reconfigurated to satisfy specific dataflows of different applications. Each processor consists of a controller, arithmetic operators, local memory, a local crossbar network, and I/O ports to communicate with other processors, memory banks, and a system controller. Operations in each processor are characterized into two modes, i.e. object domain and space domain, to fully utilize the data-independency characteristics of graphics processing. Special graphics features such as 3D-to-2D conversion, shadow generation, texturing, and reflection, can be easily handled. With the current high density interconnection (MI) technology, it is feasible to implement a 64-processor system to achieve 2.5 billion operations per second, a performance needed in most advanced graphics applications.
Generalization Through the Recurrent Interaction of Episodic Memories
Kumaran, Dharshan; McClelland, James L.
2012-01-01
In this article, we present a perspective on the role of the hippocampal system in generalization, instantiated in a computational model called REMERGE (recurrency and episodic memory results in generalization). We expose a fundamental, but neglected, tension between prevailing computational theories that emphasize the function of the hippocampus in pattern separation (Marr, 1971; McClelland, McNaughton, & O'Reilly, 1995), and empirical support for its role in generalization and flexible relational memory (Cohen & Eichenbaum, 1993; Eichenbaum, 1999). Our account provides a means by which to resolve this conflict, by demonstrating that the basic representational scheme envisioned by complementary learning systems theory (McClelland et al., 1995), which relies upon orthogonalized codes in the hippocampus, is compatible with efficient generalization—as long as there is recurrence rather than unidirectional flow within the hippocampal circuit or, more widely, between the hippocampus and neocortex. We propose that recurrent similarity computation, a process that facilitates the discovery of higher-order relationships between a set of related experiences, expands the scope of classical exemplar-based models of memory (e.g., Nosofsky, 1984) and allows the hippocampus to support generalization through interactions that unfold within a dynamically created memory space. PMID:22775499
Collective input/output under memory constraints
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lu, Yin; Chen, Yong; Zhuang, Yu
2014-12-18
Compared with current high-performance computing (HPC) systems, exascale systems are expected to have much less memory per node, which can significantly reduce necessary collective input/output (I/O) performance. In this study, we introduce a memory-conscious collective I/O strategy that takes into account memory capacity and bandwidth constraints. The new strategy restricts aggregation data traffic within disjointed subgroups, coordinates I/O accesses in intranode and internode layers, and determines I/O aggregators at run time considering memory consumption among processes. We have prototyped the design and evaluated it with commonly used benchmarks to verify its potential. The evaluation results demonstrate that this strategy holdsmore » promise in mitigating the memory pressure, alleviating the contention for memory bandwidth, and improving the I/O performance for projected extreme-scale systems. Given the importance of supporting increasingly data-intensive workloads and projected memory constraints on increasingly larger scale HPC systems, this new memory-conscious collective I/O can have a significant positive impact on scientific discovery productivity.« less
Multiprocessor architectural study
NASA Technical Reports Server (NTRS)
Kosmala, A. L.; Stanten, S. F.; Vandever, W. H.
1972-01-01
An architectural design study was made of a multiprocessor computing system intended to meet functional and performance specifications appropriate to a manned space station application. Intermetrics, previous experience, and accumulated knowledge of the multiprocessor field is used to generate a baseline philosophy for the design of a future SUMC* multiprocessor. Interrupts are defined and the crucial questions of interrupt structure, such as processor selection and response time, are discussed. Memory hierarchy and performance is discussed extensively with particular attention to the design approach which utilizes a cache memory associated with each processor. The ability of an individual processor to approach its theoretical maximum performance is then analyzed in terms of a hit ratio. Memory management is envisioned as a virtual memory system implemented either through segmentation or paging. Addressing is discussed in terms of various register design adopted by current computers and those of advanced design.
Store-operate-coherence-on-value
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Dong; Heidelberger, Philip; Kumar, Sameer
A system, method and computer program product for performing various store-operate instructions in a parallel computing environment that includes a plurality of processors and at least one cache memory device. A queue in the system receives, from a processor, a store-operate instruction that specifies under which condition a cache coherence operation is to be invoked. A hardware unit in the system runs the received store-operate instruction. The hardware unit evaluates whether a result of the running the received store-operate instruction satisfies the condition. The hardware unit invokes a cache coherence operation on a cache memory address associated with the receivedmore » store-operate instruction if the result satisfies the condition. Otherwise, the hardware unit does not invoke the cache coherence operation on the cache memory device.« less
Optical mass memory system (AMM-13). AMM/DBMS interface control document
NASA Technical Reports Server (NTRS)
Bailey, G. A.
1980-01-01
The baseline for external interfaces of a 10 to the 13th power bit, optical archival mass memory system (AMM-13) is established. The types of interfaces addressed include data transfer; AMM-13, Data Base Management System, NASA End-to-End Data System computer interconnect; data/control input and output interfaces; test input data source; file management; and facilities interface.
Distributed simulation using a real-time shared memory network
NASA Technical Reports Server (NTRS)
Simon, Donald L.; Mattern, Duane L.; Wong, Edmond; Musgrave, Jeffrey L.
1993-01-01
The Advanced Control Technology Branch of the NASA Lewis Research Center performs research in the area of advanced digital controls for aeronautic and space propulsion systems. This work requires the real-time implementation of both control software and complex dynamical models of the propulsion system. We are implementing these systems in a distributed, multi-vendor computer environment. Therefore, a need exists for real-time communication and synchronization between the distributed multi-vendor computers. A shared memory network is a potential solution which offers several advantages over other real-time communication approaches. A candidate shared memory network was tested for basic performance. The shared memory network was then used to implement a distributed simulation of a ramjet engine. The accuracy and execution time of the distributed simulation was measured and compared to the performance of the non-partitioned simulation. The ease of partitioning the simulation, the minimal time required to develop for communication between the processors and the resulting execution time all indicate that the shared memory network is a real-time communication technique worthy of serious consideration.
C-MOS array design techniques: SUMC multiprocessor system study
NASA Technical Reports Server (NTRS)
Clapp, W. A.; Helbig, W. A.; Merriam, A. S.
1972-01-01
The current capabilities of LSI techniques for speed and reliability, plus the possibilities of assembling large configurations of LSI logic and storage elements, have demanded the study of multiprocessors and multiprocessing techniques, problems, and potentialities. Evaluated are three previous systems studies for a space ultrareliable modular computer multiprocessing system, and a new multiprocessing system is proposed that is flexibly configured with up to four central processors, four 1/0 processors, and 16 main memory units, plus auxiliary memory and peripheral devices. This multiprocessor system features a multilevel interrupt, qualified S/360 compatibility for ground-based generation of programs, virtual memory management of a storage hierarchy through 1/0 processors, and multiport access to multiple and shared memory units.
The Library and Human Memory Simulation Studies. Reports on File Organization Studies.
ERIC Educational Resources Information Center
Reilly, Kevin D.
This report describes digital computer simulation efforts in a study of memory systems for two important cases: that of the individual the brain; and that of society, the library. A neural system model is presented in which a complex system is produced by connecting simple hypothetical neurons whose states change under application of a…
High speed television camera system processes photographic film data for digital computer analysis
NASA Technical Reports Server (NTRS)
Habbal, N. A.
1970-01-01
Data acquisition system translates and processes graphical information recorded on high speed photographic film. It automatically scans the film and stores the information with a minimal use of the computer memory.
Widrow, Bernard; Aragon, Juan Carlos
2013-05-01
Regarding the workings of the human mind, memory and pattern recognition seem to be intertwined. You generally do not have one without the other. Taking inspiration from life experience, a new form of computer memory has been devised. Certain conjectures about human memory are keys to the central idea. The design of a practical and useful "cognitive" memory system is contemplated, a memory system that may also serve as a model for many aspects of human memory. The new memory does not function like a computer memory where specific data is stored in specific numbered registers and retrieval is done by reading the contents of the specified memory register, or done by matching key words as with a document search. Incoming sensory data would be stored at the next available empty memory location, and indeed could be stored redundantly at several empty locations. The stored sensory data would neither have key words nor would it be located in known or specified memory locations. Sensory inputs concerning a single object or subject are stored together as patterns in a single "file folder" or "memory folder". When the contents of the folder are retrieved, sights, sounds, tactile feel, smell, etc., are obtained all at the same time. Retrieval would be initiated by a query or a prompt signal from a current set of sensory inputs or patterns. A search through the memory would be made to locate stored data that correlates with or relates to the prompt input. The search would be done by a retrieval system whose first stage makes use of autoassociative artificial neural networks and whose second stage relies on exhaustive search. Applications of cognitive memory systems have been made to visual aircraft identification, aircraft navigation, and human facial recognition. Concerning human memory, reasons are given why it is unlikely that long-term memory is stored in the synapses of the brain's neural networks. Reasons are given suggesting that long-term memory is stored in DNA or RNA. Neural networks are an important component of the human memory system, and their purpose is for information retrieval, not for information storage. The brain's neural networks are analog devices, subject to drift and unplanned change. Only with constant training is reliable action possible. Good training time is during sleep and while awake and making use of one's memory. A cognitive memory is a learning system. Learning involves storage of patterns or data in a cognitive memory. The learning process for cognitive memory is unsupervised, i.e. autonomous. Copyright © 2013 Elsevier Ltd. All rights reserved.
[Artificial intelligence meeting neuropsychology. Semantic memory in normal and pathological aging].
Aimé, Xavier; Charlet, Jean; Maillet, Didier; Belin, Catherine
2015-03-01
Artificial intelligence (IA) is the subject of much research, but also many fantasies. It aims to reproduce human intelligence in its learning capacity, knowledge storage and computation. In 2014, the Defense Advanced Research Projects Agency (DARPA) started the restoring active memory (RAM) program that attempt to develop implantable technology to bridge gaps in the injured brain and restore normal memory function to people with memory loss caused by injury or disease. In another IA's field, computational ontologies (a formal and shared conceptualization) try to model knowledge in order to represent a structured and unambiguous meaning of the concepts of a target domain. The aim of these structures is to ensure a consensual understanding of their meaning and a univariant use (the same concept is used by all to categorize the same individuals). The first representations of knowledge in the AI's domain are largely based on model tests of semantic memory. This one, as a component of long-term memory is the memory of words, ideas, concepts. It is the only declarative memory system that resists so remarkably to the effects of age. In contrast, non-specific cognitive changes may decrease the performance of elderly in various events and instead report difficulties of access to semantic representations that affect the semantics stock itself. Some dementias, like semantic dementia and Alzheimer's disease, are linked to alteration of semantic memory. We propose in this paper, using the computational ontologies model, a formal and relatively thin modeling, in the service of neuropsychology: 1) for the practitioner with decision support systems, 2) for the patient as cognitive prosthesis outsourced, and 3) for the researcher to study semantic memory.
Sparse distributed memory overview
NASA Technical Reports Server (NTRS)
Raugh, Mike
1990-01-01
The Sparse Distributed Memory (SDM) project is investigating the theory and applications of massively parallel computing architecture, called sparse distributed memory, that will support the storage and retrieval of sensory and motor patterns characteristic of autonomous systems. The immediate objectives of the project are centered in studies of the memory itself and in the use of the memory to solve problems in speech, vision, and robotics. Investigation of methods for encoding sensory data is an important part of the research. Examples of NASA missions that may benefit from this work are Space Station, planetary rovers, and solar exploration. Sparse distributed memory offers promising technology for systems that must learn through experience and be capable of adapting to new circumstances, and for operating any large complex system requiring automatic monitoring and control. Sparse distributed memory is a massively parallel architecture motivated by efforts to understand how the human brain works. Sparse distributed memory is an associative memory, able to retrieve information from cues that only partially match patterns stored in the memory. It is able to store long temporal sequences derived from the behavior of a complex system, such as progressive records of the system's sensory data and correlated records of the system's motor controls.
NASA Astrophysics Data System (ADS)
Onizawa, Naoya; Tamakoshi, Akira; Hanyu, Takahiro
2017-08-01
In this paper, reinitialization-free nonvolatile computer systems are designed and evaluated for energy-harvesting Internet of things (IoT) applications. In energy-harvesting applications, as power supplies generated from renewable power sources cause frequent power failures, data processed need to be backed up when power failures occur. Unless data are safely backed up before power supplies diminish, reinitialization processes are required when power supplies are recovered, which results in low energy efficiencies and slow operations. Using nonvolatile devices in processors and memories can realize a faster backup than a conventional volatile computer system, leading to a higher energy efficiency. To evaluate the energy efficiency upon frequent power failures, typical computer systems including processors and memories are designed using 90 nm CMOS or CMOS/magnetic tunnel junction (MTJ) technologies. Nonvolatile ARM Cortex-M0 processors with 4 kB MRAMs are evaluated using a typical computing benchmark program, Dhrystone, which shows a few order-of-magnitude reductions in energy in comparison with a volatile processor with SRAM.
Multicore Architecture-aware Scientific Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Srinivasa, Avinash
Modern high performance systems are becoming increasingly complex and powerful due to advancements in processor and memory architecture. In order to keep up with this increasing complexity, applications have to be augmented with certain capabilities to fully exploit such systems. These may be at the application level, such as static or dynamic adaptations or at the system level, like having strategies in place to override some of the default operating system polices, the main objective being to improve computational performance of the application. The current work proposes two such capabilites with respect to multi-threaded scientific applications, in particular a largemore » scale physics application computing ab-initio nuclear structure. The first involves using a middleware tool to invoke dynamic adaptations in the application, so as to be able to adjust to the changing computational resource availability at run-time. The second involves a strategy for effective placement of data in main memory, to optimize memory access latencies and bandwidth. These capabilties when included were found to have a significant impact on the application performance, resulting in average speedups of as much as two to four times.« less
Global positioning system recorder and method
Hayes, D.W.; Hofstetter, K.J.; Eakle, R.F. Jr.; Reeves, G.E.
1998-12-22
A global positioning system recorder (GPSR) is disclosed in which operational parameters and recorded positional data are stored on a transferable memory element. Through this transferrable memory element, the user of the GPSR need have no knowledge of GPSR devices other than that the memory element needs to be inserted into the memory element slot and the GPSR must be activated. The use of the data element also allows for minimal downtime of the GPSR and the ability to reprogram the GPSR and download data therefrom, without having to physically attach it to another computer. 4 figs.
Global positioning system recorder and method government rights
Hayes, David W.; Hofstetter, Kenneth J.; Eakle, Jr., Robert F.; Reeves, George E.
1998-01-01
A global positioning system recorder (GPSR) is disclosed in which operational parameters and recorded positional data are stored on a transferable memory element. Through this transferrable memory element, the user of the GPSR need have no knowledge of GPSR devices other than that the memory element needs to be inserted into the memory element slot and the GPSR must be activated. The use of the data element also allows for minimal downtime of the GPSR and the ability to reprogram the GPSR and download data therefrom, without having to physically attach it to another computer.
Validation Test Report for the Automated Optical Processing System (AOPS) Version 4.8
2013-06-28
be familiar with UNIX; BASH shell programming; and remote sensing, particularly regarding computer processing of satellite data. The system memory ...and storage requirements are difficult to gauge. The amount of memory needed is dependent upon the amount and type of satellite data you wish to...process; the larger the area, the larger the memory requirement. For example, the entire Atlantic Ocean will require more processing power than the
Comparison of two paradigms for distributed shared memory
DOE Office of Scientific and Technical Information (OSTI.GOV)
Levelt, W.G.; Kaashoek, M.F.; Bal, H.E.
1990-08-01
The paper compares two paradigms for Distributed Shared Memory on loosely coupled computing systems: the shared data-object model as used in Orca, a programming language specially designed for loosely coupled computing systems and the Shared Virtual Memory model. For both paradigms the authors have implemented two systems, one using only point-to-point messages, the other using broadcasting as well. They briefly describe these two paradigms and their implementations. Then they compare their performance on four applications: the traveling salesman problem, alpha-beta search, matrix multiplication and the all pairs shortest paths problem. The measurements show that both paradigms can be used efficientlymore » for programming large-grain parallel applications. Significant speedups were obtained on all applications. The unstructured Shared Virtual Memory paradigm achieves the best absolute performance, although this is largely due to the preliminary nature of the Orca compiler used. The structured shared data-object model achieves the highest speedups and is much easier to program and to debug.« less
FPGA cluster for high-performance AO real-time control system
NASA Astrophysics Data System (ADS)
Geng, Deli; Goodsell, Stephen J.; Basden, Alastair G.; Dipper, Nigel A.; Myers, Richard M.; Saunter, Chris D.
2006-06-01
Whilst the high throughput and low latency requirements for the next generation AO real-time control systems have posed a significant challenge to von Neumann architecture processor systems, the Field Programmable Gate Array (FPGA) has emerged as a long term solution with high performance on throughput and excellent predictability on latency. Moreover, FPGA devices have highly capable programmable interfacing, which lead to more highly integrated system. Nevertheless, a single FPGA is still not enough: multiple FPGA devices need to be clustered to perform the required subaperture processing and the reconstruction computation. In an AO real-time control system, the memory bandwidth is often the bottleneck of the system, simply because a vast amount of supporting data, e.g. pixel calibration maps and the reconstruction matrix, need to be accessed within a short period. The cluster, as a general computing architecture, has excellent scalability in processing throughput, memory bandwidth, memory capacity, and communication bandwidth. Problems, such as task distribution, node communication, system verification, are discussed.
Processing-in-Memory Enabled Graphics Processors for 3D Rendering
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xie, Chenhao; Song, Shuaiwen; Wang, Jing
2017-02-06
The performance of 3D rendering of Graphics Processing Unit that convents 3D vector stream into 2D frame with 3D image effects significantly impact users’ gaming experience on modern computer systems. Due to the high texture throughput in 3D rendering, main memory bandwidth becomes a critical obstacle for improving the overall rendering performance. 3D stacked memory systems such as Hybrid Memory Cube (HMC) provide opportunities to significantly overcome the memory wall by directly connecting logic controllers to DRAM dies. Based on the observation that texel fetches significantly impact off-chip memory traffic, we propose two architectural designs to enable Processing-In-Memory based GPUmore » for efficient 3D rendering.« less
Optical quantum memory based on electromagnetically induced transparency
Ma, Lijun; Slattery, Oliver
2017-01-01
Electromagnetically induced transparency (EIT) is a promising approach to implement quantum memory in quantum communication and quantum computing applications. In this paper, following a brief overview of the main approaches to quantum memory, we provide details of the physical principle and theory of quantum memory based specifically on EIT. We discuss the key technologies for implementing quantum memory based on EIT and review important milestones, from the first experimental demonstration to current applications in quantum information systems. PMID:28828172
Optical quantum memory based on electromagnetically induced transparency.
Ma, Lijun; Slattery, Oliver; Tang, Xiao
2017-04-01
Electromagnetically induced transparency (EIT) is a promising approach to implement quantum memory in quantum communication and quantum computing applications. In this paper, following a brief overview of the main approaches to quantum memory, we provide details of the physical principle and theory of quantum memory based specifically on EIT. We discuss the key technologies for implementing quantum memory based on EIT and review important milestones, from the first experimental demonstration to current applications in quantum information systems.
Scripting for Construction of a Transactive Memory System in Multidisciplinary CSCL Environments
ERIC Educational Resources Information Center
Noroozi, Omid; Biemans, Harm J. A.; Weinberger, Armin; Mulder, Martin; Chizari, Mohammad
2013-01-01
Establishing a Transactive Memory System (TMS) is essential for groups of learners, when they are multidisciplinary and collaborate online. Environments for Computer-Supported Collaborative Learning (CSCL) could be designed to facilitate the TMS. This study investigates how various aspects of a TMS (i.e., specialization, coordination, and trust)…
Peregrine System | High-Performance Computing | NREL
) and longer-term (/projects) storage. These file systems are mounted on all nodes. Peregrine has three -2670 Xeon processors and 64 GB of memory. In addition to mounting the /home, /nopt, /projects and # cores/node Memory/node Peak (DP) performance per node 88 Intel Xeon E5-2670 "Sandy Bridge" 8
SODR Memory Control Buffer Control ASIC
NASA Technical Reports Server (NTRS)
Hodson, Robert F.
1994-01-01
The Spacecraft Optical Disk Recorder (SODR) is a state of the art mass storage system for future NASA missions requiring high transmission rates and a large capacity storage system. This report covers the design and development of an SODR memory buffer control applications specific integrated circuit (ASIC). The memory buffer control ASIC has two primary functions: (1) buffering data to prevent loss of data during disk access times, (2) converting data formats from a high performance parallel interface format to a small computer systems interface format. Ten 144 p in, 50 MHz CMOS ASIC's were designed, fabricated and tested to implement the memory buffer control function.
Optical interconnection networks for high-performance computing systems
NASA Astrophysics Data System (ADS)
Biberman, Aleksandr; Bergman, Keren
2012-04-01
Enabled by silicon photonic technology, optical interconnection networks have the potential to be a key disruptive technology in computing and communication industries. The enduring pursuit of performance gains in computing, combined with stringent power constraints, has fostered the ever-growing computational parallelism associated with chip multiprocessors, memory systems, high-performance computing systems and data centers. Sustaining these parallelism growths introduces unique challenges for on- and off-chip communications, shifting the focus toward novel and fundamentally different communication approaches. Chip-scale photonic interconnection networks, enabled by high-performance silicon photonic devices, offer unprecedented bandwidth scalability with reduced power consumption. We demonstrate that the silicon photonic platforms have already produced all the high-performance photonic devices required to realize these types of networks. Through extensive empirical characterization in much of our work, we demonstrate such feasibility of waveguides, modulators, switches and photodetectors. We also demonstrate systems that simultaneously combine many functionalities to achieve more complex building blocks. We propose novel silicon photonic devices, subsystems, network topologies and architectures to enable unprecedented performance of these photonic interconnection networks. Furthermore, the advantages of photonic interconnection networks extend far beyond the chip, offering advanced communication environments for memory systems, high-performance computing systems, and data centers.
Development of a fault-tolerant microprocessor based computer system for space flight
NASA Technical Reports Server (NTRS)
Montgomery, V. T.
1981-01-01
A methodology for the design of a tightly coupled, highly reliable microprocessor based computer system is described. The concept of triple modular redundancy with sparing is used. The notion of synchronizing by using a single crystal oscillator is examined. The use of decoders to replace voters is also used. The decoders not only isolate the failed module but also allow error identification to be accomplished. Each module is to have its own RAM memory. The necessary circuitry to select a correct memory and the corresponding DMA controller was designed.
Fault-tolerant computer study. [logic designs for building block circuits
NASA Technical Reports Server (NTRS)
Rennels, D. A.; Avizienis, A. A.; Ercegovac, M. D.
1981-01-01
A set of building block circuits is described which can be used with commercially available microprocessors and memories to implement fault tolerant distributed computer systems. Each building block circuit is intended for VLSI implementation as a single chip. Several building blocks and associated processor and memory chips form a self checking computer module with self contained input output and interfaces to redundant communications buses. Fault tolerance is achieved by connecting self checking computer modules into a redundant network in which backup buses and computer modules are provided to circumvent failures. The requirements and design methodology which led to the definition of the building block circuits are discussed.
NASA Technical Reports Server (NTRS)
1973-01-01
Design and development efforts for a spaceborne modular computer system are reported. An initial baseline description is followed by an interface design that includes definition of the overall system response to all classes of failure. Final versions for the register level designs for all module types were completed. Packaging, support and control executive software, including memory utilization estimates and design verification plan, were formalized to insure a soundly integrated design of the digital computer system.
Parallel Computation of the Regional Ocean Modeling System (ROMS)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, P; Song, Y T; Chao, Y
2005-04-05
The Regional Ocean Modeling System (ROMS) is a regional ocean general circulation modeling system solving the free surface, hydrostatic, primitive equations over varying topography. It is free software distributed world-wide for studying both complex coastal ocean problems and the basin-to-global scale ocean circulation. The original ROMS code could only be run on shared-memory systems. With the increasing need to simulate larger model domains with finer resolutions and on a variety of computer platforms, there is a need in the ocean-modeling community to have a ROMS code that can be run on any parallel computer ranging from 10 to hundreds ofmore » processors. Recently, we have explored parallelization for ROMS using the MPI programming model. In this paper, an efficient parallelization strategy for such a large-scale scientific software package, based on an existing shared-memory computing model, is presented. In addition, scientific applications and data-performance issues on a couple of SGI systems, including Columbia, the world's third-fastest supercomputer, are discussed.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Reyna, David; Betty, Rita
Using High Performance Computing to Examine the Processes of Neurogenesis Underlying Pattern Separation/Completion of Episodic Information - Sandia researchers developed novel methods and metrics for studying the computational function of neurogenesis,thus generating substantial impact to the neuroscience and neural computing communities. This work could benefit applications in machine learning and other analysis activities. The purpose of this project was to computationally model the impact of neural population dynamics within the neurobiological memory system in order to examine how subareas in the brain enable pattern separation and completion of information in memory across time as associated experiences.
Memory Reconsolidation and Computational Learning
2010-03-01
Cooper and H.T. Siegelmann, "Memory Reconsolidation for Natural Language Processing," Cognitive Neurodynamics , 3, 2009: 365-372. M.M. Olsen, N...computerized memories and other state of the art cognitive architectures, our memory system has the ability to process on-line and in real-time as...on both continuous and binary inputs, unlike state of the art methods in case based reasoning and in cognitive architectures, which are bound to
Exploring the use of I/O nodes for computation in a MIMD multiprocessor
NASA Technical Reports Server (NTRS)
Kotz, David; Cai, Ting
1995-01-01
As parallel systems move into the production scientific-computing world, the emphasis will be on cost-effective solutions that provide high throughput for a mix of applications. Cost effective solutions demand that a system make effective use of all of its resources. Many MIMD multiprocessors today, however, distinguish between 'compute' and 'I/O' nodes, the latter having attached disks and being dedicated to running the file-system server. This static division of responsibilities simplifies system management but does not necessarily lead to the best performance in workloads that need a different balance of computation and I/O. Of course, computational processes sharing a node with a file-system service may receive less CPU time, network bandwidth, and memory bandwidth than they would on a computation-only node. In this paper we begin to examine this issue experimentally. We found that high performance I/O does not necessarily require substantial CPU time, leaving plenty of time for application computation. There were some complex file-system requests, however, which left little CPU time available to the application. (The impact on network and memory bandwidth still needs to be determined.) For applications (or users) that cannot tolerate an occasional interruption, we recommend that they continue to use only compute nodes. For tolerant applications needing more cycles than those provided by the compute nodes, we recommend that they take full advantage of both compute and I/O nodes for computation, and that operating systems should make this possible.
Solitonic Josephson-based meminductive systems
Guarcello, Claudio; Solinas, Paolo; Di Ventra, Massimiliano; ...
2017-04-24
Memristors, memcapacitors, and meminductors represent an innovative generation of circuit elements whose properties depend on the state and history of the system. The hysteretic behavior of one of their constituent variables, is their distinctive fingerprint. This feature endows them with the ability to store and process information on the same physical location, a property that is expected to benefit many applications ranging from unconventional computing to adaptive electronics to robotics. Therefore, it is important to find appropriate memory elements that combine a wide range of memory states, long memory retention times, and protection against unavoidable noise. Although several physical systemsmore » belong to the general class of memelements, few of them combine these important physical features in a single component. Here in this paper, we demonstrate theoretically a superconducting memory based on solitonic long Josephson junctions. Moreover, since solitons are at the core of its operation, this system provides an intrinsic topological protection against external perturbations. We show that the Josephson critical current behaves hysteretically as an external magnetic field is properly swept. Accordingly, long Josephson junctions can be used as multi-state memories, with a controllable number of available states, and in other emerging areas such as memcomputing, i.e., computing directly in/by the memory.« less
Solitonic Josephson-based meminductive systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Guarcello, Claudio; Solinas, Paolo; Di Ventra, Massimiliano
Memristors, memcapacitors, and meminductors represent an innovative generation of circuit elements whose properties depend on the state and history of the system. The hysteretic behavior of one of their constituent variables, is their distinctive fingerprint. This feature endows them with the ability to store and process information on the same physical location, a property that is expected to benefit many applications ranging from unconventional computing to adaptive electronics to robotics. Therefore, it is important to find appropriate memory elements that combine a wide range of memory states, long memory retention times, and protection against unavoidable noise. Although several physical systemsmore » belong to the general class of memelements, few of them combine these important physical features in a single component. Here in this paper, we demonstrate theoretically a superconducting memory based on solitonic long Josephson junctions. Moreover, since solitons are at the core of its operation, this system provides an intrinsic topological protection against external perturbations. We show that the Josephson critical current behaves hysteretically as an external magnetic field is properly swept. Accordingly, long Josephson junctions can be used as multi-state memories, with a controllable number of available states, and in other emerging areas such as memcomputing, i.e., computing directly in/by the memory.« less
ERIC Educational Resources Information Center
Oberauer, Klaus; Souza, Alessandra S.; Druey, Michel D.; Gade, Miriam
2013-01-01
The article investigates the mechanisms of selecting and updating representations in declarative and procedural working memory (WM). Declarative WM holds the objects of thought available, whereas procedural WM holds representations of what to do with these objects. Both systems consist of three embedded components: activated long-term memory, a…
Scaling to Nanotechnology Limits with the PIMS Computer Architecture and a new Scaling Rule
DOE Office of Scientific and Technical Information (OSTI.GOV)
Debenedictis, Erik P.
2015-02-01
We describe a new approach to computing that moves towards the limits of nanotechnology using a newly formulated sc aling rule. This is in contrast to the current computer industry scali ng away from von Neumann's original computer at the rate of Moore's Law. We extend Moore's Law to 3D, which l eads generally to architectures that integrate logic and memory. To keep pow er dissipation cons tant through a 2D surface of the 3D structure requires using adiabatic principles. We call our newly proposed architecture Processor In Memory and Storage (PIMS). We propose a new computational model that integratesmore » processing and memory into "tiles" that comprise logic, memory/storage, and communications functions. Since the programming model will be relatively stable as a system scales, programs repr esented by tiles could be executed in a PIMS system built with today's technology or could become the "schematic diagram" for implementation in an ultimate 3D nanotechnology of the future. We build a systems software approach that offers advantages over and above the technological and arch itectural advantages. Firs t, the algorithms may be more efficient in the conventional sens e of having fewer steps. Second, the algorithms may run with higher power efficiency per operation by being a better match for the adiabatic scaling ru le. The performance analysis based on demonstrated ideas in physical science suggests 80,000 x improvement in cost per operation for the (arguably) gene ral purpose function of emulating neurons in Deep Learning.« less
Fractional Steps methods for transient problems on commodity computer architectures
NASA Astrophysics Data System (ADS)
Krotkiewski, M.; Dabrowski, M.; Podladchikov, Y. Y.
2008-12-01
Fractional Steps methods are suitable for modeling transient processes that are central to many geological applications. Low memory requirements and modest computational complexity facilitates calculations on high-resolution three-dimensional models. An efficient implementation of Alternating Direction Implicit/Locally One-Dimensional schemes for an Opteron-based shared memory system is presented. The memory bandwidth usage, the main bottleneck on modern computer architectures, is specially addressed. High efficiency of above 2 GFlops per CPU is sustained for problems of 1 billion degrees of freedom. The optimized sequential implementation of all 1D sweeps is comparable in execution time to copying the used data in the memory. Scalability of the parallel implementation on up to 8 CPUs is close to perfect. Performing one timestep of the Locally One-Dimensional scheme on a system of 1000 3 unknowns on 8 CPUs takes only 11 s. We validate the LOD scheme using a computational model of an isolated inclusion subject to a constant far field flux. Next, we study numerically the evolution of a diffusion front and the effective thermal conductivity of composites consisting of multiple inclusions and compare the results with predictions based on the differential effective medium approach. Finally, application of the developed parabolic solver is suggested for a real-world problem of fluid transport and reactions inside a reservoir.
Automatic Generation of Directive-Based Parallel Programs for Shared Memory Parallel Systems
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Yan, Jerry; Frumkin, Michael
2000-01-01
The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. As great progress was made in hardware and software technologies, performance of parallel programs with compiler directives has demonstrated large improvement. The introduction of OpenMP directives, the industrial standard for shared-memory programming, has minimized the issue of portability. Due to its ease of programming and its good performance, the technique has become very popular. In this study, we have extended CAPTools, a computer-aided parallelization toolkit, to automatically generate directive-based, OpenMP, parallel programs. We outline techniques used in the implementation of the tool and present test results on the NAS parallel benchmarks and ARC3D, a CFD application. This work demonstrates the great potential of using computer-aided tools to quickly port parallel programs and also achieve good performance.
Cryogenic Memories based on Spin-Singlet and Spin-Triplet Ferromagnetic Josephson Junctions
NASA Astrophysics Data System (ADS)
Gingrich, Eric
The last several decades have seen an explosion in the use and size of computers for scientific applications. The US Department of Energy has set an ExaScale computing goal for high performance computing that is projected to be unattainable by current CMOS computing designs. This has led to a renewed interest in superconducting computing as a means of beating these projections. One of the primary requirements of this thrust is the development of an efficient cryogenic memory. Estimates of power consumption of early Rapid Single Flux Quantum (RSFQ) memory designs are on the order of MW, far too steep for any real application. Therefore, other memory concepts are required. S/F/S Josephson Junctions, a class of device in which two superconductors (S) are separated by one or more ferromagnetic layers (F) has shown promise as a memory element. Several different systems have been proposed utilizing either the spin-singlet or spin-triplet superconducting states. This talk will discuss the concepts underpinning these devices, and the recent work done to demonstrate their feasibility. This research is supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via U.S. Army Research Office Contract W911NF-14-C-0115.
Multiprocessor switch with selective pairing
Gara, Alan; Gschwind, Michael K; Salapura, Valentina
2014-03-11
System, method and computer program product for a multiprocessing system to offer selective pairing of processor cores for increased processing reliability. A selective pairing facility is provided that selectively connects, i.e., pairs, multiple microprocessor or processor cores to provide one highly reliable thread (or thread group). Each paired microprocessor or processor cores that provide one highly reliable thread for high-reliability connect with a system components such as a memory "nest" (or memory hierarchy), an optional system controller, and optional interrupt controller, optional I/O or peripheral devices, etc. The memory nest is attached to a selective pairing facility via a switch or a bus
MIDAS - ESO's new image processing system
NASA Astrophysics Data System (ADS)
Banse, K.; Crane, P.; Grosbol, P.; Middleburg, F.; Ounnas, C.; Ponz, D.; Waldthausen, H.
1983-03-01
The Munich Image Data Analysis System (MIDAS) is an image processing system whose heart is a pair of VAX 11/780 computers linked together via DECnet. One of these computers, VAX-A, is equipped with 3.5 Mbytes of memory, 1.2 Gbytes of disk storage, and two tape drives with 800/1600 bpi density. The other computer, VAX-B, has 4.0 Mbytes of memory, 688 Mbytes of disk storage, and one tape drive with 1600/6250 bpi density. MIDAS is a command-driven system geared toward the interactive user. The type and number of parameters in a command depends on the unique parameter invoked. MIDAS is a highly modular system that provides building blocks for the undertaking of more sophisticated applications. Presently, 175 commands are available. These include the modification of the color-lookup table interactively, to enhance various image features, and the interactive extraction of subimages.
Intelligent holographic databases
NASA Astrophysics Data System (ADS)
Barbastathis, George
Memory is a key component of intelligence. In the human brain, physical structure and functionality jointly provide diverse memory modalities at multiple time scales. How could we engineer artificial memories with similar faculties? In this thesis, we attack both hardware and algorithmic aspects of this problem. A good part is devoted to holographic memory architectures, because they meet high capacity and parallelism requirements. We develop and fully characterize shift multiplexing, a novel storage method that simplifies disk head design for holographic disks. We develop and optimize the design of compact refreshable holographic random access memories, showing several ways that 1 Tbit can be stored holographically in volume less than 1 m3, with surface density more than 20 times higher than conventional silicon DRAM integrated circuits. To address the issue of photorefractive volatility, we further develop the two-lambda (dual wavelength) method for shift multiplexing, and combine electrical fixing with angle multiplexing to demonstrate 1,000 multiplexed fixed holograms. Finally, we propose a noise model and an information theoretic metric to optimize the imaging system of a holographic memory, in terms of storage density and error rate. Motivated by the problem of interfacing sensors and memories to a complex system with limited computational resources, we construct a computer game of Desert Survival, built as a high-dimensional non-stationary virtual environment in a competitive setting. The efficacy of episodic learning, implemented as a reinforced Nearest Neighbor scheme, and the probability of winning against a control opponent improve significantly by concentrating the algorithmic effort to the virtual desert neighborhood that emerges as most significant at any time. The generalized computational model combines the autonomous neural network and von Neumann paradigms through a compact, dynamic central representation, which contains the most salient features of the sensory inputs, fused with relevant recollections, reminiscent of the hypothesized cognitive function of awareness. The Declarative Memory is searched both by content and address, suggesting a holographic implementation. The proposed computer architecture may lead to a novel paradigm that solves 'hard' cognitive problems at low cost.
Exploiting short-term memory in soft body dynamics as a computational resource
Nakajima, K.; Li, T.; Hauser, H.; Pfeifer, R.
2014-01-01
Soft materials are not only highly deformable, but they also possess rich and diverse body dynamics. Soft body dynamics exhibit a variety of properties, including nonlinearity, elasticity and potentially infinitely many degrees of freedom. Here, we demonstrate that such soft body dynamics can be employed to conduct certain types of computation. Using body dynamics generated from a soft silicone arm, we show that they can be exploited to emulate functions that require memory and to embed robust closed-loop control into the arm. Our results suggest that soft body dynamics have a short-term memory and can serve as a computational resource. This finding paves the way towards exploiting passive body dynamics for control of a large class of underactuated systems. PMID:25185579
NASA Astrophysics Data System (ADS)
Speidel, Steven
1992-08-01
Our ultimate goal is to develop neural-like cognitive sensory processing within non-neuronal systems. Toward this end, computational models are being developed for selectivity attending the task-relevant parts of composite sensory excitations in an example sound processing application. Significant stimuli partials are selectively attended through the use of generalized neural adaptive beamformers. Computational components are being tested by experiment in the laboratory and also by use of recordings from sensor deployments in the ocean. Results will be presented. These computational components are being integrated into a comprehensive processing architecture that simultaneously attends memory according to stimuli, attends stimuli according to memory, and attends stimuli and memory according to an ongoing thought process. The proposed neural architecture is potentially very fast when implemented in special hardware.
NASA Astrophysics Data System (ADS)
Lai, Siyan; Xu, Ying; Shao, Bo; Guo, Menghan; Lin, Xiaola
2017-04-01
In this paper we study on Monte Carlo method for solving systems of linear algebraic equations (SLAE) based on shared memory. Former research demostrated that GPU can effectively speed up the computations of this issue. Our purpose is to optimize Monte Carlo method simulation on GPUmemoryachritecture specifically. Random numbers are organized to storein shared memory, which aims to accelerate the parallel algorithm. Bank conflicts can be avoided by our Collaborative Thread Arrays(CTA)scheme. The results of experiments show that the shared memory based strategy can speed up the computaions over than 3X at most.
Chemical Memory Reactions Induced Bursting Dynamics in Gene Expression
Tian, Tianhai
2013-01-01
Memory is a ubiquitous phenomenon in biological systems in which the present system state is not entirely determined by the current conditions but also depends on the time evolutionary path of the system. Specifically, many memorial phenomena are characterized by chemical memory reactions that may fire under particular system conditions. These conditional chemical reactions contradict to the extant stochastic approaches for modeling chemical kinetics and have increasingly posed significant challenges to mathematical modeling and computer simulation. To tackle the challenge, I proposed a novel theory consisting of the memory chemical master equations and memory stochastic simulation algorithm. A stochastic model for single-gene expression was proposed to illustrate the key function of memory reactions in inducing bursting dynamics of gene expression that has been observed in experiments recently. The importance of memory reactions has been further validated by the stochastic model of the p53-MDM2 core module. Simulations showed that memory reactions is a major mechanism for realizing both sustained oscillations of p53 protein numbers in single cells and damped oscillations over a population of cells. These successful applications of the memory modeling framework suggested that this innovative theory is an effective and powerful tool to study memory process and conditional chemical reactions in a wide range of complex biological systems. PMID:23349679
Chemical memory reactions induced bursting dynamics in gene expression.
Tian, Tianhai
2013-01-01
Memory is a ubiquitous phenomenon in biological systems in which the present system state is not entirely determined by the current conditions but also depends on the time evolutionary path of the system. Specifically, many memorial phenomena are characterized by chemical memory reactions that may fire under particular system conditions. These conditional chemical reactions contradict to the extant stochastic approaches for modeling chemical kinetics and have increasingly posed significant challenges to mathematical modeling and computer simulation. To tackle the challenge, I proposed a novel theory consisting of the memory chemical master equations and memory stochastic simulation algorithm. A stochastic model for single-gene expression was proposed to illustrate the key function of memory reactions in inducing bursting dynamics of gene expression that has been observed in experiments recently. The importance of memory reactions has been further validated by the stochastic model of the p53-MDM2 core module. Simulations showed that memory reactions is a major mechanism for realizing both sustained oscillations of p53 protein numbers in single cells and damped oscillations over a population of cells. These successful applications of the memory modeling framework suggested that this innovative theory is an effective and powerful tool to study memory process and conditional chemical reactions in a wide range of complex biological systems.
NASA Astrophysics Data System (ADS)
Liu, Jiping; Kang, Xiaochen; Dong, Chun; Xu, Shenghua
2017-12-01
Surface area estimation is a widely used tool for resource evaluation in the physical world. When processing large scale spatial data, the input/output (I/O) can easily become the bottleneck in parallelizing the algorithm due to the limited physical memory resources and the very slow disk transfer rate. In this paper, we proposed a stream tilling approach to surface area estimation that first decomposed a spatial data set into tiles with topological expansions. With these tiles, the one-to-one mapping relationship between the input and the computing process was broken. Then, we realized a streaming framework towards the scheduling of the I/O processes and computing units. Herein, each computing unit encapsulated a same copy of the estimation algorithm, and multiple asynchronous computing units could work individually in parallel. Finally, the performed experiment demonstrated that our stream tilling estimation can efficiently alleviate the heavy pressures from the I/O-bound work, and the measured speedup after being optimized have greatly outperformed the directly parallel versions in shared memory systems with multi-core processors.
Accelerating 3D Elastic Wave Equations on Knights Landing based Intel Xeon Phi processors
NASA Astrophysics Data System (ADS)
Sourouri, Mohammed; Birger Raknes, Espen
2017-04-01
In advanced imaging methods like reverse-time migration (RTM) and full waveform inversion (FWI) the elastic wave equation (EWE) is numerically solved many times to create the seismic image or the elastic parameter model update. Thus, it is essential to optimize the solution time for solving the EWE as this will have a major impact on the total computational cost in running RTM or FWI. From a computational point of view applications implementing EWEs are associated with two major challenges. The first challenge is the amount of memory-bound computations involved, while the second challenge is the execution of such computations over very large datasets. So far, multi-core processors have not been able to tackle these two challenges, which eventually led to the adoption of accelerators such as Graphics Processing Units (GPUs). Compared to conventional CPUs, GPUs are densely populated with many floating-point units and fast memory, a type of architecture that has proven to map well to many scientific computations. Despite its architectural advantages, full-scale adoption of accelerators has yet to materialize. First, accelerators require a significant programming effort imposed by programming models such as CUDA or OpenCL. Second, accelerators come with a limited amount of memory, which also require explicit data transfers between the CPU and the accelerator over the slow PCI bus. The second generation of the Xeon Phi processor based on the Knights Landing (KNL) architecture, promises the computational capabilities of an accelerator but require the same programming effort as traditional multi-core processors. The high computational performance is realized through many integrated cores (number of cores and tiles and memory varies with the model) organized in tiles that are connected via a 2D mesh based interconnect. In contrary to accelerators, KNL is a self-hosted system, meaning explicit data transfers over the PCI bus are no longer required. However, like most accelerators, KNL sports a memory subsystem consisting of low-level caches and 16GB of high-bandwidth MCDRAM memory. For capacity computing, up to 400GB of conventional DDR4 memory is provided. Such a strict hierarchical memory layout means that data locality is imperative if the true potential of this product is to be harnessed. In this work, we study a series of optimizations specifically targeting KNL for our EWE based application to reduce the time-to-solution time for the following 3D model sizes in grid points: 1283, 2563 and 5123. We compare the results with an optimized version for multi-core CPUs running on a dual-socket Xeon E5 2680v3 system using OpenMP. Our initial naive implementation on the KNL is roughly 20% faster than the multi-core version, but by using only one thread per core and careful memory placement using the memkind library, we could achieve higher speedups. Additionally, by using the MCDRAM as cache for problem sizes that are smaller than 16 GB further performance improvements were unlocked. Depending on the problem size, our overall results indicate that the KNL based system is approximately 2.2x faster than the 24-core Xeon E5 2680v3 system, with only modest changes to the code.
NASA Technical Reports Server (NTRS)
Hamilton, M. H.
1972-01-01
Erasable-memory programs designed for guidance computers used in command and lunar modules are presented. The purpose, functional description, assumptions, restrictions, and imitations are given for each program.
2014-09-01
not losing track of the original facts of the situation. However, hippocampal episodic memory also has limitations – it operates one memory at a...ability to strategically control the use of episodic memory . Specific areas of PFC are implicated as these episodic control structures, including...certainly start by encoding the problem into hippocampal episodic memory , so they can retrieve it when interference overtakes the system and they
Programmable Direct-Memory-Access Controller
NASA Technical Reports Server (NTRS)
Hendry, David F.
1990-01-01
Proposed programmable direct-memory-access controller (DMAC) operates with computer systems of 32000 series, which have 32-bit data buses and use addresses of 24 (or potentially 32) bits. Controller functions with or without help of central processing unit (CPU) and starts itself. Includes such advanced features as ability to compare two blocks of memory for equality and to search block of memory for specific value. Made as single very-large-scale integrated-circuit chip.
NASA Technical Reports Server (NTRS)
Rogers, David
1988-01-01
The advent of the Connection Machine profoundly changes the world of supercomputers. The highly nontraditional architecture makes possible the exploration of algorithms that were impractical for standard Von Neumann architectures. Sparse distributed memory (SDM) is an example of such an algorithm. Sparse distributed memory is a particularly simple and elegant formulation for an associative memory. The foundations for sparse distributed memory are described, and some simple examples of using the memory are presented. The relationship of sparse distributed memory to three important computational systems is shown: random-access memory, neural networks, and the cerebellum of the brain. Finally, the implementation of the algorithm for sparse distributed memory on the Connection Machine is discussed.
Hardware enabled performance counters with support for operating system context switching
Salapura, Valentina; Wisniewski, Robert W.
2015-06-30
A device for supporting hardware enabled performance counters with support for context switching include a plurality of performance counters operable to collect information associated with one or more computer system related activities, a first register operable to store a memory address, a second register operable to store a mode indication, and a state machine operable to read the second register and cause the plurality of performance counters to copy the information to memory area indicated by the memory address based on the mode indication.
Recursive computer architecture for VLSI
DOE Office of Scientific and Technical Information (OSTI.GOV)
Treleaven, P.C.; Hopkins, R.P.
1982-01-01
A general-purpose computer architecture based on the concept of recursion and suitable for VLSI computer systems built from replicated (lego-like) computing elements is presented. The recursive computer architecture is defined by presenting a program organisation, a machine organisation and an experimental machine implementation oriented to VLSI. The experimental implementation is being restricted to simple, identical microcomputers each containing a memory, a processor and a communications capability. This future generation of lego-like computer systems are termed fifth generation computers by the Japanese. 30 references.
Computer technologies and institutional memory
NASA Technical Reports Server (NTRS)
Bell, Christopher; Lachman, Roy
1989-01-01
NASA programs for manned space flight are in their 27th year. Scientists and engineers who worked continuously on the development of aerospace technology during that period are approaching retirement. The resulting loss to the organization will be considerable. Although this problem is general to the NASA community, the problem was explored in terms of the institutional memory and technical expertise of a single individual in the Man-Systems division. The main domain of the expert was spacecraft lighting, which became the subject area for analysis in these studies. The report starts with an analysis of the cumulative expertise and institutional memory of technical employees of organizations such as NASA. A set of solutions to this problem are examined and found inadequate. Two solutions were investigated at length: hypertext and expert systems. Illustrative examples were provided of hypertext and expert system representation of spacecraft lighting. These computer technologies can be used to ameliorate the problem of the loss of invaluable personnel.
Blurriness in Live Forensics: An Introduction
NASA Astrophysics Data System (ADS)
Savoldi, Antonio; Gubian, Paolo
The Live Forensics discipline aims at answering basic questions related to a digital crime, which usually involves a computer-based system. The investigation should be carried out with the very goal to establish which processes were running, when they were started and by whom, what specific activities those processes were doing and the state of active network connections. Besides, a set of tools needs to be launched on the running system by altering, as a consequence of the Locard’s exchange principle [2], the system’s memory. All the methodologies for the live forensics field proposed until now have a basic, albeit important, weakness, which is the inability to quantify the perturbation, or blurriness, of the system’s memory of the investigated computer. This is the very last goal of this paper: to provide a set of guidelines which can be effectively used for measuring the uncertainty of the collected volatile memory on a live system being investigated.
Li, Ji-Qing; Zhang, Yu-Shan; Ji, Chang-Ming; Wang, Ai-Jing; Lund, Jay R
2013-01-01
This paper examines long-term optimal operation using dynamic programming for a large hydropower system of 10 reservoirs in Northeast China. Besides considering flow and hydraulic head, the optimization explicitly includes time-varying electricity market prices to maximize benefit. Two techniques are used to reduce the 'curse of dimensionality' of dynamic programming with many reservoirs. Discrete differential dynamic programming (DDDP) reduces the search space and computer memory needed. Object-oriented programming (OOP) and the ability to dynamically allocate and release memory with the C++ language greatly reduces the cumulative effect of computer memory for solving multi-dimensional dynamic programming models. The case study shows that the model can reduce the 'curse of dimensionality' and achieve satisfactory results.
Robust uncertainty evaluation for system identification on distributed wireless platforms
NASA Astrophysics Data System (ADS)
Crinière, Antoine; Döhler, Michael; Le Cam, Vincent; Mevel, Laurent
2016-04-01
Health monitoring of civil structures by system identification procedures from automatic control is now accepted as a valid approach. These methods provide frequencies and modeshapes from the structure over time. For a continuous monitoring the excitation of a structure is usually ambient, thus unknown and assumed to be noise. Hence, all estimates from the vibration measurements are realizations of random variables with inherent uncertainty due to (unknown) process and measurement noise and finite data length. The underlying algorithms are usually running under Matlab under the assumption of large memory pool and considerable computational power. Even under these premises, computational and memory usage are heavy and not realistic for being embedded in on-site sensor platforms such as the PEGASE platform. Moreover, the current push for distributed wireless systems calls for algorithmic adaptation for lowering data exchanges and maximizing local processing. Finally, the recent breakthrough in system identification allows us to process both frequency information and its related uncertainty together from one and only one data sequence, at the expense of computational and memory explosion that require even more careful attention than before. The current approach will focus on presenting a system identification procedure called multi-setup subspace identification that allows to process both frequencies and their related variances from a set of interconnected wireless systems with all computation running locally within the limited memory pool of each system before being merged on a host supervisor. Careful attention will be given to data exchanges and I/O satisfying OGC standards, as well as minimizing memory footprints and maximizing computational efficiency. Those systems are built in a way of autonomous operations on field and could be later included in a wide distributed architecture such as the Cloud2SM project. The usefulness of these strategies is illustrated on data from a progressive damage action on a prestressed concrete bridge. References [1] E. Carden and P. Fanning. Vibration based condition monitoring: a review. Structural Health Monitoring, 3(4):355-377, 2004. [2] M. Döhler and L. Mevel. Efficient multi-order uncertainty computation for stochastic subspace identification. Mechanical Systems and Signal Processing, 38(2):346-366, 2013. [3] M.Döhler, L. Mevel. Modular subspace-based system identification from multi-setup measurements. IEEE Transactions on Automatic Control, 57(11):2951-2956, 2012. [4] M. Döhler, X.-B. Lam, and L. Mevel. Uncertainty quantification for modal parameters from stochastic subspace identification on multi-setup measurements. MechanicalSystems and Signal Processing, 36(2):562-581, 2013. [5] A Crinière, J Dumoulin, L Mevel, G Andrade-Barosso, M Simonin. The Cloud2SM Project.European Geosciences Union General Assembly (EGU2015), Apr 2015, Vienne, Austria. 2015.
NASA Technical Reports Server (NTRS)
1991-01-01
Various papers on supercomputing are presented. The general topics addressed include: program analysis/data dependence, memory access, distributed memory code generation, numerical algorithms, supercomputer benchmarks, latency tolerance, parallel programming, applications, processor design, networks, performance tools, mapping and scheduling, characterization affecting performance, parallelism packaging, computing climate change, combinatorial algorithms, hardware and software performance issues, system issues. (No individual items are abstracted in this volume)
This Is Your Brain: A Decision-Making Machine
2015-11-01
brain has vast comput-ing power that performs a plethora of vital tasks. It regu-lates your bodily functions, movements and emotions . It processes and...system beneath the cerebrum and associated with long-term memory and emotions . In our “The brain is a wonderful organ. It starts working when you get...presence of perceived danger. Long-term memories and experiences also are stored here, often along with their emotional connections to pain or
Reinforcement Learning and Episodic Memory in Humans and Animals: An Integrative Framework.
Gershman, Samuel J; Daw, Nathaniel D
2017-01-03
We review the psychology and neuroscience of reinforcement learning (RL), which has experienced significant progress in the past two decades, enabled by the comprehensive experimental study of simple learning and decision-making tasks. However, one challenge in the study of RL is computational: The simplicity of these tasks ignores important aspects of reinforcement learning in the real world: (a) State spaces are high-dimensional, continuous, and partially observable; this implies that (b) data are relatively sparse and, indeed, precisely the same situation may never be encountered twice; furthermore, (c) rewards depend on the long-term consequences of actions in ways that violate the classical assumptions that make RL tractable. A seemingly distinct challenge is that, cognitively, theories of RL have largely involved procedural and semantic memory, the way in which knowledge about action values or world models extracted gradually from many experiences can drive choice. This focus on semantic memory leaves out many aspects of memory, such as episodic memory, related to the traces of individual events. We suggest that these two challenges are related. The computational challenge can be dealt with, in part, by endowing RL systems with episodic memory, allowing them to (a) efficiently approximate value functions over complex state spaces, (b) learn with very little data, and (c) bridge long-term dependencies between actions and rewards. We review the computational theory underlying this proposal and the empirical evidence to support it. Our proposal suggests that the ubiquitous and diverse roles of memory in RL may function as part of an integrated learning system.
Collectively loading an application in a parallel computer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aho, Michael E.; Attinella, John E.; Gooding, Thomas M.
Collectively loading an application in a parallel computer, the parallel computer comprising a plurality of compute nodes, including: identifying, by a parallel computer control system, a subset of compute nodes in the parallel computer to execute a job; selecting, by the parallel computer control system, one of the subset of compute nodes in the parallel computer as a job leader compute node; retrieving, by the job leader compute node from computer memory, an application for executing the job; and broadcasting, by the job leader to the subset of compute nodes in the parallel computer, the application for executing the job.
Reservoir Computing Beyond Memory-Nonlinearity Trade-off.
Inubushi, Masanobu; Yoshimura, Kazuyuki
2017-08-31
Reservoir computing is a brain-inspired machine learning framework that employs a signal-driven dynamical system, in particular harnessing common-signal-induced synchronization which is a widely observed nonlinear phenomenon. Basic understanding of a working principle in reservoir computing can be expected to shed light on how information is stored and processed in nonlinear dynamical systems, potentially leading to progress in a broad range of nonlinear sciences. As a first step toward this goal, from the viewpoint of nonlinear physics and information theory, we study the memory-nonlinearity trade-off uncovered by Dambre et al. (2012). Focusing on a variational equation, we clarify a dynamical mechanism behind the trade-off, which illustrates why nonlinear dynamics degrades memory stored in dynamical system in general. Moreover, based on the trade-off, we propose a mixture reservoir endowed with both linear and nonlinear dynamics and show that it improves the performance of information processing. Interestingly, for some tasks, significant improvements are observed by adding a few linear dynamics to the nonlinear dynamical system. By employing the echo state network model, the effect of the mixture reservoir is numerically verified for a simple function approximation task and for more complex tasks.
NASA Astrophysics Data System (ADS)
Bernier, Jean D.
1991-09-01
The imaging in real time of infrared background scenes with the Naval Postgraduate School Infrared Search and Target Designation (NPS-IRSTD) System was achieved through extensive software developments in protected mode assembly language on an Intel 80386 33 MHz computer. The new software processes the 512 by 480 pixel images directly in the extended memory area of the computer where the DT-2861 frame grabber memory buffers are mapped. Direct interfacing, through a JDR-PR10 prototype card, between the frame grabber and the host computer AT bus enables each load of the frame grabber memory buffers to be effected under software control. The protected mode assembly language program can refresh the display of a six degree pseudo-color sector in the scanner rotation within the two second period of the scanner. A study of the imaging properties of the NPS-IRSTD is presented with preliminary work on image analysis and contrast enhancement of infrared background scenes.
Rainsford, M; Palmer, M A; Paine, G
2018-04-01
Despite numerous innovative studies, rates of replication in the field of music psychology are extremely low (Frieler et al., 2013). Two key methodological challenges affecting researchers wishing to administer and reproduce studies in music cognition are the difficulty of measuring musical responses, particularly when conducting free-recall studies, and access to a reliable set of novel stimuli unrestricted by copyright or licensing issues. In this article, we propose a solution for these challenges in computer-based administration. We present a computer-based application for testing memory for melodies. Created using the software Max/MSP (Cycling '74, 2014a), the MUSOS (Music Software System) Toolkit uses a simple modular framework configurable for testing common paradigms such as recall, old-new recognition, and stem completion. The program is accompanied by a stimulus set of 156 novel, copyright-free melodies, in audio and Max/MSP file formats. Two pilot tests were conducted to establish the properties of the accompanying stimulus set that are relevant to music cognition and general memory research. By using this software, a researcher without specialist musical training may administer and accurately measure responses from common paradigms used in the study of memory for music.
A malicious pattern detection engine for embedded security systems in the Internet of Things.
Oh, Doohwan; Kim, Deokho; Ro, Won Woo
2014-12-16
With the emergence of the Internet of Things (IoT), a large number of physical objects in daily life have been aggressively connected to the Internet. As the number of objects connected to networks increases, the security systems face a critical challenge due to the global connectivity and accessibility of the IoT. However, it is difficult to adapt traditional security systems to the objects in the IoT, because of their limited computing power and memory size. In light of this, we present a lightweight security system that uses a novel malicious pattern-matching engine. We limit the memory usage of the proposed system in order to make it work on resource-constrained devices. To mitigate performance degradation due to limitations of computation power and memory, we propose two novel techniques, auxiliary shifting and early decision. Through both techniques, we can efficiently reduce the number of matching operations on resource-constrained systems. Experiments and performance analyses show that our proposed system achieves a maximum speedup of 2.14 with an IoT object and provides scalable performance for a large number of patterns.
Integrating Cache Performance Modeling and Tuning Support in Parallelization Tools
NASA Technical Reports Server (NTRS)
Waheed, Abdul; Yan, Jerry; Saini, Subhash (Technical Monitor)
1998-01-01
With the resurgence of distributed shared memory (DSM) systems based on cache-coherent Non Uniform Memory Access (ccNUMA) architectures and increasing disparity between memory and processors speeds, data locality overheads are becoming the greatest bottlenecks in the way of realizing potential high performance of these systems. While parallelization tools and compilers facilitate the users in porting their sequential applications to a DSM system, a lot of time and effort is needed to tune the memory performance of these applications to achieve reasonable speedup. In this paper, we show that integrating cache performance modeling and tuning support within a parallelization environment can alleviate this problem. The Cache Performance Modeling and Prediction Tool (CPMP), employs trace-driven simulation techniques without the overhead of generating and managing detailed address traces. CPMP predicts the cache performance impact of source code level "what-if" modifications in a program to assist a user in the tuning process. CPMP is built on top of a customized version of the Computer Aided Parallelization Tools (CAPTools) environment. Finally, we demonstrate how CPMP can be applied to tune a real Computational Fluid Dynamics (CFD) application.
Error correcting code with chip kill capability and power saving enhancement
Gara, Alan G [Mount Kisco, NY; Chen, Dong [Croton On Husdon, NY; Coteus, Paul W [Yorktown Heights, NY; Flynn, William T [Rochester, MN; Marcella, James A [Rochester, MN; Takken, Todd [Brewster, NY; Trager, Barry M [Yorktown Heights, NY; Winograd, Shmuel [Scarsdale, NY
2011-08-30
A method and system are disclosed for detecting memory chip failure in a computer memory system. The method comprises the steps of accessing user data from a set of user data chips, and testing the user data for errors using data from a set of system data chips. This testing is done by generating a sequence of check symbols from the user data, grouping the user data into a sequence of data symbols, and computing a specified sequence of syndromes. If all the syndromes are zero, the user data has no errors. If one of the syndromes is non-zero, then a set of discriminator expressions are computed, and used to determine whether a single or double symbol error has occurred. In the preferred embodiment, less than two full system data chips are used for testing and correcting the user data.
Raster Scan Computer Image Generation (CIG) System Based On Refresh Memory
NASA Astrophysics Data System (ADS)
Dichter, W.; Doris, K.; Conkling, C.
1982-06-01
A full color, Computer Image Generation (CIG) raster visual system has been developed which provides a high level of training sophistication by utilizing advanced semiconductor technology and innovative hardware and firmware techniques. Double buffered refresh memory and efficient algorithms eliminate the problem of conventional raster line ordering by allowing the generated image to be stored in a random fashion. Modular design techniques and simplified architecture provide significant advantages in reduced system cost, standardization of parts, and high reliability. The major system components are a general purpose computer to perform interfacing and data base functions; a geometric processor to define the instantaneous scene image; a display generator to convert the image to a video signal; an illumination control unit which provides final image processing; and a CRT monitor for display of the completed image. Additional optional enhancements include texture generators, increased edge and occultation capability, curved surface shading, and data base extensions.
Nonvolatile reconfigurable sequential logic in a HfO2 resistive random access memory array.
Zhou, Ya-Xiong; Li, Yi; Su, Yu-Ting; Wang, Zhuo-Rui; Shih, Ling-Yi; Chang, Ting-Chang; Chang, Kuan-Chang; Long, Shi-Bing; Sze, Simon M; Miao, Xiang-Shui
2017-05-25
Resistive random access memory (RRAM) based reconfigurable logic provides a temporal programmable dimension to realize Boolean logic functions and is regarded as a promising route to build non-von Neumann computing architecture. In this work, a reconfigurable operation method is proposed to perform nonvolatile sequential logic in a HfO 2 -based RRAM array. Eight kinds of Boolean logic functions can be implemented within the same hardware fabrics. During the logic computing processes, the RRAM devices in an array are flexibly configured in a bipolar or complementary structure. The validity was demonstrated by experimentally implemented NAND and XOR logic functions and a theoretically designed 1-bit full adder. With the trade-off between temporal and spatial computing complexity, our method makes better use of limited computing resources, thus provides an attractive scheme for the construction of logic-in-memory systems.
Metal oxide resistive random access memory based synaptic devices for brain-inspired computing
NASA Astrophysics Data System (ADS)
Gao, Bin; Kang, Jinfeng; Zhou, Zheng; Chen, Zhe; Huang, Peng; Liu, Lifeng; Liu, Xiaoyan
2016-04-01
The traditional Boolean computing paradigm based on the von Neumann architecture is facing great challenges for future information technology applications such as big data, the Internet of Things (IoT), and wearable devices, due to the limited processing capability issues such as binary data storage and computing, non-parallel data processing, and the buses requirement between memory units and logic units. The brain-inspired neuromorphic computing paradigm is believed to be one of the promising solutions for realizing more complex functions with a lower cost. To perform such brain-inspired computing with a low cost and low power consumption, novel devices for use as electronic synapses are needed. Metal oxide resistive random access memory (ReRAM) devices have emerged as the leading candidate for electronic synapses. This paper comprehensively addresses the recent work on the design and optimization of metal oxide ReRAM-based synaptic devices. A performance enhancement methodology and optimized operation scheme to achieve analog resistive switching and low-energy training behavior are provided. A three-dimensional vertical synapse network architecture is proposed for high-density integration and low-cost fabrication. The impacts of the ReRAM synaptic device features on the performances of neuromorphic systems are also discussed on the basis of a constructed neuromorphic visual system with a pattern recognition function. Possible solutions to achieve the high recognition accuracy and efficiency of neuromorphic systems are presented.
NASA Technical Reports Server (NTRS)
Denning, Peter J.
1988-01-01
Accidental overwriting of files or of memory regions belonging to other programs, browsing of personal files by superusers, Trojan horses, and viruses are examples of breakdowns in workstations and personal computers that would be significantly reduced by memory protection. Memory protection is the capability of an operating system and supporting hardware to delimit segments of memory, to control whether segments can be read from or written into, and to confine accesses of a program to its segments alone. The absence of memory protection in many operating systems today is the result of a bias toward a narrow definition of performance as maximum instruction-execution rate. A broader definition, including the time to get the job done, makes clear that cost of recovery from memory interference errors reduces expected performance. The mechanisms of memory protection are well understood, powerful, efficient, and elegant. They add to performance in the broad sense without reducing instruction execution rate.
WinHPC System Configuration | High-Performance Computing | NREL
CPUs with 48GB of memory. Node 04 has dual Intel Xeon E5530 CPUs with 24GB of memory. Nodes 05-20 have dual AMD Opteron 2374 HE CPUs with 16GB of memory. Nodes 21-30 have been decommissioned. Nodes 31-35 have dual Intel Xeon X5675 CPUs with 48GB of memory. Nodes 36-37 have dual Intel Xeon E5-2680 CPUs with
NASA Astrophysics Data System (ADS)
Bjorklund, E.
1994-12-01
In the 1970s, when computers were memory limited, operating system designers created the concept of "virtual memory", which gave users the ability to address more memory than physically existed. In the 1990s, many large control systems have the potential of becoming data limited. We propose that many of the principles behind virtual memory systems (working sets, locality, caching and clustering) can also be applied to data-limited systems, creating, in effect, "virtual data systems". At the Los Alamos National Laboratory's Clinton P. Anderson Meson Physics Facility (LAMPF), we have applied these principles to a moderately sized (10 000 data points) data acquisition and control system. To test the principles, we measured the system's performance during tune-up, production, and maintenance periods. In this paper, we present a general discussion of the principles of a virtual data system along with some discussion of our own implementation and the results of our performance measurements.
ERIC Educational Resources Information Center
Yilmaz, Ramazan; Karaoglan Yilmaz, Fatma Gizem; Kilic Cakmak, Ebru
2017-01-01
The purpose of this study is to examine the impacts of transactive memory system (TMS) and interaction platforms in computer-supported collaborative learning (CSCL) on social presence perceptions and self-regulation skills of learners. Within the scope of the study, social presence perceptions and self-regulation skills of students in…
Multistate Memristive Tantalum Oxide Devices for Ternary Arithmetic
Kim, Wonjoo; Chattopadhyay, Anupam; Siemon, Anne; Linn, Eike; Waser, Rainer; Rana, Vikas
2016-01-01
Redox-based resistive switching random access memory (ReRAM) offers excellent properties to implement future non-volatile memory arrays. Recently, the capability of two-state ReRAMs to implement Boolean logic functionality gained wide interest. Here, we report on seven-states Tantalum Oxide Devices, which enable the realization of an intrinsic modular arithmetic using a ternary number system. Modular arithmetic, a fundamental system for operating on numbers within the limit of a modulus, is known to mathematicians since the days of Euclid and finds applications in diverse areas ranging from e-commerce to musical notations. We demonstrate that multistate devices not only reduce the storage area consumption drastically, but also enable novel in-memory operations, such as computing using high-radix number systems, which could not be implemented using two-state devices. The use of high radix number system reduces the computational complexity by reducing the number of needed digits. Thus the number of calculation operations in an addition and the number of logic devices can be reduced. PMID:27834352
Multistate Memristive Tantalum Oxide Devices for Ternary Arithmetic.
Kim, Wonjoo; Chattopadhyay, Anupam; Siemon, Anne; Linn, Eike; Waser, Rainer; Rana, Vikas
2016-11-11
Redox-based resistive switching random access memory (ReRAM) offers excellent properties to implement future non-volatile memory arrays. Recently, the capability of two-state ReRAMs to implement Boolean logic functionality gained wide interest. Here, we report on seven-states Tantalum Oxide Devices, which enable the realization of an intrinsic modular arithmetic using a ternary number system. Modular arithmetic, a fundamental system for operating on numbers within the limit of a modulus, is known to mathematicians since the days of Euclid and finds applications in diverse areas ranging from e-commerce to musical notations. We demonstrate that multistate devices not only reduce the storage area consumption drastically, but also enable novel in-memory operations, such as computing using high-radix number systems, which could not be implemented using two-state devices. The use of high radix number system reduces the computational complexity by reducing the number of needed digits. Thus the number of calculation operations in an addition and the number of logic devices can be reduced.
Multistate Memristive Tantalum Oxide Devices for Ternary Arithmetic
NASA Astrophysics Data System (ADS)
Kim, Wonjoo; Chattopadhyay, Anupam; Siemon, Anne; Linn, Eike; Waser, Rainer; Rana, Vikas
2016-11-01
Redox-based resistive switching random access memory (ReRAM) offers excellent properties to implement future non-volatile memory arrays. Recently, the capability of two-state ReRAMs to implement Boolean logic functionality gained wide interest. Here, we report on seven-states Tantalum Oxide Devices, which enable the realization of an intrinsic modular arithmetic using a ternary number system. Modular arithmetic, a fundamental system for operating on numbers within the limit of a modulus, is known to mathematicians since the days of Euclid and finds applications in diverse areas ranging from e-commerce to musical notations. We demonstrate that multistate devices not only reduce the storage area consumption drastically, but also enable novel in-memory operations, such as computing using high-radix number systems, which could not be implemented using two-state devices. The use of high radix number system reduces the computational complexity by reducing the number of needed digits. Thus the number of calculation operations in an addition and the number of logic devices can be reduced.
Time-resolved EPR spectroscopy in a Unix environment.
Lacoff, N M; Franke, J E; Warden, J T
1990-02-01
A computer-aided time-resolved electron paramagnetic resonance (EPR) spectrometer implemented under version 2.9 BSD Unix was developed by interfacing a Varian E-9 EPR spectrometer and a Biomation 805 waveform recorder to a PDP-11/23A minicomputer having MINC A/D and D/A capabilities. Special problems with real-time data acquisition in a multiuser, multitasking Unix environment, addressing of computer main memory for the control of hardware devices, and limitation of computer main memory were resolved, and their solutions are presented. The time-resolved EPR system and the data acquisition and analysis programs, written entirely in C, are described. Furthermore, the benefits of utilizing the Unix operating system and the C language are discussed, and system performance is illustrated with time-resolved EPR spectra of the reaction center cation in photosystem 1 of green plant photosynthesis.
Accelerating next generation sequencing data analysis with system level optimizations.
Kathiresan, Nagarajan; Temanni, Ramzi; Almabrazi, Hakeem; Syed, Najeeb; Jithesh, Puthen V; Al-Ali, Rashid
2017-08-22
Next generation sequencing (NGS) data analysis is highly compute intensive. In-memory computing, vectorization, bulk data transfer, CPU frequency scaling are some of the hardware features in the modern computing architectures. To get the best execution time and utilize these hardware features, it is necessary to tune the system level parameters before running the application. We studied the GATK-HaplotypeCaller which is part of common NGS workflows, that consume more than 43% of the total execution time. Multiple GATK 3.x versions were benchmarked and the execution time of HaplotypeCaller was optimized by various system level parameters which included: (i) tuning the parallel garbage collection and kernel shared memory to simulate in-memory computing, (ii) architecture-specific tuning in the PairHMM library for vectorization, (iii) including Java 1.8 features through GATK source code compilation and building a runtime environment for parallel sorting and bulk data transfer (iv) the default 'on-demand' mode of CPU frequency is over-clocked by using 'performance-mode' to accelerate the Java multi-threads. As a result, the HaplotypeCaller execution time was reduced by 82.66% in GATK 3.3 and 42.61% in GATK 3.7. Overall, the execution time of NGS pipeline was reduced to 70.60% and 34.14% for GATK 3.3 and GATK 3.7 respectively.
Ackermann, Hans D.; Pankratz, Leroy W.; Dansereau, Danny A.
1983-01-01
The computer programs published in Open-File Report 82-1065, A comprehensive system for interpreting seismic-refraction arrival-time data using interactive computer methods (Ackermann, Pankratz, and Dansereau, 1982), have been modified to run on a mini-computer. The new version uses approximately 1/10 of the memory of the initial version, is more efficient and gives the same results.
NASA Technical Reports Server (NTRS)
Tuccillo, J. J.
1984-01-01
Numerical Weather Prediction (NWP), for both operational and research purposes, requires only fast computational speed but also large memory. A technique for solving the Primitive Equations for atmospheric motion on the CYBER 205, as implemented in the Mesoscale Atmospheric Simulation System, which is fully vectorized and requires substantially less memory than other techniques such as the Leapfrog or Adams-Bashforth Schemes is discussed. The technique presented uses the Euler-Backard time marching scheme. Also discussed are several techniques for reducing computational time of the model by replacing slow intrinsic routines by faster algorithms which use only hardware vector instructions.
1991-07-31
INTELLIGENT SCSI DMV-719 MAS MIL CONTROLLER DY-4 SYSTEMS BYTE-WIDE MEMORY CARD DMV-536 MEM MIL DY-4 SYSTEMS POWER SUPPLY UNIT DMV-870 PWR MIL P age No. 5 06/10...FORCE COMPUTERS PROCESSOR CPU-386 SERIES SBC COM FORCE COMPUTERS ADVANCED SYSTEM CONTROL ASCU -1/2 SBC COM UNITI FORCE COMPUTERS GRAPHICS CONTROLLER AGC...RECORD VENDOR: JANZ COMPUTER AG DIVISION: VENDOR ADDRESS: Im Doerener Feld 3 D-4790 Paderborn Germany MARKETING: Johannes Kunz TECHNICAL: Arnulf
Programs for Testing Processor-in-Memory Computing Systems
NASA Technical Reports Server (NTRS)
Katz, Daniel S.
2006-01-01
The Multithreaded Microbenchmarks for Processor-In-Memory (PIM) Compilers, Simulators, and Hardware are computer programs arranged in a series for use in testing the performances of PIM computing systems, including compilers, simulators, and hardware. The programs at the beginning of the series test basic functionality; the programs at subsequent positions in the series test increasingly complex functionality. The programs are intended to be used while designing a PIM system, and can be used to verify that compilers, simulators, and hardware work correctly. The programs can also be used to enable designers of these system components to examine tradeoffs in implementation. Finally, these programs can be run on non-PIM hardware (either single-threaded or multithreaded) using the POSIX pthreads standard to verify that the benchmarks themselves operate correctly. [POSIX (Portable Operating System Interface for UNIX) is a set of standards that define how programs and operating systems interact with each other. pthreads is a library of pre-emptive thread routines that comply with one of the POSIX standards.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Arafat, Humayun; Dinan, James; Krishnamoorthy, Sriram
Task parallelism is an attractive approach to automatically load balance the computation in a parallel system and adapt to dynamism exhibited by parallel systems. Exploiting task parallelism through work stealing has been extensively studied in shared and distributed-memory contexts. In this paper, we study the design of a system that uses work stealing for dynamic load balancing of task-parallel programs executed on hybrid distributed-memory CPU-graphics processing unit (GPU) systems in a global-address space framework. We take into account the unique nature of the accelerator model employed by GPUs, the significant performance difference between GPU and CPU execution as a functionmore » of problem size, and the distinct CPU and GPU memory domains. We consider various alternatives in designing a distributed work stealing algorithm for CPU-GPU systems, while taking into account the impact of task distribution and data movement overheads. These strategies are evaluated using microbenchmarks that capture various execution configurations as well as the state-of-the-art CCSD(T) application module from the computational chemistry domain.« less
Vera, Javier
2018-01-01
What is the influence of short-term memory enhancement on the emergence of grammatical agreement systems in multi-agent language games? Agreement systems suppose that at least two words share some features with each other, such as gender, number, or case. Previous work, within the multi-agent language-game framework, has recently proposed models stressing the hypothesis that the emergence of a grammatical agreement system arises from the minimization of semantic ambiguity. On the other hand, neurobiological evidence argues for the hypothesis that language evolution has mainly related to an increasing of short-term memory capacity, which has allowed the online manipulation of words and meanings participating particularly in grammatical agreement systems. Here, the main aim is to propose a multi-agent language game for the emergence of a grammatical agreement system, under measurable long-range relations depending on the short-term memory capacity. Computer simulations, based on a parameter that measures the amount of short-term memory capacity, suggest that agreement marker systems arise in a population of agents equipped at least with a critical short-term memory capacity.
The Computer and Its Functions; How to Communicate with the Computer.
ERIC Educational Resources Information Center
Ward, Peggy M.
A brief discussion of why it is important for students to be familiar with computers and their functions and a list of some practical applications introduce this two-part paper. Focusing on how the computer works, the first part explains the various components of the computer, different kinds of memory storage devices, disk operating systems, and…
Efficient ICCG on a shared memory multiprocessor
NASA Technical Reports Server (NTRS)
Hammond, Steven W.; Schreiber, Robert
1989-01-01
Different approaches are discussed for exploiting parallelism in the ICCG (Incomplete Cholesky Conjugate Gradient) method for solving large sparse symmetric positive definite systems of equations on a shared memory parallel computer. Techniques for efficiently solving triangular systems and computing sparse matrix-vector products are explored. Three methods for scheduling the tasks in solving triangular systems are implemented on the Sequent Balance 21000. Sample problems that are representative of a large class of problems solved using iterative methods are used. We show that a static analysis to determine data dependences in the triangular solve can greatly improve its parallel efficiency. We also show that ignoring symmetry and storing the whole matrix can reduce solution time substantially.
Event parallelism: Distributed memory parallel computing for high energy physics experiments
NASA Astrophysics Data System (ADS)
Nash, Thomas
1989-12-01
This paper describes the present and expected future development of distributed memory parallel computers for high energy physics experiments. It covers the use of event parallel microprocessor farms, particularly at Fermilab, including both ACP multiprocessors and farms of MicroVAXES. These systems have proven very cost effective in the past. A case is made for moving to the more open environment of UNIX and RISC processors. The 2nd Generation ACP Multiprocessor System, which is based on powerful RISC system, is described. Given the promise of still more extraordinary increases in processor performance, a new emphasis on point to point, rather than bussed, communication will be required. Developments in this direction are described.
Address tracing for parallel machines
NASA Technical Reports Server (NTRS)
Stunkel, Craig B.; Janssens, Bob; Fuchs, W. Kent
1991-01-01
Recently implemented parallel system address-tracing methods based on several metrics are surveyed. The issues specific to collection of traces for both shared and distributed memory parallel computers are highlighted. Five general categories of address-trace collection methods are examined: hardware-captured, interrupt-based, simulation-based, altered microcode-based, and instrumented program-based traces. The problems unique to shared memory and distributed memory multiprocessors are examined separately.
Memory Network For Distributed Data Processors
NASA Technical Reports Server (NTRS)
Bolen, David; Jensen, Dean; Millard, ED; Robinson, Dave; Scanlon, George
1992-01-01
Universal Memory Network (UMN) is modular, digital data-communication system enabling computers with differing bus architectures to share 32-bit-wide data between locations up to 3 km apart with less than one millisecond of latency. Makes it possible to design sophisticated real-time and near-real-time data-processing systems without data-transfer "bottlenecks". This enterprise network permits transmission of volume of data equivalent to an encyclopedia each second. Facilities benefiting from Universal Memory Network include telemetry stations, simulation facilities, power-plants, and large laboratories or any facility sharing very large volumes of data. Main hub of UMN is reflection center including smaller hubs called Shared Memory Interfaces.
Methods for operating parallel computing systems employing sequenced communications
Benner, R.E.; Gustafson, J.L.; Montry, G.R.
1999-08-10
A parallel computing system and method are disclosed having improved performance where a program is concurrently run on a plurality of nodes for reducing total processing time, each node having a processor, a memory, and a predetermined number of communication channels connected to the node and independently connected directly to other nodes. The present invention improves performance of the parallel computing system by providing a system which can provide efficient communication between the processors and between the system and input and output devices. A method is also disclosed which can locate defective nodes with the computing system. 15 figs.
Methods for operating parallel computing systems employing sequenced communications
Benner, Robert E.; Gustafson, John L.; Montry, Gary R.
1999-01-01
A parallel computing system and method having improved performance where a program is concurrently run on a plurality of nodes for reducing total processing time, each node having a processor, a memory, and a predetermined number of communication channels connected to the node and independently connected directly to other nodes. The present invention improves performance of performance of the parallel computing system by providing a system which can provide efficient communication between the processors and between the system and input and output devices. A method is also disclosed which can locate defective nodes with the computing system.
Computer Exercises in Systems and Fields Experiments
ERIC Educational Resources Information Center
Bacon, C. M.; McDougal, J. R.
1971-01-01
Laboratory activities give students an opportunity to interact with computers in modes ranging from remote terminal use in laboratory experimentation to the direct hands-on use of a small digital computer with disk memory and on-line plotter, and finally to the use of a large computer under closed-shop operation. (Author/TS)
Computer Sciences and Data Systems, volume 1
NASA Technical Reports Server (NTRS)
1987-01-01
Topics addressed include: software engineering; university grants; institutes; concurrent processing; sparse distributed memory; distributed operating systems; intelligent data management processes; expert system for image analysis; fault tolerant software; and architecture research.
Wolinski, Christophe Czeslaw [Los Alamos, NM; Gokhale, Maya B [Los Alamos, NM; McCabe, Kevin Peter [Los Alamos, NM
2011-01-18
Fabric-based computing systems and methods are disclosed. A fabric-based computing system can include a polymorphous computing fabric that can be customized on a per application basis and a host processor in communication with said polymorphous computing fabric. The polymorphous computing fabric includes a cellular architecture that can be highly parameterized to enable a customized synthesis of fabric instances for a variety of enhanced application performances thereof. A global memory concept can also be included that provides the host processor random access to all variables and instructions associated with the polymorphous computing fabric.
Signal and noise extraction from analog memory elements for neuromorphic computing.
Gong, N; Idé, T; Kim, S; Boybat, I; Sebastian, A; Narayanan, V; Ando, T
2018-05-29
Dense crossbar arrays of non-volatile memory (NVM) can potentially enable massively parallel and highly energy-efficient neuromorphic computing systems. The key requirements for the NVM elements are continuous (analog-like) conductance tuning capability and switching symmetry with acceptable noise levels. However, most NVM devices show non-linear and asymmetric switching behaviors. Such non-linear behaviors render separation of signal and noise extremely difficult with conventional characterization techniques. In this study, we establish a practical methodology based on Gaussian process regression to address this issue. The methodology is agnostic to switching mechanisms and applicable to various NVM devices. We show tradeoff between switching symmetry and signal-to-noise ratio for HfO 2 -based resistive random access memory. Then, we characterize 1000 phase-change memory devices based on Ge 2 Sb 2 Te 5 and separate total variability into device-to-device variability and inherent randomness from individual devices. These results highlight the usefulness of our methodology to realize ideal NVM devices for neuromorphic computing.
How to Program the Principal's Office for the Computer Age.
ERIC Educational Resources Information Center
Frankel, Steven
1983-01-01
Explains why principals' offices need computers and discusses the characteristics of inexpensive personal business computers, including their operating systems, disk drives, memory, and compactness. Reviews software available for word processing, accounting, database management, and communications, and compares the Kaypro II, Morrow, and Osborne I…
1978-05-01
navigation computer (SNC), sepa- rate alterable memory units for the computer, a control /display unit (CDU), a computer control unit (CCU), and a non ...AND SYSTEM Advisory Group for Aerospace Research and Development, Paris (France). Presented at the 15th Meeting of the Guidance and Control Panel of... Group , Redondo Beach, Calif.) American Institute of Aeronautics and Astronautics, Guidance and Control Conference, Key Biscayne, Fla., August 20-22
Combining neural networks and signed particles to simulate quantum systems more efficiently
NASA Astrophysics Data System (ADS)
Sellier, Jean Michel
2018-04-01
Recently a new formulation of quantum mechanics has been suggested which describes systems by means of ensembles of classical particles provided with a sign. This novel approach mainly consists of two steps: the computation of the Wigner kernel, a multi-dimensional function describing the effects of the potential over the system, and the field-less evolution of the particles which eventually create new signed particles in the process. Although this method has proved to be extremely advantageous in terms of computational resources - as a matter of fact it is able to simulate in a time-dependent fashion many-body systems on relatively small machines - the Wigner kernel can represent the bottleneck of simulations of certain systems. Moreover, storing the kernel can be another issue as the amount of memory needed is cursed by the dimensionality of the system. In this work, we introduce a new technique which drastically reduces the computation time and memory requirement to simulate time-dependent quantum systems which is based on the use of an appropriately tailored neural network combined with the signed particle formalism. In particular, the suggested neural network is able to compute efficiently and reliably the Wigner kernel without any training as its entire set of weights and biases is specified by analytical formulas. As a consequence, the amount of memory for quantum simulations radically drops since the kernel does not need to be stored anymore as it is now computed by the neural network itself, only on the cells of the (discretized) phase-space which are occupied by particles. As its is clearly shown in the final part of this paper, not only this novel approach drastically reduces the computational time, it also remains accurate. The author believes this work opens the way towards effective design of quantum devices, with incredible practical implications.
Work stealing for GPU-accelerated parallel programs in a global address space framework
DOE Office of Scientific and Technical Information (OSTI.GOV)
Arafat, Humayun; Dinan, James; Krishnamoorthy, Sriram
Task parallelism is an attractive approach to automatically load balance the computation in a parallel system and adapt to dynamism exhibited by parallel systems. Exploiting task parallelism through work stealing has been extensively studied in shared and distributed-memory contexts. In this paper, we study the design of a system that uses work stealing for dynamic load balancing of task-parallel programs executed on hybrid distributed-memory CPU-graphics processing unit (GPU) systems in a global-address space framework. We take into account the unique nature of the accelerator model employed by GPUs, the significant performance difference between GPU and CPU execution as a functionmore » of problem size, and the distinct CPU and GPU memory domains. We consider various alternatives in designing a distributed work stealing algorithm for CPU-GPU systems, while taking into account the impact of task distribution and data movement overheads. These strategies are evaluated using microbenchmarks that capture various execution configurations as well as the state-of-the-art CCSD(T) application module from the computational chemistry domain« less
Parallel and distributed computation for fault-tolerant object recognition
NASA Technical Reports Server (NTRS)
Wechsler, Harry
1988-01-01
The distributed associative memory (DAM) model is suggested for distributed and fault-tolerant computation as it relates to object recognition tasks. The fault-tolerance is with respect to geometrical distortions (scale and rotation), noisy inputs, occulsion/overlap, and memory faults. An experimental system was developed for fault-tolerant structure recognition which shows the feasibility of such an approach. The approach is futher extended to the problem of multisensory data integration and applied successfully to the recognition of colored polyhedral objects.
Using a Cray Y-MP as an array processor for a RISC Workstation
NASA Technical Reports Server (NTRS)
Lamaster, Hugh; Rogallo, Sarah J.
1992-01-01
As microprocessors increase in power, the economics of centralized computing has changed dramatically. At the beginning of the 1980's, mainframes and super computers were often considered to be cost-effective machines for scalar computing. Today, microprocessor-based RISC (reduced-instruction-set computer) systems have displaced many uses of mainframes and supercomputers. Supercomputers are still cost competitive when processing jobs that require both large memory size and high memory bandwidth. One such application is array processing. Certain numerical operations are appropriate to use in a Remote Procedure Call (RPC)-based environment. Matrix multiplication is an example of an operation that can have a sufficient number of arithmetic operations to amortize the cost of an RPC call. An experiment which demonstrates that matrix multiplication can be executed remotely on a large system to speed the execution over that experienced on a workstation is described.
The force on the flex: Global parallelism and portability
NASA Technical Reports Server (NTRS)
Jordan, H. F.
1986-01-01
A parallel programming methodology, called the force, supports the construction of programs to be executed in parallel by an unspecified, but potentially large, number of processes. The methodology was originally developed on a pipelined, shared memory multiprocessor, the Denelcor HEP, and embodies the primitive operations of the force in a set of macros which expand into multiprocessor Fortran code. A small set of primitives is sufficient to write large parallel programs, and the system has been used to produce 10,000 line programs in computational fluid dynamics. The level of complexity of the force primitives is intermediate. It is high enough to mask detailed architectural differences between multiprocessors but low enough to give the user control over performance. The system is being ported to a medium scale multiprocessor, the Flex/32, which is a 20 processor system with a mixture of shared and local memory. Memory organization and the type of processor synchronization supported by the hardware on the two machines lead to some differences in efficient implementations of the force primitives, but the user interface remains the same. An initial implementation was done by retargeting the macros to Flexible Computer Corporation's ConCurrent C language. Subsequently, the macros were caused to directly produce the system calls which form the basis for ConCurrent C. The implementation of the Fortran based system is in step with Flexible Computer Corporations's implementation of a Fortran system in the parallel environment.
The Aging Navigational System.
Lester, Adam W; Moffat, Scott D; Wiener, Jan M; Barnes, Carol A; Wolbers, Thomas
2017-08-30
The discovery of neuronal systems dedicated to computing spatial information, composed of functionally distinct cell types such as place and grid cells, combined with an extensive body of human-based behavioral and neuroimaging research has provided us with a detailed understanding of the brain's navigation circuit. In this review, we discuss emerging evidence from rodents, non-human primates, and humans that demonstrates how cognitive aging affects the navigational computations supported by these systems. Critically, we show 1) that navigational deficits cannot solely be explained by general deficits in learning and memory, 2) that there is no uniform decline across different navigational computations, and 3) that navigational deficits might be sensitive markers for impending pathological decline. Following an introduction to the mechanisms underlying spatial navigation and how they relate to general processes of learning and memory, the review discusses how aging affects the perception and integration of spatial information, the creation and storage of memory traces for spatial information, and the use of spatial information during navigational behavior. The closing section highlights the clinical potential of behavioral and neural markers of spatial navigation, with a particular emphasis on neurodegenerative disorders. Copyright © 2017 Elsevier Inc. All rights reserved.
Brain-Based Devices for Neuromorphic Computer Systems
2013-07-01
and Deco, G. (2012). Effective Visual Working Memory Capacity: An Emergent Effect from the Neural Dynamics in an Attractor Network. PLoS ONE 7, e42719...models, apply them to a recognition task, and to demonstrate a working memory . In the course of this work a new analytical method for spiking data was...4 3.4 Spiking Neural Model Simulation of Working Memory ..................................... 5 3.5 A Novel Method for Analysis
An adaptive replacement algorithm for paged-memory computer systems.
NASA Technical Reports Server (NTRS)
Thorington, J. M., Jr.; Irwin, J. D.
1972-01-01
A general class of adaptive replacement schemes for use in paged memories is developed. One such algorithm, called SIM, is simulated using a probability model that generates memory traces, and the results of the simulation of this adaptive scheme are compared with those obtained using the best nonlookahead algorithms. A technique for implementing this type of adaptive replacement algorithm with state of the art digital hardware is also presented.
Unconditional room-temperature quantum memory
NASA Astrophysics Data System (ADS)
Hosseini, M.; Campbell, G.; Sparkes, B. M.; Lam, P. K.; Buchler, B. C.
2011-10-01
Just as classical information systems require buffers and memory, the same is true for quantum information systems. The potential that optical quantum information processing holds for revolutionizing computation and communication is therefore driving significant research into developing optical quantum memory. A practical optical quantum memory must be able to store and recall quantum states on demand with high efficiency and low noise. Ideally, the platform for the memory would also be simple and inexpensive. Here, we present a complete tomographic reconstruction of quantum states that have been stored in the ground states of rubidium in a vapour cell operating at around 80°C. Without conditional measurements, we show recall fidelity up to 98% for coherent pulses containing around one photon. To unambiguously verify that our memory beats the quantum no-cloning limit we employ state-independent verification using conditional variance and signal-transfer coefficients.
Exploiting short-term memory in soft body dynamics as a computational resource.
Nakajima, K; Li, T; Hauser, H; Pfeifer, R
2014-11-06
Soft materials are not only highly deformable, but they also possess rich and diverse body dynamics. Soft body dynamics exhibit a variety of properties, including nonlinearity, elasticity and potentially infinitely many degrees of freedom. Here, we demonstrate that such soft body dynamics can be employed to conduct certain types of computation. Using body dynamics generated from a soft silicone arm, we show that they can be exploited to emulate functions that require memory and to embed robust closed-loop control into the arm. Our results suggest that soft body dynamics have a short-term memory and can serve as a computational resource. This finding paves the way towards exploiting passive body dynamics for control of a large class of underactuated systems. © 2014 The Author(s) Published by the Royal Society. All rights reserved.
Performance Analysis of Multilevel Parallel Applications on Shared Memory Architectures
NASA Technical Reports Server (NTRS)
Biegel, Bryan A. (Technical Monitor); Jost, G.; Jin, H.; Labarta J.; Gimenez, J.; Caubet, J.
2003-01-01
Parallel programming paradigms include process level parallelism, thread level parallelization, and multilevel parallelism. This viewgraph presentation describes a detailed performance analysis of these paradigms for Shared Memory Architecture (SMA). This analysis uses the Paraver Performance Analysis System. The presentation includes diagrams of a flow of useful computations.
ERIC Educational Resources Information Center
Mayer, Richard E.; Moreno, Roxana
1998-01-01
Multimedia learners (n=146 college students) were able to integrate words and computer-presented pictures more easily when the words were presented aurally rather than visually. This split-attention effect is consistent with a dual-processing model of working memory. (SLD)
Logical Access Control Mechanisms in Computer Systems.
ERIC Educational Resources Information Center
Hsiao, David K.
The subject of access control mechanisms in computer systems is concerned with effective means to protect the anonymity of private information on the one hand, and to regulate the access to shareable information on the other hand. Effective means for access control may be considered on three levels: memory, process and logical. This report is a…
Long-range interactions and parallel scalability in molecular simulations
NASA Astrophysics Data System (ADS)
Patra, Michael; Hyvönen, Marja T.; Falck, Emma; Sabouri-Ghomi, Mohsen; Vattulainen, Ilpo; Karttunen, Mikko
2007-01-01
Typical biomolecular systems such as cellular membranes, DNA, and protein complexes are highly charged. Thus, efficient and accurate treatment of electrostatic interactions is of great importance in computational modeling of such systems. We have employed the GROMACS simulation package to perform extensive benchmarking of different commonly used electrostatic schemes on a range of computer architectures (Pentium-4, IBM Power 4, and Apple/IBM G5) for single processor and parallel performance up to 8 nodes—we have also tested the scalability on four different networks, namely Infiniband, GigaBit Ethernet, Fast Ethernet, and nearly uniform memory architecture, i.e. communication between CPUs is possible by directly reading from or writing to other CPUs' local memory. It turns out that the particle-mesh Ewald method (PME) performs surprisingly well and offers competitive performance unless parallel runs on PC hardware with older network infrastructure are needed. Lipid bilayers of sizes 128, 512 and 2048 lipid molecules were used as the test systems representing typical cases encountered in biomolecular simulations. Our results enable an accurate prediction of computational speed on most current computing systems, both for serial and parallel runs. These results should be helpful in, for example, choosing the most suitable configuration for a small departmental computer cluster.
JPRS Report Science & Technology Europe.
1992-10-22
Potatoes for More Sugar [Frankfurt/Main FRANKFURTER ALLEGEMEINE, 12 Aug 92] 26 COMPUTERS French Devise Operating System for Parallel, Failure...Tolerant and Real-Time Systems [Munich COMPUTER WOCHE, 5 Jun 92] 27 Germany Markets External Mass Memory for IBM-Compatible Parallel Interfaces...Infrared Detection System [Thierry Lucas; Paris L’USINE NOUVELLE TECHNOLOGIES, 16 Jul 92] 28 Streamlined ACE Fighter Airplane Approved [Paris AFP
Contention Modeling for Multithreaded Distributed Shared Memory Machines: The Cray XMT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Secchi, Simone; Tumeo, Antonino; Villa, Oreste
Distributed Shared Memory (DSM) machines are a wide class of multi-processor computing systems where a large virtually-shared address space is mapped on a network of physically distributed memories. High memory latency and network contention are two of the main factors that limit performance scaling of such architectures. Modern high-performance computing DSM systems have evolved toward exploitation of massive hardware multi-threading and fine-grained memory hashing to tolerate irregular latencies, avoid network hot-spots and enable high scaling. In order to model the performance of such large-scale machines, parallel simulation has been proved to be a promising approach to achieve good accuracy inmore » reasonable times. One of the most critical factors in solving the simulation speed-accuracy trade-off is network modeling. The Cray XMT is a massively multi-threaded supercomputing architecture that belongs to the DSM class, since it implements a globally-shared address space abstraction on top of a physically distributed memory substrate. In this paper, we discuss the development of a contention-aware network model intended to be integrated in a full-system XMT simulator. We start by measuring the effects of network contention in a 128-processor XMT machine and then investigate the trade-off that exists between simulation accuracy and speed, by comparing three network models which operate at different levels of accuracy. The comparison and model validation is performed by executing a string-matching algorithm on the full-system simulator and on the XMT, using three datasets that generate noticeably different contention patterns.« less
NASA Technical Reports Server (NTRS)
Hendry, David F. (Inventor)
1993-01-01
In a data system having a memory, plural input/output (I/O) devices and a bus connecting each of the I/O devices to the memory, a direct memory access (DMA) controller regulating access of each of the I/O devices to the bus, including a priority register storing priorities of bus access requests from the I/O devices, an interrupt register storing bus access requests of the I/O devices, a resolver for selecting one of the I/O devices to have access to the bus, a pointer register storing addresses of locations in the memory for communication with the one I/O device via the bus, a sequence register storing an address of a location in the memory containing a channel program instruction which is to be executed next, an ALU for incrementing and decrementing addresses stored in the pointer register, computing the next address to be stored in the sequence register, computing an initial contents of each of the register. The memory contains a sequence of channel program instructions defining a set up operation wherein the contents of each of the registers in the channel register is initialized in accordance with the initial contents computed by the ALU and an access operation wherein data is transferred on the bus between a location in the memory whose address is currently stored in the pointer register and the one I/O device enabled by the resolver.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jared Stimson
FORENSIC ANALYSIS OF WINDOW’S® VIRTUAL MEMORY INCORPORATING THE SYSTEM’S PAGEFILE Computer Forensics is concerned with the use of computer investigation and analysis techniques in order to collect evidence suitable for presentation in court. The examination of volatile memory is a relatively new but important area in computer forensics. More recently criminals are becoming more forensically aware and are now able to compromise computers without accessing the hard disk of the target computer. This means that traditional incident response practice of pulling the plug will destroy the only evidence of the crime. While some techniques are available for acquiring the contentsmore » of main memory, few exist which can analyze these data in a meaningful way. One reason for this is how memory is managed by the operating system. Data belonging to one process can be distributed arbitrarily across physical memory or the hard disk, making it very difficult to recover useful information. This report will focus on how these disparate sources of information can be combined to give a single, contiguous address space for each process. Using address translation a tool is developed to reconstruct the virtual address space of a process by combining a physical memory dump with the page-file on the hard disk. COUNTERINTELLIGENCE THROUGH MALICIOUS CODE ANALYSIS As computer network technology continues to grow so does the reliance on this technology for everyday business functionality. To appeal to customers and employees alike, businesses are seeking an increased online prescience, and to increase productivity the same businesses are computerizing their day-to-day operations. The combination of a publicly accessible interface to the businesses network, and the increase in the amount of intellectual property present on these networks presents serious risks. All of this intellectual property now faces constant attacks from a wide variety of malicious software that is intended to uncover company and government secrets. Every year billions of dollars are invested in preventing and recovering from the introduction of malicious code into a system. However, there is little research being done on leveraging these attacks for counterintelligence opportunities. With the ever-increasing number of vulnerable computers on the Internet the task of attributing these attacks to an organization or a single person is a daunting one. This thesis will demonstrate the idea of intentionally running a piece of malicious code in a secure environment in order to gain counterintelligence on an attacker.« less
Computer vision camera with embedded FPGA processing
NASA Astrophysics Data System (ADS)
Lecerf, Antoine; Ouellet, Denis; Arias-Estrada, Miguel
2000-03-01
Traditional computer vision is based on a camera-computer system in which the image understanding algorithms are embedded in the computer. To circumvent the computational load of vision algorithms, low-level processing and imaging hardware can be integrated in a single compact module where a dedicated architecture is implemented. This paper presents a Computer Vision Camera based on an open architecture implemented in an FPGA. The system is targeted to real-time computer vision tasks where low level processing and feature extraction tasks can be implemented in the FPGA device. The camera integrates a CMOS image sensor, an FPGA device, two memory banks, and an embedded PC for communication and control tasks. The FPGA device is a medium size one equivalent to 25,000 logic gates. The device is connected to two high speed memory banks, an IS interface, and an imager interface. The camera can be accessed for architecture programming, data transfer, and control through an Ethernet link from a remote computer. A hardware architecture can be defined in a Hardware Description Language (like VHDL), simulated and synthesized into digital structures that can be programmed into the FPGA and tested on the camera. The architecture of a classical multi-scale edge detection algorithm based on a Laplacian of Gaussian convolution has been developed to show the capabilities of the system.
A Malicious Pattern Detection Engine for Embedded Security Systems in the Internet of Things
Oh, Doohwan; Kim, Deokho; Ro, Won Woo
2014-01-01
With the emergence of the Internet of Things (IoT), a large number of physical objects in daily life have been aggressively connected to the Internet. As the number of objects connected to networks increases, the security systems face a critical challenge due to the global connectivity and accessibility of the IoT. However, it is difficult to adapt traditional security systems to the objects in the IoT, because of their limited computing power and memory size. In light of this, we present a lightweight security system that uses a novel malicious pattern-matching engine. We limit the memory usage of the proposed system in order to make it work on resource-constrained devices. To mitigate performance degradation due to limitations of computation power and memory, we propose two novel techniques, auxiliary shifting and early decision. Through both techniques, we can efficiently reduce the number of matching operations on resource-constrained systems. Experiments and performance analyses show that our proposed system achieves a maximum speedup of 2.14 with an IoT object and provides scalable performance for a large number of patterns. PMID:25521382
Holographic memory system based on projection recording of computer-generated 1D Fourier holograms.
Betin, A Yu; Bobrinev, V I; Donchenko, S S; Odinokov, S B; Evtikhiev, N N; Starikov, R S; Starikov, S N; Zlokazov, E Yu
2014-10-01
Utilization of computer generation of holographic structures significantly simplifies the optical scheme that is used to record the microholograms in a holographic memory record system. Also digital holographic synthesis allows to account the nonlinear errors of the record system to improve the microholograms quality. The multiplexed record of holograms is a widespread technique to increase the data record density. In this article we represent the holographic memory system based on digital synthesis of amplitude one-dimensional (1D) Fourier transform holograms and the multiplexed record of these holograms onto the holographic carrier using optical projection scheme. 1D Fourier transform holograms are very sensitive to orientation of the anamorphic optical element (cylindrical lens) that is required for encoded data object reconstruction. The multiplex record of several holograms with different orientation in an optical projection scheme allowed reconstruction of the data object from each hologram by rotating the cylindrical lens on the corresponding angle. Also, we discuss two optical schemes for the recorded holograms readout: a full-page readout system and line-by-line readout system. We consider the benefits of both systems and present the results of experimental modeling of 1D Fourier holograms nonmultiplex and multiplex record and reconstruction.
NASA Astrophysics Data System (ADS)
Saputro, Dewi Retno Sari; Widyaningsih, Purnami
2017-08-01
In general, the parameter estimation of GWOLR model uses maximum likelihood method, but it constructs a system of nonlinear equations, making it difficult to find the solution. Therefore, an approximate solution is needed. There are two popular numerical methods: the methods of Newton and Quasi-Newton (QN). Newton's method requires large-scale time in executing the computation program since it contains Jacobian matrix (derivative). QN method overcomes the drawback of Newton's method by substituting derivative computation into a function of direct computation. The QN method uses Hessian matrix approach which contains Davidon-Fletcher-Powell (DFP) formula. The Broyden-Fletcher-Goldfarb-Shanno (BFGS) method is categorized as the QN method which has the DFP formula attribute of having positive definite Hessian matrix. The BFGS method requires large memory in executing the program so another algorithm to decrease memory usage is needed, namely Low Memory BFGS (LBFGS). The purpose of this research is to compute the efficiency of the LBFGS method in the iterative and recursive computation of Hessian matrix and its inverse for the GWOLR parameter estimation. In reference to the research findings, we found out that the BFGS and LBFGS methods have arithmetic operation schemes, including O(n2) and O(nm).
Moradi, Saber; Qiao, Ning; Stefanini, Fabio; Indiveri, Giacomo
2018-02-01
Neuromorphic computing systems comprise networks of neurons that use asynchronous events for both computation and communication. This type of representation offers several advantages in terms of bandwidth and power consumption in neuromorphic electronic systems. However, managing the traffic of asynchronous events in large scale systems is a daunting task, both in terms of circuit complexity and memory requirements. Here, we present a novel routing methodology that employs both hierarchical and mesh routing strategies and combines heterogeneous memory structures for minimizing both memory requirements and latency, while maximizing programming flexibility to support a wide range of event-based neural network architectures, through parameter configuration. We validated the proposed scheme in a prototype multicore neuromorphic processor chip that employs hybrid analog/digital circuits for emulating synapse and neuron dynamics together with asynchronous digital circuits for managing the address-event traffic. We present a theoretical analysis of the proposed connectivity scheme, describe the methods and circuits used to implement such scheme, and characterize the prototype chip. Finally, we demonstrate the use of the neuromorphic processor with a convolutional neural network for the real-time classification of visual symbols being flashed to a dynamic vision sensor (DVS) at high speed.
Exascale Hardware Architectures Working Group
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hemmert, S; Ang, J; Chiang, P
2011-03-15
The ASC Exascale Hardware Architecture working group is challenged to provide input on the following areas impacting the future use and usability of potential exascale computer systems: processor, memory, and interconnect architectures, as well as the power and resilience of these systems. Going forward, there are many challenging issues that will need to be addressed. First, power constraints in processor technologies will lead to steady increases in parallelism within a socket. Additionally, all cores may not be fully independent nor fully general purpose. Second, there is a clear trend toward less balanced machines, in terms of compute capability compared tomore » memory and interconnect performance. In order to mitigate the memory issues, memory technologies will introduce 3D stacking, eventually moving on-socket and likely on-die, providing greatly increased bandwidth but unfortunately also likely providing smaller memory capacity per core. Off-socket memory, possibly in the form of non-volatile memory, will create a complex memory hierarchy. Third, communication energy will dominate the energy required to compute, such that interconnect power and bandwidth will have a significant impact. All of the above changes are driven by the need for greatly increased energy efficiency, as current technology will prove unsuitable for exascale, due to unsustainable power requirements of such a system. These changes will have the most significant impact on programming models and algorithms, but they will be felt across all layers of the machine. There is clear need to engage all ASC working groups in planning for how to deal with technological changes of this magnitude. The primary function of the Hardware Architecture Working Group is to facilitate codesign with hardware vendors to ensure future exascale platforms are capable of efficiently supporting the ASC applications, which in turn need to meet the mission needs of the NNSA Stockpile Stewardship Program. This issue is relatively immediate, as there is only a small window of opportunity to influence hardware design for 2018 machines. Given the short timeline a firm co-design methodology with vendors is of prime importance.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sofronov, I.D.; Voronin, B.L.; Butnev, O.I.
1997-12-31
The aim of the work performed is to develop a 3D parallel program for numerical calculation of gas dynamics problem with heat conductivity on distributed memory computational systems (CS), satisfying the condition of numerical result independence from the number of processors involved. Two basically different approaches to the structure of massive parallel computations have been developed. The first approach uses the 3D data matrix decomposition reconstructed at temporal cycle and is a development of parallelization algorithms for multiprocessor CS with shareable memory. The second approach is based on using a 3D data matrix decomposition not reconstructed during a temporal cycle.more » The program was developed on 8-processor CS MP-3 made in VNIIEF and was adapted to a massive parallel CS Meiko-2 in LLNL by joint efforts of VNIIEF and LLNL staffs. A large number of numerical experiments has been carried out with different number of processors up to 256 and the efficiency of parallelization has been evaluated in dependence on processor number and their parameters.« less
Multiprocessor shared-memory information exchange
DOE Office of Scientific and Technical Information (OSTI.GOV)
Santoline, L.L.; Bowers, M.D.; Crew, A.W.
1989-02-01
In distributed microprocessor-based instrumentation and control systems, the inter-and intra-subsystem communication requirements ultimately form the basis for the overall system architecture. This paper describes a software protocol which addresses the intra-subsystem communications problem. Specifically the protocol allows for multiple processors to exchange information via a shared-memory interface. The authors primary goal is to provide a reliable means for information to be exchanged between central application processor boards (masters) and dedicated function processor boards (slaves) in a single computer chassis. The resultant Multiprocessor Shared-Memory Information Exchange (MSMIE) protocol, a standard master-slave shared-memory interface suitable for use in nuclear safety systems, ismore » designed to pass unidirectional buffers of information between the processors while providing a minimum, deterministic cycle time for this data exchange.« less
Injecting Artificial Memory Errors Into a Running Computer Program
NASA Technical Reports Server (NTRS)
Bornstein, Benjamin J.; Granat, Robert A.; Wagstaff, Kiri L.
2008-01-01
Single-event upsets (SEUs) or bitflips are computer memory errors caused by radiation. BITFLIPS (Basic Instrumentation Tool for Fault Localized Injection of Probabilistic SEUs) is a computer program that deliberately injects SEUs into another computer program, while the latter is running, for the purpose of evaluating the fault tolerance of that program. BITFLIPS was written as a plug-in extension of the open-source Valgrind debugging and profiling software. BITFLIPS can inject SEUs into any program that can be run on the Linux operating system, without needing to modify the program s source code. Further, if access to the original program source code is available, BITFLIPS offers fine-grained control over exactly when and which areas of memory (as specified via program variables) will be subjected to SEUs. The rate of injection of SEUs is controlled by specifying either a fault probability or a fault rate based on memory size and radiation exposure time, in units of SEUs per byte per second. BITFLIPS can also log each SEU that it injects and, if program source code is available, report the magnitude of effect of the SEU on a floating-point value or other program variable.
Expert system shell to reason on large amounts of data
NASA Technical Reports Server (NTRS)
Giuffrida, Gionanni
1994-01-01
The current data base management systems (DBMS's) do not provide a sophisticated environment to develop rule based expert systems applications. Some of the new DBMS's come with some sort of rule mechanism; these are active and deductive database systems. However, both of these are not featured enough to support full implementation based on rules. On the other hand, current expert system shells do not provide any link with external databases. That is, all the data are kept in the system working memory. Such working memory is maintained in main memory. For some applications the reduced size of the available working memory could represent a constraint for the development. Typically these are applications which require reasoning on huge amounts of data. All these data do not fit into the computer main memory. Moreover, in some cases these data can be already available in some database systems and continuously updated while the expert system is running. This paper proposes an architecture which employs knowledge discovering techniques to reduce the amount of data to be stored in the main memory; in this architecture a standard DBMS is coupled with a rule-based language. The data are stored into the DBMS. An interface between the two systems is responsible for inducing knowledge from the set of relations. Such induced knowledge is then transferred to the rule-based language working memory.
Using Rollback Avoidance to Mitigate Failures in Next-Generation Extreme-Scale Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Levy, Scott N.
2016-05-01
High-performance computing (HPC) systems enable scientists to numerically model complex phenomena in many important physical systems. The next major milestone in the development of HPC systems is the construction of the rst supercomputer capable executing more than an exa op, 10 18 oating point operations per second. On systems of this scale, failures will occur much more frequently than on current systems. As a result, resilience is a key obstacle to building next-generation extremescale systems. Coordinated checkpointing is currently the most widely-used mechanism for handling failures on HPC systems. Although coordinated checkpointing remains e ective on current systems, increasing themore » scale of today's systems to build next-generation systems will increase the cost of fault tolerance as more and more time is taken away from the application to protect against or recover from failure. Rollback avoidance techniques seek to mitigate the cost of checkpoint/restart by allowing an application to continue its execution rather than rolling back to an earlier checkpoint when failures occur. These techniqes include failure prediction and preventive migration, replicated computation, fault-tolerant algorithms, and softwarebased memory fault correction. In this thesis, we examine how rollback avoidance techniques can be used to address failures on extreme-scale systems. Using a combination of analytic modeling and simulation, we evaluate the potential impact of rollback avoidance on these systems. We then present a novel rollback avoidance technique that exploits similarities in application memory. Finally, we examine the feasibility of using this technique to protect against memory faults in kernel memory.« less
Challenges of Future High-End Computing
NASA Technical Reports Server (NTRS)
Bailey, David; Kutler, Paul (Technical Monitor)
1998-01-01
The next major milestone in high performance computing is a sustained rate of one Pflop/s (also written one petaflops, or 10(circumflex)15 floating-point operations per second). In addition to prodigiously high computational performance, such systems must of necessity feature very large main memories, as well as comparably high I/O bandwidth and huge mass storage facilities. The current consensus of scientists who have studied these issues is that "affordable" petaflops systems may be feasible by the year 2010, assuming that certain key technologies continue to progress at current rates. One important question is whether applications can be structured to perform efficiently on such systems, which are expected to incorporate many thousands of processors and deeply hierarchical memory systems. To answer these questions, advanced performance modeling techniques, including simulation of future architectures and applications, may be required. It may also be necessary to formulate "latency tolerant algorithms" and other completely new algorithmic approaches for certain applications. This talk will give an overview of these challenges.
Helicopter In-Flight Monitoring System Second Generation (HIMS II).
1983-08-01
acquisition cycle. B. Computer Chassis CPU (DEC LSI-II/2) -- Executes instructions contained in the memory. 32K memory (DEC MSVII-DD) --Contains program...when the operator executes command #2, 3, or 5 (display data). New cartridges can be inserted as required for truly unlimited, continuous data...is called bootstrapping. The software, which is stored on a tape cartridge, is loaded into memory by execution of a small program stored in read-only
Efficient frequent pattern mining algorithm based on node sets in cloud computing environment
NASA Astrophysics Data System (ADS)
Billa, V. N. Vinay Kumar; Lakshmanna, K.; Rajesh, K.; Reddy, M. Praveen Kumar; Nagaraja, G.; Sudheer, K.
2017-11-01
The ultimate goal of Data Mining is to determine the hidden information which is useful in making decisions using the large databases collected by an organization. This Data Mining involves many tasks that are to be performed during the process. Mining frequent itemsets is the one of the most important tasks in case of transactional databases. These transactional databases contain the data in very large scale where the mining of these databases involves the consumption of physical memory and time in proportion to the size of the database. A frequent pattern mining algorithm is said to be efficient only if it consumes less memory and time to mine the frequent itemsets from the given large database. Having these points in mind in this thesis we proposed a system which mines frequent itemsets in an optimized way in terms of memory and time by using cloud computing as an important factor to make the process parallel and the application is provided as a service. A complete framework which uses a proven efficient algorithm called FIN algorithm. FIN algorithm works on Nodesets and POC (pre-order coding) tree. In order to evaluate the performance of the system we conduct the experiments to compare the efficiency of the same algorithm applied in a standalone manner and in cloud computing environment on a real time data set which is traffic accidents data set. The results show that the memory consumption and execution time taken for the process in the proposed system is much lesser than those of standalone system.
Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures
2017-10-04
Report: Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures The views, opinions and/or findings contained in this...Chapel Hill Title: Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures Report Term: 0-Other Email: dm...algorithms for scientific and geometric computing by exploiting the power and performance efficiency of heterogeneous shared memory architectures . These
Internode data communications in a parallel computer
Archer, Charles J.; Blocksome, Michael A.; Miller, Douglas R.; Parker, Jeffrey J.; Ratterman, Joseph D.; Smith, Brian E.
2013-09-03
Internode data communications in a parallel computer that includes compute nodes that each include main memory and a messaging unit, the messaging unit including computer memory and coupling compute nodes for data communications, in which, for each compute node at compute node boot time: a messaging unit allocates, in the messaging unit's computer memory, a predefined number of message buffers, each message buffer associated with a process to be initialized on the compute node; receives, prior to initialization of a particular process on the compute node, a data communications message intended for the particular process; and stores the data communications message in the message buffer associated with the particular process. Upon initialization of the particular process, the process establishes a messaging buffer in main memory of the compute node and copies the data communications message from the message buffer of the messaging unit into the message buffer of main memory.
Internode data communications in a parallel computer
Archer, Charles J; Blocksome, Michael A; Miller, Douglas R; Parker, Jeffrey J; Ratterman, Joseph D; Smith, Brian E
2014-02-11
Internode data communications in a parallel computer that includes compute nodes that each include main memory and a messaging unit, the messaging unit including computer memory and coupling compute nodes for data communications, in which, for each compute node at compute node boot time: a messaging unit allocates, in the messaging unit's computer memory, a predefined number of message buffers, each message buffer associated with a process to be initialized on the compute node; receives, prior to initialization of a particular process on the compute node, a data communications message intended for the particular process; and stores the data communications message in the message buffer associated with the particular process. Upon initialization of the particular process, the process establishes a messaging buffer in main memory of the compute node and copies the data communications message from the message buffer of the messaging unit into the message buffer of main memory.
System, methods and apparatus for program optimization for multi-threaded processor architectures
Bastoul, Cedric; Lethin, Richard A; Leung, Allen K; Meister, Benoit J; Szilagyi, Peter; Vasilache, Nicolas T; Wohlford, David E
2015-01-06
Methods, apparatus and computer software product for source code optimization are provided. In an exemplary embodiment, a first custom computing apparatus is used to optimize the execution of source code on a second computing apparatus. In this embodiment, the first custom computing apparatus contains a memory, a storage medium and at least one processor with at least one multi-stage execution unit. The second computing apparatus contains at least two multi-stage execution units that allow for parallel execution of tasks. The first custom computing apparatus optimizes the code for parallelism, locality of operations and contiguity of memory accesses on the second computing apparatus. This Abstract is provided for the sole purpose of complying with the Abstract requirement rules. This Abstract is submitted with the explicit understanding that it will not be used to interpret or to limit the scope or the meaning of the claims.
Bermuda Triangle: a subsystem of the 168/E interfacing scheme used by Group B at SLAC
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oxoby, G.J.; Levinson, L.J.; Trang, Q.H.
1979-12-01
The Bermuda Triangle system is a method of interfacing several 168/E microprocessors to a central system for control of the processors and overlaying their memories. The system is a three-way interface with I/O ports to a large buffer memory, a PDP11 Unibus and a bus to the 168/E processors. Data may be transferred bidirectionally between any two ports. Two Bermuda Triangles are used, one for the program memory and one for the data memory. The program buffer memory stores the overlay programs for the 168/E, and the data buffer memory, the incoming raw data, the data portion of the overlays,more » and the outgoing processed events. This buffering is necessary since the memories of 168/E microprocessors are small compared to the main program and the amount of data being processed. The link to the computer facility is via a Unibus to IBM channel interface. A PDP11/04 controls the data flow. 7 figures, 4 tables. (RWR)« less
Vector computer memory bank contention
NASA Technical Reports Server (NTRS)
Bailey, D. H.
1985-01-01
A number of vector supercomputers feature very large memories. Unfortunately the large capacity memory chips that are used in these computers are much slower than the fast central processing unit (CPU) circuitry. As a result, memory bank reservation times (in CPU ticks) are much longer than on previous generations of computers. A consequence of these long reservation times is that memory bank contention is sharply increased, resulting in significantly lowered performance rates. The phenomenon of memory bank contention in vector computers is analyzed using both a Markov chain model and a Monte Carlo simulation program. The results of this analysis indicate that future generations of supercomputers must either employ much faster memory chips or else feature very large numbers of independent memory banks.
Vector computer memory bank contention
NASA Technical Reports Server (NTRS)
Bailey, David H.
1987-01-01
A number of vector supercomputers feature very large memories. Unfortunately the large capacity memory chips that are used in these computers are much slower than the fast central processing unit (CPU) circuitry. As a result, memory bank reservation times (in CPU ticks) are much longer than on previous generations of computers. A consequence of these long reservation times is that memory bank contention is sharply increased, resulting in significantly lowered performance rates. The phenomenon of memory bank contention in vector computers is analyzed using both a Markov chain model and a Monte Carlo simulation program. The results of this analysis indicate that future generations of supercomputers must either employ much faster memory chips or else feature very large numbers of independent memory banks.
Boyle, Peter A.; Christ, Norman H.; Gara, Alan; Mawhinney, Robert D.; Ohmacht, Martin; Sugavanam, Krishnan
2012-12-11
A prefetch system improves a performance of a parallel computing system. The parallel computing system includes a plurality of computing nodes. A computing node includes at least one processor and at least one memory device. The prefetch system includes at least one stream prefetch engine and at least one list prefetch engine. The prefetch system operates those engines simultaneously. After the at least one processor issues a command, the prefetch system passes the command to a stream prefetch engine and a list prefetch engine. The prefetch system operates the stream prefetch engine and the list prefetch engine to prefetch data to be needed in subsequent clock cycles in the processor in response to the passed command.
Enhancing an appointment diary on a pocket computer for use by people after brain injury.
Wright, P; Rogers, N; Hall, C; Wilson, B; Evans, J; Emslie, H
2001-12-01
People with memory loss resulting from brain injury benefit from purpose-designed memory aids such as appointment diaries on pocket computers. The present study explores the effects of extending the range of memory aids and including games. For 2 months, 12 people who had sustained brain injury were loaned a pocket computer containing three purpose-designed memory aids: diary, notebook and to-do list. A month later they were given another computer with the same memory aids but a different method of text entry (physical keyboard or touch-screen keyboard). Machine order was counterbalanced across participants. Assessment was by interviews during the loan periods, rating scales, performance tests and computer log files. All participants could use the memory aids and ten people (83%) found them very useful. Correlations among the three memory aids were not significant, suggesting individual variation in how they were used. Games did not increase use of the memory aids, nor did loan of the preferred pocket computer (with physical keyboard). Significantly more diary entries were made by people who had previously used other memory aids, suggesting that a better understanding of how to use a range of memory aids could benefit some people with brain injury.
Programmable fuzzy associative memory processor
NASA Astrophysics Data System (ADS)
Shao, Lan; Liu, Liren; Li, Guoqiang
1996-02-01
An optical system based on the method of spatial area-coding and multiple image scheme is proposed for fuzzy associative memory processing. Fuzzy maximum operation is accomplished by a ferroelectric liquid crystal PROM instead of a computer-based approach. A relative subsethood is introduced here to be used as a criterion for the recall evaluation.
ERIC Educational Resources Information Center
Zambon, Franco
This study sought to determine a useful frequency for refreshing students' memories of complex procedures that involved a formal computer language. Students were required to execute the Microsoft Disc Operating System (MS-DOS) commands for "copy,""backup," and "restore." A total of 126 college students enrolled in six…
NASA Technical Reports Server (NTRS)
Stehle, Roy H.; Ogier, Richard G.
1993-01-01
Alternatives for realizing a packet-based network switch for use on a frequency division multiple access/time division multiplexed (FDMA/TDM) geostationary communication satellite were investigated. Each of the eight downlink beams supports eight directed dwells. The design needed to accommodate multicast packets with very low probability of loss due to contention. Three switch architectures were designed and analyzed. An output-queued, shared bus system yielded a functionally simple system, utilizing a first-in, first-out (FIFO) memory per downlink dwell, but at the expense of a large total memory requirement. A shared memory architecture offered the most efficiency in memory requirements, requiring about half the memory of the shared bus design. The processing requirement for the shared-memory system adds system complexity that may offset the benefits of the smaller memory. An alternative design using a shared memory buffer per downlink beam decreases circuit complexity through a distributed design, and requires at most 1000 packets of memory more than the completely shared memory design. Modifications to the basic packet switch designs were proposed to accommodate circuit-switched traffic, which must be served on a periodic basis with minimal delay. Methods for dynamically controlling the downlink dwell lengths were developed and analyzed. These methods adapt quickly to changing traffic demands, and do not add significant complexity or cost to the satellite and ground station designs. Methods for reducing the memory requirement by not requiring the satellite to store full packets were also proposed and analyzed. In addition, optimal packet and dwell lengths were computed as functions of memory size for the three switch architectures.
Contribution of the Cholinergic System to Verbal Memory Performance in Mild Cognitive Impairment.
Peter, Jessica; Lahr, Jacob; Minkova, Lora; Lauer, Eliza; Grothe, Michel J; Teipel, Stefan; Köstering, Lena; Kaller, Christoph P; Heimbach, Bernhard; Hüll, Michael; Normann, Claus; Nissen, Christoph; Reis, Janine; Klöppel, Stefan
2016-06-18
Acetylcholine is critically involved in modulating learning and memory function, which both decline in neurodegeneration. It remains unclear to what extent structural and functional changes in the cholinergic system contribute to episodic memory dysfunction in mild cognitive impairment (MCI), in addition to hippocampal degeneration. A better understanding is critical, given that the cholinergic system is the main target of current symptomatic treatment in mild to moderate Alzheimer's disease. We simultaneously assessed the structural and functional integrity of the cholinergic system in 20 patients with MCI and 20 matched healthy controls and examined their effect on verbal episodic memory via multivariate regression analyses. Mediating effects of either cholinergic function or hippocampal volume on the relationship between cholinergic structure and episodic memory were computed. In MCI, a less intact structure and function of the cholinergic system was found. A smaller cholinergic structure was significantly correlated with a functionally more active cholinergic system in patients, but not in controls. This association was not modulated by age or disease severity, arguing against compensational processes. Further analyses indicated that neither functional nor structural changes in the cholinergic system influence verbal episodic memory at the MCI stage. In fact, those associations were fully mediated by hippocampal volume. Although the cholinergic system is structurally and functionally altered in MCI, episodic memory dysfunction results primarily from hippocampal neurodegeneration, which may explain the inefficiency of cholinergic treatment at this disease stage.
JANUS: A Compilation System for Balancing Parallelism and Performance in OpenVX
NASA Astrophysics Data System (ADS)
Omidian, Hossein; Lemieux, Guy G. F.
2018-04-01
Embedded systems typically do not have enough on-chip memory for entire an image buffer. Programming systems like OpenCV operate on entire image frames at each step, making them use excessive memory bandwidth and power. In contrast, the paradigm used by OpenVX is much more efficient; it uses image tiling, and the compilation system is allowed to analyze and optimize the operation sequence, specified as a compute graph, before doing any pixel processing. In this work, we are building a compilation system for OpenVX that can analyze and optimize the compute graph to take advantage of parallel resources in many-core systems or FPGAs. Using a database of prewritten OpenVX kernels, it automatically adjusts the image tile size as well as using kernel duplication and coalescing to meet a defined area (resource) target, or to meet a specified throughput target. This allows a single compute graph to target implementations with a wide range of performance needs or capabilities, e.g. from handheld to datacenter, that use minimal resources and power to reach the performance target.
Estimating Performance of Single Bus, Shared Memory Multiprocessors
1987-05-01
Chandy78] K.M. Chandy, C.M. Sauer, "Approximate methods for analyzing queuing network models of computing systems," Computing Surveys, vol10 , no 3...Denning78] P. Denning, J. Buzen, "The operational analysis of queueing network models", Computing Sur- veys, vol10 , no 3, September 1978, pp 225-261
Knowledge representation and user interface concepts to support mixed-initiative diagnosis
NASA Technical Reports Server (NTRS)
Sobelman, Beverly H.; Holtzblatt, Lester J.
1989-01-01
The Remote Maintenance Monitoring System (RMMS) provides automated support for the maintenance and repair of ModComp computer systems used in the Launch Processing System (LPS) at Kennedy Space Center. RMMS supports manual and automated diagnosis of intermittent hardware failures, providing an efficient means for accessing and analyzing the data generated by catastrophic failure recovery procedures. This paper describes the design and functionality of the user interface for interactive analysis of memory dump data, relating it to the underlying declarative representation of memory dumps.
Multi-Core Processor Memory Contention Benchmark Analysis Case Study
NASA Technical Reports Server (NTRS)
Simon, Tyler; McGalliard, James
2009-01-01
Multi-core processors dominate current mainframe, server, and high performance computing (HPC) systems. This paper provides synthetic kernel and natural benchmark results from an HPC system at the NASA Goddard Space Flight Center that illustrate the performance impacts of multi-core (dual- and quad-core) vs. single core processor systems. Analysis of processor design, application source code, and synthetic and natural test results all indicate that multi-core processors can suffer from significant memory subsystem contention compared to similar single-core processors.
NASA Astrophysics Data System (ADS)
Liu, Jianming; Grant, Steven L.; Benesty, Jacob
2015-12-01
A new reweighted proportionate affine projection algorithm (RPAPA) with memory and row action projection (MRAP) is proposed in this paper. The reweighted PAPA is derived from a family of sparseness measures, which demonstrate performance similar to mu-law and the l 0 norm PAPA but with lower computational complexity. The sparseness of the channel is taken into account to improve the performance for dispersive system identification. Meanwhile, the memory of the filter's coefficients is combined with row action projections (RAP) to significantly reduce computational complexity. Simulation results demonstrate that the proposed RPAPA MRAP algorithm outperforms both the affine projection algorithm (APA) and PAPA, and has performance similar to l 0 PAPA and mu-law PAPA, in terms of convergence speed and tracking ability. Meanwhile, the proposed RPAPA MRAP has much lower computational complexity than PAPA, mu-law PAPA, and l 0 PAPA, etc., which makes it very appealing for real-time implementation.
Modeling aspects of human memory for scientific study.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Caudell, Thomas P.; Watson, Patrick; McDaniel, Mark A.
Working with leading experts in the field of cognitive neuroscience and computational intelligence, SNL has developed a computational architecture that represents neurocognitive mechanisms associated with how humans remember experiences in their past. The architecture represents how knowledge is organized and updated through information from individual experiences (episodes) via the cortical-hippocampal declarative memory system. We compared the simulated behavioral characteristics with those of humans measured under well established experimental standards, controlling for unmodeled aspects of human processing, such as perception. We used this knowledge to create robust simulations of & human memory behaviors that should help move the scientific community closermore » to understanding how humans remember information. These behaviors were experimentally validated against actual human subjects, which was published. An important outcome of the validation process will be the joining of specific experimental testing procedures from the field of neuroscience with computational representations from the field of cognitive modeling and simulation.« less
Advanced information processing system: Local system services
NASA Technical Reports Server (NTRS)
Burkhardt, Laura; Alger, Linda; Whittredge, Roy; Stasiowski, Peter
1989-01-01
The Advanced Information Processing System (AIPS) is a multi-computer architecture composed of hardware and software building blocks that can be configured to meet a broad range of application requirements. The hardware building blocks are fault-tolerant, general-purpose computers, fault-and damage-tolerant networks (both computer and input/output), and interfaces between the networks and the computers. The software building blocks are the major software functions: local system services, input/output, system services, inter-computer system services, and the system manager. The foundation of the local system services is an operating system with the functions required for a traditional real-time multi-tasking computer, such as task scheduling, inter-task communication, memory management, interrupt handling, and time maintenance. Resting on this foundation are the redundancy management functions necessary in a redundant computer and the status reporting functions required for an operator interface. The functional requirements, functional design and detailed specifications for all the local system services are documented.
Systems Suitable for Information Professionals.
ERIC Educational Resources Information Center
Blair, John C., Jr.
1983-01-01
Describes computer operating systems applicable to microcomputers, noting hardware components, advantages and disadvantages of each system, local area networks, distributed processing, and a fully configured system. Lists of hardware components (disk drives, solid state disk emulators, input/output and memory components, and processors) and…
2016-11-01
Feasibility of using Shape Memory Alloys for Gas Turbine Blade Actuation by Kathryn Esham, Luis Bravo, Anindya Ghoshal, Muthuvel Murugan, and Michael...Computational Study on the Feasibility of using Shape Memory Alloys for Gas Turbine Blade Actuation by Luis Bravo, Anindya Ghoshal, Muthuvel...High Performance Computing (HPC)-Enabled Computational Study on the Feasibility of using Shape Memory Alloys for Gas Turbine Blade Actuation 5a
Protecting solid-state spins from a strongly coupled environment
NASA Astrophysics Data System (ADS)
Chen, Mo; Calvin Sun, Won Kyu; Saha, Kasturi; Jaskula, Jean-Christophe; Cappellaro, Paola
2018-06-01
Quantum memories are critical for solid-state quantum computing devices and a good quantum memory requires both long storage time and fast read/write operations. A promising system is the nitrogen-vacancy (NV) center in diamond, where the NV electronic spin serves as the computing qubit and a nearby nuclear spin as the memory qubit. Previous works used remote, weakly coupled 13C nuclear spins, trading read/write speed for long storage time. Here we focus instead on the intrinsic strongly coupled 14N nuclear spin. We first quantitatively understand its decoherence mechanism, identifying as its source the electronic spin that acts as a quantum fluctuator. We then propose a scheme to protect the quantum memory from the fluctuating noise by applying dynamical decoupling on the environment itself. We demonstrate a factor of 3 enhancement of the storage time in a proof-of-principle experiment, showing the potential for a quantum memory that combines fast operation with long coherence time.
NASA Astrophysics Data System (ADS)
Ohene-Kwofie, Daniel; Otoo, Ekow
2015-10-01
The ATLAS detector, operated at the Large Hadron Collider (LHC) records proton-proton collisions at CERN every 50ns resulting in a sustained data flow up to PB/s. The upgraded Tile Calorimeter of the ATLAS experiment will sustain about 5PB/s of digital throughput. These massive data rates require extremely fast data capture and processing. Although there has been a steady increase in the processing speed of CPU/GPGPU assembled for high performance computing, the rate of data input and output, even under parallel I/O, has not kept up with the general increase in computing speeds. The problem then is whether one can implement an I/O subsystem infrastructure capable of meeting the computational speeds of the advanced computing systems at the petascale and exascale level. We propose a system architecture that leverages the Partitioned Global Address Space (PGAS) model of computing to maintain an in-memory data-store for the Processing Unit (PU) of the upgraded electronics of the Tile Calorimeter which is proposed to be used as a high throughput general purpose co-processor to the sROD of the upgraded Tile Calorimeter. The physical memory of the PUs are aggregated into a large global logical address space using RDMA- capable interconnects such as PCI- Express to enhance data processing throughput.
A learnable parallel processing architecture towards unity of memory and computing
NASA Astrophysics Data System (ADS)
Li, H.; Gao, B.; Chen, Z.; Zhao, Y.; Huang, P.; Ye, H.; Liu, L.; Liu, X.; Kang, J.
2015-08-01
Developing energy-efficient parallel information processing systems beyond von Neumann architecture is a long-standing goal of modern information technologies. The widely used von Neumann computer architecture separates memory and computing units, which leads to energy-hungry data movement when computers work. In order to meet the need of efficient information processing for the data-driven applications such as big data and Internet of Things, an energy-efficient processing architecture beyond von Neumann is critical for the information society. Here we show a non-von Neumann architecture built of resistive switching (RS) devices named “iMemComp”, where memory and logic are unified with single-type devices. Leveraging nonvolatile nature and structural parallelism of crossbar RS arrays, we have equipped “iMemComp” with capabilities of computing in parallel and learning user-defined logic functions for large-scale information processing tasks. Such architecture eliminates the energy-hungry data movement in von Neumann computers. Compared with contemporary silicon technology, adder circuits based on “iMemComp” can improve the speed by 76.8% and the power dissipation by 60.3%, together with a 700 times aggressive reduction in the circuit area.
A learnable parallel processing architecture towards unity of memory and computing.
Li, H; Gao, B; Chen, Z; Zhao, Y; Huang, P; Ye, H; Liu, L; Liu, X; Kang, J
2015-08-14
Developing energy-efficient parallel information processing systems beyond von Neumann architecture is a long-standing goal of modern information technologies. The widely used von Neumann computer architecture separates memory and computing units, which leads to energy-hungry data movement when computers work. In order to meet the need of efficient information processing for the data-driven applications such as big data and Internet of Things, an energy-efficient processing architecture beyond von Neumann is critical for the information society. Here we show a non-von Neumann architecture built of resistive switching (RS) devices named "iMemComp", where memory and logic are unified with single-type devices. Leveraging nonvolatile nature and structural parallelism of crossbar RS arrays, we have equipped "iMemComp" with capabilities of computing in parallel and learning user-defined logic functions for large-scale information processing tasks. Such architecture eliminates the energy-hungry data movement in von Neumann computers. Compared with contemporary silicon technology, adder circuits based on "iMemComp" can improve the speed by 76.8% and the power dissipation by 60.3%, together with a 700 times aggressive reduction in the circuit area.
Implementation of relational data base management systems on micro-computers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Huang, C.L.
1982-01-01
This dissertation describes an implementation of a Relational Data Base Management System on a microcomputer. A specific floppy disk based hardward called TERAK is being used, and high level query interface which is similar to a subset of the SEQUEL language is provided. The system contains sub-systems such as I/O, file management, virtual memory management, query system, B-tree management, scanner, command interpreter, expression compiler, garbage collection, linked list manipulation, disk space management, etc. The software has been implemented to fulfill the following goals: (1) it is highly modularized. (2) The system is physically segmented into 16 logically independent, overlayable segments,more » in a way such that a minimal amount of memory is needed at execution time. (3) Virtual memory system is simulated that provides the system with seemingly unlimited memory space. (4) A language translator is applied to recognize user requests in the query language. The code generation of this translator generates compact code for the execution of UPDATE, DELETE, and QUERY commands. (5) A complete set of basic functions needed for on-line data base manipulations is provided through the use of a friendly query interface. (6) To eliminate the dependency on the environment (both software and hardware) as much as possible, so that it would be easy to transplant the system to other computers. (7) To simulate each relation as a sequential file. It is intended to be a highly efficient, single user system suited to be used by small or medium sized organizations for, say, administrative purposes. Experiments show that quite satisfying results have indeed been achieved.« less
New double-byte error-correcting codes for memory systems
NASA Technical Reports Server (NTRS)
Feng, Gui-Liang; Wu, Xinen; Rao, T. R. N.
1996-01-01
Error-correcting or error-detecting codes have been used in the computer industry to increase reliability, reduce service costs, and maintain data integrity. The single-byte error-correcting and double-byte error-detecting (SbEC-DbED) codes have been successfully used in computer memory subsystems. There are many methods to construct double-byte error-correcting (DBEC) codes. In the present paper we construct a class of double-byte error-correcting codes, which are more efficient than those known to be optimum, and a decoding procedure for our codes is also considered.
NRAM: a disruptive carbon-nanotube resistance-change memory.
Gilmer, D C; Rueckes, T; Cleveland, L
2018-04-03
Advanced memory technology based on carbon nanotubes (CNTs) (NRAM) possesses desired properties for implementation in a host of integrated systems due to demonstrated advantages of its operation including high speed (nanotubes can switch state in picoseconds), high endurance (over a trillion), and low power (with essential zero standby power). The applicable integrated systems for NRAM have markets that will see compound annual growth rates (CAGR) of over 62% between 2018 and 2023, with an embedded systems CAGR of 115% in 2018-2023 (http://bccresearch.com/pressroom/smc/bcc-research-predicts:-nram-(finally)-to-revolutionize-computer-memory). These opportunities are helping drive the realization of a shift from silicon-based to carbon-based (NRAM) memories. NRAM is a memory cell made up of an interlocking matrix of CNTs, either touching or slightly separated, leading to low or higher resistance states respectively. The small movement of atoms, as opposed to moving electrons for traditional silicon-based memories, renders NRAM with a more robust endurance and high temperature retention/operation which, along with high speed/low power, is expected to blossom in this memory technology to be a disruptive replacement for the current status quo of DRAM (dynamic RAM), SRAM (static RAM), and NAND flash memories.
NRAM: a disruptive carbon-nanotube resistance-change memory
NASA Astrophysics Data System (ADS)
Gilmer, D. C.; Rueckes, T.; Cleveland, L.
2018-04-01
Advanced memory technology based on carbon nanotubes (CNTs) (NRAM) possesses desired properties for implementation in a host of integrated systems due to demonstrated advantages of its operation including high speed (nanotubes can switch state in picoseconds), high endurance (over a trillion), and low power (with essential zero standby power). The applicable integrated systems for NRAM have markets that will see compound annual growth rates (CAGR) of over 62% between 2018 and 2023, with an embedded systems CAGR of 115% in 2018-2023 (http://bccresearch.com/pressroom/smc/bcc-research-predicts:-nram-(finally)-to-revolutionize-computer-memory). These opportunities are helping drive the realization of a shift from silicon-based to carbon-based (NRAM) memories. NRAM is a memory cell made up of an interlocking matrix of CNTs, either touching or slightly separated, leading to low or higher resistance states respectively. The small movement of atoms, as opposed to moving electrons for traditional silicon-based memories, renders NRAM with a more robust endurance and high temperature retention/operation which, along with high speed/low power, is expected to blossom in this memory technology to be a disruptive replacement for the current status quo of DRAM (dynamic RAM), SRAM (static RAM), and NAND flash memories.
40 CFR 1033.112 - Emission diagnostics for SCR systems.
Code of Federal Regulations, 2013 CFR
2013-07-01
.... This section does not apply for SCR systems using the engine's fuel as the reductant. (a) The... computer memory all incidents of engine operation with inadequate reductant injection or reductant quality...
Artificial Intelligence Support for Computational Chemistry
NASA Astrophysics Data System (ADS)
Duch, Wlodzislaw
Possible forms of artificial intelligence (AI) support for quantum chemistry are discussed. Questions addressed include: what kind of support is desirable, what kind of support is feasible, what can we expect in the coming years. Advantages and disadvantages of current AI techniques are presented and it is argued that at present the memory-based systems are the most effective for large scale applications. Such systems may be used to predict the accuracy of calculations and to select the least expensive methods and basis sets belonging to the same accuracy class. Advantages of the Feature Space Mapping as an improvement on the memory based systems are outlined and some results obtained in classification problems given. Relevance of such classification systems to computational chemistry is illustrated with two examples showing similarity of results obtained by different methods that take electron correlation into account.
A User Oriented Microcomputer and Monitor System.
1981-02-15
inhibit signal is generated by the Monitor to (1) prevent microcomputer bus timeout, and (2) suspend the micro- computer interval timers while the...PDPll is prevented until the user sets the BIT flag for the associated buffer memory. Completion of a buffer memory transfer generates monitor source...1553 NUX PIOU PRGRAMMED 10 IRECT MEMORY MONITOR 0I INTERAC JI LMEMOR COR POWER I J SUPPLIES 4 FIGURE 15. MICROCOMPUTER MAJOR AREAS 64 a uIu 1 ta 0 W o
pyCTQW: A continuous-time quantum walk simulator on distributed memory computers
NASA Astrophysics Data System (ADS)
Izaac, Josh A.; Wang, Jingbo B.
2015-01-01
In the general field of quantum information and computation, quantum walks are playing an increasingly important role in constructing physical models and quantum algorithms. We have recently developed a distributed memory software package pyCTQW, with an object-oriented Python interface, that allows efficient simulation of large multi-particle CTQW (continuous-time quantum walk)-based systems. In this paper, we present an introduction to the Python and Fortran interfaces of pyCTQW, discuss various numerical methods of calculating the matrix exponential, and demonstrate the performance behavior of pyCTQW on a distributed memory cluster. In particular, the Chebyshev and Krylov-subspace methods for calculating the quantum walk propagation are provided, as well as methods for visualization and data analysis.
Bad data packet capture device
Chen, Dong; Gara, Alan; Heidelberger, Philip; Vranas, Pavlos
2010-04-20
An apparatus and method for capturing data packets for analysis on a network computing system includes a sending node and a receiving node connected by a bi-directional communication link. The sending node sends a data transmission to the receiving node on the bi-directional communication link, and the receiving node receives the data transmission and verifies the data transmission to determine valid data and invalid data and verify retransmissions of invalid data as corresponding valid data. A memory device communicates with the receiving node for storing the invalid data and the corresponding valid data. A computing node communicates with the memory device and receives and performs an analysis of the invalid data and the corresponding valid data received from the memory device.
NASA Technical Reports Server (NTRS)
Hamilton, M. H.
1972-01-01
Erasable-memory programs (EMPs) designed for the guidance computers used in the command (CMC) and lunar modules (LGC) are described. CMC programs are designated COLOSSUS 3, and the associated EMPs are identified by a three-digit number beginning with 5. LGC programs are designated LUMINARY 1E, and the associated EMPs are identified, with one exception, by a three-digit number beginning with 1. The exception is EMP 99. The EMPs vary in complexity from a simple flagbit setting to a long and intricate logical structure. They all, however, cause the computer to behave in a way not intended in the original design of the programs; they accomplish this off-nominal behavior by some alteration of erasable memory to interface with existing fixed-memory programs to effect a desired result.
A Comprehensive Study on Energy Efficiency and Performance of Flash-based SSD
DOE Office of Scientific and Technical Information (OSTI.GOV)
Park, Seon-Yeon; Kim, Youngjae; Urgaonkar, Bhuvan
2011-01-01
Use of flash memory as a storage medium is becoming popular in diverse computing environments. However, because of differences in interface, flash memory requires a hard-disk-emulation layer, called FTL (flash translation layer). Although the FTL enables flash memory storages to replace conventional hard disks, it induces significant computational and space overhead. Despite the low power consumption of flash memory, this overhead leads to significant power consumption in an overall storage system. In this paper, we analyze the characteristics of flash-based storage devices from the viewpoint of power consumption and energy efficiency by using various methodologies. First, we utilize simulation tomore » investigate the interior operation of flash-based storage of flash-based storages. Subsequently, we measure the performance and energy efficiency of commodity flash-based SSDs by using microbenchmarks to identify the block-device level characteristics and macrobenchmarks to reveal their filesystem level characteristics.« less
Support for non-locking parallel reception of packets belonging to a single memory reception FIFO
Chen, Dong [Yorktown Heights, NY; Heidelberger, Philip [Yorktown Heights, NY; Salapura, Valentina [Yorktown Heights, NY; Senger, Robert M [Yorktown Heights, NY; Steinmacher-Burow, Burkhard [Boeblingen, DE; Sugawara, Yutaka [Yorktown Heights, NY
2011-01-27
A method and apparatus for distributed parallel messaging in a parallel computing system. A plurality of DMA engine units are configured in a multiprocessor system to operate in parallel, one DMA engine unit for transferring a current packet received at a network reception queue to a memory location in a memory FIFO (rmFIFO) region of a memory. A control unit implements logic to determine whether any prior received packet destined for that rmFIFO is still in a process of being stored in the associated memory by another DMA engine unit of the plurality, and prevent the one DMA engine unit from indicating completion of storing the current received packet in the reception memory FIFO (rmFIFO) until all prior received packets destined for that rmFIFO are completely stored by the other DMA engine units. Thus, there is provided non-locking support so that multiple packets destined for a single rmFIFO are transferred and stored in parallel to predetermined locations in a memory.
Initial Performance Results on IBM POWER6
NASA Technical Reports Server (NTRS)
Saini, Subbash; Talcott, Dale; Jespersen, Dennis; Djomehri, Jahed; Jin, Haoqiang; Mehrotra, Piysuh
2008-01-01
The POWER5+ processor has a faster memory bus than that of the previous generation POWER5 processor (533 MHz vs. 400 MHz), but the measured per-core memory bandwidth of the latter is better than that of the former (5.7 GB/s vs. 4.3 GB/s). The reason for this is that in the POWER5+, the two cores on the chip share the L2 cache, L3 cache and memory bus. The memory controller is also on the chip and is shared by the two cores. This serializes the path to memory. For consistently good performance on a wide range of applications, the performance of the processor, the memory subsystem, and the interconnects (both latency and bandwidth) should be balanced. Recognizing this, IBM has designed the Power6 processor so as to avoid the bottlenecks due to the L2 cache, memory controller and buffer chips of the POWER5+. Unlike the POWER5+, each core in the POWER6 has its own L2 cache (4 MB - double that of the Power5+), memory controller and buffer chips. Each core in the POWER6 runs at 4.7 GHz instead of 1.9 GHz in POWER5+. In this paper, we evaluate the performance of a dual-core Power6 based IBM p6-570 system, and we compare its performance with that of a dual-core Power5+ based IBM p575+ system. In this evaluation, we have used the High- Performance Computing Challenge (HPCC) benchmarks, NAS Parallel Benchmarks (NPB), and four real-world applications--three from computational fluid dynamics and one from climate modeling.
Parallelization strategies for continuum-generalized method of moments on the multi-thread systems
NASA Astrophysics Data System (ADS)
Bustamam, A.; Handhika, T.; Ernastuti, Kerami, D.
2017-07-01
Continuum-Generalized Method of Moments (C-GMM) covers the Generalized Method of Moments (GMM) shortfall which is not as efficient as Maximum Likelihood estimator by using the continuum set of moment conditions in a GMM framework. However, this computation would take a very long time since optimizing regularization parameter. Unfortunately, these calculations are processed sequentially whereas in fact all modern computers are now supported by hierarchical memory systems and hyperthreading technology, which allowing for parallel computing. This paper aims to speed up the calculation process of C-GMM by designing a parallel algorithm for C-GMM on the multi-thread systems. First, parallel regions are detected for the original C-GMM algorithm. There are two parallel regions in the original C-GMM algorithm, that are contributed significantly to the reduction of computational time: the outer-loop and the inner-loop. Furthermore, this parallel algorithm will be implemented with standard shared-memory application programming interface, i.e. Open Multi-Processing (OpenMP). The experiment shows that the outer-loop parallelization is the best strategy for any number of observations.
NASA Astrophysics Data System (ADS)
Mikaelian, Andrei L.
Attention is given to data storage, devices, architectures, and implementations of optical memory and neural networks; holographic optical elements and computer-generated holograms; holographic display and materials; systems, pattern recognition, interferometry, and applications in optical information processing; and special measurements and devices. Topics discussed include optical immersion as a new way to increase information recording density, systems for data reading from optical disks on the basis of diffractive lenses, a new real-time optical associative memory system, an optical pattern recognition system based on a WTA model of neural networks, phase diffraction grating for the integral transforms of coherent light fields, holographic recording with operated sensitivity and stability in chalcogenide glass layers, a compact optical logic processor, a hybrid optical system for computing invariant moments of images, optical fiber holographic inteferometry, and image transmission through random media in single pass via optical phase conjugation.
Continuing challenges for computer-based neuropsychological tests.
Letz, Richard
2003-08-01
A number of issues critical to the development of computer-based neuropsychological testing systems that remain continuing challenges to their widespread use in occupational and environmental health are reviewed. Several computer-based neuropsychological testing systems have been developed over the last 20 years, and they have contributed substantially to the study of neurologic effects of a number of environmental exposures. However, many are no longer supported and do not run on contemporary personal computer operating systems. Issues that are continuing challenges for development of computer-based neuropsychological tests in environmental and occupational health are discussed: (1) some current technological trends that generally make test development more difficult; (2) lack of availability of usable speech recognition of the type required for computer-based testing systems; (3) implementing computer-based procedures and tasks that are improvements over, not just adaptations of, their manually-administered predecessors; (4) implementing tests of a wider range of memory functions than the limited range now available; (5) paying more attention to motivational influences that affect the reliability and validity of computer-based measurements; and (6) increasing the usability of and audience for computer-based systems. Partial solutions to some of these challenges are offered. The challenges posed by current technological trends are substantial and generally beyond the control of testing system developers. Widespread acceptance of the "tablet PC" and implementation of accurate small vocabulary, discrete, speaker-independent speech recognition would enable revolutionary improvements to computer-based testing systems, particularly for testing memory functions not covered in existing systems. Dynamic, adaptive procedures, particularly ones based on item-response theory (IRT) and computerized-adaptive testing (CAT) methods, will be implemented in new tests that will be more efficient, reliable, and valid than existing test procedures. These additional developments, along with implementation of innovative reporting formats, are necessary for more widespread acceptance of the testing systems.
Combining Distributed and Shared Memory Models: Approach and Evolution of the Global Arrays Toolkit
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nieplocha, Jarek; Harrison, Robert J.; Kumar, Mukul
2002-07-29
Both shared memory and distributed memory models have advantages and shortcomings. Shared memory model is much easier to use but it ignores data locality/placement. Given the hierarchical nature of the memory subsystems in the modern computers this characteristic might have a negative impact on performance and scalability. Various techniques, such as code restructuring to increase data reuse and introducing blocking in data accesses, can address the problem and yield performance competitive with message passing[Singh], however at the cost of compromising the ease of use feature. Distributed memory models such as message passing or one-sided communication offer performance and scalability butmore » they compromise the ease-of-use. In this context, the message-passing model is sometimes referred to as?assembly programming for the scientific computing?. The Global Arrays toolkit[GA1, GA2] attempts to offer the best features of both models. It implements a shared-memory programming model in which data locality is managed explicitly by the programmer. This management is achieved by explicit calls to functions that transfer data between a global address space (a distributed array) and local storage. In this respect, the GA model has similarities to the distributed shared-memory models that provide an explicit acquire/release protocol. However, the GA model acknowledges that remote data is slower to access than local data and allows data locality to be explicitly specified and hence managed. The GA model exposes to the programmer the hierarchical memory of modern high-performance computer systems, and by recognizing the communication overhead for remote data transfer, it promotes data reuse and locality of reference. This paper describes the characteristics of the Global Arrays programming model, capabilities of the toolkit, and discusses its evolution.« less
PIMS: Memristor-Based Processing-in-Memory-and-Storage.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cook, Jeanine
Continued progress in computing has augmented the quest for higher performance with a new quest for higher energy efficiency. This has led to the re-emergence of Processing-In-Memory (PIM) ar- chitectures that offer higher density and performance with some boost in energy efficiency. Past PIM work either integrated a standard CPU with a conventional DRAM to improve the CPU- memory link, or used a bit-level processor with Single Instruction Multiple Data (SIMD) control, but neither matched the energy consumption of the memory to the computation. We originally proposed to develop a new architecture derived from PIM that more effectively addressed energymore » efficiency for high performance scientific, data analytics, and neuromorphic applications. We also originally planned to implement a von Neumann architecture with arithmetic/logic units (ALUs) that matched the power consumption of an advanced storage array to maximize energy efficiency. Implementing this architecture in storage was our original idea, since by augmenting storage (in- stead of memory), the system could address both in-memory computation and applications that accessed larger data sets directly from storage, hence Processing-in-Memory-and-Storage (PIMS). However, as our research matured, we discovered several things that changed our original direc- tion, the most important being that a PIM that implements a standard von Neumann-type archi- tecture results in significant energy efficiency improvement, but only about a O(10) performance improvement. In addition to this, the emergence of new memory technologies moved us to propos- ing a non-von Neumann architecture, called Superstrider, implemented not in storage, but in a new DRAM technology called High Bandwidth Memory (HBM). HBM is a stacked DRAM tech- nology that includes a logic layer where an architecture such as Superstrider could potentially be implemented.« less
A general valuation of the various types of phototropic (i.e., reversible, light induced, color producing) phenomenon is given regarding the...application of phototropic material to bioptic high density storage media for compu er memories. The inorganic ’’F’’ center type phototropic systems were
Method for simultaneous overlapped communications between neighboring processors in a multiple
Benner, Robert E.; Gustafson, John L.; Montry, Gary R.
1991-01-01
A parallel computing system and method having improved performance where a program is concurrently run on a plurality of nodes for reducing total processing time, each node having a processor, a memory, and a predetermined number of communication channels connected to the node and independently connected directly to other nodes. The present invention improves performance of performance of the parallel computing system by providing a system which can provide efficient communication between the processors and between the system and input and output devices. A method is also disclosed which can locate defective nodes with the computing system.
Cycle accurate and cycle reproducible memory for an FPGA based hardware accelerator
Asaad, Sameh W.; Kapur, Mohit
2016-03-15
A method, system and computer program product are disclosed for using a Field Programmable Gate Array (FPGA) to simulate operations of a device under test (DUT). The DUT includes a device memory having a number of input ports, and the FPGA is associated with a target memory having a second number of input ports, the second number being less than the first number. In one embodiment, a given set of inputs is applied to the device memory at a frequency Fd and in a defined cycle of time, and the given set of inputs is applied to the target memory at a frequency Ft. Ft is greater than Fd and cycle accuracy is maintained between the device memory and the target memory. In an embodiment, a cycle accurate model of the DUT memory is created by separating the DUT memory interface protocol from the target memory storage array.
Peregrine System Configuration | High-Performance Computing | NREL
nodes and storage are connected by a high speed InfiniBand network. Compute nodes are diskless with an directories are mounted on all nodes, along with a file system dedicated to shared projects. A brief processors with 64 GB of memory. All nodes are connected to the high speed Infiniband network and and a
Bhanot, Gyan V [Princeton, NJ; Chen, Dong [Croton-On-Hudson, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Steinmacher-Burow, Burkhard D [Mount Kisco, NY; Vranas, Pavlos M [Bedford Hills, NY
2012-01-10
The present in invention is directed to a method, system and program storage device for efficiently implementing a multidimensional Fast Fourier Transform (FFT) of a multidimensional array comprising a plurality of elements initially distributed in a multi-node computer system comprising a plurality of nodes in communication over a network, comprising: distributing the plurality of elements of the array in a first dimension across the plurality of nodes of the computer system over the network to facilitate a first one-dimensional FFT; performing the first one-dimensional FFT on the elements of the array distributed at each node in the first dimension; re-distributing the one-dimensional FFT-transformed elements at each node in a second dimension via "all-to-all" distribution in random order across other nodes of the computer system over the network; and performing a second one-dimensional FFT on elements of the array re-distributed at each node in the second dimension, wherein the random order facilitates efficient utilization of the network thereby efficiently implementing the multidimensional FFT. The "all-to-all" re-distribution of array elements is further efficiently implemented in applications other than the multidimensional FFT on the distributed-memory parallel supercomputer.
Bhanot, Gyan V [Princeton, NJ; Chen, Dong [Croton-On-Hudson, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Steinmacher-Burow, Burkhard D [Mount Kisco, NY; Vranas, Pavlos M [Bedford Hills, NY
2008-01-01
The present in invention is directed to a method, system and program storage device for efficiently implementing a multidimensional Fast Fourier Transform (FFT) of a multidimensional array comprising a plurality of elements initially distributed in a multi-node computer system comprising a plurality of nodes in communication over a network, comprising: distributing the plurality of elements of the array in a first dimension across the plurality of nodes of the computer system over the network to facilitate a first one-dimensional FFT; performing the first one-dimensional FFT on the elements of the array distributed at each node in the first dimension; re-distributing the one-dimensional FFT-transformed elements at each node in a second dimension via "all-to-all" distribution in random order across other nodes of the computer system over the network; and performing a second one-dimensional FFT on elements of the array re-distributed at each node in the second dimension, wherein the random order facilitates efficient utilization of the network thereby efficiently implementing the multidimensional FFT. The "all-to-all" re-distribution of array elements is further efficiently implemented in applications other than the multidimensional FFT on the distributed-memory parallel supercomputer.
Overview of emerging nonvolatile memory technologies
2014-01-01
Nonvolatile memory technologies in Si-based electronics date back to the 1990s. Ferroelectric field-effect transistor (FeFET) was one of the most promising devices replacing the conventional Flash memory facing physical scaling limitations at those times. A variant of charge storage memory referred to as Flash memory is widely used in consumer electronic products such as cell phones and music players while NAND Flash-based solid-state disks (SSDs) are increasingly displacing hard disk drives as the primary storage device in laptops, desktops, and even data centers. The integration limit of Flash memories is approaching, and many new types of memory to replace conventional Flash memories have been proposed. Emerging memory technologies promise new memories to store more data at less cost than the expensive-to-build silicon chips used by popular consumer gadgets including digital cameras, cell phones and portable music players. They are being investigated and lead to the future as potential alternatives to existing memories in future computing systems. Emerging nonvolatile memory technologies such as magnetic random-access memory (MRAM), spin-transfer torque random-access memory (STT-RAM), ferroelectric random-access memory (FeRAM), phase-change memory (PCM), and resistive random-access memory (RRAM) combine the speed of static random-access memory (SRAM), the density of dynamic random-access memory (DRAM), and the nonvolatility of Flash memory and so become very attractive as another possibility for future memory hierarchies. Many other new classes of emerging memory technologies such as transparent and plastic, three-dimensional (3-D), and quantum dot memory technologies have also gained tremendous popularity in recent years. Subsequently, not an exaggeration to say that computer memory could soon earn the ultimate commercial validation for commercial scale-up and production the cheap plastic knockoff. Therefore, this review is devoted to the rapidly developing new class of memory technologies and scaling of scientific procedures based on an investigation of recent progress in advanced Flash memory devices. PMID:25278820
Overview of emerging nonvolatile memory technologies.
Meena, Jagan Singh; Sze, Simon Min; Chand, Umesh; Tseng, Tseung-Yuen
2014-01-01
Nonvolatile memory technologies in Si-based electronics date back to the 1990s. Ferroelectric field-effect transistor (FeFET) was one of the most promising devices replacing the conventional Flash memory facing physical scaling limitations at those times. A variant of charge storage memory referred to as Flash memory is widely used in consumer electronic products such as cell phones and music players while NAND Flash-based solid-state disks (SSDs) are increasingly displacing hard disk drives as the primary storage device in laptops, desktops, and even data centers. The integration limit of Flash memories is approaching, and many new types of memory to replace conventional Flash memories have been proposed. Emerging memory technologies promise new memories to store more data at less cost than the expensive-to-build silicon chips used by popular consumer gadgets including digital cameras, cell phones and portable music players. They are being investigated and lead to the future as potential alternatives to existing memories in future computing systems. Emerging nonvolatile memory technologies such as magnetic random-access memory (MRAM), spin-transfer torque random-access memory (STT-RAM), ferroelectric random-access memory (FeRAM), phase-change memory (PCM), and resistive random-access memory (RRAM) combine the speed of static random-access memory (SRAM), the density of dynamic random-access memory (DRAM), and the nonvolatility of Flash memory and so become very attractive as another possibility for future memory hierarchies. Many other new classes of emerging memory technologies such as transparent and plastic, three-dimensional (3-D), and quantum dot memory technologies have also gained tremendous popularity in recent years. Subsequently, not an exaggeration to say that computer memory could soon earn the ultimate commercial validation for commercial scale-up and production the cheap plastic knockoff. Therefore, this review is devoted to the rapidly developing new class of memory technologies and scaling of scientific procedures based on an investigation of recent progress in advanced Flash memory devices.
NASA Technical Reports Server (NTRS)
Phyne, J. R.; Nelson, M. D.
1975-01-01
The design and implementation of hardware and software systems involved in using a 40,000 bit/second communication line as the connecting link between an IMLAC PDS 1-D display computer and a Univac 1108 computer system were described. The IMLAC consists of two independent processors sharing a common memory. The display processor generates the deflection and beam control currents as it interprets a program contained in the memory; the minicomputer has a general instruction set and is responsible for starting and stopping the display processor and for communicating with the outside world through the keyboard, teletype, light pen, and communication line. The processing time associated with each data byte was minimized by designing the input and output processes as finite state machines which automatically sequence from each state to the next. Several tests of the communication link and the IMLAC software were made using a special low capacity computer grade cable between the IMLAC and the Univac.
NASA Astrophysics Data System (ADS)
Strotov, Valery V.; Taganov, Alexander I.; Konkin, Yuriy V.; Kolesenkov, Aleksandr N.
2017-10-01
Task of processing and analysis of obtained Earth remote sensing data on ultra-small spacecraft board is actual taking into consideration significant expenditures of energy for data transfer and low productivity of computers. Thereby, there is an issue of effective and reliable storage of the general information flow obtained from onboard systems of information collection, including Earth remote sensing data, into a specialized data base. The paper has considered peculiarities of database management system operation with the multilevel memory structure. For storage of data in data base the format has been developed that describes a data base physical structure which contains required parameters for information loading. Such structure allows reducing a memory size occupied by data base because it is not necessary to store values of keys separately. The paper has shown architecture of the relational database management system oriented into embedment into the onboard ultra-small spacecraft software. Data base for storage of different information, including Earth remote sensing data, can be developed by means of such database management system for its following processing. Suggested database management system architecture has low requirements to power of the computer systems and memory resources on the ultra-small spacecraft board. Data integrity is ensured under input and change of the structured information.
TMS communications hardware. Volume 1: Computer interfaces
NASA Technical Reports Server (NTRS)
Brown, J. S.; Weinrich, S. S.
1979-01-01
A prototpye coaxial cable bus communications system was designed to be used in the Trend Monitoring System (TMS) to connect intelligent graphics terminals (based around a Data General NOVA/3 computer) to a MODCOMP IV host minicomputer. The direct memory access (DMA) interfaces which were utilized for each of these computers are identified. It is shown that for the MODCOMP, an off-the-shell board was suitable, while for the NOVAs, custon interface circuitry was designed and implemented.
Simulation of n-qubit quantum systems. I. Quantum registers and quantum gates
NASA Astrophysics Data System (ADS)
Radtke, T.; Fritzsche, S.
2005-12-01
During recent years, quantum computations and the study of n-qubit quantum systems have attracted a lot of interest, both in theory and experiment. Apart from the promise of performing quantum computations, however, these investigations also revealed a great deal of difficulties which still need to be solved in practice. In quantum computing, unitary and non-unitary quantum operations act on a given set of qubits to form (entangled) states, in which the information is encoded by the overall system often referred to as quantum registers. To facilitate the simulation of such n-qubit quantum systems, we present the FEYNMAN program to provide all necessary tools in order to define and to deal with quantum registers and quantum operations. Although the present version of the program is restricted to unitary transformations, it equally supports—whenever possible—the representation of the quantum registers both, in terms of their state vectors and density matrices. In addition to the composition of two or more quantum registers, moreover, the program also supports their decomposition into various parts by applying the partial trace operation and the concept of the reduced density matrix. Using an interactive design within the framework of MAPLE, therefore, we expect the FEYNMAN program to be helpful not only for teaching the basic elements of quantum computing but also for studying their physical realization in the future. Program summaryTitle of program:FEYNMAN Catalogue number:ADWE Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADWE Program obtainable from:CPC Program Library, Queen's University of Belfast, N. Ireland Licensing provisions:None Computers for which the program is designed:All computers with a license of the computer algebra system MAPLE [Maple is a registered trademark of Waterlo Maple Inc.] Operating systems or monitors under which the program has been tested:Linux, MS Windows XP Programming language used:MAPLE 9.5 (but should be compatible with 9.0 and 8.0, too) Memory and time required to execute with typical data:Storage and time requirements critically depend on the number of qubits, n, in the quantum registers due to the exponential increase of the associated Hilbert space. In particular, complex algebraic operations may require large amounts of memory even for small qubit numbers. However, most of the standard commands (see Section 4 for simple examples) react promptly for up to five qubits on a normal single-processor machine ( ⩾1GHz with 512 MB memory) and use less than 10 MB memory. No. of lines in distributed program, including test data, etc.: 8864 No. of bytes in distributed program, including test data, etc.: 493 182 Distribution format: tar.gz Nature of the physical problem:During the last decade, quantum computing has been found to provide a revolutionary new form of computation. The algorithms by Shor [P.W. Shor, SIAM J. Sci. Statist. Comput. 26 (1997) 1484] and Grover [L.K. Grover, Phys. Rev. Lett. 79 (1997) 325. [2
Working Memory Contributions to Reinforcement Learning Impairments in Schizophrenia
Brown, Jaime K.; Gold, James M.; Waltz, James A.; Frank, Michael J.
2014-01-01
Previous research has shown that patients with schizophrenia are impaired in reinforcement learning tasks. However, behavioral learning curves in such tasks originate from the interaction of multiple neural processes, including the basal ganglia- and dopamine-dependent reinforcement learning (RL) system, but also prefrontal cortex-dependent cognitive strategies involving working memory (WM). Thus, it is unclear which specific system induces impairments in schizophrenia. We recently developed a task and computational model allowing us to separately assess the roles of RL (slow, cumulative learning) mechanisms versus WM (fast but capacity-limited) mechanisms in healthy adult human subjects. Here, we used this task to assess patients' specific sources of impairments in learning. In 15 separate blocks, subjects learned to pick one of three actions for stimuli. The number of stimuli to learn in each block varied from two to six, allowing us to separate influences of capacity-limited WM from the incremental RL system. As expected, both patients (n = 49) and healthy controls (n = 36) showed effects of set size and delay between stimulus repetitions, confirming the presence of working memory effects. Patients performed significantly worse than controls overall, but computational model fits and behavioral analyses indicate that these deficits could be entirely accounted for by changes in WM parameters (capacity and reliability), whereas RL processes were spared. These results suggest that the working memory system contributes strongly to learning impairments in schizophrenia. PMID:25297101
Dynamism in Electronic Performance Support Systems.
ERIC Educational Resources Information Center
Laffey, James
1995-01-01
Describes a model for dynamic electronic performance support systems based on NNAble, a system developed by the training group at Apple Computer. Principles for designing dynamic performance support are discussed, including a systems approach, performer-centered design, awareness of situated cognition, organizational memory, and technology use.…
Energy efficient hybrid computing systems using spin devices
NASA Astrophysics Data System (ADS)
Sharad, Mrigank
Emerging spin-devices like magnetic tunnel junctions (MTJ's), spin-valves and domain wall magnets (DWM) have opened new avenues for spin-based logic design. This work explored potential computing applications which can exploit such devices for higher energy-efficiency and performance. The proposed applications involve hybrid design schemes, where charge-based devices supplement the spin-devices, to gain large benefits at the system level. As an example, lateral spin valves (LSV) involve switching of nanomagnets using spin-polarized current injection through a metallic channel such as Cu. Such spin-torque based devices possess several interesting properties that can be exploited for ultra-low power computation. Analog characteristic of spin current facilitate non-Boolean computation like majority evaluation that can be used to model a neuron. The magneto-metallic neurons can operate at ultra-low terminal voltage of ˜20mV, thereby resulting in small computation power. Moreover, since nano-magnets inherently act as memory elements, these devices can facilitate integration of logic and memory in interesting ways. The spin based neurons can be integrated with CMOS and other emerging devices leading to different classes of neuromorphic/non-Von-Neumann architectures. The spin-based designs involve `mixed-mode' processing and hence can provide very compact and ultra-low energy solutions for complex computation blocks, both digital as well as analog. Such low-power, hybrid designs can be suitable for various data processing applications like cognitive computing, associative memory, and currentmode on-chip global interconnects. Simulation results for these applications based on device-circuit co-simulation framework predict more than ˜100x improvement in computation energy as compared to state of the art CMOS design, for optimal spin-device parameters.
Multi-port, optically addressed RAM
NASA Technical Reports Server (NTRS)
Johnston, Alan R. (Inventor); Nixon, Robert H. (Inventor); Bergman, Larry A. (Inventor); Esener, Sadik (Inventor)
1989-01-01
A random access memory addressing system utilizing optical links between memory and the read/write logic circuits comprises addressing circuits including a plurality of light signal sources, a plurality of optical gates including optical detectors associated with the memory cells, and a holographic optical element adapted to reflect and direct the light signals to the desired memory cell locations. More particularly, it is a multi-port, binary computer memory for interfacing with a plurality of computers. There are a plurality of storage cells for containing bits of binary information, the storage cells being disposed at the intersections of a plurality of row conductors and a plurality of column conductors. There is interfacing logic for receiving information from the computers directing access to ones of the storage cells. There are first light sources associated with the interfacing logic for transmitting a first light beam with the access information modulated thereon. First light detectors are associated with the storage cells for receiving the first light beam, for generating an electrical signal containing the access information, and for conducting the electrical signal to the one of the storage cells to which it is directed. There are holographic optical elements for reflecting the first light beam from the first light sources to the first light detectors.
Data Movement Dominates: Final Report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jacob, Bruce L.
Over the past three years in this project, what we have observed is that the primary reason for data movement in large-scale systems is that the per-node capacity is not large enough—i.e., one of the solutions to the data-movement problem (certainly not the only solution that is required, but a significant one nonetheless) is to increase per-node capacity so that inter-node traffic is reduced. This unfortunately is not as simple as it sounds. Today’s main memory systems for datacenters, enterprise computing systems, and supercomputers, fail to provide high per-socket capacity [Dirik & Jacob 2009; Cooper-Balis et al. 2012], except atmore » extremely high price points (factors of 10–100x the cost/bit of consumer main-memory systems) [Stokes 2008]. The reason is that our choice of technology for today’s main memory systems—i.e., DRAM, which we have used as a main-memory technology since the 1970s [Jacob et al. 2007]—can no longer keep up with our needs for density and price per bit. Main memory systems have always been built from the cheapest, densest, lowest-power memory technology available, and DRAM is no longer the cheapest, the densest, nor the lowest-power storage technology out there. It is now time for DRAM to go the way that SRAM went: move out of the way for a cheaper, slower, denser storage technology, and become a cache instead. This inflection point has happened before, in the context of SRAM yielding to DRAM. There was once a time that SRAM was the storage technology of choice for all main memories [Tomasulo 1967; Thornton 1970; Kidder 1981]. However, once DRAM hit volume production in the 1970s and 80s, it supplanted SRAM as a main memory technology because it was cheaper, and it was denser. It also happened to be lower power, but that was not the primary consideration of the day. At the time, it was recognized that DRAM was much slower than SRAM, but it was only at the supercomputer level (For instance the Cray X-MP in the 1980s and its follow-on, the Cray Y-MP, in the 1990s) that could one afford to build ever- larger main memories out of SRAM—the reasoning for moving to DRAM was that an appropriately designed memory hierarchy, built of DRAM as main memory and SRAM as a cache, would approach the performance of SRAM, at the price-per-bit of DRAM [Mashey 1999]. Today it is quite clear that, were one to build an entire multi-gigabyte main memory out of SRAM instead of DRAM, one could improve the performance of almost any computer system by up to an order of magnitude—but this option is not even considered, because to build that system would be prohibitively expensive. It is now time to revisit the same design choice in the context of modern technologies and modern systems. For reasons both technical and economic, we can no longer afford to build ever-larger main memory systems out of DRAM. Flash memory, on the other hand, is significantly cheaper and denser than DRAM and therefore should take its place. While it is true that flash is significantly slower than DRAM, one can afford to build much larger main memories out of flash than out of DRAM, and we show that an appropriately designed memory hierarchy, built of flash as main memory and DRAM as a cache, will approach the performance of DRAM, at the price-per-bit of flash. In our studies as part of this project, we have investigated Non-Volatile Main Memory (NVMM), a new main-memory architecture for large-scale computing systems, one that is specifically designed to address the weaknesses described previously. In particular, it provides the following features: non-volatility: The bulk of the storage is comprised of NAND flash, and in this organization DRAM is used only as a cache, not as main memory. Furthermore, the flash is journaled, which means that operations such as checkpoint/restore are already built into the system. 1+ terabytes of storage per socket: SSDs and DRAM DIMMs have roughly the same form factor (several square inches of PCB surface area), and terabyte SSDs are now commonplace. performance approaching that of DRAM: DRAM is used as a cache to the flash system. price-per-bit approaching that of NAND: Flash is currently well under $0.50 per gigabyte; DDR3 SDRAM is currently just over $10 per gigabyte [Newegg 2014]. Even today, one can build an easily affordable main memory system with a terabyte or more of NAND storage per CPU socket (which would be extremely expensive were one to use DRAM), and our cycle- accurate, full-system experiments show that this can be done at a performance point that lies within a factor of two of DRAM.« less
Stream-based Hebbian eigenfilter for real-time neuronal spike discrimination
2012-01-01
Background Principal component analysis (PCA) has been widely employed for automatic neuronal spike sorting. Calculating principal components (PCs) is computationally expensive, and requires complex numerical operations and large memory resources. Substantial hardware resources are therefore needed for hardware implementations of PCA. General Hebbian algorithm (GHA) has been proposed for calculating PCs of neuronal spikes in our previous work, which eliminates the needs of computationally expensive covariance analysis and eigenvalue decomposition in conventional PCA algorithms. However, large memory resources are still inherently required for storing a large volume of aligned spikes for training PCs. The large size memory will consume large hardware resources and contribute significant power dissipation, which make GHA difficult to be implemented in portable or implantable multi-channel recording micro-systems. Method In this paper, we present a new algorithm for PCA-based spike sorting based on GHA, namely stream-based Hebbian eigenfilter, which eliminates the inherent memory requirements of GHA while keeping the accuracy of spike sorting by utilizing the pseudo-stationarity of neuronal spikes. Because of the reduction of large hardware storage requirements, the proposed algorithm can lead to ultra-low hardware resources and power consumption of hardware implementations, which is critical for the future multi-channel micro-systems. Both clinical and synthetic neural recording data sets were employed for evaluating the accuracy of the stream-based Hebbian eigenfilter. The performance of spike sorting using stream-based eigenfilter and the computational complexity of the eigenfilter were rigorously evaluated and compared with conventional PCA algorithms. Field programmable logic arrays (FPGAs) were employed to implement the proposed algorithm, evaluate the hardware implementations and demonstrate the reduction in both power consumption and hardware memories achieved by the streaming computing Results and discussion Results demonstrate that the stream-based eigenfilter can achieve the same accuracy and is 10 times more computationally efficient when compared with conventional PCA algorithms. Hardware evaluations show that 90.3% logic resources, 95.1% power consumption and 86.8% computing latency can be reduced by the stream-based eigenfilter when compared with PCA hardware. By utilizing the streaming method, 92% memory resources and 67% power consumption can be saved when compared with the direct implementation of GHA. Conclusion Stream-based Hebbian eigenfilter presents a novel approach to enable real-time spike sorting with reduced computational complexity and hardware costs. This new design can be further utilized for multi-channel neuro-physiological experiments or chronic implants. PMID:22490725
Strategies for concurrent processing of complex algorithms in data driven architectures
NASA Technical Reports Server (NTRS)
Stoughton, John W.; Mielke, Roland R.
1988-01-01
The purpose is to document research to develop strategies for concurrent processing of complex algorithms in data driven architectures. The problem domain consists of decision-free algorithms having large-grained, computationally complex primitive operations. Such are often found in signal processing and control applications. The anticipated multiprocessor environment is a data flow architecture containing between two and twenty computing elements. Each computing element is a processor having local program memory, and which communicates with a common global data memory. A new graph theoretic model called ATAMM which establishes rules for relating a decomposed algorithm to its execution in a data flow architecture is presented. The ATAMM model is used to determine strategies to achieve optimum time performance and to develop a system diagnostic software tool. In addition, preliminary work on a new multiprocessor operating system based on the ATAMM specifications is described.
Performing an allreduce operation using shared memory
Archer, Charles J [Rochester, MN; Dozsa, Gabor [Ardsley, NY; Ratterman, Joseph D [Rochester, MN; Smith, Brian E [Rochester, MN
2012-04-17
Methods, apparatus, and products are disclosed for performing an allreduce operation using shared memory that include: receiving, by at least one of a plurality of processing cores on a compute node, an instruction to perform an allreduce operation; establishing, by the core that received the instruction, a job status object for specifying a plurality of shared memory allreduce work units, the plurality of shared memory allreduce work units together performing the allreduce operation on the compute node; determining, by an available core on the compute node, a next shared memory allreduce work unit in the job status object; and performing, by that available core on the compute node, that next shared memory allreduce work unit.
Performing an allreduce operation using shared memory
Archer, Charles J; Dozsa, Gabor; Ratterman, Joseph D; Smith, Brian E
2014-06-10
Methods, apparatus, and products are disclosed for performing an allreduce operation using shared memory that include: receiving, by at least one of a plurality of processing cores on a compute node, an instruction to perform an allreduce operation; establishing, by the core that received the instruction, a job status object for specifying a plurality of shared memory allreduce work units, the plurality of shared memory allreduce work units together performing the allreduce operation on the compute node; determining, by an available core on the compute node, a next shared memory allreduce work unit in the job status object; and performing, by that available core on the compute node, that next shared memory allreduce work unit.
Mechanisms for widespread hippocampal involvement in cognition
Shohamy, Daphna; Turk-Browne, Nicholas B.
2014-01-01
The quintessential memory system in the human brain — the hippocampus and surrounding medial temporal lobe (MTL) — is often treated as a module for the formation of conscious, or declarative memories. However, growing evidence suggests that the hippocampus plays a broader role in memory and cognition and that theories organizing memory into strictly dedicated systems may need to be updated. We first consider the historical evidence for the specialized role of the hippocampus in declarative memory. Then, we describe the serendipitous encounter that motivated this special section, based on parallel research from our labs that suggested a more pervasive contribution of the hippocampus to cognition beyond declarative memory. Finally, we develop a theoretical framework that describes two general mechanisms for how the hippocampus interacts with other brain systems and cognitive processes: the Memory Modulation Hypothesis, in which mnemonic representations in the hippocampus modulate the operation of other systems, and the Adaptive Function Hypothesis, in which specialized computations in the hippocampus are recruited as a component of both mnemonic and non-mnemonic functions. This framework is consistent with an emerging view that the most fertile ground for discovery in cognitive psychology and neuroscience lies at the interface between parts of the mind and brain that have traditionally been studied in isolation. PMID:24246058
Memory effects in nanoparticle dynamics and transport
NASA Astrophysics Data System (ADS)
Sanghi, Tarun; Bhadauria, Ravi; Aluru, N. R.
2016-10-01
In this work, we use the generalized Langevin equation (GLE) to characterize and understand memory effects in nanoparticle dynamics and transport. Using the GLE formulation, we compute the memory function and investigate its scaling with the mass, shape, and size of the nanoparticle. It is observed that changing the mass of the nanoparticle leads to a rescaling of the memory function with the reduced mass of the system. Further, we show that for different mass nanoparticles it is the initial value of the memory function and not its relaxation time that determines the "memory" or "memoryless" dynamics. The size and the shape of the nanoparticle are found to influence both the functional-form and the initial value of the memory function. For a fixed mass nanoparticle, increasing its size enhances the memory effects. Using GLE simulations we also investigate and highlight the role of memory in nanoparticle dynamics and transport.
The ASSIST: Bringing Information and Software Together for Scientists
NASA Technical Reports Server (NTRS)
Mandel, Eric
1997-01-01
The ASSIST was developed as a step toward overcoming the problems faced by researchers when trying to utilize complex and often conflicting astronomical data analysis systems. It implements a uniform graphical interface to analysis systems, documentation, data, and organizational memory. It is layered on top of the Answer Garden Substrate (AGS), a system specially designed to facilitate the collection and dissemination of organizational memory. Under the AISRP program, we further developed the ASSIST to make it even easier for researchers to overcome the difficulties of accessing software and information in a complex computer environment.
Study on advanced information processing system
NASA Technical Reports Server (NTRS)
Shin, Kang G.; Liu, Jyh-Charn
1992-01-01
Issues related to the reliability of a redundant system with large main memory are addressed. In particular, the Fault-Tolerant Processor (FTP) for Advanced Launch System (ALS) is used as a basis for our presentation. When the system is free of latent faults, the probability of system crash due to nearly-coincident channel faults is shown to be insignificant even when the outputs of computing channels are infrequently voted on. In particular, using channel error maskers (CEMs) is shown to improve reliability more effectively than increasing the number of channels for applications with long mission times. Even without using a voter, most memory errors can be immediately corrected by CEMs implemented with conventional coding techniques. In addition to their ability to enhance system reliability, CEMs--with a low hardware overhead--can be used to reduce not only the need of memory realignment, but also the time required to realign channel memories in case, albeit rare, such a need arises. Using CEMs, we have developed two schemes, called Scheme 1 and Scheme 2, to solve the memory realignment problem. In both schemes, most errors are corrected by CEMs, and the remaining errors are masked by a voter.
Study on fault-tolerant processors for advanced launch system
NASA Technical Reports Server (NTRS)
Shin, Kang G.; Liu, Jyh-Charn
1990-01-01
Issues related to the reliability of a redundant system with large main memory are addressed. The Fault-Tolerant Processor (FTP) for the Advanced Launch System (ALS) is used as a basis for the presentation. When the system is free of latent faults, the probability of system crash due to multiple channel faults is shown to be insignificant even when voting on the outputs of computing channels is infrequent. Using channel error maskers (CEMs) is shown to improve reliability more effectively than increasing redundancy or the number of channels for applications with long mission times. Even without using a voter, most memory errors can be immediately corrected by those CEMs implemented with conventional coding techniques. In addition to their ability to enhance system reliability, CEMs (with a very low hardware overhead) can be used to dramatically reduce not only the need of memory realignment, but also the time required to realign channel memories in case, albeit rare, such a need arises. Using CEMs, two different schemes were developed to solve the memory realignment problem. In both schemes, most errors are corrected by CEMs, and the remaining errors are masked by a voter.
Implementation of real-time digital signal processing systems
NASA Technical Reports Server (NTRS)
Narasimha, M.; Peterson, A.; Narayan, S.
1978-01-01
Special purpose hardware implementation of DFT Computers and digital filters is considered in the light of newly introduced algorithms and IC devices. Recent work by Winograd on high-speed convolution techniques for computing short length DFT's, has motivated the development of more efficient algorithms, compared to the FFT, for evaluating the transform of longer sequences. Among these, prime factor algorithms appear suitable for special purpose hardware implementations. Architectural considerations in designing DFT computers based on these algorithms are discussed. With the availability of monolithic multiplier-accumulators, a direct implementation of IIR and FIR filters, using random access memories in place of shift registers, appears attractive. The memory addressing scheme involved in such implementations is discussed. A simple counter set-up to address the data memory in the realization of FIR filters is also described. The combination of a set of simple filters (weighting network) and a DFT computer is shown to realize a bank of uniform bandpass filters. The usefulness of this concept in arriving at a modular design for a million channel spectrum analyzer, based on microprocessors, is discussed.
Numerical arc segmentation algorithm for a radio conference-NASARC (version 2.0) technical manual
NASA Technical Reports Server (NTRS)
Whyte, Wayne A., Jr.; Heyward, Ann O.; Ponchak, Denise S.; Spence, Rodney L.; Zuzek, John E.
1987-01-01
The information contained in the NASARC (Version 2.0) Technical Manual (NASA TM-100160) and NASARC (Version 2.0) User's Manual (NASA TM-100161) relates to the state of NASARC software development through October 16, 1987. The Technical Manual describes the Numerical Arc Segmentation Algorithm for a Radio Conference (NASARC) concept and the algorithms used to implement the concept. The User's Manual provides information on computer system considerations, installation instructions, description of input files, and program operating instructions. Significant revisions have been incorporated in the Version 2.0 software. These revisions have enhanced the modeling capabilities of the NASARC procedure while greatly reducing the computer run time and memory requirements. Array dimensions within the software have been structured to fit within the currently available 6-megabyte memory capacity of the International Frequency Registration Board (IFRB) computer facility. A piecewise approach to predetermined arc generation in NASARC (Version 2.0) allows worldwide scenarios to be accommodated within these memory constraints while at the same time effecting an overall reduction in computer run time.
Numerical Arc Segmentation Algorithm for a Radio Conference-NASARC, Version 2.0: User's Manual
NASA Technical Reports Server (NTRS)
Whyte, Wayne A., Jr.; Heyward, Ann O.; Ponchak, Denise S.; Spence, Rodney L.; Zuzek, John E.
1987-01-01
The information contained in the NASARC (Version 2.0) Technical Manual (NASA TM-100160) and the NASARC (Version 2.0) User's Manual (NASA TM-100161) relates to the state of the Numerical Arc Segmentation Algorithm for a Radio Conference (NASARC) software development through October 16, 1987. The technical manual describes the NASARC concept and the algorithms which are used to implement it. The User's Manual provides information on computer system considerations, installation instructions, description of input files, and program operation instructions. Significant revisions have been incorporated in the Version 2.0 software over prior versions. These revisions have enhanced the modeling capabilities of the NASARC procedure while greatly reducing the computer run time and memory requirements. Array dimensions within the software have been structured to fit into the currently available 6-megabyte memory capacity of the International Frequency Registration Board (IFRB) computer facility. A piecewise approach to predetermined arc generation in NASARC (Version 2.0) allows worldwide scenarios to be accommodated within these memory constraints while at the same time reducing computer run time.
NASA Technical Reports Server (NTRS)
Denning, P. J.
1986-01-01
Virtual memory was conceived as a way to automate overlaying of program segments. Modern computers have very large main memories, but need automatic solutions to the relocation and protection problems. Virtual memory serves this need as well and is thus useful in computers of all sizes. The history of the idea is traced, showing how it has become a widespread, little noticed feature of computers today.
Fully integrated sub 100ps photon counting platform
NASA Astrophysics Data System (ADS)
Buckley, S. J.; Bellis, S. J.; Rosinger, P.; Jackson, J. C.
2007-02-01
Current state of the art high resolution counting modules, specifically designed for high timing resolution applications, are largely based on a computer card format. This has tended to result in a costly solution that is restricted to the computer it resides in. We describe a four channel timing module that interfaces to a computer via a USB port and operates with a resolution of less than 100 picoseconds. The core design of the system is an advanced field programmable gate array (FPGA) interfacing to a precision time interval measurement module, mass memory block and a high speed USB 2.0 serial data port. The FPGA design allows the module to operate in a number of modes allowing both continuous recording of photon events (time-tagging) and repetitive time binning. In time-tag mode the system reports, for each photon event, the high resolution time along with the chronological time (macro time) and the channel ID. The time-tags are uploaded in real time to a host computer via a high speed USB port allowing continuous storage to computer memory of up to 4 millions photons per second. In time-bin mode, binning is carried out with count rates up to 10 million photons per second. Each curve resides in a block of 128,000 time-bins each with a resolution programmable down to less than 100 picoseconds. Each bin has a limit of 65535 hits allowing autonomous curve recording until a bin reaches the maximum count or the system is commanded to halt. Due to the large memory storage, several curves/experiments can be stored in the system prior to uploading to the host computer for analysis. This makes this module ideal for integration into high timing resolution specific applications such as laser ranging and fluorescence lifetime imaging using techniques such as time correlated single photon counting (TCSPC).
Stability of discrete memory states to stochastic fluctuations in neuronal systems
Miller, Paul; Wang, Xiao-Jing
2014-01-01
Noise can degrade memories by causing transitions from one memory state to another. For any biological memory system to be useful, the time scale of such noise-induced transitions must be much longer than the required duration for memory retention. Using biophysically-realistic modeling, we consider two types of memory in the brain: short-term memories maintained by reverberating neuronal activity for a few seconds, and long-term memories maintained by a molecular switch for years. Both systems require persistence of (neuronal or molecular) activity self-sustained by an autocatalytic process and, we argue, that both have limited memory lifetimes because of significant fluctuations. We will first discuss a strongly recurrent cortical network model endowed with feedback loops, for short-term memory. Fluctuations are due to highly irregular spike firing, a salient characteristic of cortical neurons. Then, we will analyze a model for long-term memory, based on an autophosphorylation mechanism of calcium/calmodulin-dependent protein kinase II (CaMKII) molecules. There, fluctuations arise from the fact that there are only a small number of CaMKII molecules at each postsynaptic density (putative synaptic memory unit). Our results are twofold. First, we demonstrate analytically and computationally the exponential dependence of stability on the number of neurons in a self-excitatory network, and on the number of CaMKII proteins in a molecular switch. Second, for each of the two systems, we implement graded memory consisting of a group of bistable switches. For the neuronal network we report interesting ramping temporal dynamics as a result of sequentially switching an increasing number of discrete, bistable, units. The general observation of an exponential increase in memory stability with the system size leads to a trade-off between the robustness of memories (which increases with the size of each bistable unit) and the total amount of information storage (which decreases with increasing unit size), which may be optimized in the brain through biological evolution. PMID:16822041
Parallelization of NAS Benchmarks for Shared Memory Multiprocessors
NASA Technical Reports Server (NTRS)
Waheed, Abdul; Yan, Jerry C.; Saini, Subhash (Technical Monitor)
1998-01-01
This paper presents our experiences of parallelizing the sequential implementation of NAS benchmarks using compiler directives on SGI Origin2000 distributed shared memory (DSM) system. Porting existing applications to new high performance parallel and distributed computing platforms is a challenging task. Ideally, a user develops a sequential version of the application, leaving the task of porting to new generations of high performance computing systems to parallelization tools and compilers. Due to the simplicity of programming shared-memory multiprocessors, compiler developers have provided various facilities to allow the users to exploit parallelism. Native compilers on SGI Origin2000 support multiprocessing directives to allow users to exploit loop-level parallelism in their programs. Additionally, supporting tools can accomplish this process automatically and present the results of parallelization to the users. We experimented with these compiler directives and supporting tools by parallelizing sequential implementation of NAS benchmarks. Results reported in this paper indicate that with minimal effort, the performance gain is comparable with the hand-parallelized, carefully optimized, message-passing implementations of the same benchmarks.
1990-04-23
developed Ada Real - Time Operating System (ARTOS) for bare machine environments(Target), ACW 1.1I0. " ; - -M.UIECTTERMS Ada programming language, Ada...configuration) Operating System: CSC developed Ada Real - Time Operating System (ARTOS) for bare machine environments Memory Size: 4MB 2.2...Test Method Testing of the MC Ado V1.2.beta/ Concurrent Computer Corporation compiler and the CSC developed Ada Real - Time Operating System (ARTOS) for
Advanced computer architecture specification for automated weld systems
NASA Technical Reports Server (NTRS)
Katsinis, Constantine
1994-01-01
This report describes the requirements for an advanced automated weld system and the associated computer architecture, and defines the overall system specification from a broad perspective. According to the requirements of welding procedures as they relate to an integrated multiaxis motion control and sensor architecture, the computer system requirements are developed based on a proven multiple-processor architecture with an expandable, distributed-memory, single global bus architecture, containing individual processors which are assigned to specific tasks that support sensor or control processes. The specified architecture is sufficiently flexible to integrate previously developed equipment, be upgradable and allow on-site modifications.
NASA Astrophysics Data System (ADS)
Thompson, Kyle Bonner
An algorithm is described to efficiently compute aerothermodynamic design sensitivities using a decoupled variable set. In a conventional approach to computing design sensitivities for reacting flows, the species continuity equations are fully coupled to the conservation laws for momentum and energy. In this algorithm, the species continuity equations are solved separately from the mixture continuity, momentum, and total energy equations. This decoupling simplifies the implicit system, so that the flow solver can be made significantly more efficient, with very little penalty on overall scheme robustness. Most importantly, the computational cost of the point implicit relaxation is shown to scale linearly with the number of species for the decoupled system, whereas the fully coupled approach scales quadratically. Also, the decoupled method significantly reduces the cost in wall time and memory in comparison to the fully coupled approach. This decoupled approach for computing design sensitivities with the adjoint system is demonstrated for inviscid flow in chemical non-equilibrium around a re-entry vehicle with a retro-firing annular nozzle. The sensitivities of the surface temperature and mass flow rate through the nozzle plenum are computed with respect to plenum conditions and verified against sensitivities computed using a complex-variable finite-difference approach. The decoupled scheme significantly reduces the computational time and memory required to complete the optimization, making this an attractive method for high-fidelity design of hypersonic vehicles.
Biomorphic Multi-Agent Architecture for Persistent Computing
NASA Technical Reports Server (NTRS)
Lodding, Kenneth N.; Brewster, Paul
2009-01-01
A multi-agent software/hardware architecture, inspired by the multicellular nature of living organisms, has been proposed as the basis of design of a robust, reliable, persistent computing system. Just as a multicellular organism can adapt to changing environmental conditions and can survive despite the failure of individual cells, a multi-agent computing system, as envisioned, could adapt to changing hardware, software, and environmental conditions. In particular, the computing system could continue to function (perhaps at a reduced but still reasonable level of performance) if one or more component( s) of the system were to fail. One of the defining characteristics of a multicellular organism is unity of purpose. In biology, the purpose is survival of the organism. The purpose of the proposed multi-agent architecture is to provide a persistent computing environment in harsh conditions in which repair is difficult or impossible. A multi-agent, organism-like computing system would be a single entity built from agents or cells. Each agent or cell would be a discrete hardware processing unit that would include a data processor with local memory, an internal clock, and a suite of communication equipment capable of both local line-of-sight communications and global broadcast communications. Some cells, denoted specialist cells, could contain such additional hardware as sensors and emitters. Each cell would be independent in the sense that there would be no global clock, no global (shared) memory, no pre-assigned cell identifiers, no pre-defined network topology, and no centralized brain or control structure. Like each cell in a living organism, each agent or cell of the computing system would contain a full description of the system encoded as genes, but in this case, the genes would be components of a software genome.
Final Report: Correctness Tools for Petascale Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mellor-Crummey, John
2014-10-27
In the course of developing parallel programs for leadership computing systems, subtle programming errors often arise that are extremely difficult to diagnose without tools. To meet this challenge, University of Maryland, the University of Wisconsin—Madison, and Rice University worked to develop lightweight tools to help code developers pinpoint a variety of program correctness errors that plague parallel scientific codes. The aim of this project was to develop software tools that help diagnose program errors including memory leaks, memory access errors, round-off errors, and data races. Research at Rice University focused on developing algorithms and data structures to support efficient monitoringmore » of multithreaded programs for memory access errors and data races. This is a final report about research and development work at Rice University as part of this project.« less
Zhang, Wei; Ding, Dong-Sheng; Dong, Ming-Xin; Shi, Shuai; Wang, Kai; Liu, Shi-Long; Li, Yan; Zhou, Zhi-Yuan; Shi, Bao-Sen; Guo, Guang-Can
2016-11-14
Entanglement in multiple degrees of freedom has many benefits over entanglement in a single one. The former enables quantum communication with higher channel capacity and more efficient quantum information processing and is compatible with diverse quantum networks. Establishing multi-degree-of-freedom entangled memories is not only vital for high-capacity quantum communication and computing, but also promising for enhanced violations of nonlocality in quantum systems. However, there have been yet no reports of the experimental realization of multi-degree-of-freedom entangled memories. Here we experimentally established hyper- and hybrid entanglement in multiple degrees of freedom, including path (K-vector) and orbital angular momentum, between two separated atomic ensembles by using quantum storage. The results are promising for achieving quantum communication and computing with many degrees of freedom.
Wilhelm, Jan; Seewald, Patrick; Del Ben, Mauro; Hutter, Jürg
2016-12-13
We present an algorithm for computing the correlation energy in the random phase approximation (RPA) in a Gaussian basis requiring [Formula: see text] operations and [Formula: see text] memory. The method is based on the resolution of the identity (RI) with the overlap metric, a reformulation of RI-RPA in the Gaussian basis, imaginary time, and imaginary frequency integration techniques, and the use of sparse linear algebra. Additional memory reduction without extra computations can be achieved by an iterative scheme that overcomes the memory bottleneck of canonical RPA implementations. We report a massively parallel implementation that is the key for the application to large systems. Finally, cubic-scaling RPA is applied to a thousand water molecules using a correlation-consistent triple-ζ quality basis.
A FPGA-based Measurement System for Nonvolatile Semiconductor Memory Characterization
NASA Astrophysics Data System (ADS)
Bu, Jiankang; White, Marvin
2002-03-01
Low voltage, long retention, high density SONOS nonvolatile semiconductor memory (NVSM) devices are ideally suited for PCMCIA, FLASH and 'smart' cards. The SONOS memory transistor requires characterization with an accurate, rapid measurement system with minimum disturbance to the device. The FPGA-based measurement system includes three parts: 1) a pattern generator implemented with XILINX FPGAs and corresponding software, 2) a high-speed, constant-current, threshold voltage detection circuit, 3) and a data evaluation program, implemented with a LABVIEW program. Fig. 1 shows the general block diagram of the FPGA-based measurement system. The function generator is designed and simulated with XILINX Foundation Software. Under the control of the specific erase/write/read pulses, the analog detect circuit applies operational modes to the SONOS device under test (DUT) and determines the change of the memory-state of the SONOS nonvolatile memory transistor. The TEK460 digitizes the analog threshold voltage output and sends to the PC computer. The data is filtered and averaged with a LABVIEWTM program running on the PC computer and displayed on the monitor in real time. We have implemented the pattern generator with XILINX FPGAs. Fig. 2 shows the block diagram of the pattern generator. We realized the logic control by a method of state machine design. Fig. 3 shows a small part of the state machine. The flexibility of the FPGAs enhances the capabilities of this system and allows measurement variations without hardware changes. The characterization of the nonvolatile memory transistor device under test (DUT), as function of programming voltage and time, is achieved by a high-speed, constant-current threshold voltage detection circuit. The analog detection circuit incorporating fast analog switches controlled digitally with the FPGAs. The schematic circuit diagram is shown in Fig. 4. The various operational modes for the DUT are realized with control signals applied to the analog switches (SW) as shown in Fig. 5. A LABVIEWTM program, on a PC platform, collects and processes the data. The data is displayed on the monitor in real time. This time-domain filtering reduces the digitizing error. Fig. 6 shows the data processing. SONOS nonvolatile semiconductor memories are characterized by erase/write, retention and endurance measurements. Fig. 7 shows the erase/write characteristics of an n-Channel, 5V prog-rammable SONOS memory transistor. Fig.8 shows the retention characteristic of the same SONOS transistor. We have used this system to characterize SONOS nonvolatile semiconductor memory transistors. The attractive features of the test system design lies in the cost-effectiveness and flexibility of the test pattern implementation, fast read-out of memory state, low power, high precision determination of the device threshold voltage, and perhaps most importantly, minimum disturbance, which is indispensable for nonvolatile memory characterization.
2006-07-01
4 Abbreviations AI Artificial Intelligence AM Artificial Memory CAD Computer Aided...memory (AM), artificial intelligence (AI), and embedded knowledge systems it is possible to expand the “effective span of competence” of...Technology J Joint J2 Joint Intelligence J3 Joint Operations NATO North Atlantic Treaty Organisation NCW Network Centric Warfare NHS National Health
ERIC Educational Resources Information Center
Colom, Roberto; Shih, Pei Chun
2004-01-01
A study was conducted in which 226 participants performed 12 tests, 6 thought to reflect verbal, quantitative, and spatial working memory (WM), and 6 of crystallized (Gc), fluid (Gf), and spatial (Gv) cognitive abilities. Confirmatory factor analyses (CFAs) were computed to test the unitary nature of the WM system. Six primary latent factors were…
Computer hardware for radiologists: Part I
Indrajit, IK; Alam, A
2010-01-01
Computers are an integral part of modern radiology practice. They are used in different radiology modalities to acquire, process, and postprocess imaging data. They have had a dramatic influence on contemporary radiology practice. Their impact has extended further with the emergence of Digital Imaging and Communications in Medicine (DICOM), Picture Archiving and Communication System (PACS), Radiology information system (RIS) technology, and Teleradiology. A basic overview of computer hardware relevant to radiology practice is presented here. The key hardware components in a computer are the motherboard, central processor unit (CPU), the chipset, the random access memory (RAM), the memory modules, bus, storage drives, and ports. The personnel computer (PC) has a rectangular case that contains important components called hardware, many of which are integrated circuits (ICs). The fiberglass motherboard is the main printed circuit board and has a variety of important hardware mounted on it, which are connected by electrical pathways called “buses”. The CPU is the largest IC on the motherboard and contains millions of transistors. Its principal function is to execute “programs”. A Pentium® 4 CPU has transistors that execute a billion instructions per second. The chipset is completely different from the CPU in design and function; it controls data and interaction of buses between the motherboard and the CPU. Memory (RAM) is fundamentally semiconductor chips storing data and instructions for access by a CPU. RAM is classified by storage capacity, access speed, data rate, and configuration. PMID:21042437
Distributed parallel messaging for multiprocessor systems
Chen, Dong; Heidelberger, Philip; Salapura, Valentina; Senger, Robert M; Steinmacher-Burrow, Burhard; Sugawara, Yutaka
2013-06-04
A method and apparatus for distributed parallel messaging in a parallel computing system. The apparatus includes, at each node of a multiprocessor network, multiple injection messaging engine units and reception messaging engine units, each implementing a DMA engine and each supporting both multiple packet injection into and multiple reception from a network, in parallel. The reception side of the messaging unit (MU) includes a switch interface enabling writing of data of a packet received from the network to the memory system. The transmission side of the messaging unit, includes switch interface for reading from the memory system when injecting packets into the network.
A scalable approach to solving dense linear algebra problems on hybrid CPU-GPU systems
Song, Fengguang; Dongarra, Jack
2014-10-01
Aiming to fully exploit the computing power of all CPUs and all graphics processing units (GPUs) on hybrid CPU-GPU systems to solve dense linear algebra problems, in this paper we design a class of heterogeneous tile algorithms to maximize the degree of parallelism, to minimize the communication volume, and to accommodate the heterogeneity between CPUs and GPUs. The new heterogeneous tile algorithms are executed upon our decentralized dynamic scheduling runtime system, which schedules a task graph dynamically and transfers data between compute nodes automatically. The runtime system uses a new distributed task assignment protocol to solve data dependencies between tasksmore » without any coordination between processing units. By overlapping computation and communication through dynamic scheduling, we are able to attain scalable performance for the double-precision Cholesky factorization and QR factorization. Finally, our approach demonstrates a performance comparable to Intel MKL on shared-memory multicore systems and better performance than both vendor (e.g., Intel MKL) and open source libraries (e.g., StarPU) in the following three environments: heterogeneous clusters with GPUs, conventional clusters without GPUs, and shared-memory systems with multiple GPUs.« less
A scalable approach to solving dense linear algebra problems on hybrid CPU-GPU systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Song, Fengguang; Dongarra, Jack
Aiming to fully exploit the computing power of all CPUs and all graphics processing units (GPUs) on hybrid CPU-GPU systems to solve dense linear algebra problems, in this paper we design a class of heterogeneous tile algorithms to maximize the degree of parallelism, to minimize the communication volume, and to accommodate the heterogeneity between CPUs and GPUs. The new heterogeneous tile algorithms are executed upon our decentralized dynamic scheduling runtime system, which schedules a task graph dynamically and transfers data between compute nodes automatically. The runtime system uses a new distributed task assignment protocol to solve data dependencies between tasksmore » without any coordination between processing units. By overlapping computation and communication through dynamic scheduling, we are able to attain scalable performance for the double-precision Cholesky factorization and QR factorization. Finally, our approach demonstrates a performance comparable to Intel MKL on shared-memory multicore systems and better performance than both vendor (e.g., Intel MKL) and open source libraries (e.g., StarPU) in the following three environments: heterogeneous clusters with GPUs, conventional clusters without GPUs, and shared-memory systems with multiple GPUs.« less
GPU-accelerated computing for Lagrangian coherent structures of multi-body gravitational regimes
NASA Astrophysics Data System (ADS)
Lin, Mingpei; Xu, Ming; Fu, Xiaoyu
2017-04-01
Based on a well-established theoretical foundation, Lagrangian Coherent Structures (LCSs) have elicited widespread research on the intrinsic structures of dynamical systems in many fields, including the field of astrodynamics. Although the application of LCSs in dynamical problems seems straightforward theoretically, its associated computational cost is prohibitive. We propose a block decomposition algorithm developed on Compute Unified Device Architecture (CUDA) platform for the computation of the LCSs of multi-body gravitational regimes. In order to take advantage of GPU's outstanding computing properties, such as Shared Memory, Constant Memory, and Zero-Copy, the algorithm utilizes a block decomposition strategy to facilitate computation of finite-time Lyapunov exponent (FTLE) fields of arbitrary size and timespan. Simulation results demonstrate that this GPU-based algorithm can satisfy double-precision accuracy requirements and greatly decrease the time needed to calculate final results, increasing speed by approximately 13 times. Additionally, this algorithm can be generalized to various large-scale computing problems, such as particle filters, constellation design, and Monte-Carlo simulation.
Equation solvers for distributed-memory computers
NASA Technical Reports Server (NTRS)
Storaasli, Olaf O.
1994-01-01
A large number of scientific and engineering problems require the rapid solution of large systems of simultaneous equations. The performance of parallel computers in this area now dwarfs traditional vector computers by nearly an order of magnitude. This talk describes the major issues involved in parallel equation solvers with particular emphasis on the Intel Paragon, IBM SP-1 and SP-2 processors.
Administering an epoch initiated for remote memory access
Blocksome, Michael A; Miller, Douglas R
2014-03-18
Methods, systems, and products are disclosed for administering an epoch initiated for remote memory access that include: initiating, by an origin application messaging module on an origin compute node, one or more data transfers to a target compute node for the epoch; initiating, by the origin application messaging module after initiating the data transfers, a closing stage for the epoch, including rejecting any new data transfers after initiating the closing stage for the epoch; determining, by the origin application messaging module, whether the data transfers have completed; and closing, by the origin application messaging module, the epoch if the data transfers have completed.
Administering an epoch initiated for remote memory access
Blocksome, Michael A; Miller, Douglas R
2012-10-23
Methods, systems, and products are disclosed for administering an epoch initiated for remote memory access that include: initiating, by an origin application messaging module on an origin compute node, one or more data transfers to a target compute node for the epoch; initiating, by the origin application messaging module after initiating the data transfers, a closing stage for the epoch, including rejecting any new data transfers after initiating the closing stage for the epoch; determining, by the origin application messaging module, whether the data transfers have completed; and closing, by the origin application messaging module, the epoch if the data transfers have completed.
Administering an epoch initiated for remote memory access
Blocksome, Michael A.; Miller, Douglas R.
2013-01-01
Methods, systems, and products are disclosed for administering an epoch initiated for remote memory access that include: initiating, by an origin application messaging module on an origin compute node, one or more data transfers to a target compute node for the epoch; initiating, by the origin application messaging module after initiating the data transfers, a closing stage for the epoch, including rejecting any new data transfers after initiating the closing stage for the epoch; determining, by the origin application messaging module, whether the data transfers have completed; and closing, by the origin application messaging module, the epoch if the data transfers have completed.
The Interaction between Semantic Representation and Episodic Memory.
Fang, Jing; Rüther, Naima; Bellebaum, Christian; Wiskott, Laurenz; Cheng, Sen
2018-02-01
The experimental evidence on the interrelation between episodic memory and semantic memory is inconclusive. Are they independent systems, different aspects of a single system, or separate but strongly interacting systems? Here, we propose a computational role for the interaction between the semantic and episodic systems that might help resolve this debate. We hypothesize that episodic memories are represented as sequences of activation patterns. These patterns are the output of a semantic representational network that compresses the high-dimensional sensory input. We show quantitatively that the accuracy of episodic memory crucially depends on the quality of the semantic representation. We compare two types of semantic representations: appropriate representations, which means that the representation is used to store input sequences that are of the same type as those that it was trained on, and inappropriate representations, which means that stored inputs differ from the training data. Retrieval accuracy is higher for appropriate representations because the encoded sequences are less divergent than those encoded with inappropriate representations. Consistent with our model prediction, we found that human subjects remember some aspects of episodes significantly more accurately if they had previously been familiarized with the objects occurring in the episode, as compared to episodes involving unfamiliar objects. We thus conclude that the interaction with the semantic system plays an important role for episodic memory.
Evaluating Non-In-Place Update Techniques for Flash-Based Transaction Processing Systems
NASA Astrophysics Data System (ADS)
Wang, Yongkun; Goda, Kazuo; Kitsuregawa, Masaru
Recently, flash memory is emerging as the storage device. With price sliding fast, the cost per capacity is approaching to that of SATA disk drives. So far flash memory has been widely deployed in consumer electronics even partly in mobile computing environments. For enterprise systems, the deployment has been studied by many researchers and developers. In terms of the access performance characteristics, flash memory is quite different from disk drives. Without the mechanical components, flash memory has very high random read performance, whereas it has a limited random write performance because of the erase-before-write design. The random write performance of flash memory is comparable with or even worse than that of disk drives. Due to such a performance asymmetry, naive deployment to enterprise systems may not exploit the potential performance of flash memory at full blast. This paper studies the effectiveness of using non-in-place-update (NIPU) techniques through the IO path of flash-based transaction processing systems. Our deliberate experiments using both open-source DBMS and commercial DBMS validated the potential benefits; x3.0 to x6.6 performance improvement was confirmed by incorporating non-in-place-update techniques into file system without any modification of applications or storage devices.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rouet, François-Henry; Li, Xiaoye S.; Ghysels, Pieter
In this paper, we present a distributed-memory library for computations with dense structured matrices. A matrix is considered structured if its off-diagonal blocks can be approximated by a rank-deficient matrix with low numerical rank. Here, we use Hierarchically Semi-Separable (HSS) representations. Such matrices appear in many applications, for example, finite-element methods, boundary element methods, and so on. Exploiting this structure allows for fast solution of linear systems and/or fast computation of matrix-vector products, which are the two main building blocks of matrix computations. The compression algorithm that we use, that computes the HSS form of an input dense matrix, reliesmore » on randomized sampling with a novel adaptive sampling mechanism. We discuss the parallelization of this algorithm and also present the parallelization of structured matrix-vector product, structured factorization, and solution routines. The efficiency of the approach is demonstrated on large problems from different academic and industrial applications, on up to 8,000 cores. Finally, this work is part of a more global effort, the STRUctured Matrices PACKage (STRUMPACK) software package for computations with sparse and dense structured matrices. Hence, although useful on their own right, the routines also represent a step in the direction of a distributed-memory sparse solver.« less
Rouet, François-Henry; Li, Xiaoye S.; Ghysels, Pieter; ...
2016-06-30
In this paper, we present a distributed-memory library for computations with dense structured matrices. A matrix is considered structured if its off-diagonal blocks can be approximated by a rank-deficient matrix with low numerical rank. Here, we use Hierarchically Semi-Separable (HSS) representations. Such matrices appear in many applications, for example, finite-element methods, boundary element methods, and so on. Exploiting this structure allows for fast solution of linear systems and/or fast computation of matrix-vector products, which are the two main building blocks of matrix computations. The compression algorithm that we use, that computes the HSS form of an input dense matrix, reliesmore » on randomized sampling with a novel adaptive sampling mechanism. We discuss the parallelization of this algorithm and also present the parallelization of structured matrix-vector product, structured factorization, and solution routines. The efficiency of the approach is demonstrated on large problems from different academic and industrial applications, on up to 8,000 cores. Finally, this work is part of a more global effort, the STRUctured Matrices PACKage (STRUMPACK) software package for computations with sparse and dense structured matrices. Hence, although useful on their own right, the routines also represent a step in the direction of a distributed-memory sparse solver.« less
Neuromorphic Computing – From Materials Research to Systems Architecture Roundtable
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schuller, Ivan K.; Stevens, Rick; Pino, Robinson
2015-10-29
Computation in its many forms is the engine that fuels our modern civilization. Modern computation—based on the von Neumann architecture—has allowed, until now, the development of continuous improvements, as predicted by Moore’s law. However, computation using current architectures and materials will inevitably—within the next 10 years—reach a limit because of fundamental scientific reasons. DOE convened a roundtable of experts in neuromorphic computing systems, materials science, and computer science in Washington on October 29-30, 2015 to address the following basic questions: Can brain-like (“neuromorphic”) computing devices based on new material concepts and systems be developed to dramatically outperform conventional CMOS basedmore » technology? If so, what are the basic research challenges for materials sicence and computing? The overarching answer that emerged was: The development of novel functional materials and devices incorporated into unique architectures will allow a revolutionary technological leap toward the implementation of a fully “neuromorphic” computer. To address this challenge, the following issues were considered: The main differences between neuromorphic and conventional computing as related to: signaling models, timing/clock, non-volatile memory, architecture, fault tolerance, integrated memory and compute, noise tolerance, analog vs. digital, and in situ learning New neuromorphic architectures needed to: produce lower energy consumption, potential novel nanostructured materials, and enhanced computation Device and materials properties needed to implement functions such as: hysteresis, stability, and fault tolerance Comparisons of different implementations: spin torque, memristors, resistive switching, phase change, and optical schemes for enhanced breakthroughs in performance, cost, fault tolerance, and/or manufacturability.« less
ERIC Educational Resources Information Center
Steinke, Elisabeth
An approach to using the computer to assemble German tests is described. The purposes of the system would be: (1) an expansion of the bilingual lexical memory bank to list and store idioms of all degrees of difficulty, with frequency data and with complete and sophisticated retrieval possibility for assembly; (2) the creation of an…
Hardware packet pacing using a DMA in a parallel computer
Chen, Dong; Heidelberger, Phillip; Vranas, Pavlos
2013-08-13
Method and system for hardware packet pacing using a direct memory access controller in a parallel computer which, in one aspect, keeps track of a total number of bytes put on the network as a result of a remote get operation, using a hardware token counter.
Visual Memories Bypass Normalization.
Bloem, Ilona M; Watanabe, Yurika L; Kibbe, Melissa M; Ling, Sam
2018-05-01
How distinct are visual memory representations from visual perception? Although evidence suggests that briefly remembered stimuli are represented within early visual cortices, the degree to which these memory traces resemble true visual representations remains something of a mystery. Here, we tested whether both visual memory and perception succumb to a seemingly ubiquitous neural computation: normalization. Observers were asked to remember the contrast of visual stimuli, which were pitted against each other to promote normalization either in perception or in visual memory. Our results revealed robust normalization between visual representations in perception, yet no signature of normalization occurring between working memory stores-neither between representations in memory nor between memory representations and visual inputs. These results provide unique insight into the nature of visual memory representations, illustrating that visual memory representations follow a different set of computational rules, bypassing normalization, a canonical visual computation.
Visual Memories Bypass Normalization
Bloem, Ilona M.; Watanabe, Yurika L.; Kibbe, Melissa M.; Ling, Sam
2018-01-01
How distinct are visual memory representations from visual perception? Although evidence suggests that briefly remembered stimuli are represented within early visual cortices, the degree to which these memory traces resemble true visual representations remains something of a mystery. Here, we tested whether both visual memory and perception succumb to a seemingly ubiquitous neural computation: normalization. Observers were asked to remember the contrast of visual stimuli, which were pitted against each other to promote normalization either in perception or in visual memory. Our results revealed robust normalization between visual representations in perception, yet no signature of normalization occurring between working memory stores—neither between representations in memory nor between memory representations and visual inputs. These results provide unique insight into the nature of visual memory representations, illustrating that visual memory representations follow a different set of computational rules, bypassing normalization, a canonical visual computation. PMID:29596038
Reinforcement learning and episodic memory in humans and animals: an integrative framework
Gershman, Samuel J.; Daw, Nathaniel D.
2018-01-01
We review the psychology and neuroscience of reinforcement learning (RL), which has witnessed significant progress in the last two decades, enabled by the comprehensive experimental study of simple learning and decision-making tasks. However, the simplicity of these tasks misses important aspects of reinforcement learning in the real world: (i) State spaces are high-dimensional, continuous, and partially observable; this implies that (ii) data are relatively sparse: indeed precisely the same situation may never be encountered twice; and also that (iii) rewards depend on long-term consequences of actions in ways that violate the classical assumptions that make RL tractable. A seemingly distinct challenge is that, cognitively, these theories have largely connected with procedural and semantic memory: how knowledge about action values or world models extracted gradually from many experiences can drive choice. This misses many aspects of memory related to traces of individual events, such as episodic memory. We suggest that these two gaps are related. In particular, the computational challenges can be dealt with, in part, by endowing RL systems with episodic memory, allowing them to (i) efficiently approximate value functions over complex state spaces, (ii) learn with very little data, and (iii) bridge long-term dependencies between actions and rewards. We review the computational theory underlying this proposal and the empirical evidence to support it. Our proposal suggests that the ubiquitous and diverse roles of memory in RL may function as part of an integrated learning system. PMID:27618944
HyperForest: A high performance multi-processor architecture for real-time intelligent systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Garcia, P. Jr.; Rebeil, J.P.; Pollard, H.
1997-04-01
Intelligent Systems are characterized by the intensive use of computer power. The computer revolution of the last few years is what has made possible the development of the first generation of Intelligent Systems. Software for second generation Intelligent Systems will be more complex and will require more powerful computing engines in order to meet real-time constraints imposed by new robots, sensors, and applications. A multiprocessor architecture was developed that merges the advantages of message-passing and shared-memory structures: expendability and real-time compliance. The HyperForest architecture will provide an expandable real-time computing platform for computationally intensive Intelligent Systems and open the doorsmore » for the application of these systems to more complex tasks in environmental restoration and cleanup projects, flexible manufacturing systems, and DOE`s own production and disassembly activities.« less
Perspective: Memcomputing: Leveraging memory and physics to compute efficiently
NASA Astrophysics Data System (ADS)
Di Ventra, Massimiliano; Traversa, Fabio L.
2018-05-01
It is well known that physical phenomena may be of great help in computing some difficult problems efficiently. A typical example is prime factorization that may be solved in polynomial time by exploiting quantum entanglement on a quantum computer. There are, however, other types of (non-quantum) physical properties that one may leverage to compute efficiently a wide range of hard problems. In this perspective, we discuss how to employ one such property, memory (time non-locality), in a novel physics-based approach to computation: Memcomputing. In particular, we focus on digital memcomputing machines (DMMs) that are scalable. DMMs can be realized with non-linear dynamical systems with memory. The latter property allows the realization of a new type of Boolean logic, one that is self-organizing. Self-organizing logic gates are "terminal-agnostic," namely, they do not distinguish between the input and output terminals. When appropriately assembled to represent a given combinatorial/optimization problem, the corresponding self-organizing circuit converges to the equilibrium points that express the solutions of the problem at hand. In doing so, DMMs take advantage of the long-range order that develops during the transient dynamics. This collective dynamical behavior, reminiscent of a phase transition, or even the "edge of chaos," is mediated by families of classical trajectories (instantons) that connect critical points of increasing stability in the system's phase space. The topological character of the solution search renders DMMs robust against noise and structural disorder. Since DMMs are non-quantum systems described by ordinary differential equations, not only can they be built in hardware with the available technology, they can also be simulated efficiently on modern classical computers. As an example, we will show the polynomial-time solution of the subset-sum problem for the worst cases, and point to other types of hard problems where simulations of DMMs' equations of motion on classical computers have already demonstrated substantial advantages over traditional approaches. We conclude this article by outlining further directions of study.
NASA Astrophysics Data System (ADS)
Loring, B.; Karimabadi, H.; Rortershteyn, V.
2015-10-01
The surface line integral convolution(LIC) visualization technique produces dense visualization of vector fields on arbitrary surfaces. We present a screen space surface LIC algorithm for use in distributed memory data parallel sort last rendering infrastructures. The motivations for our work are to support analysis of datasets that are too large to fit in the main memory of a single computer and compatibility with prevalent parallel scientific visualization tools such as ParaView and VisIt. By working in screen space using OpenGL we can leverage the computational power of GPUs when they are available and run without them when they are not. We address efficiency and performance issues that arise from the transformation of data from physical to screen space by selecting an alternate screen space domain decomposition. We analyze the algorithm's scaling behavior with and without GPUs on two high performance computing systems using data from turbulent plasma simulations.
Power and Performance Trade-offs for Space Time Adaptive Processing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gawande, Nitin A.; Manzano Franco, Joseph B.; Tumeo, Antonino
Computational efficiency – performance relative to power or energy – is one of the most important concerns when designing RADAR processing systems. This paper analyzes power and performance trade-offs for a typical Space Time Adaptive Processing (STAP) application. We study STAP implementations for CUDA and OpenMP on two computationally efficient architectures, Intel Haswell Core I7-4770TE and NVIDIA Kayla with a GK208 GPU. We analyze the power and performance of STAP’s computationally intensive kernels across the two hardware testbeds. We also show the impact and trade-offs of GPU optimization techniques. We show that data parallelism can be exploited for efficient implementationmore » on the Haswell CPU architecture. The GPU architecture is able to process large size data sets without increase in power requirement. The use of shared memory has a significant impact on the power requirement for the GPU. A balance between the use of shared memory and main memory access leads to an improved performance in a typical STAP application.« less
Efficient Parallelization of a Dynamic Unstructured Application on the Tera MTA
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Biswas, Rupak
1999-01-01
The success of parallel computing in solving real-life computationally-intensive problems relies on their efficient mapping and execution on large-scale multiprocessor architectures. Many important applications are both unstructured and dynamic in nature, making their efficient parallel implementation a daunting task. This paper presents the parallelization of a dynamic unstructured mesh adaptation algorithm using three popular programming paradigms on three leading supercomputers. We examine an MPI message-passing implementation on the Cray T3E and the SGI Origin2OOO, a shared-memory implementation using cache coherent nonuniform memory access (CC-NUMA) of the Origin2OOO, and a multi-threaded version on the newly-released Tera Multi-threaded Architecture (MTA). We compare several critical factors of this parallel code development, including runtime, scalability, programmability, and memory overhead. Our overall results demonstrate that multi-threaded systems offer tremendous potential for quickly and efficiently solving some of the most challenging real-life problems on parallel computers.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Loring, Burlen; Karimabadi, Homa; Rortershteyn, Vadim
2014-07-01
The surface line integral convolution(LIC) visualization technique produces dense visualization of vector fields on arbitrary surfaces. We present a screen space surface LIC algorithm for use in distributed memory data parallel sort last rendering infrastructures. The motivations for our work are to support analysis of datasets that are too large to fit in the main memory of a single computer and compatibility with prevalent parallel scientific visualization tools such as ParaView and VisIt. By working in screen space using OpenGL we can leverage the computational power of GPUs when they are available and run without them when they are not.more » We address efficiency and performance issues that arise from the transformation of data from physical to screen space by selecting an alternate screen space domain decomposition. We analyze the algorithm's scaling behavior with and without GPUs on two high performance computing systems using data from turbulent plasma simulations.« less
Parallel Calculations in LS-DYNA
NASA Astrophysics Data System (ADS)
Vartanovich Mkrtychev, Oleg; Aleksandrovich Reshetov, Andrey
2017-11-01
Nowadays, structural mechanics exhibits a trend towards numeric solutions being found for increasingly extensive and detailed tasks, which requires that capacities of computing systems be enhanced. Such enhancement can be achieved by different means. E.g., in case a computing system is represented by a workstation, its components can be replaced and/or extended (CPU, memory etc.). In essence, such modification eventually entails replacement of the entire workstation, i.e. replacement of certain components necessitates exchange of others (faster CPUs and memory devices require buses with higher throughput etc.). Special consideration must be given to the capabilities of modern video cards. They constitute powerful computing systems capable of running data processing in parallel. Interestingly, the tools originally designed to render high-performance graphics can be applied for solving problems not immediately related to graphics (CUDA, OpenCL, Shaders etc.). However, not all software suites utilize video cards’ capacities. Another way to increase capacity of a computing system is to implement a cluster architecture: to add cluster nodes (workstations) and to increase the network communication speed between the nodes. The advantage of this approach is extensive growth due to which a quite powerful system can be obtained by combining not particularly powerful nodes. Moreover, separate nodes may possess different capacities. This paper considers the use of a clustered computing system for solving problems of structural mechanics with LS-DYNA software. To establish a range of dependencies a mere 2-node cluster has proven sufficient.
Representation-Independent Iteration of Sparse Data Arrays
NASA Technical Reports Server (NTRS)
James, Mark
2007-01-01
An approach is defined that describes a method of iterating over massively large arrays containing sparse data using an approach that is implementation independent of how the contents of the sparse arrays are laid out in memory. What is unique and important here is the decoupling of the iteration over the sparse set of array elements from how they are internally represented in memory. This enables this approach to be backward compatible with existing schemes for representing sparse arrays as well as new approaches. What is novel here is a new approach for efficiently iterating over sparse arrays that is independent of the underlying memory layout representation of the array. A functional interface is defined for implementing sparse arrays in any modern programming language with a particular focus for the Chapel programming language. Examples are provided that show the translation of a loop that computes a matrix vector product into this representation for both the distributed and not-distributed cases. This work is directly applicable to NASA and its High Productivity Computing Systems (HPCS) program that JPL and our current program are engaged in. The goal of this program is to create powerful, scalable, and economically viable high-powered computer systems suitable for use in national security and industry by 2010. This is important to NASA for its computationally intensive requirements for analyzing and understanding the volumes of science data from our returned missions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ibrahim, Khaled Z.; Epifanovsky, Evgeny; Williams, Samuel
Coupled-cluster methods provide highly accurate models of molecular structure through explicit numerical calculation of tensors representing the correlation between electrons. These calculations are dominated by a sequence of tensor contractions, motivating the development of numerical libraries for such operations. While based on matrix–matrix multiplication, these libraries are specialized to exploit symmetries in the molecular structure and in electronic interactions, and thus reduce the size of the tensor representation and the complexity of contractions. The resulting algorithms are irregular and their parallelization has been previously achieved via the use of dynamic scheduling or specialized data decompositions. We introduce our efforts tomore » extend the Libtensor framework to work in the distributed memory environment in a scalable and energy-efficient manner. We achieve up to 240× speedup compared with the optimized shared memory implementation of Libtensor. We attain scalability to hundreds of thousands of compute cores on three distributed-memory architectures (Cray XC30 and XC40, and IBM Blue Gene/Q), and on a heterogeneous GPU-CPU system (Cray XK7). As the bottlenecks shift from being compute-bound DGEMM's to communication-bound collectives as the size of the molecular system scales, we adopt two radically different parallelization approaches for handling load-imbalance, tasking and bulk synchronous models. Nevertheless, we preserve a unified interface to both programming models to maintain the productivity of computational quantum chemists.« less
Ibrahim, Khaled Z.; Epifanovsky, Evgeny; Williams, Samuel; ...
2017-03-08
Coupled-cluster methods provide highly accurate models of molecular structure through explicit numerical calculation of tensors representing the correlation between electrons. These calculations are dominated by a sequence of tensor contractions, motivating the development of numerical libraries for such operations. While based on matrix–matrix multiplication, these libraries are specialized to exploit symmetries in the molecular structure and in electronic interactions, and thus reduce the size of the tensor representation and the complexity of contractions. The resulting algorithms are irregular and their parallelization has been previously achieved via the use of dynamic scheduling or specialized data decompositions. We introduce our efforts tomore » extend the Libtensor framework to work in the distributed memory environment in a scalable and energy-efficient manner. We achieve up to 240× speedup compared with the optimized shared memory implementation of Libtensor. We attain scalability to hundreds of thousands of compute cores on three distributed-memory architectures (Cray XC30 and XC40, and IBM Blue Gene/Q), and on a heterogeneous GPU-CPU system (Cray XK7). As the bottlenecks shift from being compute-bound DGEMM's to communication-bound collectives as the size of the molecular system scales, we adopt two radically different parallelization approaches for handling load-imbalance, tasking and bulk synchronous models. Nevertheless, we preserve a unified interface to both programming models to maintain the productivity of computational quantum chemists.« less
Incorporating CLIPS into a personal-computer-based Intelligent Tutoring System
NASA Technical Reports Server (NTRS)
Mueller, Stephen J.
1990-01-01
A large number of Intelligent Tutoring Systems (ITS's) have been built since they were first proposed in the early 1970's. Research conducted on the use of the best of these systems has demonstrated their effectiveness in tutoring in selected domains. Computer Sciences Corporation, Applied Technology Division, Houston Operations has been tasked by the Spacecraft Software Division at NASA/Johnson Space Center (NASA/JSC) to develop a number of lTS's in a variety of domains and on many different platforms. This paper will address issues facing the development of an ITS on a personal computer using the CLIPS (C Language Integrated Production System) language. For an ITS to be widely accepted, not only must it be effective, flexible, and very responsive, it must also be capable of functioning on readily available computers. There are many issues to consider when using CLIPS to develop an ITS on a personal computer. Some of these issues are the following: when to use CLIPS and when to use a procedural language such as C, how to maximize speed and minimize memory usage, and how to decrease the time required to load your rule base once you are ready to deliver the system. Based on experiences in developing the CLIPS Intelligent Tutoring System (CLIPSITS) on an IBM PC clone and an intelligent Physics Tutor on a Macintosh 2, this paper reports results on how to address some of these issues. It also suggests approaches for maintaining a powerful learning environment while delivering robust performance within the speed and memory constraints of the personal computer.
The biological microprocessor, or how to build a computer with biological parts
Moe-Behrens, Gerd HG
2013-01-01
Systemics, a revolutionary paradigm shift in scientific thinking, with applications in systems biology, and synthetic biology, have led to the idea of using silicon computers and their engineering principles as a blueprint for the engineering of a similar machine made from biological parts. Here we describe these building blocks and how they can be assembled to a general purpose computer system, a biological microprocessor. Such a system consists of biological parts building an input / output device, an arithmetic logic unit, a control unit, memory, and wires (busses) to interconnect these components. A biocomputer can be used to monitor and control a biological system. PMID:24688733
A scalable parallel black oil simulator on distributed memory parallel computers
NASA Astrophysics Data System (ADS)
Wang, Kun; Liu, Hui; Chen, Zhangxin
2015-11-01
This paper presents our work on developing a parallel black oil simulator for distributed memory computers based on our in-house parallel platform. The parallel simulator is designed to overcome the performance issues of common simulators that are implemented for personal computers and workstations. The finite difference method is applied to discretize the black oil model. In addition, some advanced techniques are employed to strengthen the robustness and parallel scalability of the simulator, including an inexact Newton method, matrix decoupling methods, and algebraic multigrid methods. A new multi-stage preconditioner is proposed to accelerate the solution of linear systems from the Newton methods. Numerical experiments show that our simulator is scalable and efficient, and is capable of simulating extremely large-scale black oil problems with tens of millions of grid blocks using thousands of MPI processes on parallel computers.
ERIC Educational Resources Information Center
Deryakulu, Deniz; Olkun, Sinan
2009-01-01
This study examined Turkish computer teachers' professional memories telling of their experiences with school administrators and supervisors. Seventy-four computer teachers participated in the study. Content analysis of the memories revealed that the most frequently mentioned themes concerning school administrators were "unsupportive…
Resilient and Robust High Performance Computing Platforms for Scientific Computing Integrity
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jin, Yier
As technology advances, computer systems are subject to increasingly sophisticated cyber-attacks that compromise both their security and integrity. High performance computing platforms used in commercial and scientific applications involving sensitive, or even classified data, are frequently targeted by powerful adversaries. This situation is made worse by a lack of fundamental security solutions that both perform efficiently and are effective at preventing threats. Current security solutions fail to address the threat landscape and ensure the integrity of sensitive data. As challenges rise, both private and public sectors will require robust technologies to protect its computing infrastructure. The research outcomes from thismore » project try to address all these challenges. For example, we present LAZARUS, a novel technique to harden kernel Address Space Layout Randomization (KASLR) against paging-based side-channel attacks. In particular, our scheme allows for fine-grained protection of the virtual memory mappings that implement the randomization. We demonstrate the effectiveness of our approach by hardening a recent Linux kernel with LAZARUS, mitigating all of the previously presented side-channel attacks on KASLR. Our extensive evaluation shows that LAZARUS incurs only 0.943% overhead for standard benchmarks, and is therefore highly practical. We also introduced HA2lloc, a hardware-assisted allocator that is capable of leveraging an extended memory management unit to detect memory errors in the heap. We also perform testing using HA2lloc in a simulation environment and find that the approach is capable of preventing common memory vulnerabilities.« less
A Case Study on Neural Inspired Dynamic Memory Management Strategies for High Performance Computing.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vineyard, Craig Michael; Verzi, Stephen Joseph
As high performance computing architectures pursue more computational power there is a need for increased memory capacity and bandwidth as well. A multi-level memory (MLM) architecture addresses this need by combining multiple memory types with different characteristics as varying levels of the same architecture. How to efficiently utilize this memory infrastructure is an unknown challenge, and in this research we sought to investigate whether neural inspired approaches can meaningfully help with memory management. In particular we explored neurogenesis inspired re- source allocation, and were able to show a neural inspired mixed controller policy can beneficially impact how MLM architectures utilizemore » memory.« less
GaAs Supercomputing: Architecture, Language, And Algorithms For Image Processing
NASA Astrophysics Data System (ADS)
Johl, John T.; Baker, Nick C.
1988-10-01
The application of high-speed GaAs processors in a parallel system matches the demanding computational requirements of image processing. The architecture of the McDonnell Douglas Astronautics Company (MDAC) vector processor is described along with the algorithms and language translator. Most image and signal processing algorithms can utilize parallel processing and show a significant performance improvement over sequential versions. The parallelization performed by this system is within each vector instruction. Since each vector has many elements, each requiring some computation, useful concurrent arithmetic operations can easily be performed. Balancing the memory bandwidth with the computation rate of the processors is an important design consideration for high efficiency and utilization. The architecture features a bus-based execution unit consisting of four to eight 32-bit GaAs RISC microprocessors running at a 200 MHz clock rate for a peak performance of 1.6 BOPS. The execution unit is connected to a vector memory with three buses capable of transferring two input words and one output word every 10 nsec. The address generators inside the vector memory perform different vector addressing modes and feed the data to the execution unit. The functions discussed in this paper include basic MATRIX OPERATIONS, 2-D SPATIAL CONVOLUTION, HISTOGRAM, and FFT. For each of these algorithms, assembly language programs were run on a behavioral model of the system to obtain performance figures.
Programming for energy monitoring/display system in multicolor lidar system research
NASA Technical Reports Server (NTRS)
Alvarado, R. C., Jr.; Allen, R. J.
1982-01-01
The Z80 microprocessor based computer program that directs and controls the operation of the six channel energy monitoring/display system that is a part of the NASA Multipurpose Airborne Differential Absorption Lidar (DIAL) system is described. The program is written in the Z80 assembly language and is located on EPROM memories. All source and assembled listings of the main program, five subroutines, and two service routines along with flow charts and memory maps are included. A combinational block diagram shows the interfacing (including port addresses) between the six power sensors, displays, front panel controls, the main general purpose minicomputer, and this dedicated microcomputer system.
A Hybrid Task Graph Scheduler for High Performance Image Processing Workflows.
Blattner, Timothy; Keyrouz, Walid; Bhattacharyya, Shuvra S; Halem, Milton; Brady, Mary
2017-12-01
Designing applications for scalability is key to improving their performance in hybrid and cluster computing. Scheduling code to utilize parallelism is difficult, particularly when dealing with data dependencies, memory management, data motion, and processor occupancy. The Hybrid Task Graph Scheduler (HTGS) improves programmer productivity when implementing hybrid workflows for multi-core and multi-GPU systems. The Hybrid Task Graph Scheduler (HTGS) is an abstract execution model, framework, and API that increases programmer productivity when implementing hybrid workflows for such systems. HTGS manages dependencies between tasks, represents CPU and GPU memories independently, overlaps computations with disk I/O and memory transfers, keeps multiple GPUs occupied, and uses all available compute resources. Through these abstractions, data motion and memory are explicit; this makes data locality decisions more accessible. To demonstrate the HTGS application program interface (API), we present implementations of two example algorithms: (1) a matrix multiplication that shows how easily task graphs can be used; and (2) a hybrid implementation of microscopy image stitching that reduces code size by ≈ 43% compared to a manually coded hybrid workflow implementation and showcases the minimal overhead of task graphs in HTGS. Both of the HTGS-based implementations show good performance. In image stitching the HTGS implementation achieves similar performance to the hybrid workflow implementation. Matrix multiplication with HTGS achieves 1.3× and 1.8× speedup over the multi-threaded OpenBLAS library for 16k × 16k and 32k × 32k size matrices, respectively.
FAST: framework for heterogeneous medical image computing and visualization.
Smistad, Erik; Bozorgi, Mohammadmehdi; Lindseth, Frank
2015-11-01
Computer systems are becoming increasingly heterogeneous in the sense that they consist of different processors, such as multi-core CPUs and graphic processing units. As the amount of medical image data increases, it is crucial to exploit the computational power of these processors. However, this is currently difficult due to several factors, such as driver errors, processor differences, and the need for low-level memory handling. This paper presents a novel FrAmework for heterogeneouS medical image compuTing and visualization (FAST). The framework aims to make it easier to simultaneously process and visualize medical images efficiently on heterogeneous systems. FAST uses common image processing programming paradigms and hides the details of memory handling from the user, while enabling the use of all processors and cores on a system. The framework is open-source, cross-platform and available online. Code examples and performance measurements are presented to show the simplicity and efficiency of FAST. The results are compared to the insight toolkit (ITK) and the visualization toolkit (VTK) and show that the presented framework is faster with up to 20 times speedup on several common medical imaging algorithms. FAST enables efficient medical image computing and visualization on heterogeneous systems. Code examples and performance evaluations have demonstrated that the toolkit is both easy to use and performs better than existing frameworks, such as ITK and VTK.
Improvement and speed optimization of numerical tsunami modelling program using OpenMP technology
NASA Astrophysics Data System (ADS)
Chernov, A.; Zaytsev, A.; Yalciner, A.; Kurkin, A.
2009-04-01
Currently, the basic problem of tsunami modeling is low speed of calculations which is unacceptable for services of the operative notification. Existing algorithms of numerical modeling of hydrodynamic processes of tsunami waves are developed without taking the opportunities of modern computer facilities. There is an opportunity to have considerable acceleration of process of calculations by using parallel algorithms. We discuss here new approach to parallelization tsunami modeling code using OpenMP Technology (for multiprocessing systems with the general memory). Nowadays, multiprocessing systems are easily accessible for everyone. The cost of the use of such systems becomes much lower comparing to the costs of clusters. This opportunity also benefits all programmers to apply multithreading algorithms on desktop computers of researchers. Other important advantage of the given approach is the mechanism of the general memory - there is no necessity to send data on slow networks (for example Ethernet). All memory is the common for all computing processes; it causes almost linear scalability of the program and processes. In the new version of NAMI DANCE using OpenMP technology and multi-threading algorithm provide 80% gain in speed in comparison with the one-thread version for dual-processor unit. The speed increased and 320% gain was attained for four core processor unit of PCs. Thus, it was possible to reduce considerably time of performance of calculations on the scientific workstations (desktops) without complete change of the program and user interfaces. The further modernization of algorithms of preparation of initial data and processing of results using OpenMP looks reasonable. The final version of NAMI DANCE with the increased computational speed can be used not only for research purposes but also in real time Tsunami Warning Systems.
Viejo, Guillaume; Khamassi, Mehdi; Brovelli, Andrea; Girard, Benoît
2015-01-01
Current learning theory provides a comprehensive description of how humans and other animals learn, and places behavioral flexibility and automaticity at heart of adaptive behaviors. However, the computations supporting the interactions between goal-directed and habitual decision-making systems are still poorly understood. Previous functional magnetic resonance imaging (fMRI) results suggest that the brain hosts complementary computations that may differentially support goal-directed and habitual processes in the form of a dynamical interplay rather than a serial recruitment of strategies. To better elucidate the computations underlying flexible behavior, we develop a dual-system computational model that can predict both performance (i.e., participants' choices) and modulations in reaction times during learning of a stimulus–response association task. The habitual system is modeled with a simple Q-Learning algorithm (QL). For the goal-directed system, we propose a new Bayesian Working Memory (BWM) model that searches for information in the history of previous trials in order to minimize Shannon entropy. We propose a model for QL and BWM coordination such that the expensive memory manipulation is under control of, among others, the level of convergence of the habitual learning. We test the ability of QL or BWM alone to explain human behavior, and compare them with the performance of model combinations, to highlight the need for such combinations to explain behavior. Two of the tested combination models are derived from the literature, and the latter being our new proposal. In conclusion, all subjects were better explained by model combinations, and the majority of them are explained by our new coordination proposal. PMID:26379518
NASA Astrophysics Data System (ADS)
Megherbi, Dalila B.; Yan, Yin; Tanmay, Parikh; Khoury, Jed; Woods, C. L.
2004-11-01
Recently surveillance and Automatic Target Recognition (ATR) applications are increasing as the cost of computing power needed to process the massive amount of information continues to fall. This computing power has been made possible partly by the latest advances in FPGAs and SOPCs. In particular, to design and implement state-of-the-Art electro-optical imaging systems to provide advanced surveillance capabilities, there is a need to integrate several technologies (e.g. telescope, precise optics, cameras, image/compute vision algorithms, which can be geographically distributed or sharing distributed resources) into a programmable system and DSP systems. Additionally, pattern recognition techniques and fast information retrieval, are often important components of intelligent systems. The aim of this work is using embedded FPGA as a fast, configurable and synthesizable search engine in fast image pattern recognition/retrieval in a distributed hardware/software co-design environment. In particular, we propose and show a low cost Content Addressable Memory (CAM)-based distributed embedded FPGA hardware architecture solution with real time recognition capabilities and computing for pattern look-up, pattern recognition, and image retrieval. We show how the distributed CAM-based architecture offers a performance advantage of an order-of-magnitude over RAM-based architecture (Random Access Memory) search for implementing high speed pattern recognition for image retrieval. The methods of designing, implementing, and analyzing the proposed CAM based embedded architecture are described here. Other SOPC solutions/design issues are covered. Finally, experimental results, hardware verification, and performance evaluations using both the Xilinx Virtex-II and the Altera Apex20k are provided to show the potential and power of the proposed method for low cost reconfigurable fast image pattern recognition/retrieval at the hardware/software co-design level.
Vascular system modeling in parallel environment - distributed and shared memory approaches
Jurczuk, Krzysztof; Kretowski, Marek; Bezy-Wendling, Johanne
2011-01-01
The paper presents two approaches in parallel modeling of vascular system development in internal organs. In the first approach, new parts of tissue are distributed among processors and each processor is responsible for perfusing its assigned parts of tissue to all vascular trees. Communication between processors is accomplished by passing messages and therefore this algorithm is perfectly suited for distributed memory architectures. The second approach is designed for shared memory machines. It parallelizes the perfusion process during which individual processing units perform calculations concerning different vascular trees. The experimental results, performed on a computing cluster and multi-core machines, show that both algorithms provide a significant speedup. PMID:21550891
Ham, Timothy S; Lee, Sung K; Keasling, Jay D; Arkin, Adam P
2008-07-30
Inversion recombination elements present unique opportunities for computing and information encoding in biological systems. They provide distinct binary states that are encoded into the DNA sequence itself, allowing us to overcome limitations posed by other biological memory or logic gate systems. Further, it is in theory possible to create complex sequential logics by careful positioning of recombinase recognition sites in the sequence. In this work, we describe the design and synthesis of an inversion switch using the fim and hin inversion recombination systems to create a heritable sequential memory switch. We have integrated the two inversion systems in an overlapping manner, creating a switch that can have multiple states. The switch is capable of transitioning from state to state in a manner analogous to a finite state machine, while encoding the state information into DNA. This switch does not require protein expression to maintain its state, and "remembers" its state even upon cell death. We were able to demonstrate transition into three out of the five possible states showing the feasibility of such a switch. We demonstrate that a heritable memory system that encodes its state into DNA is possible, and that inversion recombination system could be a starting point for more complex memory circuits. Although the circuit did not fully behave as expected, we showed that a multi-state, temporal memory is achievable.
Ham, Timothy S.; Lee, Sung K.; Keasling, Jay D.; Arkin, Adam P.
2008-01-01
Background Inversion recombination elements present unique opportunities for computing and information encoding in biological systems. They provide distinct binary states that are encoded into the DNA sequence itself, allowing us to overcome limitations posed by other biological memory or logic gate systems. Further, it is in theory possible to create complex sequential logics by careful positioning of recombinase recognition sites in the sequence. Methodology/Principal Findings In this work, we describe the design and synthesis of an inversion switch using the fim and hin inversion recombination systems to create a heritable sequential memory switch. We have integrated the two inversion systems in an overlapping manner, creating a switch that can have multiple states. The switch is capable of transitioning from state to state in a manner analogous to a finite state machine, while encoding the state information into DNA. This switch does not require protein expression to maintain its state, and “remembers” its state even upon cell death. We were able to demonstrate transition into three out of the five possible states showing the feasibility of such a switch. Conclusions/Significance We demonstrate that a heritable memory system that encodes its state into DNA is possible, and that inversion recombination system could be a starting point for more complex memory circuits. Although the circuit did not fully behave as expected, we showed that a multi-state, temporal memory is achievable. PMID:18665232
The declarative/procedural model of lexicon and grammar.
Ullman, M T
2001-01-01
Our use of language depends upon two capacities: a mental lexicon of memorized words and a mental grammar of rules that underlie the sequential and hierarchical composition of lexical forms into predictably structured larger words, phrases, and sentences. The declarative/procedural model posits that the lexicon/grammar distinction in language is tied to the distinction between two well-studied brain memory systems. On this view, the memorization and use of at least simple words (those with noncompositional, that is, arbitrary form-meaning pairings) depends upon an associative memory of distributed representations that is subserved by temporal-lobe circuits previously implicated in the learning and use of fact and event knowledge. This "declarative memory" system appears to be specialized for learning arbitrarily related information (i.e., for associative binding). In contrast, the acquisition and use of grammatical rules that underlie symbol manipulation is subserved by frontal/basal-ganglia circuits previously implicated in the implicit (nonconscious) learning and expression of motor and cognitive "skills" and "habits" (e.g., from simple motor acts to skilled game playing). This "procedural" system may be specialized for computing sequences. This novel view of lexicon and grammar offers an alternative to the two main competing theoretical frameworks. It shares the perspective of traditional dual-mechanism theories in positing that the mental lexicon and a symbol-manipulating mental grammar are subserved by distinct computational components that may be linked to distinct brain structures. However, it diverges from these theories where they assume components dedicated to each of the two language capacities (that is, domain-specific) and in their common assumption that lexical memory is a rote list of items. Conversely, while it shares with single-mechanism theories the perspective that the two capacities are subserved by domain-independent computational mechanisms, it diverges from them where they link both capacities to a single associative memory system with broad anatomic distribution. The declarative/procedural model, but neither traditional dual- nor single-mechanism models, predicts double dissociations between lexicon and grammar, with associations among associative memory properties, memorized words and facts, and temporal-lobe structures, and among symbol-manipulation properties, grammatical rule products, motor skills, and frontal/basal-ganglia structures. In order to contrast lexicon and grammar while holding other factors constant, we have focused our investigations of the declarative/procedural model on morphologically complex word forms. Morphological transformations that are (largely) unproductive (e.g., in go-went, solemn-solemnity) are hypothesized to depend upon declarative memory. These have been contrasted with morphological transformations that are fully productive (e.g., in walk-walked, happy-happiness), whose computation is posited to be solely dependent upon grammatical rules subserved by the procedural system. Here evidence is presented from studies that use a range of psycholinguistic and neurolinguistic approaches with children and adults. It is argued that converging evidence from these studies supports the declarative/procedural model of lexicon and grammar.
File System Virtual Appliances: Portable File System Implementations
2009-05-01
Mobile Computing Systems and Applications, Santa Cruz, CA, 1994. IEEE. [10] Michael Eisler , Peter Corbett, Michael Kazar, Daniel S. Nydick, and...Gingell, Joseph P. Moran, and William A. Shannon. Virtual Memory Architec- ture in SunOS. In USENIX Summer Conference, pages 81–94, Berkeley, CA, 1987
DOE Office of Scientific and Technical Information (OSTI.GOV)
Perumalla, Kalyan S.; Yoginath, Srikanth B.
Problems such as fault tolerance and scalable synchronization can be efficiently solved using reversibility of applications. Making applications reversible by relying on computation rather than on memory is ideal for large scale parallel computing, especially for the next generation of supercomputers in which memory is expensive in terms of latency, energy, and price. In this direction, a case study is presented here in reversing a computational core, namely, Basic Linear Algebra Subprograms, which is widely used in scientific applications. A new Reversible BLAS (RBLAS) library interface has been designed, and a prototype has been implemented with two modes: (1) amore » memory-mode in which reversibility is obtained by checkpointing to memory in forward and restoring from memory in reverse, and (2) a computational-mode in which nothing is saved in the forward, but restoration is done entirely via inverse computation in reverse. The article is focused on detailed performance benchmarking to evaluate the runtime dynamics and performance effects, comparing reversible computation with checkpointing on both traditional CPU platforms and recent GPU accelerator platforms. For BLAS Level-1 subprograms, data indicates over an order of magnitude better speed of reversible computation compared to checkpointing. For BLAS Level-2 and Level-3, a more complex tradeoff is observed between reversible computation and checkpointing, depending on computational and memory complexities of the subprograms.« less
Fast Data Acquisition For Mass Spectrometer
NASA Technical Reports Server (NTRS)
Lincoln, K. A.; Bechtel, R. D.
1988-01-01
New equipment has speed and capacity to process time-of-flight data. System relies on fast, compact waveform digitizer with 32-k memory coupled to personal computer. With digitizer, system captures all mass peaks on each 25- to 35-microseconds cycle of spectrometer.
ERIC Educational Resources Information Center
Fassbender, Eric; Richards, Deborah; Bilgin, Ayse; Thompson, William Forde; Heiden, Wolfgang
2012-01-01
Game technology has been widely used for educational applications, however, despite the common use of background music in games, its effect on learning has been largely unexplored. This paper discusses how music played in the background of a computer-animated history lesson affected participants' memory for facts. A virtual history lesson was…
2017-02-01
enable high scalability and reconfigurability for inter-CPU/Memory communications with an increased number of communication channels in frequency ...interconnect technology (MRFI) to enable high scalability and re-configurability for inter-CPU/Memory communications with an increased number of communication ...testing in the University of California, Los Angeles (UCLA) Center for High Frequency Electronics, and Dr. Afshin Momtaz at Broadcom Corporation for
1993-11-01
way is to develop a crude but working model of an entire system. The other is by developing a realistic model of the user interface , leaving out most...devices or by incorporating software for a more user -friendly interface . Automation introduces the possibility of making data entry errors. Multimode...across various human- computer interfaces . 127 a Memory: Minimize the amount of information that the user must maintain in short-term memory
ERIC Educational Resources Information Center
van der Ven, Sanne H. G.; Klaiber, Jonathan D.; van der Maas, Han L. J.
2017-01-01
Writing down spoken number words (transcoding) is an ability that is predictive of math performance and related to working memory ability. We analysed these relationships in a large sample of over 25,000 children, from kindergarten to the end of primary school, who solved transcoding items with a computer adaptive system. Furthermore, we…
Computing Equilibrium Chemical Compositions
NASA Technical Reports Server (NTRS)
Mcbride, Bonnie J.; Gordon, Sanford
1995-01-01
Chemical Equilibrium With Transport Properties, 1993 (CET93) computer program provides data on chemical-equilibrium compositions. Aids calculation of thermodynamic properties of chemical systems. Information essential in design and analysis of such equipment as compressors, turbines, nozzles, engines, shock tubes, heat exchangers, and chemical-processing equipment. CET93/PC is version of CET93 specifically designed to run within 640K memory limit of MS-DOS operating system. CET93/PC written in FORTRAN.
Working memory contributions to reinforcement learning impairments in schizophrenia.
Collins, Anne G E; Brown, Jaime K; Gold, James M; Waltz, James A; Frank, Michael J
2014-10-08
Previous research has shown that patients with schizophrenia are impaired in reinforcement learning tasks. However, behavioral learning curves in such tasks originate from the interaction of multiple neural processes, including the basal ganglia- and dopamine-dependent reinforcement learning (RL) system, but also prefrontal cortex-dependent cognitive strategies involving working memory (WM). Thus, it is unclear which specific system induces impairments in schizophrenia. We recently developed a task and computational model allowing us to separately assess the roles of RL (slow, cumulative learning) mechanisms versus WM (fast but capacity-limited) mechanisms in healthy adult human subjects. Here, we used this task to assess patients' specific sources of impairments in learning. In 15 separate blocks, subjects learned to pick one of three actions for stimuli. The number of stimuli to learn in each block varied from two to six, allowing us to separate influences of capacity-limited WM from the incremental RL system. As expected, both patients (n = 49) and healthy controls (n = 36) showed effects of set size and delay between stimulus repetitions, confirming the presence of working memory effects. Patients performed significantly worse than controls overall, but computational model fits and behavioral analyses indicate that these deficits could be entirely accounted for by changes in WM parameters (capacity and reliability), whereas RL processes were spared. These results suggest that the working memory system contributes strongly to learning impairments in schizophrenia. Copyright © 2014 the authors 0270-6474/14/3413747-10$15.00/0.
On the Efficacy of Source Code Optimizations for Cache-Based Systems
NASA Technical Reports Server (NTRS)
VanderWijngaart, Rob F.; Saphir, William C.
1998-01-01
Obtaining high performance without machine-specific tuning is an important goal of scientific application programmers. Since most scientific processing is done on commodity microprocessors with hierarchical memory systems, this goal of "portable performance" can be achieved if a common set of optimization principles is effective for all such systems. It is widely believed, or at least hoped, that portable performance can be realized. The rule of thumb for optimization on hierarchical memory systems is to maximize temporal and spatial locality of memory references by reusing data and minimizing memory access stride. We investigate the effects of a number of optimizations on the performance of three related kernels taken from a computational fluid dynamics application. Timing the kernels on a range of processors, we observe an inconsistent and often counterintuitive impact of the optimizations on performance. In particular, code variations that have a positive impact on one architecture can have a negative impact on another, and variations expected to be unimportant can produce large effects. Moreover, we find that cache miss rates - as reported by a cache simulation tool, and confirmed by hardware counters - only partially explain the results. By contrast, the compiler-generated assembly code provides more insight by revealing the importance of processor-specific instructions and of compiler maturity, both of which strongly, and sometimes unexpectedly, influence performance. We conclude that it is difficult to obtain performance portability on modern cache-based computers, and comment on the implications of this result.
On the Efficacy of Source Code Optimizations for Cache-Based Systems
NASA Technical Reports Server (NTRS)
VanderWijngaart, Rob F.; Saphir, William C.; Saini, Subhash (Technical Monitor)
1998-01-01
Obtaining high performance without machine-specific tuning is an important goal of scientific application programmers. Since most scientific processing is done on commodity microprocessors with hierarchical memory systems, this goal of "portable performance" can be achieved if a common set of optimization principles is effective for all such systems. It is widely believed, or at least hoped, that portable performance can be realized. The rule of thumb for optimization on hierarchical memory systems is to maximize temporal and spatial locality of memory references by reusing data and minimizing memory access stride. We investigate the effects of a number of optimizations on the performance of three related kernels taken from a computational fluid dynamics application. Timing the kernels on a range of processors, we observe an inconsistent and often counterintuitive impact of the optimizations on performance. In particular, code variations that have a positive impact on one architecture can have a negative impact on another, and variations expected to be unimportant can produce large effects. Moreover, we find that cache miss rates-as reported by a cache simulation tool, and confirmed by hardware counters-only partially explain the results. By contrast, the compiler-generated assembly code provides more insight by revealing the importance of processor-specific instructions and of compiler maturity, both of which strongly, and sometimes unexpectedly, influence performance. We conclude that it is difficult to obtain performance portability on modern cache-based computers, and comment on the implications of this result.
Systemic Lisbon Battery: Normative Data for Memory and Attention Assessments.
Gamito, Pedro; Morais, Diogo; Oliveira, Jorge; Ferreira Lopes, Paulo; Picareli, Luís Felipe; Matias, Marcelo; Correia, Sara; Brito, Rodrigo
2016-05-04
Memory and attention are two cognitive domains pivotal for the performance of instrumental activities of daily living (IADLs). The assessment of these functions is still widely carried out with pencil-and-paper tests, which lack ecological validity. The evaluation of cognitive and memory functions while the patients are performing IADLs should contribute to the ecological validity of the evaluation process. The objective of this study is to establish normative data from virtual reality (VR) IADLs designed to activate memory and attention functions. A total of 243 non-clinical participants carried out a paper-and-pencil Mini-Mental State Examination (MMSE) and performed 3 VR activities: art gallery visual matching task, supermarket shopping task, and memory fruit matching game. The data (execution time and errors, and money spent in the case of the supermarket activity) was automatically generated from the app. Outcomes were computed using non-parametric statistics, due to non-normality of distributions. Age, academic qualifications, and computer experience all had significant effects on most measures. Normative values for different levels of these measures were defined. Age, academic qualifications, and computer experience should be taken into account while using our VR-based platform for cognitive assessment purposes. ©Pedro Gamito, Diogo Morais, Jorge Oliveira, Paulo Ferreira Lopes, Luís Felipe Picareli, Marcelo Matias, Sara Correia, Rodrigo Brito. Originally published in JMIR Rehabilitation and Assistive Technology (http://rehab.jmir.org), 04.05.2016.
Software/hardware distributed processing network supporting the Ada environment
NASA Astrophysics Data System (ADS)
Wood, Richard J.; Pryk, Zen
1993-09-01
A high-performance, fault-tolerant, distributed network has been developed, tested, and demonstrated. The network is based on the MIPS Computer Systems, Inc. R3000 Risc for processing, VHSIC ASICs for high speed, reliable, inter-node communications and compatible commercial memory and I/O boards. The network is an evolution of the Advanced Onboard Signal Processor (AOSP) architecture. It supports Ada application software with an Ada- implemented operating system. A six-node implementation (capable of expansion up to 256 nodes) of the RISC multiprocessor architecture provides 120 MIPS of scalar throughput, 96 Mbytes of RAM and 24 Mbytes of non-volatile memory. The network provides for all ground processing applications, has merit for space-qualified RISC-based network, and interfaces to advanced Computer Aided Software Engineering (CASE) tools for application software development.
Directions in parallel programming: HPF, shared virtual memory and object parallelism in pC++
NASA Technical Reports Server (NTRS)
Bodin, Francois; Priol, Thierry; Mehrotra, Piyush; Gannon, Dennis
1994-01-01
Fortran and C++ are the dominant programming languages used in scientific computation. Consequently, extensions to these languages are the most popular for programming massively parallel computers. We discuss two such approaches to parallel Fortran and one approach to C++. The High Performance Fortran Forum has designed HPF with the intent of supporting data parallelism on Fortran 90 applications. HPF works by asking the user to help the compiler distribute and align the data structures with the distributed memory modules in the system. Fortran-S takes a different approach in which the data distribution is managed by the operating system and the user provides annotations to indicate parallel control regions. In the case of C++, we look at pC++ which is based on a concurrent aggregate parallel model.
A high performance parallel algorithm for 1-D FFT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Agarwal, R.C.; Gustavson, F.G.; Zubair, M.
1994-12-31
In this paper the authors propose a parallel high performance FFT algorithm based on a multi-dimensional formulation. They use this to solve a commonly encountered FFT based kernel on a distributed memory parallel machine, the IBM scalable parallel system, SP1. The kernel requires a forward FFT computation of an input sequence, multiplication of the transformed data by a coefficient array, and finally an inverse FFT computation of the resultant data. They show that the multi-dimensional formulation helps in reducing the communication costs and also improves the single node performance by effectively utilizing the memory system of the node. They implementedmore » this kernel on the IBM SP1 and observed a performance of 1.25 GFLOPS on a 64-node machine.« less
The magic words: Using computers to uncover mental associations for use in magic trick design.
Williams, Howard; McOwan, Peter W
2017-01-01
The use of computational systems to aid in the design of magic tricks has been previously explored. Here further steps are taken in this direction, introducing the use of computer technology as a natural language data sourcing and processing tool for magic trick design purposes. Crowd sourcing of psychological concepts is investigated; further, the role of human associative memory and its exploitation in magical effects is explored. A new trick is developed and evaluated: a physical card trick partially designed by a computational system configured to search for and explore conceptual spaces readily understood by spectators.
Static analysis of the hull plate using the finite element method
NASA Astrophysics Data System (ADS)
Ion, A.
2015-11-01
This paper aims at presenting the static analysis for two levels of a container ship's construction as follows: the first level is at the girder / hull plate and the second level is conducted at the entire strength hull of the vessel. This article will describe the work for the static analysis of a hull plate. We shall use the software package ANSYS Mechanical 14.5. The program is run on a computer with four Intel Xeon X5260 CPU processors at 3.33 GHz, 32 GB memory installed. In terms of software, the shared memory parallel version of ANSYS refers to running ANSYS across multiple cores on a SMP system. The distributed memory parallel version of ANSYS (Distributed ANSYS) refers to running ANSYS across multiple processors on SMP systems or DMP systems.
Evolution of cellular automata with memory: The Density Classification Task.
Stone, Christopher; Bull, Larry
2009-08-01
The Density Classification Task is a well known test problem for two-state discrete dynamical systems. For many years researchers have used a variety of evolutionary computation approaches to evolve solutions to this problem. In this paper, we investigate the evolvability of solutions when the underlying Cellular Automaton is augmented with a type of memory based on the Least Mean Square algorithm. To obtain high performance solutions using a simple non-hybrid genetic algorithm, we design a novel representation based on the ternary representation used for Learning Classifier Systems. The new representation is found able to produce superior performance to the bit string traditionally used for representing Cellular automata. Moreover, memory is shown to improve evolvability of solutions and appropriate memory settings are able to be evolved as a component part of these solutions.
FFT transformed quantitative EEG analysis of short term memory load.
Singh, Yogesh; Singh, Jayvardhan; Sharma, Ratna; Talwar, Anjana
2015-07-01
The EEG is considered as building block of functional signaling in the brain. The role of EEG oscillations in human information processing has been intensively investigated. To study the quantitative EEG correlates of short term memory load as assessed through Sternberg memory test. The study was conducted on 34 healthy male student volunteers. The intervention consisted of Sternberg memory test, which runs on a version of the Sternberg memory scanning paradigm software on a computer. Electroencephalography (EEG) was recorded from 19 scalp locations according to 10-20 international system of electrode placement. EEG signals were analyzed offline. To overcome the problems of fixed band system, individual alpha frequency (IAF) based frequency band selection method was adopted. The outcome measures were FFT transformed absolute powers in the six bands at 19 electrode positions. Sternberg memory test served as model of short term memory load. Correlation analysis of EEG during memory task was reflected as decreased absolute power in Upper alpha band in nearly all the electrode positions; increased power in Theta band at Fronto-Temporal region and Lower 1 alpha band at Fronto-Central region. Lower 2 alpha, Beta and Gamma band power remained unchanged. Short term memory load has distinct electroencephalographic correlates resembling the mentally stressed state. This is evident from decreased power in Upper alpha band (corresponding to Alpha band of traditional EEG system) which is representative band of relaxed mental state. Fronto-temporal Theta power changes may reflect the encoding and execution of memory task.
Recent Trends in Spintronics-Based Nanomagnetic Logic
NASA Astrophysics Data System (ADS)
Das, Jayita; Alam, Syed M.; Bhanja, Sanjukta
2014-09-01
With the growing concerns of standby power in sub-100-nm CMOS technologies, alternative computing techniques and memory technologies are explored. Spin transfer torque magnetoresistive RAM (STT-MRAM) is one such nonvolatile memory relying on magnetic tunnel junctions (MTJs) to store information. It uses spin transfer torque to write information and magnetoresistance to read information. In 2012, Everspin Technologies, Inc. commercialized the first 64Mbit Spin Torque MRAM. On the computing end, nanomagnetic logic (NML) is a promising technique with zero leakage and high data retention. In 2000, Cowburn and Welland first demonstrated its potential in logic and information propagation through magnetostatic interaction in a chain of single domain circular nanomagnetic dots of Supermalloy (Ni80Fe14Mo5X1, X is other metals). In 2006, Imre et al. demonstrated wires and majority gates followed by coplanar cross wire systems demonstration in 2010 by Pulecio et al. Since 2004 researchers have also investigated the potential of MTJs in logic. More recently with dipolar coupling between MTJs demonstrated in 2012, logic-in-memory architecture with STT-MRAM have been investigated. The architecture borrows the computing concept from NML and read and write style from MRAM. The architecture can switch its operation between logic and memory modes with clock as classifier. Further through logic partitioning between MTJ and CMOS plane, a significant performance boost has been observed in basic computing blocks within the architecture. In this work, we have explored the developments in NML, in MTJs and more recent developments in hybrid MTJ/CMOS logic-in-memory architecture and its unique logic partitioning capability.
Design of a modular digital computer system, CDRL no. D001, final design plan
NASA Technical Reports Server (NTRS)
Easton, R. A.
1975-01-01
The engineering breadboard implementation for the CDRL no. D001 modular digital computer system developed during design of the logic system was documented. This effort followed the architecture study completed and documented previously, and was intended to verify the concepts of a fault tolerant, automatically reconfigurable, modular version of the computer system conceived during the architecture study. The system has a microprogrammed 32 bit word length, general register architecture and an instruction set consisting of a subset of the IBM System 360 instruction set plus additional fault tolerance firmware. The following areas were covered: breadboard packaging, central control element, central processing element, memory, input/output processor, and maintenance/status panel and electronics.
Experimental realization of entanglement in multiple degrees of freedom between two quantum memories
Zhang, Wei; Ding, Dong-Sheng; Dong, Ming-Xin; Shi, Shuai; Wang, Kai; Liu, Shi-Long; Li, Yan; Zhou, Zhi-Yuan; Shi, Bao-Sen; Guo, Guang-Can
2016-01-01
Entanglement in multiple degrees of freedom has many benefits over entanglement in a single one. The former enables quantum communication with higher channel capacity and more efficient quantum information processing and is compatible with diverse quantum networks. Establishing multi-degree-of-freedom entangled memories is not only vital for high-capacity quantum communication and computing, but also promising for enhanced violations of nonlocality in quantum systems. However, there have been yet no reports of the experimental realization of multi-degree-of-freedom entangled memories. Here we experimentally established hyper- and hybrid entanglement in multiple degrees of freedom, including path (K-vector) and orbital angular momentum, between two separated atomic ensembles by using quantum storage. The results are promising for achieving quantum communication and computing with many degrees of freedom. PMID:27841274
Managing internode data communications for an uninitialized process in a parallel computer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Archer, Charles J; Blocksome, Michael A; Miller, Douglas R
2014-05-20
A parallel computer includes nodes, each having main memory and a messaging unit (MU). Each MU includes computer memory, which in turn includes, MU message buffers. Each MU message buffer is associated with an uninitialized process on the compute node. In the parallel computer, managing internode data communications for an uninitialized process includes: receiving, by an MU of a compute node, one or more data communications messages in an MU message buffer associated with an uninitialized process on the compute node; determining, by an application agent, that the MU message buffer associated with the uninitialized process is full prior tomore » initialization of the uninitialized process; establishing, by the application agent, a temporary message buffer for the uninitialized process in main computer memory; and moving, by the application agent, data communications messages from the MU message buffer associated with the uninitialized process to the temporary message buffer in main computer memory.« less
Managing internode data communications for an uninitialized process in a parallel computer
Archer, Charles J; Blocksome, Michael A; Miller, Douglas R; Parker, Jeffrey J; Ratterman, Joseph D; Smith, Brian E
2014-05-20
A parallel computer includes nodes, each having main memory and a messaging unit (MU). Each MU includes computer memory, which in turn includes, MU message buffers. Each MU message buffer is associated with an uninitialized process on the compute node. In the parallel computer, managing internode data communications for an uninitialized process includes: receiving, by an MU of a compute node, one or more data communications messages in an MU message buffer associated with an uninitialized process on the compute node; determining, by an application agent, that the MU message buffer associated with the uninitialized process is full prior to initialization of the uninitialized process; establishing, by the application agent, a temporary message buffer for the uninitialized process in main computer memory; and moving, by the application agent, data communications messages from the MU message buffer associated with the uninitialized process to the temporary message buffer in main computer memory.
NASA Astrophysics Data System (ADS)
Kelley, Troy D.; McGhee, S.
2013-05-01
This paper describes the ongoing development of a robotic control architecture that inspired by computational cognitive architectures from the discipline of cognitive psychology. The Symbolic and Sub-Symbolic Robotics Intelligence Control System (SS-RICS) combines symbolic and sub-symbolic representations of knowledge into a unified control architecture. The new architecture leverages previous work in cognitive architectures, specifically the development of the Adaptive Character of Thought-Rational (ACT-R) and Soar. This paper details current work on learning from episodes or events. The use of episodic memory as a learning mechanism has, until recently, been largely ignored by computational cognitive architectures. This paper details work on metric level episodic memory streams and methods for translating episodes into abstract schemas. The presentation will include research on learning through novelty and self generated feedback mechanisms for autonomous systems.
Interactive water monitoring system accessible by cordless telephone
NASA Astrophysics Data System (ADS)
Volpicelli, Richard; Andeweg, Pierre; Hagar, William G.
1985-12-01
A battery-operated, microcomputer-controlled monitoring device linked with a cordless telephone has been developed for remote measurements. This environmental sensor is self-contained and collects and processes data according to the information sent to its on-board computer system. An RCA model 1805 microprocessor forms the basic controller with a program encoded in memory for data acquisition and analysis. Signals from analog sensing devices used to monitor the environment are converted into digital signals and stored in random access memory of the microcomputer. This remote sensing system is linked to the laboratory by means of a cordless telephone whose base unit is connected to regular telephone lines. This offshore sensing system is simply accessed by a phone call originating from a computer terminal in the laboratory. Data acquisition is initiated upon request: Information continues to be processed and stored until the computer is reprogrammed by another phone call request. Information obtained may be recalled by a phone call after the desired environmental measurements are finished or while they are in progress. Data sampling parameters may be reset at any time, including in the middle of a measurement cycle. The range of the system is limited only by existing telephone grid systems and by the transmission characteristics of the cordless phone used as a communications link. This use of a cordless telephone, coupled with the on-board computer system, may be applied to other field studies requiring data transfer between an on-site analytical system and the laboratory.
Distributed Name Servers: Naming and Caching in Large Distributed Computing Environments
1985-12-01
transmission rate of the communication medium1, transmission over a 56K bps line costs approx- imately 54r, and similarly, communication over a 9.6K...memories for modem computer systems attempt to maximize the hit ratio for a fixed-size cache by utilizing intelligent cache replacement algorithms
Buying Your Next (or First) PC: What Matters Now?
ERIC Educational Resources Information Center
Crawford, Walt
1993-01-01
Discussion of factors to consider in purchasing a personal computer covers present and future needs, computing environments, memory, processing performance, disk size, and display quality. Issues such as bundled systems, where and when to purchase, and vendor support are addressed; and an annotated bibliography of 28 recent articles is included.…
A Low Cost Microcomputer Laboratory for Investigating Computer Architecture.
ERIC Educational Resources Information Center
Mitchell, Eugene E., Ed.
1980-01-01
Described is a microcomputer laboratory at the United States Military Academy at West Point, New York, which provides easy access to non-volatile memory and a single input/output file system for 16 microcomputer laboratory positions. A microcomputer network that has a centralized data base is implemented using the concepts of computer network…
Graphics Processing Units for HEP trigger systems
NASA Astrophysics Data System (ADS)
Ammendola, R.; Bauce, M.; Biagioni, A.; Chiozzi, S.; Cotta Ramusino, A.; Fantechi, R.; Fiorini, M.; Giagu, S.; Gianoli, A.; Lamanna, G.; Lonardo, A.; Messina, A.; Neri, I.; Paolucci, P. S.; Piandani, R.; Pontisso, L.; Rescigno, M.; Simula, F.; Sozzi, M.; Vicini, P.
2016-07-01
General-purpose computing on GPUs (Graphics Processing Units) is emerging as a new paradigm in several fields of science, although so far applications have been tailored to the specific strengths of such devices as accelerator in offline computation. With the steady reduction of GPU latencies, and the increase in link and memory throughput, the use of such devices for real-time applications in high-energy physics data acquisition and trigger systems is becoming ripe. We will discuss the use of online parallel computing on GPU for synchronous low level trigger, focusing on CERN NA62 experiment trigger system. The use of GPU in higher level trigger system is also briefly considered.
CPMIP: measurements of real computational performance of Earth system models in CMIP6
NASA Astrophysics Data System (ADS)
Balaji, Venkatramani; Maisonnave, Eric; Zadeh, Niki; Lawrence, Bryan N.; Biercamp, Joachim; Fladrich, Uwe; Aloisio, Giovanni; Benson, Rusty; Caubel, Arnaud; Durachta, Jeffrey; Foujols, Marie-Alice; Lister, Grenville; Mocavero, Silvia; Underwood, Seth; Wright, Garrett
2017-01-01
A climate model represents a multitude of processes on a variety of timescales and space scales: a canonical example of multi-physics multi-scale modeling. The underlying climate system is physically characterized by sensitive dependence on initial conditions, and natural stochastic variability, so very long integrations are needed to extract signals of climate change. Algorithms generally possess weak scaling and can be I/O and/or memory-bound. Such weak-scaling, I/O, and memory-bound multi-physics codes present particular challenges to computational performance. Traditional metrics of computational efficiency such as performance counters and scaling curves do not tell us enough about real sustained performance from climate models on different machines. They also do not provide a satisfactory basis for comparative information across models. codes present particular challenges to computational performance. We introduce a set of metrics that can be used for the study of computational performance of climate (and Earth system) models. These measures do not require specialized software or specific hardware counters, and should be accessible to anyone. They are independent of platform and underlying parallel programming models. We show how these metrics can be used to measure actually attained performance of Earth system models on different machines, and identify the most fruitful areas of research and development for performance engineering. codes present particular challenges to computational performance. We present results for these measures for a diverse suite of models from several modeling centers, and propose to use these measures as a basis for a CPMIP, a computational performance model intercomparison project (MIP).
Shehzad, Danish; Bozkuş, Zeki
2016-01-01
Increase in complexity of neuronal network models escalated the efforts to make NEURON simulation environment efficient. The computational neuroscientists divided the equations into subnets amongst multiple processors for achieving better hardware performance. On parallel machines for neuronal networks, interprocessor spikes exchange consumes large section of overall simulation time. In NEURON for communication between processors Message Passing Interface (MPI) is used. MPI_Allgather collective is exercised for spikes exchange after each interval across distributed memory systems. The increase in number of processors though results in achieving concurrency and better performance but it inversely affects MPI_Allgather which increases communication time between processors. This necessitates improving communication methodology to decrease the spikes exchange time over distributed memory systems. This work has improved MPI_Allgather method using Remote Memory Access (RMA) by moving two-sided communication to one-sided communication, and use of recursive doubling mechanism facilitates achieving efficient communication between the processors in precise steps. This approach enhanced communication concurrency and has improved overall runtime making NEURON more efficient for simulation of large neuronal network models.
Bozkuş, Zeki
2016-01-01
Increase in complexity of neuronal network models escalated the efforts to make NEURON simulation environment efficient. The computational neuroscientists divided the equations into subnets amongst multiple processors for achieving better hardware performance. On parallel machines for neuronal networks, interprocessor spikes exchange consumes large section of overall simulation time. In NEURON for communication between processors Message Passing Interface (MPI) is used. MPI_Allgather collective is exercised for spikes exchange after each interval across distributed memory systems. The increase in number of processors though results in achieving concurrency and better performance but it inversely affects MPI_Allgather which increases communication time between processors. This necessitates improving communication methodology to decrease the spikes exchange time over distributed memory systems. This work has improved MPI_Allgather method using Remote Memory Access (RMA) by moving two-sided communication to one-sided communication, and use of recursive doubling mechanism facilitates achieving efficient communication between the processors in precise steps. This approach enhanced communication concurrency and has improved overall runtime making NEURON more efficient for simulation of large neuronal network models. PMID:27413363
NASA Astrophysics Data System (ADS)
Furuichi, Mikito; Nishiura, Daisuke
2017-10-01
We developed dynamic load-balancing algorithms for Particle Simulation Methods (PSM) involving short-range interactions, such as Smoothed Particle Hydrodynamics (SPH), Moving Particle Semi-implicit method (MPS), and Discrete Element method (DEM). These are needed to handle billions of particles modeled in large distributed-memory computer systems. Our method utilizes flexible orthogonal domain decomposition, allowing the sub-domain boundaries in the column to be different for each row. The imbalances in the execution time between parallel logical processes are treated as a nonlinear residual. Load-balancing is achieved by minimizing the residual within the framework of an iterative nonlinear solver, combined with a multigrid technique in the local smoother. Our iterative method is suitable for adjusting the sub-domain frequently by monitoring the performance of each computational process because it is computationally cheaper in terms of communication and memory costs than non-iterative methods. Numerical tests demonstrated the ability of our approach to handle workload imbalances arising from a non-uniform particle distribution, differences in particle types, or heterogeneous computer architecture which was difficult with previously proposed methods. We analyzed the parallel efficiency and scalability of our method using Earth simulator and K-computer supercomputer systems.
NASA Technical Reports Server (NTRS)
Park, Nohpill; Reagan, Shawn; Franks, Greg; Jones, William G.
1999-01-01
This paper discusses analytical approaches to evaluating performance of Spacecraft On-Board Computing systems, thereby ultimately achieving a reliable spacecraft data communications systems. The sensitivity analysis approach of memory system on the ProSEDS (Propulsive Small Expendable Deployer System) as a part of its data communication system will be investigated. Also, general issues and possible approaches to reliable Spacecraft On-Board Interconnection Network and Processor Array will be shown. The performance issues of a spacecraft on-board computing systems such as sensitivity, throughput, delay and reliability will be introduced and discussed.
Probabilistic resource allocation system with self-adaptive capability
NASA Technical Reports Server (NTRS)
Yufik, Yan M. (Inventor)
1996-01-01
A probabilistic resource allocation system is disclosed containing a low capacity computational module (Short Term Memory or STM) and a self-organizing associative network (Long Term Memory or LTM) where nodes represent elementary resources, terminal end nodes represent goals, and directed links represent the order of resource association in different allocation episodes. Goals and their priorities are indicated by the user, and allocation decisions are made in the STM, while candidate associations of resources are supplied by the LTM based on the association strength (reliability). Reliability values are automatically assigned to the network links based on the frequency and relative success of exercising those links in the previous allocation decisions. Accumulation of allocation history in the form of an associative network in the LTM reduces computational demands on subsequent allocations. For this purpose, the network automatically partitions itself into strongly associated high reliability packets, allowing fast approximate computation and display of allocation solutions satisfying the overall reliability and other user-imposed constraints. System performance improves in time due to modification of network parameters and partitioning criteria based on the performance feedback.
Probabilistic resource allocation system with self-adaptive capability
NASA Technical Reports Server (NTRS)
Yufik, Yan M. (Inventor)
1998-01-01
A probabilistic resource allocation system is disclosed containing a low capacity computational module (Short Term Memory or STM) and a self-organizing associative network (Long Term Memory or LTM) where nodes represent elementary resources, terminal end nodes represent goals, and weighted links represent the order of resource association in different allocation episodes. Goals and their priorities are indicated by the user, and allocation decisions are made in the STM, while candidate associations of resources are supplied by the LTM based on the association strength (reliability). Weights are automatically assigned to the network links based on the frequency and relative success of exercising those links in the previous allocation decisions. Accumulation of allocation history in the form of an associative network in the LTM reduces computational demands on subsequent allocations. For this purpose, the network automatically partitions itself into strongly associated high reliability packets, allowing fast approximate computation and display of allocation solutions satisfying the overall reliability and other user-imposed constraints. System performance improves in time due to modification of network parameters and partitioning criteria based on the performance feedback.
2010-01-01
service) High assurance software Distributed network-based battle management High performance computing supporting uniform and nonuniform memory...VNIR, MWIR, and LWIR high-resolution systems Wideband SAR systems RF and laser data links High-speed, high-power photodetector characteriza- tion...Antimonide (InSb) imaging system Long-wave infrared ( LWIR ) quantum well IR photodetector (QWIP) imaging system Research and Development Services
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ibrahim, Khaled Z.; Epifanovsky, Evgeny; Williams, Samuel W.
Coupled-cluster methods provide highly accurate models of molecular structure by explicit numerical calculation of tensors representing the correlation between electrons. These calculations are dominated by a sequence of tensor contractions, motivating the development of numerical libraries for such operations. While based on matrix-matrix multiplication, these libraries are specialized to exploit symmetries in the molecular structure and in electronic interactions, and thus reduce the size of the tensor representation and the complexity of contractions. The resulting algorithms are irregular and their parallelization has been previously achieved via the use of dynamic scheduling or specialized data decompositions. We introduce our efforts tomore » extend the Libtensor framework to work in the distributed memory environment in a scalable and energy efficient manner. We achieve up to 240 speedup compared with the best optimized shared memory implementation. We attain scalability to hundreds of thousands of compute cores on three distributed-memory architectures, (Cray XC30&XC40, BlueGene/Q), and on a heterogeneous GPU-CPU system (Cray XK7). As the bottlenecks shift from being compute-bound DGEMM's to communication-bound collectives as the size of the molecular system scales, we adopt two radically different parallelization approaches for handling load-imbalance. Nevertheless, we preserve a uni ed interface to both programming models to maintain the productivity of computational quantum chemists.« less
Programmable partitioning for high-performance coherence domains in a multiprocessor system
Blumrich, Matthias A [Ridgefield, CT; Salapura, Valentina [Chappaqua, NY
2011-01-25
A multiprocessor computing system and a method of logically partitioning a multiprocessor computing system are disclosed. The multiprocessor computing system comprises a multitude of processing units, and a multitude of snoop units. Each of the processing units includes a local cache, and the snoop units are provided for supporting cache coherency in the multiprocessor system. Each of the snoop units is connected to a respective one of the processing units and to all of the other snoop units. The multiprocessor computing system further includes a partitioning system for using the snoop units to partition the multitude of processing units into a plurality of independent, memory-consistent, adjustable-size processing groups. Preferably, when the processor units are partitioned into these processing groups, the partitioning system also configures the snoop units to maintain cache coherency within each of said groups.
Programmable stream prefetch with resource optimization
Boyle, Peter; Christ, Norman; Gara, Alan; Mawhinney, Robert; Ohmacht, Martin; Sugavanam, Krishnan
2013-01-08
A stream prefetch engine performs data retrieval in a parallel computing system. The engine receives a load request from at least one processor. The engine evaluates whether a first memory address requested in the load request is present and valid in a table. The engine checks whether there exists valid data corresponding to the first memory address in an array if the first memory address is present and valid in the table. The engine increments a prefetching depth of a first stream that the first memory address belongs to and fetching a cache line associated with the first memory address from the at least one cache memory device if there is not yet valid data corresponding to the first memory address in the array. The engine determines whether prefetching of additional data is needed for the first stream within its prefetching depth. The engine prefetches the additional data if the prefetching is needed.
Livermore Big Artificial Neural Network Toolkit
DOE Office of Scientific and Technical Information (OSTI.GOV)
Essen, Brian Van; Jacobs, Sam; Kim, Hyojin
2016-07-01
LBANN is a toolkit that is designed to train artificial neural networks efficiently on high performance computing architectures. It is optimized to take advantages of key High Performance Computing features to accelerate neural network training. Specifically it is optimized for low-latency, high bandwidth interconnects, node-local NVRAM, node-local GPU accelerators, and high bandwidth parallel file systems. It is built on top of the open source Elemental distributed-memory dense and spars-direct linear algebra and optimization library that is released under the BSD license. The algorithms contained within LBANN are drawn from the academic literature and implemented to work within a distributed-memory framework.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Agarwal, Sapan; Quach, Tu -Thach; Parekh, Ojas
In this study, the exponential increase in data over the last decade presents a significant challenge to analytics efforts that seek to process and interpret such data for various applications. Neural-inspired computing approaches are being developed in order to leverage the computational properties of the analog, low-power data processing observed in biological systems. Analog resistive memory crossbars can perform a parallel read or a vector-matrix multiplication as well as a parallel write or a rank-1 update with high computational efficiency. For an N × N crossbar, these two kernels can be O(N) more energy efficient than a conventional digital memory-basedmore » architecture. If the read operation is noise limited, the energy to read a column can be independent of the crossbar size (O(1)). These two kernels form the basis of many neuromorphic algorithms such as image, text, and speech recognition. For instance, these kernels can be applied to a neural sparse coding algorithm to give an O(N) reduction in energy for the entire algorithm when run with finite precision. Sparse coding is a rich problem with a host of applications including computer vision, object tracking, and more generally unsupervised learning.« less
Agarwal, Sapan; Quach, Tu -Thach; Parekh, Ojas; ...
2016-01-06
In this study, the exponential increase in data over the last decade presents a significant challenge to analytics efforts that seek to process and interpret such data for various applications. Neural-inspired computing approaches are being developed in order to leverage the computational properties of the analog, low-power data processing observed in biological systems. Analog resistive memory crossbars can perform a parallel read or a vector-matrix multiplication as well as a parallel write or a rank-1 update with high computational efficiency. For an N × N crossbar, these two kernels can be O(N) more energy efficient than a conventional digital memory-basedmore » architecture. If the read operation is noise limited, the energy to read a column can be independent of the crossbar size (O(1)). These two kernels form the basis of many neuromorphic algorithms such as image, text, and speech recognition. For instance, these kernels can be applied to a neural sparse coding algorithm to give an O(N) reduction in energy for the entire algorithm when run with finite precision. Sparse coding is a rich problem with a host of applications including computer vision, object tracking, and more generally unsupervised learning.« less
Hadwiger, M; Beyer, J; Jeong, Won-Ki; Pfister, H
2012-12-01
This paper presents the first volume visualization system that scales to petascale volumes imaged as a continuous stream of high-resolution electron microscopy images. Our architecture scales to dense, anisotropic petascale volumes because it: (1) decouples construction of the 3D multi-resolution representation required for visualization from data acquisition, and (2) decouples sample access time during ray-casting from the size of the multi-resolution hierarchy. Our system is designed around a scalable multi-resolution virtual memory architecture that handles missing data naturally, does not pre-compute any 3D multi-resolution representation such as an octree, and can accept a constant stream of 2D image tiles from the microscopes. A novelty of our system design is that it is visualization-driven: we restrict most computations to the visible volume data. Leveraging the virtual memory architecture, missing data are detected during volume ray-casting as cache misses, which are propagated backwards for on-demand out-of-core processing. 3D blocks of volume data are only constructed from 2D microscope image tiles when they have actually been accessed during ray-casting. We extensively evaluate our system design choices with respect to scalability and performance, compare to previous best-of-breed systems, and illustrate the effectiveness of our system for real microscopy data from neuroscience.
Operating System For Numerically Controlled Milling Machine
NASA Technical Reports Server (NTRS)
Ray, R. B.
1992-01-01
OPMILL program is operating system for Kearney and Trecker milling machine providing fast easy way to program manufacture of machine parts with IBM-compatible personal computer. Gives machinist "equation plotter" feature, which plots equations that define movements and converts equations to milling-machine-controlling program moving cutter along defined path. System includes tool-manager software handling up to 25 tools and automatically adjusts to account for each tool. Developed on IBM PS/2 computer running DOS 3.3 with 1 MB of random-access memory.
The computational nature of memory modification.
Gershman, Samuel J; Monfils, Marie-H; Norman, Kenneth A; Niv, Yael
2017-03-15
Retrieving a memory can modify its influence on subsequent behavior. We develop a computational theory of memory modification, according to which modification of a memory trace occurs through classical associative learning, but which memory trace is eligible for modification depends on a structure learning mechanism that discovers the units of association by segmenting the stream of experience into statistically distinct clusters (latent causes). New memories are formed when the structure learning mechanism infers that a new latent cause underlies current sensory observations. By the same token, old memories are modified when old and new sensory observations are inferred to have been generated by the same latent cause. We derive this framework from probabilistic principles, and present a computational implementation. Simulations demonstrate that our model can reproduce the major experimental findings from studies of memory modification in the Pavlovian conditioning literature.
Real-time depth processing for embedded platforms
NASA Astrophysics Data System (ADS)
Rahnama, Oscar; Makarov, Aleksej; Torr, Philip
2017-05-01
Obtaining depth information of a scene is an important requirement in many computer-vision and robotics applications. For embedded platforms, passive stereo systems have many advantages over their active counterparts (i.e. LiDAR, Infrared). They are power efficient, cheap, robust to lighting conditions and inherently synchronized to the RGB images of the scene. However, stereo depth estimation is a computationally expensive task that operates over large amounts of data. For embedded applications which are often constrained by power consumption, obtaining accurate results in real-time is a challenge. We demonstrate a computationally and memory efficient implementation of a stereo block-matching algorithm in FPGA. The computational core achieves a throughput of 577 fps at standard VGA resolution whilst consuming less than 3 Watts of power. The data is processed using an in-stream approach that minimizes memory-access bottlenecks and best matches the raster scan readout of modern digital image sensors.
Performance Models for Split-execution Computing Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Humble, Travis S; McCaskey, Alex; Schrock, Jonathan
Split-execution computing leverages the capabilities of multiple computational models to solve problems, but splitting program execution across different computational models incurs costs associated with the translation between domains. We analyze the performance of a split-execution computing system developed from conventional and quantum processing units (QPUs) by using behavioral models that track resource usage. We focus on asymmetric processing models built using conventional CPUs and a family of special-purpose QPUs that employ quantum computing principles. Our performance models account for the translation of a classical optimization problem into the physical representation required by the quantum processor while also accounting for hardwaremore » limitations and conventional processor speed and memory. We conclude that the bottleneck in this split-execution computing system lies at the quantum-classical interface and that the primary time cost is independent of quantum processor behavior.« less
The Remote Analysis Station (RAS) as an instructional system
NASA Technical Reports Server (NTRS)
Rogers, R. H.; Wilson, C. L.; Dye, R. H.; Jaworski, E.
1981-01-01
"Hands-on" training in LANDSAT data analysis techniques can be obtained using a desk-top, interactive remote analysis station (RAS) which consists of a color CRT imagery display, with alphanumeric overwrite and keyboard, as well as a cursor controller and modem. This portable station can communicate via modem and dial-up telephone with a host computer at 1200 baud or it can be hardwired to a host computer at 9600 baud. A Z80 microcomputer controls the display refresh memory and remote station processing. LANDSAT data is displayed as three-band false-color imagery, one-band color-sliced imagery, or color-coded processed imagery. Although the display memory routinely operates at 256 x 256 picture elements, a display resolution of 128 x 128 can be selected to fill the display faster. In the false color mode the computer packs the data into one 8-bit character. When the host is not sending pictorial information the characters sent are in ordinary ASCII code. System capabilities are described.
A Linked List-Based Algorithm for Blob Detection on Embedded Vision-Based Sensors.
Acevedo-Avila, Ricardo; Gonzalez-Mendoza, Miguel; Garcia-Garcia, Andres
2016-05-28
Blob detection is a common task in vision-based applications. Most existing algorithms are aimed at execution on general purpose computers; while very few can be adapted to the computing restrictions present in embedded platforms. This paper focuses on the design of an algorithm capable of real-time blob detection that minimizes system memory consumption. The proposed algorithm detects objects in one image scan; it is based on a linked-list data structure tree used to label blobs depending on their shape and node information. An example application showing the results of a blob detection co-processor has been built on a low-powered field programmable gate array hardware as a step towards developing a smart video surveillance system. The detection method is intended for general purpose application. As such, several test cases focused on character recognition are also examined. The results obtained present a fair trade-off between accuracy and memory requirements; and prove the validity of the proposed approach for real-time implementation on resource-constrained computing platforms.
Benchmarking Memory Performance with the Data Cube Operator
NASA Technical Reports Server (NTRS)
Frumkin, Michael A.; Shabanov, Leonid V.
2004-01-01
Data movement across a computer memory hierarchy and across computational grids is known to be a limiting factor for applications processing large data sets. We use the Data Cube Operator on an Arithmetic Data Set, called ADC, to benchmark capabilities of computers and of computational grids to handle large distributed data sets. We present a prototype implementation of a parallel algorithm for computation of the operatol: The algorithm follows a known approach for computing views from the smallest parent. The ADC stresses all levels of grid memory and storage by producing some of 2d views of an Arithmetic Data Set of d-tuples described by a small number of integers. We control data intensity of the ADC by selecting the tuple parameters, the sizes of the views, and the number of realized views. Benchmarking results of memory performance of a number of computer architectures and of a small computational grid are presented.
Naval Research Laboratory Fact Book 2012
2012-11-01
Distributed network-based battle management High performance computing supporting uniform and nonuniform memory access with single and multithreaded...hyperspectral systems VNIR, MWIR, and LWIR high-resolution systems Wideband SAR systems RF and laser data links High-speed, high-power...hyperspectral imaging system Long-wave infrared ( LWIR ) quantum well IR photodetector (QWIP) imaging system Research and Development Services Divi- sion
Towards reversible basic linear algebra subprograms: A performance study
Perumalla, Kalyan S.; Yoginath, Srikanth B.
2014-12-06
Problems such as fault tolerance and scalable synchronization can be efficiently solved using reversibility of applications. Making applications reversible by relying on computation rather than on memory is ideal for large scale parallel computing, especially for the next generation of supercomputers in which memory is expensive in terms of latency, energy, and price. In this direction, a case study is presented here in reversing a computational core, namely, Basic Linear Algebra Subprograms, which is widely used in scientific applications. A new Reversible BLAS (RBLAS) library interface has been designed, and a prototype has been implemented with two modes: (1) amore » memory-mode in which reversibility is obtained by checkpointing to memory in forward and restoring from memory in reverse, and (2) a computational-mode in which nothing is saved in the forward, but restoration is done entirely via inverse computation in reverse. The article is focused on detailed performance benchmarking to evaluate the runtime dynamics and performance effects, comparing reversible computation with checkpointing on both traditional CPU platforms and recent GPU accelerator platforms. For BLAS Level-1 subprograms, data indicates over an order of magnitude better speed of reversible computation compared to checkpointing. For BLAS Level-2 and Level-3, a more complex tradeoff is observed between reversible computation and checkpointing, depending on computational and memory complexities of the subprograms.« less
Wang, Degeng
2008-01-01
Discrepancy between the abundance of cognate protein and RNA molecules is frequently observed. A theoretical understanding of this discrepancy remains elusive, and it is frequently described as surprises and/or technical difficulties in the literature. Protein and RNA represent different steps of the multi-stepped cellular genetic information flow process, in which they are dynamically produced and degraded. This paper explores a comparison with a similar process in computers - multi-step information flow from storage level to the execution level. Functional similarities can be found in almost every facet of the retrieval process. Firstly, common architecture is shared, as the ribonome (RNA space) and the proteome (protein space) are functionally similar to the computer primary memory and the computer cache memory respectively. Secondly, the retrieval process functions, in both systems, to support the operation of dynamic networks – biochemical regulatory networks in cells and, in computers, the virtual networks (of CPU instructions) that the CPU travels through while executing computer programs. Moreover, many regulatory techniques are implemented in computers at each step of the information retrieval process, with a goal of optimizing system performance. Cellular counterparts can be easily identified for these regulatory techniques. In other words, this comparative study attempted to utilize theoretical insight from computer system design principles as catalysis to sketch an integrative view of the gene expression process, that is, how it functions to ensure efficient operation of the overall cellular regulatory network. In context of this bird’s-eye view, discrepancy between protein and RNA abundance became a logical observation one would expect. It was suggested that this discrepancy, when interpreted in the context of system operation, serves as a potential source of information to decipher regulatory logics underneath biochemical network operation. PMID:18757239
Wang, Degeng
2008-12-01
Discrepancy between the abundance of cognate protein and RNA molecules is frequently observed. A theoretical understanding of this discrepancy remains elusive, and it is frequently described as surprises and/or technical difficulties in the literature. Protein and RNA represent different steps of the multi-stepped cellular genetic information flow process, in which they are dynamically produced and degraded. This paper explores a comparison with a similar process in computers-multi-step information flow from storage level to the execution level. Functional similarities can be found in almost every facet of the retrieval process. Firstly, common architecture is shared, as the ribonome (RNA space) and the proteome (protein space) are functionally similar to the computer primary memory and the computer cache memory, respectively. Secondly, the retrieval process functions, in both systems, to support the operation of dynamic networks-biochemical regulatory networks in cells and, in computers, the virtual networks (of CPU instructions) that the CPU travels through while executing computer programs. Moreover, many regulatory techniques are implemented in computers at each step of the information retrieval process, with a goal of optimizing system performance. Cellular counterparts can be easily identified for these regulatory techniques. In other words, this comparative study attempted to utilize theoretical insight from computer system design principles as catalysis to sketch an integrative view of the gene expression process, that is, how it functions to ensure efficient operation of the overall cellular regulatory network. In context of this bird's-eye view, discrepancy between protein and RNA abundance became a logical observation one would expect. It was suggested that this discrepancy, when interpreted in the context of system operation, serves as a potential source of information to decipher regulatory logics underneath biochemical network operation.
Computationally Efficient Modeling and Simulation of Large Scale Systems
NASA Technical Reports Server (NTRS)
Jain, Jitesh (Inventor); Koh, Cheng-Kok (Inventor); Balakrishnan, Vankataramanan (Inventor); Cauley, Stephen F (Inventor); Li, Hong (Inventor)
2014-01-01
A system for simulating operation of a VLSI interconnect structure having capacitive and inductive coupling between nodes thereof, including a processor, and a memory, the processor configured to perform obtaining a matrix X and a matrix Y containing different combinations of passive circuit element values for the interconnect structure, the element values for each matrix including inductance L and inverse capacitance P, obtaining an adjacency matrix A associated with the interconnect structure, storing the matrices X, Y, and A in the memory, and performing numerical integration to solve first and second equations.
Department of Defense In-House RDT and E Activities: Management Analysis Report for Fiscal Year 1993
1994-11-01
A worldwide unique lab because it houses a high - speed modeling and simulation system, a prototype...E Division, San Diego, CA: High Performance Computing Laboratory providing a wide range of advanced computer systems for the scientific investigation...Machines CM-200 and a 256-node Thinking Machines CM-S. The CM-5 is in a very large memory, ( high performance 32 Gbytes, >4 0 OFlop) coafiguration,
DOE Office of Scientific and Technical Information (OSTI.GOV)
Muller, U.A.; Baumle, B.; Kohler, P.
1992-10-01
Music, a DSP-based system with a parallel distributed-memory architecture, provides enormous computing power yet retains the flexibility of a general-purpose computer. Reaching a peak performance of 2.7 Gflops at a significantly lower cost, power consumption, and space requirement than conventional supercomputers, Music is well suited to computationally intensive applications such as neural network simulation. 12 refs., 9 figs., 2 tabs.
A Heterogeneous Multiprocessor Graphics System Using Processor-Enhanced Memories
1989-02-01
frames per second, font generation directly from conic spline descriptions, and rapid calculation of radiosity form factors. The hardware consists of...generality for rendering curved surfaces, volume data, objects dcscri id with Constructive Solid Geometry, for rendering scenes using the radiosity ...f.aces and for computing a spherical radiosity lighting model (see Section 7.6). Custom Memory Chips \\ 208 bits x 128 pixels - Renderer Board ix p o a
CMOS Camera Array With Onboard Memory
NASA Technical Reports Server (NTRS)
Gat, Nahum
2009-01-01
A compact CMOS (complementary metal oxide semiconductor) camera system has been developed with high resolution (1.3 Megapixels), a USB (universal serial bus) 2.0 interface, and an onboard memory. Exposure times, and other operating parameters, are sent from a control PC via the USB port. Data from the camera can be received via the USB port and the interface allows for simple control and data capture through a laptop computer.
Investigation of single crystal ferrite thin films
NASA Technical Reports Server (NTRS)
Mee, J. E.; Besser, P. J.; Elkins, P. E.; Glass, H. L.; Whitcomb, E. C.
1972-01-01
Materials suitable for use in magnetic bubble domain memories were developed for aerospace applications. Practical techniques for the preparation of such materials in forms required for fabrication of computer memory devices were considered. The materials studied were epitaxial films of various compositions of the gallium-substituted yttrium gadolinium iron garnet system. The major emphasis was to determine their bubble properties and the conditions necessary for growing uncracked, high quality films.
Supplement request for Support of MRS Symposium (PECASE: Active Microstructured Polymer Systems)
2015-07-06
materials (e.g., gels, polymers, liquids , liquid crystals and photosensitive materials) that can change shape in a controlled response to stimuli. These...Rogers1. 1, , University of Illinois, Urbana, Illinois, USA. Show Abstract 8:45 AM - *XX1.02 New Wonders of Nafion : Shape Memory, Temperature Memory... Liquid Crystal Institute, Kent State University, Kent, Ohio, USA; 5, Department of Electrical and Computer Engineering, University of Idaho, Moscow
A Survey Of Architectural Approaches for Managing Embedded DRAM and Non-volatile On-chip Caches
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mittal, Sparsh; Vetter, Jeffrey S; Li, Dong
Recent trends of CMOS scaling and increasing number of on-chip cores have led to a large increase in the size of on-chip caches. Since SRAM has low density and consumes large amount of leakage power, its use in designing on-chip caches has become more challenging. To address this issue, researchers are exploring the use of several emerging memory technologies, such as embedded DRAM, spin transfer torque RAM, resistive RAM, phase change RAM and domain wall memory. In this paper, we survey the architectural approaches proposed for designing memory systems and, specifically, caches with these emerging memory technologies. To highlight theirmore » similarities and differences, we present a classification of these technologies and architectural approaches based on their key characteristics. We also briefly summarize the challenges in using these technologies for architecting caches. We believe that this survey will help the readers gain insights into the emerging memory device technologies, and their potential use in designing future computing systems.« less
Super-Memorizers Are Not Super-Recognizers
Ramon, Meike; Miellet, Sebastien; Dzieciol, Anna M.; Konrad, Boris Nikolai
2016-01-01
Humans have a natural expertise in recognizing faces. However, the nature of the interaction between this critical visual biological skill and memory is yet unclear. Here, we had the unique opportunity to test two individuals who have had exceptional success in the World Memory Championships, including several world records in face-name association memory. We designed a range of face processing tasks to determine whether superior/expert face memory skills are associated with distinctive perceptual strategies for processing faces. Superior memorizers excelled at tasks involving associative face-name learning. Nevertheless, they were as impaired as controls in tasks probing the efficiency of the face system: face inversion and the other-race effect. Super memorizers did not show increased hippocampal volumes, and exhibited optimal generic eye movement strategies when they performed complex multi-item face-name associations. Our data show that the visual computations of the face system are not malleable and are robust to acquired expertise involving extensive training of associative memory. PMID:27008627
Super-Memorizers Are Not Super-Recognizers.
Ramon, Meike; Miellet, Sebastien; Dzieciol, Anna M; Konrad, Boris Nikolai; Dresler, Martin; Caldara, Roberto
2016-01-01
Humans have a natural expertise in recognizing faces. However, the nature of the interaction between this critical visual biological skill and memory is yet unclear. Here, we had the unique opportunity to test two individuals who have had exceptional success in the World Memory Championships, including several world records in face-name association memory. We designed a range of face processing tasks to determine whether superior/expert face memory skills are associated with distinctive perceptual strategies for processing faces. Superior memorizers excelled at tasks involving associative face-name learning. Nevertheless, they were as impaired as controls in tasks probing the efficiency of the face system: face inversion and the other-race effect. Super memorizers did not show increased hippocampal volumes, and exhibited optimal generic eye movement strategies when they performed complex multi-item face-name associations. Our data show that the visual computations of the face system are not malleable and are robust to acquired expertise involving extensive training of associative memory.
Bandlimited computerized improvements in characterization of nonlinear systems with memory
NASA Astrophysics Data System (ADS)
Nuttall, Albert H.; Katz, Richard A.; Hughes, Derke R.; Koch, Robert M.
2016-05-01
The present article discusses some inroads in nonlinear signal processing made by the prime algorithm developer, Dr. Albert H. Nuttall and co-authors, a consortium of research scientists from the Naval Undersea Warfare Center Division, Newport, RI. The algorithm, called the Nuttall-Wiener-Volterra 'NWV' algorithm is named for its principal contributors [1], [2],[ 3] over many years of developmental research. The NWV algorithm significantly reduces the computational workload for characterizing nonlinear systems with memory. Following this formulation, two measurement waveforms on the system are required in order to characterize a specified nonlinear system under consideration: (1) an excitation input waveform, x(t) (the transmitted signal); and, (2) a response output waveform, z(t) (the received signal). Given these two measurement waveforms for a given propagation channel, a 'kernel' or 'channel response', h= [h0,h1,h2,h3] between the two measurement points, is computed via a least squares approach that optimizes modeled kernel values by performing a best fit between measured response z(t) and a modeled response y(t). New techniques significantly diminish the exponential growth of the number of computed kernel coefficients at second and third order in order to combat and reasonably alleviate the curse of dimensionality.
Out-of-Core Streamline Visualization on Large Unstructured Meshes
NASA Technical Reports Server (NTRS)
Ueng, Shyh-Kuang; Sikorski, K.; Ma, Kwan-Liu
1997-01-01
It's advantageous for computational scientists to have the capability to perform interactive visualization on their desktop workstations. For data on large unstructured meshes, this capability is not generally available. In particular, particle tracing on unstructured grids can result in a high percentage of non-contiguous memory accesses and therefore may perform very poorly with virtual memory paging schemes. The alternative of visualizing a lower resolution of the data degrades the original high-resolution calculations. This paper presents an out-of-core approach for interactive streamline construction on large unstructured tetrahedral meshes containing millions of elements. The out-of-core algorithm uses an octree to partition and restructure the raw data into subsets stored into disk files for fast data retrieval. A memory management policy tailored to the streamline calculations is used such that during the streamline construction only a very small amount of data are brought into the main memory on demand. By carefully scheduling computation and data fetching, the overhead of reading data from the disk is significantly reduced and good memory performance results. This out-of-core algorithm makes possible interactive streamline visualization of large unstructured-grid data sets on a single mid-range workstation with relatively low main-memory capacity: 5-20 megabytes. Our test results also show that this approach is much more efficient than relying on virtual memory and operating system's paging algorithms.
Towards Scalable Graph Computation on Mobile Devices.
Chen, Yiqi; Lin, Zhiyuan; Pienta, Robert; Kahng, Minsuk; Chau, Duen Horng
2014-10-01
Mobile devices have become increasingly central to our everyday activities, due to their portability, multi-touch capabilities, and ever-improving computational power. Such attractive features have spurred research interest in leveraging mobile devices for computation. We explore a novel approach that aims to use a single mobile device to perform scalable graph computation on large graphs that do not fit in the device's limited main memory, opening up the possibility of performing on-device analysis of large datasets, without relying on the cloud. Based on the familiar memory mapping capability provided by today's mobile operating systems, our approach to scale up computation is powerful and intentionally kept simple to maximize its applicability across the iOS and Android platforms. Our experiments demonstrate that an iPad mini can perform fast computation on large real graphs with as many as 272 million edges (Google+ social graph), at a speed that is only a few times slower than a 13″ Macbook Pro. Through creating a real world iOS app with this technique, we demonstrate the strong potential application for scalable graph computation on a single mobile device using our approach.
Towards Scalable Graph Computation on Mobile Devices
Chen, Yiqi; Lin, Zhiyuan; Pienta, Robert; Kahng, Minsuk; Chau, Duen Horng
2015-01-01
Mobile devices have become increasingly central to our everyday activities, due to their portability, multi-touch capabilities, and ever-improving computational power. Such attractive features have spurred research interest in leveraging mobile devices for computation. We explore a novel approach that aims to use a single mobile device to perform scalable graph computation on large graphs that do not fit in the device's limited main memory, opening up the possibility of performing on-device analysis of large datasets, without relying on the cloud. Based on the familiar memory mapping capability provided by today's mobile operating systems, our approach to scale up computation is powerful and intentionally kept simple to maximize its applicability across the iOS and Android platforms. Our experiments demonstrate that an iPad mini can perform fast computation on large real graphs with as many as 272 million edges (Google+ social graph), at a speed that is only a few times slower than a 13″ Macbook Pro. Through creating a real world iOS app with this technique, we demonstrate the strong potential application for scalable graph computation on a single mobile device using our approach. PMID:25859564
A Neural Network Architecture For Rapid Model Indexing In Computer Vision Systems
NASA Astrophysics Data System (ADS)
Pawlicki, Ted
1988-03-01
Models of objects stored in memory have been shown to be useful for guiding the processing of computer vision systems. A major consideration in such systems, however, is how stored models are initially accessed and indexed by the system. As the number of stored models increases, the time required to search memory for the correct model becomes high. Parallel distributed, connectionist, neural networks' have been shown to have appealing content addressable memory properties. This paper discusses an architecture for efficient storage and reference of model memories stored as stable patterns of activity in a parallel, distributed, connectionist, neural network. The emergent properties of content addressability and resistance to noise are exploited to perform indexing of the appropriate object centered model from image centered primitives. The system consists of three network modules each of which represent information relative to a different frame of reference. The model memory network is a large state space vector where fields in the vector correspond to ordered component objects and relative, object based spatial relationships between the component objects. The component assertion network represents evidence about the existence of object primitives in the input image. It establishes local frames of reference for object primitives relative to the image based frame of reference. The spatial relationship constraint network is an intermediate representation which enables the association between the object based and the image based frames of reference. This intermediate level represents information about possible object orderings and establishes relative spatial relationships from the image based information in the component assertion network below. It is also constrained by the lawful object orderings in the model memory network above. The system design is consistent with current psychological theories of recognition by component. It also seems to support Marr's notions of hierarchical indexing. (i.e. the specificity, adjunct, and parent indices) It supports the notion that multiple canonical views of an object may have to be stored in memory to enable its efficient identification. The use of variable fields in the state space vectors appears to keep the number of required nodes in the network down to a tractable number while imposing a semantic value on different areas of the state space. This semantic imposition supports an interface between the analogical aspects of neural networks and the propositional paradigms of symbolic processing.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bender, Michael A.; Berry, Jonathan W.; Hammond, Simon D.
A challenge in computer architecture is that processors often cannot be fed data from DRAM as fast as CPUs can consume it. Therefore, many applications are memory-bandwidth bound. With this motivation and the realization that traditional architectures (with all DRAM reachable only via bus) are insufficient to feed groups of modern processing units, vendors have introduced a variety of non-DDR 3D memory technologies (Hybrid Memory Cube (HMC),Wide I/O 2, High Bandwidth Memory (HBM)). These offer higher bandwidth and lower power by stacking DRAM chips on the processor or nearby on a silicon interposer. We will call these solutions “near-memory,” andmore » if user-addressable, “scratchpad.” High-performance systems on the market now offer two levels of main memory: near-memory on package and traditional DRAM further away. In the near term we expect the latencies near-memory and DRAM to be similar. Here, it is natural to think of near-memory as another module on the DRAM level of the memory hierarchy. Vendors are expected to offer modes in which the near memory is used as cache, but we believe that this will be inefficient.« less
40 CFR 1033.112 - Emission diagnostics for SCR systems.
Code of Federal Regulations, 2010 CFR
2010-07-01
... 40 Protection of Environment 32 2010-07-01 2010-07-01 false Emission diagnostics for SCR systems. 1033.112 Section 1033.112 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) AIR... computer memory all incidents of engine operation with inadequate reductant injection or reductant quality...
High-Assurance System Support through 3-D Integration
2007-11-09
algorithms ), tagging, and in selected systems, offensive mecha- nisms. For example, we can exploit the control plane to tag all traffic traveling...October 2005. [35] D. Page. Theoretical use of cache memory as a cryptanalytic side-channel. Technical Report CSTR - 02-003, Department of Computer
Parallel structures in human and computer memory
NASA Astrophysics Data System (ADS)
Kanerva, Pentti
1986-08-01
If we think of our experiences as being recorded continuously on film, then human memory can be compared to a film library that is indexed by the contents of the film strips stored in it. Moreover, approximate retrieval cues suffice to retrieve information stored in this library: We recognize a familiar person in a fuzzy photograph or a familiar tune played on a strange instrument. This paper is about how to construct a computer memory that would allow a computer to recognize patterns and to recall sequences the way humans do. Such a memory is remarkably similar in structure to a conventional computer memory and also to the neural circuits in the cortex of the cerebellum of the human brain. The paper concludes that the frame problem of artificial intelligence could be solved by the use of such a memory if we were able to encode information about the world properly.
Parallel structures in human and computer memory
NASA Technical Reports Server (NTRS)
Kanerva, P.
1986-01-01
If one thinks of our experiences as being recorded continuously on film, then human memory can be compared to a film library that is indexed by the contents of the film strips stored in it. Moreover, approximate retrieval cues suffice to retrieve information stored in this library. One recognizes a familiar person in a fuzzy photograph or a familiar tune played on a strange instrument. A computer memory that would allow a computer to recognize patterns and to recall sequences the way humans do is constructed. Such a memory is remarkably similiar in structure to a conventional computer memory and also to the neural circuits in the cortex of the cerebellum of the human brain. It is concluded that the frame problem of artificial intelligence could be solved by the use of such a memory if one were able to encode information about the world properly.
Xu, Dongrong; Hao, Xuejun; Wang, Zhishun; Duan, Yunsuo; Liu, Feng; Marsh, Rachel; Yu, Shan; Peterson, Bradley S.
2015-01-01
An increasing number of functional brain imaging studies are employing computer-based virtual reality (VR) to study changes in brain activity during the performance of high-level psychological and cognitive tasks. We report the development of a VR radial arm maze that adapts for human use in a scanning environment with the same general experimental design of behavioral tasks as that has been used with remarkable effectiveness for the study of multiple memory systems in rodents. The software platform is independent of specific computer hardware and operating systems, as we aim to provide shared access to this technology by the research community. We hope that doing so will provide greater standardization of software platform and study paradigm that will reduce variability and improve the comparability of findings across studies. We report the details of the design and implementation of this platform and provide information for downloading of the system for demonstration and research applications. PMID:26366052
Imaging System Model Crammed Into A 32K Microcomputer
NASA Astrophysics Data System (ADS)
Tyson, Robert K.
1986-12-01
An imaging system model, based upon linear systems theory, has been developed for a microcomputer with less than 32K of free random access memory (RAM). The model includes diffraction effects of the optics, aberrations in the optics, and atmospheric propagation transfer functions. Variables include pupil geometry, magnitude and character of the aberrations, and strength of atmospheric turbulence ("seeing"). Both coherent and incoherent image formation can be evaluated. The techniques employed for crowding the model into a very small computer will be discussed in detail. Simplifying assumptions for the diffraction and aberration phenomena will be shown along with practical considerations in modeling the optical system. Particular emphasis is placed on avoiding inaccuracies in modeling the pupil and the associated optical transfer function knowing limits on spatial frequency content and resolution. Memory and runtime constraints are analyzed stressing the efficient use of assembly language Fourier transform routines, disk input/output, and graphic displays. The compromises between computer time, limited RAM, and scientific accuracy will be given with techniques for balancing these parameters for individual needs.
Static Memory Deduplication for Performance Optimization in Cloud Computing.
Jia, Gangyong; Han, Guangjie; Wang, Hao; Yang, Xuan
2017-04-27
In a cloud computing environment, the number of virtual machines (VMs) on a single physical server and the number of applications running on each VM are continuously growing. This has led to an enormous increase in the demand of memory capacity and subsequent increase in the energy consumption in the cloud. Lack of enough memory has become a major bottleneck for scalability and performance of virtualization interfaces in cloud computing. To address this problem, memory deduplication techniques which reduce memory demand through page sharing are being adopted. However, such techniques suffer from overheads in terms of number of online comparisons required for the memory deduplication. In this paper, we propose a static memory deduplication (SMD) technique which can reduce memory capacity requirement and provide performance optimization in cloud computing. The main innovation of SMD is that the process of page detection is performed offline, thus potentially reducing the performance cost, especially in terms of response time. In SMD, page comparisons are restricted to the code segment, which has the highest shared content. Our experimental results show that SMD efficiently reduces memory capacity requirement and improves performance. We demonstrate that, compared to other approaches, the cost in terms of the response time is negligible.
Static Memory Deduplication for Performance Optimization in Cloud Computing
Jia, Gangyong; Han, Guangjie; Wang, Hao; Yang, Xuan
2017-01-01
In a cloud computing environment, the number of virtual machines (VMs) on a single physical server and the number of applications running on each VM are continuously growing. This has led to an enormous increase in the demand of memory capacity and subsequent increase in the energy consumption in the cloud. Lack of enough memory has become a major bottleneck for scalability and performance of virtualization interfaces in cloud computing. To address this problem, memory deduplication techniques which reduce memory demand through page sharing are being adopted. However, such techniques suffer from overheads in terms of number of online comparisons required for the memory deduplication. In this paper, we propose a static memory deduplication (SMD) technique which can reduce memory capacity requirement and provide performance optimization in cloud computing. The main innovation of SMD is that the process of page detection is performed offline, thus potentially reducing the performance cost, especially in terms of response time. In SMD, page comparisons are restricted to the code segment, which has the highest shared content. Our experimental results show that SMD efficiently reduces memory capacity requirement and improves performance. We demonstrate that, compared to other approaches, the cost in terms of the response time is negligible. PMID:28448434
NASA Astrophysics Data System (ADS)
Ogiwara, Akifumi; Maekawa, Hikaru; Watanabe, Minoru; Moriwaki, Retsu
2014-02-01
A holographic polymer-dispersed liquid crystal (HPDLC) memory to record multi-context information for an optically reconfigurable gate array is formed by the angle-multiplexing recording using a successive laser exposure in liquid crystal (LC) composites. The laser illumination system is constructed using the half mirror and photomask written by the different configuration contexts placed on the motorized stages under the control of a personal computer. The fabricated holographic memory implements a precise reconstruction of configuration contexts corresponding to the various logical circuits such as OR circuit and NOR circuit by the laser illumination at different incident angle in the HPDLC memory.
Klooster, Nathaniel B.; Cook, Susan W.; Uc, Ergun Y.; Duff, Melissa C.
2015-01-01
Hand gesture, a ubiquitous feature of human interaction, facilitates communication. Gesture also facilitates new learning, benefiting speakers and listeners alike. Thus, gestures must impact cognition beyond simply supporting the expression of already-formed ideas. However, the cognitive and neural mechanisms supporting the effects of gesture on learning and memory are largely unknown. We hypothesized that gesture's ability to drive new learning is supported by procedural memory and that procedural memory deficits will disrupt gesture production and comprehension. We tested this proposal in patients with intact declarative memory, but impaired procedural memory as a consequence of Parkinson's disease (PD), and healthy comparison participants with intact declarative and procedural memory. In separate experiments, we manipulated the gestures participants saw and produced in a Tower of Hanoi (TOH) paradigm. In the first experiment, participants solved the task either on a physical board, requiring high arching movements to manipulate the discs from peg to peg, or on a computer, requiring only flat, sideways movements of the mouse. When explaining the task, healthy participants with intact procedural memory displayed evidence of their previous experience in their gestures, producing higher, more arching hand gestures after solving on a physical board, and smaller, flatter gestures after solving on a computer. In the second experiment, healthy participants who saw high arching hand gestures in an explanation prior to solving the task subsequently moved the mouse with significantly higher curvature than those who saw smaller, flatter gestures prior to solving the task. These patterns were absent in both gesture production and comprehension experiments in patients with procedural memory impairment. These findings suggest that the procedural memory system supports the ability of gesture to drive new learning. PMID:25628556
FRIT characterized hierarchical kernel memory arrangement for multiband palmprint recognition
NASA Astrophysics Data System (ADS)
Kisku, Dakshina R.; Gupta, Phalguni; Sing, Jamuna K.
2015-10-01
In this paper, we present a hierarchical kernel associative memory (H-KAM) based computational model with Finite Ridgelet Transform (FRIT) representation for multispectral palmprint recognition. To characterize a multispectral palmprint image, the Finite Ridgelet Transform is used to achieve a very compact and distinctive representation of linear singularities while it also captures the singularities along lines and edges. The proposed system makes use of Finite Ridgelet Transform to represent multispectral palmprint image and it is then modeled by Kernel Associative Memories. Finally, the recognition scheme is thoroughly tested with a benchmarking multispectral palmprint database CASIA. For recognition purpose a Bayesian classifier is used. The experimental results exhibit robustness of the proposed system under different wavelengths of palm image.
Memory-based frame synchronizer. [for digital communication systems
NASA Technical Reports Server (NTRS)
Stattel, R. J.; Niswander, J. K. (Inventor)
1981-01-01
A frame synchronizer for use in digital communications systems wherein data formats can be easily and dynamically changed is described. The use of memory array elements provide increased flexibility in format selection and sync word selection in addition to real time reconfiguration ability. The frame synchronizer comprises a serial-to-parallel converter which converts a serial input data stream to a constantly changing parallel data output. This parallel data output is supplied to programmable sync word recognizers each consisting of a multiplexer and a random access memory (RAM). The multiplexer is connected to both the parallel data output and an address bus which may be connected to a microprocessor or computer for purposes of programming the sync word recognizer. The RAM is used as an associative memory or decorder and is programmed to identify a specific sync word. Additional programmable RAMs are used as counter decoders to define word bit length, frame word length, and paragraph frame length.
Flash drive memory apparatus and method
NASA Technical Reports Server (NTRS)
Hinchey, Michael G. (Inventor)
2010-01-01
A memory apparatus includes a non-volatile computer memory, a USB mass storage controller connected to the non-volatile computer memory, the USB mass storage controller including a daisy chain component, a male USB interface connected to the USB mass storage controller, and at least one other interface for a memory device, other than a USB interface, the at least one other interface being connected to the USB mass storage controller.
The computational nature of memory modification
Gershman, Samuel J; Monfils, Marie-H; Norman, Kenneth A; Niv, Yael
2017-01-01
Retrieving a memory can modify its influence on subsequent behavior. We develop a computational theory of memory modification, according to which modification of a memory trace occurs through classical associative learning, but which memory trace is eligible for modification depends on a structure learning mechanism that discovers the units of association by segmenting the stream of experience into statistically distinct clusters (latent causes). New memories are formed when the structure learning mechanism infers that a new latent cause underlies current sensory observations. By the same token, old memories are modified when old and new sensory observations are inferred to have been generated by the same latent cause. We derive this framework from probabilistic principles, and present a computational implementation. Simulations demonstrate that our model can reproduce the major experimental findings from studies of memory modification in the Pavlovian conditioning literature. DOI: http://dx.doi.org/10.7554/eLife.23763.001 PMID:28294944
Data systems and computer science programs: Overview
NASA Technical Reports Server (NTRS)
Smith, Paul H.; Hunter, Paul
1991-01-01
An external review of the Integrated Technology Plan for the Civil Space Program is presented. The topics are presented in viewgraph form and include the following: onboard memory and storage technology; advanced flight computers; special purpose flight processors; onboard networking and testbeds; information archive, access, and retrieval; visualization; neural networks; software engineering; and flight control and operations.
Photonic Diagnostic Technique For Thin Photoactive Films
NASA Technical Reports Server (NTRS)
Thakoor, Sarita
1996-01-01
Photonic diagnostic technique developed for use in noninvasive, rapid evaluation of thin paraelectric/ferroelectric films. Method proves useful in basic research, on-line monitoring for quality control at any stage of fabrication, and development of novel optoelectronic systems. Used to predict imprint-prone memory cells, and to study time evolution of defects in ferroelectric memories during processing. Plays vital role in enabling high-density ferroelectric memory manufacturing. One potential application lies in use of photoresponse for nondestructive readout of polarization memory states in high-density, high-speed memory devices. In another application, extension of basic concept of method makes possible to develop specially tailored ferrocapacitor to act as programmable detector, wherein remanent polarization used to modulate photoresponse. Large arrays of such detectors useful in optoelectronic processing, computing, and communication.
Memory as Perception of the Past: Compressed Time inMind and Brain.
Howard, Marc W
2018-02-01
In the visual system retinal space is compressed such that acuity decreases further from the fovea. Different forms of memory may rely on a compressed representation of time, manifested as decreased accuracy for events that happened further in the past. Neurophysiologically, "time cells" show receptive fields in time. Analogous to the compression of visual space, time cells show less acuity for events further in the past. Behavioral evidence suggests memory can be accessed by scanning a compressed temporal representation, analogous to visual search. This suggests a common computational language for visual attention and memory retrieval. In this view, time functions like a scaffolding that organizes memories in much the same way that retinal space functions like a scaffolding for visual perception. Copyright © 2017 Elsevier Ltd. All rights reserved.
Fast associative memory + slow neural circuitry = the computational model of the brain.
NASA Astrophysics Data System (ADS)
Berkovich, Simon; Berkovich, Efraim; Lapir, Gennady
1997-08-01
We propose a computational model of the brain based on a fast associative memory and relatively slow neural processors. In this model, processing time is expensive but memory access is not, and therefore most algorithmic tasks would be accomplished by using large look-up tables as opposed to calculating. The essential feature of an associative memory in this context (characteristic for a holographic type memory) is that it works without an explicit mechanism for resolution of multiple responses. As a result, the slow neuronal processing elements, overwhelmed by the flow of information, operate as a set of templates for ranking of the retrieved information. This structure addresses the primary controversy in the brain architecture: distributed organization of memory vs. localization of processing centers. This computational model offers an intriguing explanation of many of the paradoxical features in the brain architecture, such as integration of sensors (through DMA mechanism), subliminal perception, universality of software, interrupts, fault-tolerance, certain bizarre possibilities for rapid arithmetics etc. In conventional computer science the presented type of a computational model did not attract attention as it goes against the technological grain by using a working memory faster than processing elements.
Hopfield, J J
2008-05-01
The algorithms that simple feedback neural circuits representing a brain area can rapidly carry out are often adequate to solve easy problems but for more difficult problems can return incorrect answers. A new excitatory-inhibitory circuit model of associative memory displays the common human problem of failing to rapidly find a memory when only a small clue is present. The memory model and a related computational network for solving Sudoku puzzles produce answers that contain implicit check bits in the representation of information across neurons, allowing a rapid evaluation of whether the putative answer is correct or incorrect through a computation related to visual pop-out. This fact may account for our strong psychological feeling of right or wrong when we retrieve a nominal memory from a minimal clue. This information allows more difficult computations or memory retrievals to be done in a serial fashion by using the fast but limited capabilities of a computational module multiple times. The mathematics of the excitatory-inhibitory circuits for associative memory and for Sudoku, both of which are understood in terms of energy or Lyapunov functions, is described in detail.
Pfeiffer, P.; Egusquiza, I. L.; Di Ventra, M.; ...
2016-07-06
Technology based on memristors, resistors with memory whose resistance depends on the history of the crossing charges, has lately enhanced the classical paradigm of computation with neuromorphic architectures. However, in contrast to the known quantized models of passive circuit elements, such as inductors, capacitors or resistors, the design and realization of a quantum memristor is still missing. Here, we introduce the concept of a quantum memristor as a quantum dissipative device, whose decoherence mechanism is controlled by a continuous-measurement feedback scheme, which accounts for the memory. Indeed, we provide numerical simulations showing that memory effects actually persist in the quantummore » regime. Our quantization method, specifically designed for superconducting circuits, may be extended to other quantum platforms, allowing for memristor-type constructions in different quantum technologies. As a result, the proposed quantum memristor is then a building block for neuromorphic quantum computation and quantum simulations of non-Markovian systems.« less
Josephson 4 K-bit cache memory design for a prototype signal processor. I - General overview
NASA Astrophysics Data System (ADS)
Henkels, W. H.; Geppert, L. M.; Kadlec, J.; Epperlein, P. W.; Beha, H.
1985-09-01
In the early stages of thg Josephson computer project conducted at an American computer company, it was recognized that a very fast cache memory was needed to complement Josephson logic. A subnanosecond access time memory was implemented experimentally on the basis of a 2.5-micron Pb-alloy technology. It was then decided to switch over to a Nb-base-electrode technology with the objective to alleviate problems with the long-term reliability and aging of Pb-based junctions. The present paper provides a general overview of the status of a 4 x 1 K-bit Josephson cache design employing a 2.5-micron Nb-edge-junction technology. Attention is given to the fabrication process and its implications, aspects of circuit design methodology, an overview of system environment and chip components, design changes and status, and various difficulties and uncertainties.
General-purpose interface bus for multiuser, multitasking computer system
NASA Technical Reports Server (NTRS)
Generazio, Edward R.; Roth, Don J.; Stang, David B.
1990-01-01
The architecture of a multiuser, multitasking, virtual-memory computer system intended for the use by a medium-size research group is described. There are three central processing units (CPU) in the configuration, each with 16 MB memory, and two 474 MB hard disks attached. CPU 1 is designed for data analysis and contains an array processor for fast-Fourier transformations. In addition, CPU 1 shares display images viewed with the image processor. CPU 2 is designed for image analysis and display. CPU 3 is designed for data acquisition and contains 8 GPIB channels and an analog-to-digital conversion input/output interface with 16 channels. Up to 9 users can access the third CPU simultaneously for data acquisition. Focus is placed on the optimization of hardware interfaces and software, facilitating instrument control, data acquisition, and processing.
NASA Technical Reports Server (NTRS)
1981-01-01
Communication is made possible for disabled individuals by means of an electronic system, developed at Stanford University's School of Medicine, which produces highly intelligible synthesized speech. Familiarly known as the "talking wheelchair" and formally as the Versatile Portable Speech Prosthesis (VPSP). Wheelchair mounted system consists of a word processor, a video screen, a voice synthesizer and a computer program which instructs the synthesizer how to produce intelligible sounds in response to user commands. Computer's memory contains 925 words plus a number of common phrases and questions. Memory can also store several thousand other words of the user's choice. Message units are selected by operating a simple switch, joystick or keyboard. Completed message appears on the video screen, then user activates speech synthesizer, which generates a voice with a somewhat mechanical tone. With the keyboard, an experienced user can construct messages as rapidly as 30 words per minute.
Memory conformity affects inaccurate memories more than accurate memories.
Wright, Daniel B; Villalba, Daniella K
2012-01-01
After controlling for initial confidence, inaccurate memories were shown to be more easily distorted than accurate memories. In two experiments groups of participants viewed 50 stimuli and were then presented with these stimuli plus 50 fillers. During this test phase participants reported their confidence that each stimulus was originally shown. This was followed by computer-generated responses from a bogus participant. After being exposed to this response participants again rated the confidence of their memory. The computer-generated responses systematically distorted participants' responses. Memory distortion depended on initial memory confidence, with uncertain memories being more malleable than confident memories. This effect was moderated by whether the participant's memory was initially accurate or inaccurate. Inaccurate memories were more malleable than accurate memories. The data were consistent with a model describing two types of memory (i.e., recollective and non-recollective memories), which differ in how susceptible these memories are to memory distortion.
Arranging computer architectures to create higher-performance controllers
NASA Technical Reports Server (NTRS)
Jacklin, Stephen A.
1988-01-01
Techniques for integrating microprocessors, array processors, and other intelligent devices in control systems are reviewed, with an emphasis on the (re)arrangement of components to form distributed or parallel processing systems. Consideration is given to the selection of the host microprocessor, increasing the power and/or memory capacity of the host, multitasking software for the host, array processors to reduce computation time, the allocation of real-time and non-real-time events to different computer subsystems, intelligent devices to share the computational burden for real-time events, and intelligent interfaces to increase communication speeds. The case of a helicopter vibration-suppression and stabilization controller is analyzed as an example, and significant improvements in computation and throughput rates are demonstrated.
Plastic modulation of episodic memory networks in the aging brain with cognitive decline.
Bai, Feng; Yuan, Yonggui; Yu, Hui; Zhang, Zhijun
2016-07-15
Social-cognitive processing has been posited to underlie general functions such as episodic memory. Episodic memory impairment is a recognized hallmark of amnestic mild cognitive impairment (aMCI) who is at a high risk for dementia. Three canonical networks, self-referential processing, executive control processing and salience processing, have distinct roles in episodic memory retrieval processing. It remains unclear whether and how these sub-networks of the episodic memory retrieval system would be affected in aMCI. This task-state fMRI study constructed systems-level episodic memory retrieval sub-networks in 28 aMCI and 23 controls using two computational approaches: a multiple region-of-interest based approach and a voxel-level functional connectivity-based approach, respectively. These approaches produced the remarkably similar findings that the self-referential processing network made critical contributions to episodic memory retrieval in aMCI. More conspicuous alterations in self-referential processing of the episodic memory retrieval network were identified in aMCI. In order to complete a given episodic memory retrieval task, increases in cooperation between the self-referential processing network and other sub-networks were mobilized in aMCI. Self-referential processing mediate the cooperation of the episodic memory retrieval sub-networks as it may help to achieve neural plasticity and may contribute to the prevention and treatment of dementia. Copyright © 2016 Elsevier B.V. All rights reserved.
Ullman, Michael T; Pancheva, Roumyana; Love, Tracy; Yee, Eiling; Swinney, David; Hickok, Gregory
2005-05-01
Are the linguistic forms that are memorized in the mental lexicon and those that are specified by the rules of grammar subserved by distinct neurocognitive systems or by a single computational system with relatively broad anatomic distribution? On a dual-system view, the productive -ed-suffixation of English regular past tense forms (e.g., look-looked) depends upon the mental grammar, whereas irregular forms (e.g., dig-dug) are retrieved from lexical memory. On a single-mechanism view, the computation of both past tense types depends on associative memory. Neurological double dissociations between regulars and irregulars strengthen the dual-system view. The computation of real and novel, regular and irregular past tense forms was investigated in 20 aphasic subjects. Aphasics with non-fluent agrammatic speech and left frontal lesions were consistently more impaired at the production, reading, and judgment of regular than irregular past tenses. Aphasics with fluent speech and word-finding difficulties, and with left temporal/temporo-parietal lesions, showed the opposite pattern. These patterns held even when measures of frequency, phonological complexity, articulatory difficulty, and other factors were held constant. The data support the view that the memorized words of the mental lexicon are subserved by a brain system involving left temporal/temporo-parietal structures, whereas aspects of the mental grammar, in particular the computation of regular morphological forms, are subserved by a distinct system involving left frontal structures.
All-memristive neuromorphic computing with level-tuned neurons
NASA Astrophysics Data System (ADS)
Pantazi, Angeliki; Woźniak, Stanisław; Tuma, Tomas; Eleftheriou, Evangelos
2016-09-01
In the new era of cognitive computing, systems will be able to learn and interact with the environment in ways that will drastically enhance the capabilities of current processors, especially in extracting knowledge from vast amount of data obtained from many sources. Brain-inspired neuromorphic computing systems increasingly attract research interest as an alternative to the classical von Neumann processor architecture, mainly because of the coexistence of memory and processing units. In these systems, the basic components are neurons interconnected by synapses. The neurons, based on their nonlinear dynamics, generate spikes that provide the main communication mechanism. The computational tasks are distributed across the neural network, where synapses implement both the memory and the computational units, by means of learning mechanisms such as spike-timing-dependent plasticity. In this work, we present an all-memristive neuromorphic architecture comprising neurons and synapses realized by using the physical properties and state dynamics of phase-change memristors. The architecture employs a novel concept of interconnecting the neurons in the same layer, resulting in level-tuned neuronal characteristics that preferentially process input information. We demonstrate the proposed architecture in the tasks of unsupervised learning and detection of multiple temporal correlations in parallel input streams. The efficiency of the neuromorphic architecture along with the homogenous neuro-synaptic dynamics implemented with nanoscale phase-change memristors represent a significant step towards the development of ultrahigh-density neuromorphic co-processors.
All-memristive neuromorphic computing with level-tuned neurons.
Pantazi, Angeliki; Woźniak, Stanisław; Tuma, Tomas; Eleftheriou, Evangelos
2016-09-02
In the new era of cognitive computing, systems will be able to learn and interact with the environment in ways that will drastically enhance the capabilities of current processors, especially in extracting knowledge from vast amount of data obtained from many sources. Brain-inspired neuromorphic computing systems increasingly attract research interest as an alternative to the classical von Neumann processor architecture, mainly because of the coexistence of memory and processing units. In these systems, the basic components are neurons interconnected by synapses. The neurons, based on their nonlinear dynamics, generate spikes that provide the main communication mechanism. The computational tasks are distributed across the neural network, where synapses implement both the memory and the computational units, by means of learning mechanisms such as spike-timing-dependent plasticity. In this work, we present an all-memristive neuromorphic architecture comprising neurons and synapses realized by using the physical properties and state dynamics of phase-change memristors. The architecture employs a novel concept of interconnecting the neurons in the same layer, resulting in level-tuned neuronal characteristics that preferentially process input information. We demonstrate the proposed architecture in the tasks of unsupervised learning and detection of multiple temporal correlations in parallel input streams. The efficiency of the neuromorphic architecture along with the homogenous neuro-synaptic dynamics implemented with nanoscale phase-change memristors represent a significant step towards the development of ultrahigh-density neuromorphic co-processors.
LittleQuickWarp: an ultrafast image warping tool.
Qu, Lei; Peng, Hanchuan
2015-02-01
Warping images into a standard coordinate space is critical for many image computing related tasks. However, for multi-dimensional and high-resolution images, an accurate warping operation itself is often very expensive in terms of computer memory and computational time. For high-throughput image analysis studies such as brain mapping projects, it is desirable to have high performance image warping tools that are compatible with common image analysis pipelines. In this article, we present LittleQuickWarp, a swift and memory efficient tool that boosts 3D image warping performance dramatically and at the same time has high warping quality similar to the widely used thin plate spline (TPS) warping. Compared to the TPS, LittleQuickWarp can improve the warping speed 2-5 times and reduce the memory consumption 6-20 times. We have implemented LittleQuickWarp as an Open Source plug-in program on top of the Vaa3D system (http://vaa3d.org). The source code and a brief tutorial can be found in the Vaa3D plugin source code repository. Copyright © 2014 Elsevier Inc. All rights reserved.
Numerical methods on some structured matrix algebra problems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jessup, E.R.
1996-06-01
This proposal concerned the design, analysis, and implementation of serial and parallel algorithms for certain structured matrix algebra problems. It emphasized large order problems and so focused on methods that can be implemented efficiently on distributed-memory MIMD multiprocessors. Such machines supply the computing power and extensive memory demanded by the large order problems. We proposed to examine three classes of matrix algebra problems: the symmetric and nonsymmetric eigenvalue problems (especially the tridiagonal cases) and the solution of linear systems with specially structured coefficient matrices. As all of these are of practical interest, a major goal of this work was tomore » translate our research in linear algebra into useful tools for use by the computational scientists interested in these and related applications. Thus, in addition to software specific to the linear algebra problems, we proposed to produce a programming paradigm and library to aid in the design and implementation of programs for distributed-memory MIMD computers. We now report on our progress on each of the problems and on the programming tools.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, Seyong; Vetter, Jeffrey S
Computer architecture experts expect that non-volatile memory (NVM) hierarchies will play a more significant role in future systems including mobile, enterprise, and HPC architectures. With this expectation in mind, we present NVL-C: a novel programming system that facilitates the efficient and correct programming of NVM main memory systems. The NVL-C programming abstraction extends C with a small set of intuitive language features that target NVM main memory, and can be combined directly with traditional C memory model features for DRAM. We have designed these new features to enable compiler analyses and run-time checks that can improve performance and guard againstmore » a number of subtle programming errors, which, when left uncorrected, can corrupt NVM-stored data. Moreover, to enable recovery of data across application or system failures, these NVL-C features include a flexible directive for specifying NVM transactions. So that our implementation might be extended to other compiler front ends and languages, the majority of our compiler analyses are implemented in an extended version of LLVM's intermediate representation (LLVM IR). We evaluate NVL-C on a number of applications to show its flexibility, performance, and correctness.« less
2008-01-01
Distributed network-based battle management High performance computing supporting uniform and nonuniform memory access with single and multithreaded...pallet Airborne EO/IR and radar sensors VNIR through SWIR hyperspectral systems VNIR, MWIR, and LWIR high-resolution sys- tems Wideband SAR systems...meteorological sensors Hyperspectral sensor systems (PHILLS) Mid-wave infrared (MWIR) Indium Antimonide (InSb) imaging system Long-wave infrared ( LWIR
The CD-ROM Services of SilverPlatter Information, Inc.
ERIC Educational Resources Information Center
Allen, Robert J.
1985-01-01
The SilverPlatter system is a complete, stand-alone system, consisting of an IBM (or compatible) personal computer, compact disc with read-only memory (CD-ROM) drive, software, and one or more databases. Large databases (e.g., ERIC, PsycLIT) will soon be available on the system for "local" installation in schools, libraries, and…
NASA Technical Reports Server (NTRS)
Wigton, Larry
1996-01-01
Improving the numerical linear algebra routines for use in new Navier-Stokes codes, specifically Tim Barth's unstructured grid code, with spin-offs to TRANAIR is reported. A fast distance calculation routine for Navier-Stokes codes using the new one-equation turbulence models is written. The primary focus of this work was devoted to improving matrix-iterative methods. New algorithms have been developed which activate the full potential of classical Cray-class computers as well as distributed-memory parallel computers.
The magic words: Using computers to uncover mental associations for use in magic trick design
2017-01-01
The use of computational systems to aid in the design of magic tricks has been previously explored. Here further steps are taken in this direction, introducing the use of computer technology as a natural language data sourcing and processing tool for magic trick design purposes. Crowd sourcing of psychological concepts is investigated; further, the role of human associative memory and its exploitation in magical effects is explored. A new trick is developed and evaluated: a physical card trick partially designed by a computational system configured to search for and explore conceptual spaces readily understood by spectators. PMID:28792941
State recovery and lockstep execution restart in a system with multiprocessor pairing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gara, Alan; Gschwind, Michael K; Salapura, Valentina
System, method and computer program product for a multiprocessing system to offer selective pairing of processor cores for increased processing reliability. A selective pairing facility is provided that selectively connects, i.e., pairs, multiple microprocessor or processor cores to provide one highly reliable thread (or thread group). Each paired microprocessor or processor cores that provide one highly reliable thread for high-reliability connect with a system components such as a memory "nest" (or memory hierarchy), an optional system controller, and optional interrupt controller, optional I/O or peripheral devices, etc. The memory nest is attached to a selective pairing facility via a switchmore » or a bus. Each selectively paired processor core is includes a transactional execution facility, whereing the system is configured to enable processor rollback to a previous state and reinitialize lockstep execution in order to recover from an incorrect execution when an incorrect execution has been detected by the selective pairing facility.« less
Local rollback for fault-tolerance in parallel computing systems
Blumrich, Matthias A [Yorktown Heights, NY; Chen, Dong [Yorktown Heights, NY; Gara, Alan [Yorktown Heights, NY; Giampapa, Mark E [Yorktown Heights, NY; Heidelberger, Philip [Yorktown Heights, NY; Ohmacht, Martin [Yorktown Heights, NY; Steinmacher-Burow, Burkhard [Boeblingen, DE; Sugavanam, Krishnan [Yorktown Heights, NY
2012-01-24
A control logic device performs a local rollback in a parallel super computing system. The super computing system includes at least one cache memory device. The control logic device determines a local rollback interval. The control logic device runs at least one instruction in the local rollback interval. The control logic device evaluates whether an unrecoverable condition occurs while running the at least one instruction during the local rollback interval. The control logic device checks whether an error occurs during the local rollback. The control logic device restarts the local rollback interval if the error occurs and the unrecoverable condition does not occur during the local rollback interval.