A general purpose subroutine for fast fourier transform on a distributed memory parallel machine
NASA Technical Reports Server (NTRS)
Dubey, A.; Zubair, M.; Grosch, C. E.
1992-01-01
One issue which is central in developing a general purpose Fast Fourier Transform (FFT) subroutine on a distributed memory parallel machine is the data distribution. It is possible that different users would like to use the FFT routine with different data distributions. Thus, there is a need to design FFT schemes on distributed memory parallel machines which can support a variety of data distributions. An FFT implementation on a distributed memory parallel machine which works for a number of data distributions commonly encountered in scientific applications is presented. The problem of rearranging the data after computing the FFT is also addressed. The performance of the implementation on a distributed memory parallel machine Intel iPSC/860 is evaluated.
NASA Technical Reports Server (NTRS)
Rogers, David
1988-01-01
The advent of the Connection Machine profoundly changes the world of supercomputers. The highly nontraditional architecture makes possible the exploration of algorithms that were impractical for standard Von Neumann architectures. Sparse distributed memory (SDM) is an example of such an algorithm. Sparse distributed memory is a particularly simple and elegant formulation for an associative memory. The foundations for sparse distributed memory are described, and some simple examples of using the memory are presented. The relationship of sparse distributed memory to three important computational systems is shown: random-access memory, neural networks, and the cerebellum of the brain. Finally, the implementation of the algorithm for sparse distributed memory on the Connection Machine is discussed.
Low Latency Messages on Distributed Memory Multiprocessors
Rosing, Matt; Saltz, Joel
1995-01-01
This article describes many of the issues in developing an efficient interface for communication on distributed memory machines. Although the hardware component of message latency is less than 1 ws on many distributed memory machines, the software latency associated with sending and receiving typed messages is on the order of 50 μs. The reason for this imbalance is that the software interface does not match the hardware. By changing the interface to match the hardware more closely, applications with fine grained communication can be put on these machines. This article describes several tests performed and many of the issues involvedmore » in supporting low latency messages on distributed memory machines.« less
Execution time supports for adaptive scientific algorithms on distributed memory machines
NASA Technical Reports Server (NTRS)
Berryman, Harry; Saltz, Joel; Scroggs, Jeffrey
1990-01-01
Optimizations are considered that are required for efficient execution of code segments that consists of loops over distributed data structures. The PARTI (Parallel Automated Runtime Toolkit at ICASE) execution time primitives are designed to carry out these optimizations and can be used to implement a wide range of scientific algorithms on distributed memory machines. These primitives allow the user to control array mappings in a way that gives an appearance of shared memory. Computations can be based on a global index set. Primitives are used to carry out gather and scatter operations on distributed arrays. Communications patterns are derived at runtime, and the appropriate send and receive messages are automatically generated.
Execution time support for scientific programs on distributed memory machines
NASA Technical Reports Server (NTRS)
Berryman, Harry; Saltz, Joel; Scroggs, Jeffrey
1990-01-01
Optimizations are considered that are required for efficient execution of code segments that consists of loops over distributed data structures. The PARTI (Parallel Automated Runtime Toolkit at ICASE) execution time primitives are designed to carry out these optimizations and can be used to implement a wide range of scientific algorithms on distributed memory machines. These primitives allow the user to control array mappings in a way that gives an appearance of shared memory. Computations can be based on a global index set. Primitives are used to carry out gather and scatter operations on distributed arrays. Communications patterns are derived at runtime, and the appropriate send and receive messages are automatically generated.
Low latency messages on distributed memory multiprocessors
NASA Technical Reports Server (NTRS)
Rosing, Matthew; Saltz, Joel
1993-01-01
Many of the issues in developing an efficient interface for communication on distributed memory machines are described and a portable interface is proposed. Although the hardware component of message latency is less than one microsecond on many distributed memory machines, the software latency associated with sending and receiving typed messages is on the order of 50 microseconds. The reason for this imbalance is that the software interface does not match the hardware. By changing the interface to match the hardware more closely, applications with fine grained communication can be put on these machines. Based on several tests that were run on the iPSC/860, an interface that will better match current distributed memory machines is proposed. The model used in the proposed interface consists of a computation processor and a communication processor on each node. Communication between these processors and other nodes in the system is done through a buffered network. Information that is transmitted is either data or procedures to be executed on the remote processor. The dual processor system is better suited for efficiently handling asynchronous communications compared to a single processor system. The ability to send data or procedure is very flexible for minimizing message latency, based on the type of communication being performed. The test performed and the proposed interface are described.
Contention Modeling for Multithreaded Distributed Shared Memory Machines: The Cray XMT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Secchi, Simone; Tumeo, Antonino; Villa, Oreste
Distributed Shared Memory (DSM) machines are a wide class of multi-processor computing systems where a large virtually-shared address space is mapped on a network of physically distributed memories. High memory latency and network contention are two of the main factors that limit performance scaling of such architectures. Modern high-performance computing DSM systems have evolved toward exploitation of massive hardware multi-threading and fine-grained memory hashing to tolerate irregular latencies, avoid network hot-spots and enable high scaling. In order to model the performance of such large-scale machines, parallel simulation has been proved to be a promising approach to achieve good accuracy inmore » reasonable times. One of the most critical factors in solving the simulation speed-accuracy trade-off is network modeling. The Cray XMT is a massively multi-threaded supercomputing architecture that belongs to the DSM class, since it implements a globally-shared address space abstraction on top of a physically distributed memory substrate. In this paper, we discuss the development of a contention-aware network model intended to be integrated in a full-system XMT simulator. We start by measuring the effects of network contention in a 128-processor XMT machine and then investigate the trade-off that exists between simulation accuracy and speed, by comparing three network models which operate at different levels of accuracy. The comparison and model validation is performed by executing a string-matching algorithm on the full-system simulator and on the XMT, using three datasets that generate noticeably different contention patterns.« less
Vascular system modeling in parallel environment - distributed and shared memory approaches
Jurczuk, Krzysztof; Kretowski, Marek; Bezy-Wendling, Johanne
2011-01-01
The paper presents two approaches in parallel modeling of vascular system development in internal organs. In the first approach, new parts of tissue are distributed among processors and each processor is responsible for perfusing its assigned parts of tissue to all vascular trees. Communication between processors is accomplished by passing messages and therefore this algorithm is perfectly suited for distributed memory architectures. The second approach is designed for shared memory machines. It parallelizes the perfusion process during which individual processing units perform calculations concerning different vascular trees. The experimental results, performed on a computing cluster and multi-core machines, show that both algorithms provide a significant speedup. PMID:21550891
Shared versus distributed memory multiprocessors
NASA Technical Reports Server (NTRS)
Jordan, Harry F.
1991-01-01
The question of whether multiprocessors should have shared or distributed memory has attracted a great deal of attention. Some researchers argue strongly for building distributed memory machines, while others argue just as strongly for programming shared memory multiprocessors. A great deal of research is underway on both types of parallel systems. Special emphasis is placed on systems with a very large number of processors for computation intensive tasks and considers research and implementation trends. It appears that the two types of systems will likely converge to a common form for large scale multiprocessors.
Address tracing for parallel machines
NASA Technical Reports Server (NTRS)
Stunkel, Craig B.; Janssens, Bob; Fuchs, W. Kent
1991-01-01
Recently implemented parallel system address-tracing methods based on several metrics are surveyed. The issues specific to collection of traces for both shared and distributed memory parallel computers are highlighted. Five general categories of address-trace collection methods are examined: hardware-captured, interrupt-based, simulation-based, altered microcode-based, and instrumented program-based traces. The problems unique to shared memory and distributed memory multiprocessors are examined separately.
Preconditioned implicit solvers for the Navier-Stokes equations on distributed-memory machines
NASA Technical Reports Server (NTRS)
Ajmani, Kumud; Liou, Meng-Sing; Dyson, Rodger W.
1994-01-01
The GMRES method is parallelized, and combined with local preconditioning to construct an implicit parallel solver to obtain steady-state solutions for the Navier-Stokes equations of fluid flow on distributed-memory machines. The new implicit parallel solver is designed to preserve the convergence rate of the equivalent 'serial' solver. A static domain-decomposition is used to partition the computational domain amongst the available processing nodes of the parallel machine. The SPMD (Single-Program Multiple-Data) programming model is combined with message-passing tools to develop the parallel code on a 32-node Intel Hypercube and a 512-node Intel Delta machine. The implicit parallel solver is validated for internal and external flow problems, and is found to compare identically with flow solutions obtained on a Cray Y-MP/8. A peak computational speed of 2300 MFlops/sec has been achieved on 512 nodes of the Intel Delta machine,k for a problem size of 1024 K equations (256 K grid points).
Execution models for mapping programs onto distributed memory parallel computers
NASA Technical Reports Server (NTRS)
Sussman, Alan
1992-01-01
The problem of exploiting the parallelism available in a program to efficiently employ the resources of the target machine is addressed. The problem is discussed in the context of building a mapping compiler for a distributed memory parallel machine. The paper describes using execution models to drive the process of mapping a program in the most efficient way onto a particular machine. Through analysis of the execution models for several mapping techniques for one class of programs, we show that the selection of the best technique for a particular program instance can make a significant difference in performance. On the other hand, the results of benchmarks from an implementation of a mapping compiler show that our execution models are accurate enough to select the best mapping technique for a given program.
A manual for PARTI runtime primitives
NASA Technical Reports Server (NTRS)
Berryman, Harry; Saltz, Joel
1990-01-01
Primitives are presented that are designed to help users efficiently program irregular problems (e.g., unstructured mesh sweeps, sparse matrix codes, adaptive mesh partial differential equations solvers) on distributed memory machines. These primitives are also designed for use in compilers for distributed memory multiprocessors. Communications patterns are captured at runtime, and the appropriate send and receive messages are automatically generated.
Supporting shared data structures on distributed memory architectures
NASA Technical Reports Server (NTRS)
Koelbel, Charles; Mehrotra, Piyush; Vanrosendale, John
1990-01-01
Programming nonshared memory systems is more difficult than programming shared memory systems, since there is no support for shared data structures. Current programming languages for distributed memory architectures force the user to decompose all data structures into separate pieces, with each piece owned by one of the processors in the machine, and with all communication explicitly specified by low-level message-passing primitives. A new programming environment is presented for distributed memory architectures, providing a global name space and allowing direct access to remote parts of data values. The analysis and program transformations required to implement this environment are described, and the efficiency of the resulting code on the NCUBE/7 and IPSC/2 hypercubes are described.
Vienna FORTRAN: A FORTRAN language extension for distributed memory multiprocessors
NASA Technical Reports Server (NTRS)
Chapman, Barbara; Mehrotra, Piyush; Zima, Hans
1991-01-01
Exploiting the performance potential of distributed memory machines requires a careful distribution of data across the processors. Vienna FORTRAN is a language extension of FORTRAN which provides the user with a wide range of facilities for such mapping of data structures. However, programs in Vienna FORTRAN are written using global data references. Thus, the user has the advantage of a shared memory programming paradigm while explicitly controlling the placement of data. The basic features of Vienna FORTRAN are presented along with a set of examples illustrating the use of these features.
Limits in decision making arise from limits in memory retrieval.
Giguère, Gyslain; Love, Bradley C
2013-05-07
Some decisions, such as predicting the winner of a baseball game, are challenging in part because outcomes are probabilistic. When making such decisions, one view is that humans stochastically and selectively retrieve a small set of relevant memories that provides evidence for competing options. We show that optimal performance at test is impossible when retrieving information in this fashion, no matter how extensive training is, because limited retrieval introduces noise into the decision process that cannot be overcome. One implication is that people should be more accurate in predicting future events when trained on idealized rather than on the actual distributions of items. In other words, we predict the best way to convey information to people is to present it in a distorted, idealized form. Idealization of training distributions is predicted to reduce the harmful noise induced by immutable bottlenecks in people's memory retrieval processes. In contrast, machine learning systems that selectively weight (i.e., retrieve) all training examples at test should not benefit from idealization. These conjectures are strongly supported by several studies and supporting analyses. Unlike machine systems, people's test performance on a target distribution is higher when they are trained on an idealized version of the distribution rather than on the actual target distribution. Optimal machine classifiers modified to selectively and stochastically sample from memory match the pattern of human performance. These results suggest firm limits on human rationality and have broad implications for how to train humans tasked with important classification decisions, such as radiologists, baggage screeners, intelligence analysts, and gamblers.
Limits in decision making arise from limits in memory retrieval
Giguère, Gyslain; Love, Bradley C.
2013-01-01
Some decisions, such as predicting the winner of a baseball game, are challenging in part because outcomes are probabilistic. When making such decisions, one view is that humans stochastically and selectively retrieve a small set of relevant memories that provides evidence for competing options. We show that optimal performance at test is impossible when retrieving information in this fashion, no matter how extensive training is, because limited retrieval introduces noise into the decision process that cannot be overcome. One implication is that people should be more accurate in predicting future events when trained on idealized rather than on the actual distributions of items. In other words, we predict the best way to convey information to people is to present it in a distorted, idealized form. Idealization of training distributions is predicted to reduce the harmful noise induced by immutable bottlenecks in people’s memory retrieval processes. In contrast, machine learning systems that selectively weight (i.e., retrieve) all training examples at test should not benefit from idealization. These conjectures are strongly supported by several studies and supporting analyses. Unlike machine systems, people’s test performance on a target distribution is higher when they are trained on an idealized version of the distribution rather than on the actual target distribution. Optimal machine classifiers modified to selectively and stochastically sample from memory match the pattern of human performance. These results suggest firm limits on human rationality and have broad implications for how to train humans tasked with important classification decisions, such as radiologists, baggage screeners, intelligence analysts, and gamblers. PMID:23610402
NAS Applications and Advanced Algorithms
NASA Technical Reports Server (NTRS)
Bailey, David H.; Biswas, Rupak; VanDerWijngaart, Rob; Kutler, Paul (Technical Monitor)
1997-01-01
This paper examines the applications most commonly run on the supercomputers at the Numerical Aerospace Simulation (NAS) facility. It analyzes the extent to which such applications are fundamentally oriented to vector computers, and whether or not they can be efficiently implemented on hierarchical memory machines, such as systems with cache memories and highly parallel, distributed memory systems.
A self-defining hierarchical data system
NASA Technical Reports Server (NTRS)
Bailey, J.
1992-01-01
The Self-Defining Data System (SDS) is a system which allows the creation of self-defining hierarchical data structures in a form which allows the data to be moved between different machine architectures. Because the structures are self-defining they can be used for communication between independent modules in a distributed system. Unlike disk-based hierarchical data systems such as Starlink's HDS, SDS works entirely in memory and is very fast. Data structures are created and manipulated as internal dynamic structures in memory managed by SDS itself. A structure may then be exported into a caller supplied memory buffer in a defined external format. This structure can be written as a file or sent as a message to another machine. It remains static in structure until it is reimported into SDS. SDS is written in portable C and has been run on a number of different machine architectures. Structures are portable between machines with SDS looking after conversion of byte order, floating point format, and alignment. A Fortran callable version is also available for some machines.
Programming distributed memory architectures using Kali
NASA Technical Reports Server (NTRS)
Mehrotra, Piyush; Vanrosendale, John
1990-01-01
Programming nonshared memory systems is more difficult than programming shared memory systems, in part because of the relatively low level of current programming environments for such machines. A new programming environment is presented, Kali, which provides a global name space and allows direct access to remote data values. In order to retain efficiency, Kali provides a system on annotations, allowing the user to control those aspects of the program critical to performance, such as data distribution and load balancing. The primitives and constructs provided by the language is described, and some of the issues raised in translating a Kali program for execution on distributed memory systems are also discussed.
Distributed-Memory Fast Maximal Independent Set
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kanewala Appuhamilage, Thejaka Amila J.; Zalewski, Marcin J.; Lumsdaine, Andrew
The Maximal Independent Set (MIS) graph problem arises in many applications such as computer vision, information theory, molecular biology, and process scheduling. The growing scale of MIS problems suggests the use of distributed-memory hardware as a cost-effective approach to providing necessary compute and memory resources. Luby proposed four randomized algorithms to solve the MIS problem. All those algorithms are designed focusing on shared-memory machines and are analyzed using the PRAM model. These algorithms do not have direct efficient distributed-memory implementations. In this paper, we extend two of Luby’s seminal MIS algorithms, “Luby(A)” and “Luby(B),” to distributed-memory execution, and we evaluatemore » their performance. We compare our results with the “Filtered MIS” implementation in the Combinatorial BLAS library for two types of synthetic graph inputs.« less
A manual for PARTI runtime primitives, revision 1
NASA Technical Reports Server (NTRS)
Das, Raja; Saltz, Joel; Berryman, Harry
1991-01-01
Primitives are presented that are designed to help users efficiently program irregular problems (e.g., unstructured mesh sweeps, sparse matrix codes, adaptive mesh partial differential equations solvers) on distributed memory machines. These primitives are also designed for use in compilers for distributed memory multiprocessors. Communications patterns are captured at runtime, and the appropriate send and receive messages are automatically generated.
NASA Technical Reports Server (NTRS)
Denning, Peter J.
1989-01-01
Sparse distributed memory was proposed be Pentti Kanerva as a realizable architecture that could store large patterns and retrieve them based on partial matches with patterns representing current sensory inputs. This memory exhibits behaviors, both in theory and in experiment, that resemble those previously unapproached by machines - e.g., rapid recognition of faces or odors, discovery of new connections between seemingly unrelated ideas, continuation of a sequence of events when given a cue from the middle, knowing that one doesn't know, or getting stuck with an answer on the tip of one's tongue. These behaviors are now within reach of machines that can be incorporated into the computing systems of robots capable of seeing, talking, and manipulating. Kanerva's theory is a break with the Western rationalistic tradition, allowing a new interpretation of learning and cognition that respects biology and the mysteries of individual human beings.
A compositional reservoir simulator on distributed memory parallel computers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rame, M.; Delshad, M.
1995-12-31
This paper presents the application of distributed memory parallel computes to field scale reservoir simulations using a parallel version of UTCHEM, The University of Texas Chemical Flooding Simulator. The model is a general purpose highly vectorized chemical compositional simulator that can simulate a wide range of displacement processes at both field and laboratory scales. The original simulator was modified to run on both distributed memory parallel machines (Intel iPSC/960 and Delta, Connection Machine 5, Kendall Square 1 and 2, and CRAY T3D) and a cluster of workstations. A domain decomposition approach has been taken towards parallelization of the code. Amore » portion of the discrete reservoir model is assigned to each processor by a set-up routine that attempts a data layout as even as possible from the load-balance standpoint. Each of these subdomains is extended so that data can be shared between adjacent processors for stencil computation. The added routines that make parallel execution possible are written in a modular fashion that makes the porting to new parallel platforms straight forward. Results of the distributed memory computing performance of Parallel simulator are presented for field scale applications such as tracer flood and polymer flood. A comparison of the wall-clock times for same problems on a vector supercomputer is also presented.« less
Task Assignment Heuristics for Distributed CFD Applications
NASA Technical Reports Server (NTRS)
Lopez-Benitez, N.; Djomehri, M. J.; Biswas, R.; Biegel, Bryan (Technical Monitor)
2001-01-01
CFD applications require high-performance computational platforms: 1. Complex physics and domain configuration demand strongly coupled solutions; 2. Applications are CPU and memory intensive; and 3. Huge resource requirements can only be satisfied by teraflop-scale machines or distributed computing.
BIRD: A general interface for sparse distributed memory simulators
NASA Technical Reports Server (NTRS)
Rogers, David
1990-01-01
Kanerva's sparse distributed memory (SDM) has now been implemented for at least six different computers, including SUN3 workstations, the Apple Macintosh, and the Connection Machine. A common interface for input of commands would both aid testing of programs on a broad range of computer architectures and assist users in transferring results from research environments to applications. A common interface also allows secondary programs to generate command sequences for a sparse distributed memory, which may then be executed on the appropriate hardware. The BIRD program is an attempt to create such an interface. Simplifying access to different simulators should assist developers in finding appropriate uses for SDM.
NASA Technical Reports Server (NTRS)
Chapman, Barbara; Mehrotra, Piyush; Zima, Hans
1992-01-01
Exploiting the full performance potential of distributed memory machines requires a careful distribution of data across the processors. Vienna Fortran is a language extension of Fortran which provides the user with a wide range of facilities for such mapping of data structures. In contrast to current programming practice, programs in Vienna Fortran are written using global data references. Thus, the user has the advantages of a shared memory programming paradigm while explicitly controlling the data distribution. In this paper, we present the language features of Vienna Fortran for FORTRAN 77, together with examples illustrating the use of these features.
NASA Technical Reports Server (NTRS)
Sohn, Andrew; Biswas, Rupak; Simon, Horst D.
1996-01-01
The computational requirements for an adaptive solution of unsteady problems change as the simulation progresses. This causes workload imbalance among processors on a parallel machine which, in turn, requires significant data movement at runtime. We present a new dynamic load-balancing framework, called JOVE, that balances the workload across all processors with a global view. Whenever the computational mesh is adapted, JOVE is activated to eliminate the load imbalance. JOVE has been implemented on an IBM SP2 distributed-memory machine in MPI for portability. Experimental results for two model meshes demonstrate that mesh adaption with load balancing gives more than a sixfold improvement over one without load balancing. We also show that JOVE gives a 24-fold speedup on 64 processors compared to sequential execution.
A performance study of sparse Cholesky factorization on INTEL iPSC/860
NASA Technical Reports Server (NTRS)
Zubair, M.; Ghose, M.
1992-01-01
The problem of Cholesky factorization of a sparse matrix has been very well investigated on sequential machines. A number of efficient codes exist for factorizing large unstructured sparse matrices. However, there is a lack of such efficient codes on parallel machines in general, and distributed machines in particular. Some of the issues that are critical to the implementation of sparse Cholesky factorization on a distributed memory parallel machine are ordering, partitioning and mapping, load balancing, and ordering of various tasks within a processor. Here, we focus on the effect of various partitioning schemes on the performance of sparse Cholesky factorization on the Intel iPSC/860. Also, a new partitioning heuristic for structured as well as unstructured sparse matrices is proposed, and its performance is compared with other schemes.
NASA Technical Reports Server (NTRS)
Saini, Subash; Bailey, David; Chancellor, Marisa K. (Technical Monitor)
1997-01-01
High Performance Fortran (HPF), the high-level language for parallel Fortran programming, is based on Fortran 90. HALF was defined by an informal standards committee known as the High Performance Fortran Forum (HPFF) in 1993, and modeled on TMC's CM Fortran language. Several HPF features have since been incorporated into the draft ANSI/ISO Fortran 95, the next formal revision of the Fortran standard. HPF allows users to write a single parallel program that can execute on a serial machine, a shared-memory parallel machine, or a distributed-memory parallel machine. HPF eliminates the complex, error-prone task of explicitly specifying how, where, and when to pass messages between processors on distributed-memory machines, or when to synchronize processors on shared-memory machines. HPF is designed in a way that allows the programmer to code an application at a high level, and then selectively optimize portions of the code by dropping into message-passing or calling tuned library routines as 'extrinsics'. Compilers supporting High Performance Fortran features first appeared in late 1994 and early 1995 from Applied Parallel Research (APR) Digital Equipment Corporation, and The Portland Group (PGI). IBM introduced an HPF compiler for the IBM RS/6000 SP/2 in April of 1996. Over the past two years, these implementations have shown steady improvement in terms of both features and performance. The performance of various hardware/ programming model (HPF and MPI (message passing interface)) combinations will be compared, based on latest NAS (NASA Advanced Supercomputing) Parallel Benchmark (NPB) results, thus providing a cross-machine and cross-model comparison. Specifically, HPF based NPB results will be compared with MPI based NPB results to provide perspective on performance currently obtainable using HPF versus MPI or versus hand-tuned implementations such as those supplied by the hardware vendors. In addition we would also present NPB (Version 1.0) performance results for the following systems: DEC Alpha Server 8400 5/440, Fujitsu VPP Series (VX, VPP300, and VPP700), HP/Convex Exemplar SPP2000, IBM RS/6000 SP P2SC node (120 MHz) NEC SX-4/32, SGI/CRAY T3E, SGI Origin2000.
NASA Technical Reports Server (NTRS)
Agrawal, Gagan; Sussman, Alan; Saltz, Joel
1993-01-01
Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid codes) and/or irregularly coupled (called multiblock or irregularly coupled regular mesh problems). A combined runtime and compile-time approach for parallelizing these applications on distributed memory parallel machines in an efficient and machine-independent fashion was described. A runtime library which can be used to port these applications on distributed memory machines was designed and implemented. The library is currently implemented on several different systems. To further ease the task of application programmers, methods were developed for integrating this runtime library with compilers for HPK-like parallel programming languages. How this runtime library was integrated with the Fortran 90D compiler being developed at Syracuse University is discussed. Experimental results to demonstrate the efficacy of our approach are presented. A multiblock Navier-Stokes solver template and a multigrid code were experimented with. Our experimental results show that our primitives have low runtime communication overheads. Further, the compiler parallelized codes perform within 20 percent of the code parallelized by manually inserting calls to the runtime library.
Research in Parallel Algorithms and Software for Computational Aerosciences
NASA Technical Reports Server (NTRS)
Domel, Neal D.
1996-01-01
Phase I is complete for the development of a Computational Fluid Dynamics parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Research in Parallel Algorithms and Software for Computational Aerosciences
NASA Technical Reports Server (NTRS)
Domel, Neal D.
1996-01-01
Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
A new scheduling algorithm for parallel sparse LU factorization with static pivoting
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grigori, Laura; Li, Xiaoye S.
2002-08-20
In this paper we present a static scheduling algorithm for parallel sparse LU factorization with static pivoting. The algorithm is divided into mapping and scheduling phases, using the symmetric pruned graphs of L' and U to represent dependencies. The scheduling algorithm is designed for driving the parallel execution of the factorization on a distributed-memory architecture. Experimental results and comparisons with SuperLU{_}DIST are reported after applying this algorithm on real world application matrices on an IBM SP RS/6000 distributed memory machine.
Hardware/software codesign for embedded RISC core
NASA Astrophysics Data System (ADS)
Liu, Peng
2001-12-01
This paper describes hardware/software codesign method of the extendible embedded RISC core VIRGO, which based on MIPS-I instruction set architecture. VIRGO is described by Verilog hardware description language that has five-stage pipeline with shared 32-bit cache/memory interface, and it is controlled by distributed control scheme. Every pipeline stage has one small controller, which controls the pipeline stage status and cooperation among the pipeline phase. Since description use high level language and structure is distributed, VIRGO core has highly extension that can meet the requirements of application. We take look at the high-definition television MPEG2 MPHL decoder chip, constructed the hardware/software codesign virtual prototyping machine that can research on VIRGO core instruction set architecture, and system on chip memory size requirements, and system on chip software, etc. We also can evaluate the system on chip design and RISC instruction set based on the virtual prototyping machine platform.
Implementation of a parallel unstructured Euler solver on the CM-5
NASA Technical Reports Server (NTRS)
Morano, Eric; Mavriplis, D. J.
1995-01-01
An efficient unstructured 3D Euler solver is parallelized on a Thinking Machine Corporation Connection Machine 5, distributed memory computer with vectoring capability. In this paper, the single instruction multiple data (SIMD) strategy is employed through the use of the CM Fortran language and the CMSSL scientific library. The performance of the CMSSL mesh partitioner is evaluated and the overall efficiency of the parallel flow solver is discussed.
El-Zawawy, Mohamed A.
2014-01-01
This paper introduces new approaches for the analysis of frequent statement and dereference elimination for imperative and object-oriented distributed programs running on parallel machines equipped with hierarchical memories. The paper uses languages whose address spaces are globally partitioned. Distributed programs allow defining data layout and threads writing to and reading from other thread memories. Three type systems (for imperative distributed programs) are the tools of the proposed techniques. The first type system defines for every program point a set of calculated (ready) statements and memory accesses. The second type system uses an enriched version of types of the first type system and determines which of the ready statements and memory accesses are used later in the program. The third type system uses the information gather so far to eliminate unnecessary statement computations and memory accesses (the analysis of frequent statement and dereference elimination). Extensions to these type systems are also presented to cover object-oriented distributed programs. Two advantages of our work over related work are the following. The hierarchical style of concurrent parallel computers is similar to the memory model used in this paper. In our approach, each analysis result is assigned a type derivation (serves as a correctness proof). PMID:24892098
Confessions of a robot lobotomist
NASA Technical Reports Server (NTRS)
Gottshall, R. Marc
1994-01-01
Since its inception, numerically controlled (NC) machining methods have been used throughout the aerospace industry to mill, drill, and turn complex shapes by sequentially stepping through motion programs. However, the recent demand for more precision, faster feeds, exotic sensors, and branching execution have existing computer numerical control (CNC) and distributed numerical control (DNC) systems running at maximum controller capacity. Typical disadvantages of current CNC's include fixed memory capacities, limited communication ports, and the use of multiple control languages. The need to tailor CNC's to meet specific applications, whether it be expanded memory, additional communications, or integrated vision, often requires replacing the original controller supplied with the commercial machine tool with a more powerful and capable system. This paper briefly describes the process and equipment requirements for new controllers and their evolutionary implementation in an aerospace environment. The process of controller retrofit with currently available machines is examined, along with several case studies and their computational and architectural implications.
Distributed-Memory Computing With the Langley Aerothermodynamic Upwind Relaxation Algorithm (LAURA)
NASA Technical Reports Server (NTRS)
Riley, Christopher J.; Cheatwood, F. McNeil
1997-01-01
The Langley Aerothermodynamic Upwind Relaxation Algorithm (LAURA), a Navier-Stokes solver, has been modified for use in a parallel, distributed-memory environment using the Message-Passing Interface (MPI) standard. A standard domain decomposition strategy is used in which the computational domain is divided into subdomains with each subdomain assigned to a processor. Performance is examined on dedicated parallel machines and a network of desktop workstations. The effect of domain decomposition and frequency of boundary updates on performance and convergence is also examined for several realistic configurations and conditions typical of large-scale computational fluid dynamic analysis.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jones, J.P.; Bangs, A.L.; Butler, P.L.
Hetero Helix is a programming environment which simulates shared memory on a heterogeneous network of distributed-memory computers. The machines in the network may vary with respect to their native operating systems and internal representation of numbers. Hetero Helix presents a simple programming model to developers, and also considers the needs of designers, system integrators, and maintainers. The key software technology underlying Hetero Helix is the use of a compiler'' which analyzes the data structures in shared memory and automatically generates code which translates data representations from the format native to each machine into a common format, and vice versa. Themore » design of Hetero Helix was motivated in particular by the requirements of robotics applications. Hetero Helix has been used successfully in an integration effort involving 27 CPUs in a heterogeneous network and a body of software totaling roughly 100,00 lines of code. 25 refs., 6 figs.« less
Run-time scheduling and execution of loops on message passing machines
NASA Technical Reports Server (NTRS)
Crowley, Kay; Saltz, Joel; Mirchandaney, Ravi; Berryman, Harry
1989-01-01
Sparse system solvers and general purpose codes for solving partial differential equations are examples of the many types of problems whose irregularity can result in poor performance on distributed memory machines. Often, the data structures used in these problems are very flexible. Crucial details concerning loop dependences are encoded in these structures rather than being explicitly represented in the program. Good methods for parallelizing and partitioning these types of problems require assignment of computations in rather arbitrary ways. Naive implementations of programs on distributed memory machines requiring general loop partitions can be extremely inefficient. Instead, the scheduling mechanism needs to capture the data reference patterns of the loops in order to partition the problem. First, the indices assigned to each processor must be locally numbered. Next, it is necessary to precompute what information is needed by each processor at various points in the computation. The precomputed information is then used to generate an execution template designed to carry out the computation, communication, and partitioning of data, in an optimized manner. The design is presented for a general preprocessor and schedule executer, the structures of which do not vary, even though the details of the computation and of the type of information are problem dependent.
Run-time scheduling and execution of loops on message passing machines
NASA Technical Reports Server (NTRS)
Saltz, Joel; Crowley, Kathleen; Mirchandaney, Ravi; Berryman, Harry
1990-01-01
Sparse system solvers and general purpose codes for solving partial differential equations are examples of the many types of problems whose irregularity can result in poor performance on distributed memory machines. Often, the data structures used in these problems are very flexible. Crucial details concerning loop dependences are encoded in these structures rather than being explicitly represented in the program. Good methods for parallelizing and partitioning these types of problems require assignment of computations in rather arbitrary ways. Naive implementations of programs on distributed memory machines requiring general loop partitions can be extremely inefficient. Instead, the scheduling mechanism needs to capture the data reference patterns of the loops in order to partition the problem. First, the indices assigned to each processor must be locally numbered. Next, it is necessary to precompute what information is needed by each processor at various points in the computation. The precomputed information is then used to generate an execution template designed to carry out the computation, communication, and partitioning of data, in an optimized manner. The design is presented for a general preprocessor and schedule executer, the structures of which do not vary, even though the details of the computation and of the type of information are problem dependent.
Shehzad, Danish; Bozkuş, Zeki
2016-01-01
Increase in complexity of neuronal network models escalated the efforts to make NEURON simulation environment efficient. The computational neuroscientists divided the equations into subnets amongst multiple processors for achieving better hardware performance. On parallel machines for neuronal networks, interprocessor spikes exchange consumes large section of overall simulation time. In NEURON for communication between processors Message Passing Interface (MPI) is used. MPI_Allgather collective is exercised for spikes exchange after each interval across distributed memory systems. The increase in number of processors though results in achieving concurrency and better performance but it inversely affects MPI_Allgather which increases communication time between processors. This necessitates improving communication methodology to decrease the spikes exchange time over distributed memory systems. This work has improved MPI_Allgather method using Remote Memory Access (RMA) by moving two-sided communication to one-sided communication, and use of recursive doubling mechanism facilitates achieving efficient communication between the processors in precise steps. This approach enhanced communication concurrency and has improved overall runtime making NEURON more efficient for simulation of large neuronal network models.
Bozkuş, Zeki
2016-01-01
Increase in complexity of neuronal network models escalated the efforts to make NEURON simulation environment efficient. The computational neuroscientists divided the equations into subnets amongst multiple processors for achieving better hardware performance. On parallel machines for neuronal networks, interprocessor spikes exchange consumes large section of overall simulation time. In NEURON for communication between processors Message Passing Interface (MPI) is used. MPI_Allgather collective is exercised for spikes exchange after each interval across distributed memory systems. The increase in number of processors though results in achieving concurrency and better performance but it inversely affects MPI_Allgather which increases communication time between processors. This necessitates improving communication methodology to decrease the spikes exchange time over distributed memory systems. This work has improved MPI_Allgather method using Remote Memory Access (RMA) by moving two-sided communication to one-sided communication, and use of recursive doubling mechanism facilitates achieving efficient communication between the processors in precise steps. This approach enhanced communication concurrency and has improved overall runtime making NEURON more efficient for simulation of large neuronal network models. PMID:27413363
NASA Astrophysics Data System (ADS)
Siddiqui, Maheen; Wedemann, Roseli S.; Jensen, Henrik Jeldtoft
2018-01-01
We explore statistical characteristics of avalanches associated with the dynamics of a complex-network model, where two modules corresponding to sensorial and symbolic memories interact, representing unconscious and conscious mental processes. The model illustrates Freud's ideas regarding the neuroses and that consciousness is related with symbolic and linguistic memory activity in the brain. It incorporates the Stariolo-Tsallis generalization of the Boltzmann Machine in order to model memory retrieval and associativity. In the present work, we define and measure avalanche size distributions during memory retrieval, in order to gain insight regarding basic aspects of the functioning of these complex networks. The avalanche sizes defined for our model should be related to the time consumed and also to the size of the neuronal region which is activated, during memory retrieval. This allows the qualitative comparison of the behaviour of the distribution of cluster sizes, obtained during fMRI measurements of the propagation of signals in the brain, with the distribution of avalanche sizes obtained in our simulation experiments. This comparison corroborates the indication that the Nonextensive Statistical Mechanics formalism may indeed be more well suited to model the complex networks which constitute brain and mental structure.
Design and Implementation of Distributed Crawler System Based on Scrapy
NASA Astrophysics Data System (ADS)
Fan, Yuhao
2018-01-01
At present, some large-scale search engines at home and abroad only provide users with non-custom search services, and a single-machine web crawler cannot sovle the difficult task. In this paper, Through the study and research of the original Scrapy framework, the original Scrapy framework is improved by combining Scrapy and Redis, a distributed crawler system based on Web information Scrapy framework is designed and implemented, and Bloom Filter algorithm is applied to dupefilter modul to reduce memory consumption. The movie information captured from douban is stored in MongoDB, so that the data can be processed and analyzed. The results show that distributed crawler system based on Scrapy framework is more efficient and stable than the single-machine web crawler system.
Automatic selection of dynamic data partitioning schemes for distributed memory multicomputers
NASA Technical Reports Server (NTRS)
Palermo, Daniel J.; Banerjee, Prithviraj
1995-01-01
For distributed memory multicomputers such as the Intel Paragon, the IBM SP-2, the NCUBE/2, and the Thinking Machines CM-5, the quality of the data partitioning for a given application is crucial to obtaining high performance. This task has traditionally been the user's responsibility, but in recent years much effort has been directed to automating the selection of data partitioning schemes. Several researchers have proposed systems that are able to produce data distributions that remain in effect for the entire execution of an application. For complex programs, however, such static data distributions may be insufficient to obtain acceptable performance. The selection of distributions that dynamically change over the course of a program's execution adds another dimension to the data partitioning problem. In this paper, we present a technique that can be used to automatically determine which partitionings are most beneficial over specific sections of a program while taking into account the added overhead of performing redistribution. This system is being built as part of the PARADIGM (PARAllelizing compiler for DIstributed memory General-purpose Multicomputers) project at the University of Illinois. The complete system will provide a fully automated means to parallelize programs written in a serial programming model obtaining high performance on a wide range of distributed-memory multicomputers.
Paging memory from random access memory to backing storage in a parallel computer
Archer, Charles J; Blocksome, Michael A; Inglett, Todd A; Ratterman, Joseph D; Smith, Brian E
2013-05-21
Paging memory from random access memory (`RAM`) to backing storage in a parallel computer that includes a plurality of compute nodes, including: executing a data processing application on a virtual machine operating system in a virtual machine on a first compute node; providing, by a second compute node, backing storage for the contents of RAM on the first compute node; and swapping, by the virtual machine operating system in the virtual machine on the first compute node, a page of memory from RAM on the first compute node to the backing storage on the second compute node.
A high performance parallel algorithm for 1-D FFT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Agarwal, R.C.; Gustavson, F.G.; Zubair, M.
1994-12-31
In this paper the authors propose a parallel high performance FFT algorithm based on a multi-dimensional formulation. They use this to solve a commonly encountered FFT based kernel on a distributed memory parallel machine, the IBM scalable parallel system, SP1. The kernel requires a forward FFT computation of an input sequence, multiplication of the transformed data by a coefficient array, and finally an inverse FFT computation of the resultant data. They show that the multi-dimensional formulation helps in reducing the communication costs and also improves the single node performance by effectively utilizing the memory system of the node. They implementedmore » this kernel on the IBM SP1 and observed a performance of 1.25 GFLOPS on a 64-node machine.« less
Block-Parallel Data Analysis with DIY2
DOE Office of Scientific and Technical Information (OSTI.GOV)
Morozov, Dmitriy; Peterka, Tom
DIY2 is a programming model and runtime for block-parallel analytics on distributed-memory machines. Its main abstraction is block-structured data parallelism: data are decomposed into blocks; blocks are assigned to processing elements (processes or threads); computation is described as iterations over these blocks, and communication between blocks is defined by reusable patterns. By expressing computation in this general form, the DIY2 runtime is free to optimize the movement of blocks between slow and fast memories (disk and flash vs. DRAM) and to concurrently execute blocks residing in memory with multiple threads. This enables the same program to execute in-core, out-of-core, serial,more » parallel, single-threaded, multithreaded, or combinations thereof. This paper describes the implementation of the main features of the DIY2 programming model and optimizations to improve performance. DIY2 is evaluated on benchmark test cases to establish baseline performance for several common patterns and on larger complete analysis codes running on large-scale HPC machines.« less
PANDA: A distributed multiprocessor operating system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chubb, P.
1989-01-01
PANDA is a design for a distributed multiprocessor and an operating system. PANDA is designed to allow easy expansion of both hardware and software. As such, the PANDA kernel provides only message passing and memory and process management. The other features needed for the system (device drivers, secondary storage management, etc.) are provided as replaceable user tasks. The thesis presents PANDA's design and implementation, both hardware and software. PANDA uses multiple 68010 processors sharing memory on a VME bus, each such node potentially connected to others via a high speed network. The machine is completely homogeneous: there are no differencesmore » between processors that are detectable by programs running on the machine. A single two-processor node has been constructed. Each processor contains memory management circuits designed to allow processors to share page tables safely. PANDA presents a programmers' model similar to the hardware model: a job is divided into multiple tasks, each having its own address space. Within each task, multiple processes share code and data. Tasks can send messages to each other, and set up virtual circuits between themselves. Peripheral devices such as disc drives are represented within PANDA by tasks. PANDA divides secondary storage into volumes, each volume being accessed by a volume access task, or VAT. All knowledge about the way that data is stored on a disc is kept in its volume's VAT. The design is such that PANDA should provide a useful testbed for file systems and device drivers, as these can be installed without recompiling PANDA itself, and without rebooting the machine.« less
Dynamic data distributions in Vienna Fortran
NASA Technical Reports Server (NTRS)
Chapman, Barbara; Mehrotra, Piyush; Moritsch, Hans; Zima, Hans
1993-01-01
Vienna Fortran is a machine-independent language extension of Fortran, which is based upon the Single-Program-Multiple-Data (SPMD) paradigm and allows the user to write programs for distributed-memory systems using global addresses. The language features focus mainly on the issue of distributing data across virtual processor structures. Those features of Vienna Fortran that allow the data distributions of arrays to change dynamically, depending on runtime conditions are discussed. The relevant language features are discussed, their implementation is outlined, and how they may be used in applications is described.
Parallel performance investigations of an unstructured mesh Navier-Stokes solver
NASA Technical Reports Server (NTRS)
Mavriplis, Dimitri J.
2000-01-01
A Reynolds-averaged Navier-Stokes solver based on unstructured mesh techniques for analysis of high-lift configurations is described. The method makes use of an agglomeration multigrid solver for convergence acceleration. Implicit line-smoothing is employed to relieve the stiffness associated with highly stretched meshes. A GMRES technique is also implemented to speed convergence at the expense of additional memory usage. The solver is cache efficient and fully vectorizable, and is parallelized using a two-level hybrid MPI-OpenMP implementation suitable for shared and/or distributed memory architectures, as well as clusters of shared memory machines. Convergence and scalability results are illustrated for various high-lift cases.
NASA Astrophysics Data System (ADS)
Galiatsatos, P. G.; Tennyson, J.
2012-11-01
The most time consuming step within the framework of the UK R-matrix molecular codes is that of the diagonalization of the inner region Hamiltonian matrix (IRHM). Here we present the method that we follow to speed up this step. We use shared memory machines (SMM), distributed memory machines (DMM), the OpenMP directive based parallel language, the MPI function based parallel language, the sparse matrix diagonalizers ARPACK and PARPACK, a variation for real symmetric matrices of the official coordinate sparse matrix format and finally a parallel sparse matrix-vector product (PSMV). The efficient application of the previous techniques rely on two important facts: the sparsity of the matrix is large enough (more than 98%) and in order to get back converged results we need a small only part of the matrix spectrum.
Fast adaptive composite grid methods on distributed parallel architectures
NASA Technical Reports Server (NTRS)
Lemke, Max; Quinlan, Daniel
1992-01-01
The fast adaptive composite (FAC) grid method is compared with the adaptive composite method (AFAC) under variety of conditions including vectorization and parallelization. Results are given for distributed memory multiprocessor architectures (SUPRENUM, Intel iPSC/2 and iPSC/860). It is shown that the good performance of AFAC and its superiority over FAC in a parallel environment is a property of the algorithm and not dependent on peculiarities of any machine.
Concurrent Image Processing Executive (CIPE)
NASA Technical Reports Server (NTRS)
Lee, Meemong; Cooper, Gregory T.; Groom, Steven L.; Mazer, Alan S.; Williams, Winifred I.
1988-01-01
The design and implementation of a Concurrent Image Processing Executive (CIPE), which is intended to become the support system software for a prototype high performance science analysis workstation are discussed. The target machine for this software is a JPL/Caltech Mark IIIfp Hypercube hosted by either a MASSCOMP 5600 or a Sun-3, Sun-4 workstation; however, the design will accommodate other concurrent machines of similar architecture, i.e., local memory, multiple-instruction-multiple-data (MIMD) machines. The CIPE system provides both a multimode user interface and an applications programmer interface, and has been designed around four loosely coupled modules; (1) user interface, (2) host-resident executive, (3) hypercube-resident executive, and (4) application functions. The loose coupling between modules allows modification of a particular module without significantly affecting the other modules in the system. In order to enhance hypercube memory utilization and to allow expansion of image processing capabilities, a specialized program management method, incremental loading, was devised. To minimize data transfer between host and hypercube a data management method which distributes, redistributes, and tracks data set information was implemented.
Parallelization and automatic data distribution for nuclear reactor simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liebrock, L.M.
1997-07-01
Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directlymore » affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed.« less
NASA Technical Reports Server (NTRS)
Quealy, Angela; Cole, Gary L.; Blech, Richard A.
1993-01-01
The Application Portable Parallel Library (APPL) is a subroutine-based library of communication primitives that is callable from applications written in FORTRAN or C. APPL provides a consistent programmer interface to a variety of distributed and shared-memory multiprocessor MIMD machines. The objective of APPL is to minimize the effort required to move parallel applications from one machine to another, or to a network of homogeneous machines. APPL encompasses many of the message-passing primitives that are currently available on commercial multiprocessor systems. This paper describes APPL (version 2.3.1) and its usage, reports the status of the APPL project, and indicates possible directions for the future. Several applications using APPL are discussed, as well as performance and overhead results.
Implementing Molecular Dynamics for Hybrid High Performance Computers - 1. Short Range Forces
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, W Michael; Wang, Peng; Plimpton, Steven J
The use of accelerators such as general-purpose graphics processing units (GPGPUs) have become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power requirements. Hybrid high performance computers, machines with more than one type of floating-point processor, are now becoming more prevalent due to these advantages. In this work, we discuss several important issues in porting a large molecular dynamics code for use on parallel hybrid machines - 1) choosing a hybrid parallel decomposition that works on central processing units (CPUs) with distributed memory and accelerator cores with shared memory,more » 2) minimizing the amount of code that must be ported for efficient acceleration, 3) utilizing the available processing power from both many-core CPUs and accelerators, and 4) choosing a programming model for acceleration. We present our solution to each of these issues for short-range force calculation in the molecular dynamics package LAMMPS. We describe algorithms for efficient short range force calculation on hybrid high performance machines. We describe a new approach for dynamic load balancing of work between CPU and accelerator cores. We describe the Geryon library that allows a single code to compile with both CUDA and OpenCL for use on a variety of accelerators. Finally, we present results on a parallel test cluster containing 32 Fermi GPGPUs and 180 CPU cores.« less
Sequential behavior and its inherent tolerance to memory faults.
NASA Technical Reports Server (NTRS)
Meyer, J. F.
1972-01-01
Representation of a memory fault of a sequential machine M by a function mu on the states of M and the result of the fault by an appropriately determined machine M(mu). Given some sequential behavior B, its inherent tolerance to memory faults can then be measured in terms of the minimum memory redundancy required to realize B with a state-assigned machine having fault tolerance type tau and fault tolerance level t. A behavior having maximum inherent tolerance is exhibited, and it is shown that behaviors of the same size can have different inherent tolerance.
NASA Astrophysics Data System (ADS)
Choi, S. G.; Kim, S. H.; Choi, W. K.; Moon, G. C.; Lee, E. S.
2017-06-01
Shape memory alloy (SMA) is important material used for the medicine and aerospace industry due to its characteristics called the shape memory effect, which involves the recovery of deformed alloy to its original state through the application of temperature or stress. Consumers in modern society demand stability in parts. Electrochemical machining is one of the methods for obtained these stabilities in parts requirements. These parts of shape memory alloy require fine patterns in some applications. In order to machine a fine pattern, the electrochemical machining method is suitable. For precision electrochemical machining using different shape electrodes, the current density should be controlled precisely. And electrode shape is required for precise electrochemical machining. It is possible to obtain precise square holes on the SMA if the insulation layer controlled the unnecessary current between electrode and workpiece. If it is adjusting the unnecessary current to obtain the desired shape, it will be a great contribution to the medical industry and the aerospace industry. It is possible to process a desired shape to the shape memory alloy by micro controlling the unnecessary current. In case of the square electrode without insulation layer, it derives inexact square holes due to the unnecessary current. The results using the insulated electrode in only side show precise square holes. The removal rate improved in case of insulated electrode than others because insulation layer concentrate the applied current to the machining zone.
Communication Studies of DMP and SMP Machines
NASA Technical Reports Server (NTRS)
Sohn, Andrew; Biswas, Rupak; Chancellor, Marisa K. (Technical Monitor)
1997-01-01
Understanding the interplay between machines and problems is key to obtaining high performance on parallel machines. This paper investigates the interplay between programming paradigms and communication capabilities of parallel machines. In particular, we explicate the communication capabilities of the IBM SP-2 distributed-memory multiprocessor and the SGI PowerCHALLENGEarray symmetric multiprocessor. Two benchmark problems of bitonic sorting and Fast Fourier Transform are selected for experiments. Communication-efficient algorithms are developed to exploit the overlapping capabilities of the machines. Programs are written in Message-Passing Interface for portability and identical codes are used for both machines. Various data sizes and message sizes are used to test the machines' communication capabilities. Experimental results indicate that the communication performance of the multiprocessors are consistent with the size of messages. The SP-2 is sensitive to message size but yields a much higher communication overlapping because of the communication co-processor. The PowerCHALLENGEarray is not highly sensitive to message size and yields a low communication overlapping. Bitonic sorting yields lower performance compared to FFT due to a smaller computation-to-communication ratio.
Implementations of BLAST for parallel computers.
Jülich, A
1995-02-01
The BLAST sequence comparison programs have been ported to a variety of parallel computers-the shared memory machine Cray Y-MP 8/864 and the distributed memory architectures Intel iPSC/860 and nCUBE. Additionally, the programs were ported to run on workstation clusters. We explain the parallelization techniques and consider the pros and cons of these methods. The BLAST programs are very well suited for parallelization for a moderate number of processors. We illustrate our results using the program blastp as an example. As input data for blastp, a 799 residue protein query sequence and the protein database PIR were used.
Using data tagging to improve the performance of Kanerva's sparse distributed memory
NASA Technical Reports Server (NTRS)
Rogers, David
1988-01-01
The standard formulation of Kanerva's sparse distributed memory (SDM) involves the selection of a large number of data storage locations, followed by averaging the data contained in those locations to reconstruct the stored data. A variant of this model is discussed, in which the predominant pattern is the focus of reconstruction. First, one architecture is proposed which returns the predominant pattern rather than the average pattern. However, this model will require too much storage for most uses. Next, a hybrid model is proposed, called tagged SDM, which approximates the results of the predominant pattern machine, but is nearly as efficient as Kanerva's original formulation. Finally, some experimental results are shown which confirm that significant improvements in the recall capability of SDM can be achieved using the tagged architecture.
Compiling global name-space programs for distributed execution
NASA Technical Reports Server (NTRS)
Koelbel, Charles; Mehrotra, Piyush
1990-01-01
Distributed memory machines do not provide hardware support for a global address space. Thus programmers are forced to partition the data across the memories of the architecture and use explicit message passing to communicate data between processors. The compiler support required to allow programmers to express their algorithms using a global name-space is examined. A general method is presented for analysis of a high level source program and along with its translation to a set of independently executing tasks communicating via messages. If the compiler has enough information, this translation can be carried out at compile-time. Otherwise run-time code is generated to implement the required data movement. The analysis required in both situations is described and the performance of the generated code on the Intel iPSC/2 is presented.
Distributed Fading Memory for Stimulus Properties in the Primary Visual Cortex
Singer, Wolf; Maass, Wolfgang
2009-01-01
It is currently not known how distributed neuronal responses in early visual areas carry stimulus-related information. We made multielectrode recordings from cat primary visual cortex and applied methods from machine learning in order to analyze the temporal evolution of stimulus-related information in the spiking activity of large ensembles of around 100 neurons. We used sequences of up to three different visual stimuli (letters of the alphabet) presented for 100 ms and with intervals of 100 ms or larger. Most of the information about visual stimuli extractable by sophisticated methods of machine learning, i.e., support vector machines with nonlinear kernel functions, was also extractable by simple linear classification such as can be achieved by individual neurons. New stimuli did not erase information about previous stimuli. The responses to the most recent stimulus contained about equal amounts of information about both this and the preceding stimulus. This information was encoded both in the discharge rates (response amplitudes) of the ensemble of neurons and, when using short time constants for integration (e.g., 20 ms), in the precise timing of individual spikes (≤∼20 ms), and persisted for several 100 ms beyond the offset of stimuli. The results indicate that the network from which we recorded is endowed with fading memory and is capable of performing online computations utilizing information about temporally sequential stimuli. This result challenges models assuming frame-by-frame analyses of sequential inputs. PMID:20027205
Automatic Adaptation of Tunable Distributed Applications
2001-01-01
size, weight, and battery life, with a single CPU, less memory, smaller hard disk, and lower bandwidth network connectivity. The power of PDAs is...wireless, and bluetooth [32] facilities; thus achieving different rates of data transmission. 1 With the trend of “write once, run everywhere...applications, a single component can execute on multiple processors (or machines) in parallel. These parallel applications, written in a specialized language
2015-09-28
the performance of log-and- replay can degrade significantly for VMs configured with multiple virtual CPUs, since the shared memory communication...whether based on checkpoint replication or log-and- replay , existing HA ap- proaches use in- memory backups. The backup VM sits in the memory of a...efficiently. 15. SUBJECT TERMS High-availability virtual machines, live migration, memory and traffic overheads, application suspension, Java
Parallelization of KENO-Va Monte Carlo code
NASA Astrophysics Data System (ADS)
Ramón, Javier; Peña, Jorge
1995-07-01
KENO-Va is a code integrated within the SCALE system developed by Oak Ridge that solves the transport equation through the Monte Carlo Method. It is being used at the Consejo de Seguridad Nuclear (CSN) to perform criticality calculations for fuel storage pools and shipping casks. Two parallel versions of the code: one for shared memory machines and other for distributed memory systems using the message-passing interface PVM have been generated. In both versions the neutrons of each generation are tracked in parallel. In order to preserve the reproducibility of the results in both versions, advanced seeds for random numbers were used. The CONVEX C3440 with four processors and shared memory at CSN was used to implement the shared memory version. A FDDI network of 6 HP9000/735 was employed to implement the message-passing version using proprietary PVM. The speedup obtained was 3.6 in both cases.
Microstructure, crystallization and shape memory behavior of titania and yttria co-doped zirconia
Zeng, Xiao Mei; Du, Zehui; Schuh, Christopher A.; ...
2015-12-17
Small volume zirconia ceramics with few or no grain boundaries have been demonstrated recently to exhibit the shape memory effect. To explore the shape memory properties of yttria doped zirconia (YDZ), it is desirable to develop large, microscale grains, instead of submicron grains that result from typical processing of YDZ. In this paper, we have successfully produced single crystal micro-pillars from microscale grains encouraged by the addition of titania during processing. Titania has been doped into YDZ ceramics and its effect on the grain growth, crystallization and microscale elemental distribution of the ceramics have been systematically studied. With 5 mol%more » titania doping, the grain size can be increased up to ~4 μm, while retaining a large quantity of the desired tetragonal phase of zirconia. Finally, micro-pillars machined from tetragonal grains exhibit the expected shape memory effects where pillars made from titania-free YDZ would not.« less
On the Performance of an Algebraic MultigridSolver on Multicore Clusters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Baker, A H; Schulz, M; Yang, U M
2010-04-29
Algebraic multigrid (AMG) solvers have proven to be extremely efficient on distributed-memory architectures. However, when executed on modern multicore cluster architectures, we face new challenges that can significantly harm AMG's performance. We discuss our experiences on such an architecture and present a set of techniques that help users to overcome the associated problems, including thread and process pinning and correct memory associations. We have implemented most of the techniques in a MultiCore SUPport library (MCSup), which helps to map OpenMP applications to multicore machines. We present results using both an MPI-only and a hybrid MPI/OpenMP model.
Arithmetic Data Cube as a Data Intensive Benchmark
NASA Technical Reports Server (NTRS)
Frumkin, Michael A.; Shabano, Leonid
2003-01-01
Data movement across computational grids and across memory hierarchy of individual grid machines is known to be a limiting factor for application involving large data sets. In this paper we introduce the Data Cube Operator on an Arithmetic Data Set which we call Arithmetic Data Cube (ADC). We propose to use the ADC to benchmark grid capabilities to handle large distributed data sets. The ADC stresses all levels of grid memory by producing 2d views of an Arithmetic Data Set of d-tuples described by a small number of parameters. We control data intensity of the ADC by controlling the sizes of the views through choice of the tuple parameters.
DOE Office of Scientific and Technical Information (OSTI.GOV)
McGoldrick, P.R.
1981-01-01
The Mirror Fusion Test Facility (MFTF) is a complex facility requiring a highly-computerized Supervisory Control and Diagnostics System (SCDS) to monitor and provide control over ten subsystems; three of which require true process control. SCDS will provide physicists with a method of studying machine and plasma behavior by acquiring and processing up to four megabytes of plasma diagnostic information every five minutes. A high degree of availability and throughput is provided by a distributed computer system (nine 32-bit minicomputers on shared memory). Data, distributed across SCDS, is managed by a high-bandwidth Distributed Database Management System. The MFTF operators' control roommore » consoles use color television monitors with touch sensitive screens; this is a totally new approach. The method of handling deviations to normal machine operation and how the operator should be notified and assisted in the resolution of problems has been studied and a system designed.« less
The MIT Alewife Machine: A Large-Scale Distributed-Memory Multiprocessor
1991-06-01
Symposium on Compiler Construction, June 1986. [14] Daniel Gajski , David Kuck, Duncan Lawrie, and Ahmed Saleh. Cedar - A Large Scale Multiprocessor. In...Directory Methods. In Proceedings 17th Annual International Symposium on Computer Architecture, June 1990. [31] G . M. Papadopoulos and D.E. Culler...Monsoon: An Explicit Token-Store Ar- chitecture. In Proceedings 17th Annual International Symposium on Computer Architecture, June 1990. [32] G . F
RIP-REMOTE INTERACTIVE PARTICLE-TRACER
NASA Technical Reports Server (NTRS)
Rogers, S. E.
1994-01-01
Remote Interactive Particle-tracing (RIP) is a distributed-graphics program which computes particle traces for computational fluid dynamics (CFD) solution data sets. A particle trace is a line which shows the path a massless particle in a fluid will take; it is a visual image of where the fluid is going. The program is able to compute and display particle traces at a speed of about one trace per second because it runs on two machines concurrently. The data used by the program is contained in two files. The solution file contains data on density, momentum and energy quantities of a flow field at discrete points in three-dimensional space, while the grid file contains the physical coordinates of each of the discrete points. RIP requires two computers. A local graphics workstation interfaces with the user for program control and graphics manipulation, and a remote machine interfaces with the solution data set and performs time-intensive computations. The program utilizes two machines in a distributed mode for two reasons. First, the data to be used by the program is usually generated on the supercomputer. RIP avoids having to convert and transfer the data, eliminating any memory limitations of the local machine. Second, as computing the particle traces can be computationally expensive, RIP utilizes the power of the supercomputer for this task. Although the remote site code was developed on a CRAY, it is possible to port this to any supercomputer class machine with a UNIX-like operating system. Integration of a velocity field from a starting physical location produces the particle trace. The remote machine computes the particle traces using the particle-tracing subroutines from PLOT3D/AMES, a CFD post-processing graphics program available from COSMIC (ARC-12779). These routines use a second-order predictor-corrector method to integrate the velocity field. Then the remote program sends graphics tokens to the local machine via a remote-graphics library. The local machine interprets the graphics tokens and draws the particle traces. The program is menu driven. RIP is implemented on the silicon graphics IRIS 3000 (local workstation) with an IRIX operating system and on the CRAY2 (remote station) with a UNICOS 1.0 or 2.0 operating system. The IRIS 4D can be used in place of the IRIS 3000. The program is written in C (67%) and FORTRAN 77 (43%) and has an IRIS memory requirement of 4 MB. The remote and local stations must use the same user ID. PLOT3D/AMES unformatted data sets are required for the remote machine. The program was developed in 1988.
Review and analysis of dense linear system solver package for distributed memory machines
NASA Technical Reports Server (NTRS)
Narang, H. N.
1993-01-01
A dense linear system solver package recently developed at the University of Texas at Austin for distributed memory machine (e.g. Intel Paragon) has been reviewed and analyzed. The package contains about 45 software routines, some written in FORTRAN, and some in C-language, and forms the basis for parallel/distributed solutions of systems of linear equations encountered in many problems of scientific and engineering nature. The package, being studied by the Computer Applications Branch of the Analysis and Computation Division, may provide a significant computational resource for NASA scientists and engineers in parallel/distributed computing. Since the package is new and not well tested or documented, many of its underlying concepts and implementations were unclear; our task was to review, analyze, and critique the package as a step in the process that will enable scientists and engineers to apply it to the solution of their problems. All routines in the package were reviewed and analyzed. Underlying theory or concepts which exist in the form of published papers or technical reports, or memos, were either obtained from the author, or from the scientific literature; and general algorithms, explanations, examples, and critiques have been provided to explain the workings of these programs. Wherever the things were still unclear, communications were made with the developer (author), either by telephone or by electronic mail, to understand the workings of the routines. Whenever possible, tests were made to verify the concepts and logic employed in their implementations. A detailed report is being separately documented to explain the workings of these routines.
Concurrent Image Processing Executive (CIPE). Volume 1: Design overview
NASA Technical Reports Server (NTRS)
Lee, Meemong; Groom, Steven L.; Mazer, Alan S.; Williams, Winifred I.
1990-01-01
The design and implementation of a Concurrent Image Processing Executive (CIPE), which is intended to become the support system software for a prototype high performance science analysis workstation are described. The target machine for this software is a JPL/Caltech Mark 3fp Hypercube hosted by either a MASSCOMP 5600 or a Sun-3, Sun-4 workstation; however, the design will accommodate other concurrent machines of similar architecture, i.e., local memory, multiple-instruction-multiple-data (MIMD) machines. The CIPE system provides both a multimode user interface and an applications programmer interface, and has been designed around four loosely coupled modules: user interface, host-resident executive, hypercube-resident executive, and application functions. The loose coupling between modules allows modification of a particular module without significantly affecting the other modules in the system. In order to enhance hypercube memory utilization and to allow expansion of image processing capabilities, a specialized program management method, incremental loading, was devised. To minimize data transfer between host and hypercube, a data management method which distributes, redistributes, and tracks data set information was implemented. The data management also allows data sharing among application programs. The CIPE software architecture provides a flexible environment for scientific analysis of complex remote sensing image data, such as planetary data and imaging spectrometry, utilizing state-of-the-art concurrent computation capabilities.
Tunvirachaisakul, Chavit; Supasitthumrong, Thitiporn; Tangwongchai, Sookjareon; Hemrunroj, Solaphat; Chuchuen, Phenphichcha; Tawankanjanachot, Itthipol; Likitchareon, Yuthachai; Phanthumchinda, Kamman; Sriswasdi, Sira; Maes, Michael
2018-04-04
The Consortium to Establish a Registry for Alzheimer's Disease (CERAD) developed a neuropsychological battery (CERAD-NP) to screen patients with Alzheimer's dementia. Mild cognitive impairment (MCI) has received attention as a pre-dementia stage. To delineate the CERAD-NP features of MCI and their clinical utility to externally validate MCI diagnosis. The study included 60 patients with MCI, diagnosed using the Clinical Dementia Rating, and 63 normal controls. Data were analysed employing receiver operating characteristic analysis, Linear Support Vector Machine, Random Forest, Adaptive Boosting, Neural Network models, and t-distributed stochastic neighbour embedding (t-SNE). MCI patients were best discriminated from normal controls using a combination of Wordlist Recall, Wordlist Memory, and Verbal Fluency Test. Machine learning showed that the CERAD features learned from MCI patients and controls were not strongly predictive of the diagnosis (maximal cross-validation 77.2%), whilst t-SNE showed that there is a considerable overlap between MCI and controls. The most important features of the CERAD-NP differentiating MCI from normal controls indicate impairments in episodic and semantic memory and recall. While these features significantly discriminate MCI patients from normal controls, the tests are not predictive of MCI. © 2018 S. Karger AG, Basel.
Dynamic Load Balancing for Adaptive Computations on Distributed-Memory Machines
NASA Technical Reports Server (NTRS)
1999-01-01
Dynamic load balancing is central to adaptive mesh-based computations on large-scale parallel computers. The principal investigator has investigated various issues on the dynamic load balancing problem under NASA JOVE and JAG rants. The major accomplishments of the project are two graph partitioning algorithms and a load balancing framework. The S-HARP dynamic graph partitioner is known to be the fastest among the known dynamic graph partitioners to date. It can partition a graph of over 100,000 vertices in 0.25 seconds on a 64- processor Cray T3E distributed-memory multiprocessor while maintaining the scalability of over 16-fold speedup. Other known and widely used dynamic graph partitioners take over a second or two while giving low scalability of a few fold speedup on 64 processors. These results have been published in journals and peer-reviewed flagship conferences.
Numerical methods on some structured matrix algebra problems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jessup, E.R.
1996-06-01
This proposal concerned the design, analysis, and implementation of serial and parallel algorithms for certain structured matrix algebra problems. It emphasized large order problems and so focused on methods that can be implemented efficiently on distributed-memory MIMD multiprocessors. Such machines supply the computing power and extensive memory demanded by the large order problems. We proposed to examine three classes of matrix algebra problems: the symmetric and nonsymmetric eigenvalue problems (especially the tridiagonal cases) and the solution of linear systems with specially structured coefficient matrices. As all of these are of practical interest, a major goal of this work was tomore » translate our research in linear algebra into useful tools for use by the computational scientists interested in these and related applications. Thus, in addition to software specific to the linear algebra problems, we proposed to produce a programming paradigm and library to aid in the design and implementation of programs for distributed-memory MIMD computers. We now report on our progress on each of the problems and on the programming tools.« less
Genomically Encoded Analog Memory with Precise In vivo DNA Writing in Living Cell Populations
Farzadfard, Fahim; Lu, Timothy K.
2014-01-01
Cellular memory is crucial to many natural biological processes and for sophisticated synthetic-biology applications. Existing cellular memories rely on epigenetic switches or recombinases, which are limited in scalability and recording capacity. Here, we use the DNA of living cell populations as genomic ‘tape recorders’ for the analog and distributed recording of long-term event histories. We describe a platform for generating single-stranded DNA (ssDNA) in vivo in response to arbitrary transcriptional signals. When co-expressed with a recombinase, these intracellularly expressed ssDNAs target specific genomic DNA addresses, resulting in precise mutations that accumulate in cell populations as a function of the magnitude and duration of the inputs. This platform could enable long-term cellular recorders for environmental and biomedical applications, biological state machines, and enhanced genome engineering strategies. PMID:25395541
Vienna Fortran - A Language Specification. Version 1.1
1992-03-01
other computer archi- tectures is the fact that the memory is physically distributed among the processors; the time required to access a non-local...datum may be an order of magnitude higher than the time taken to access locally stored data. This has important consequences for program efficiency. In...machine in many aspects. It is tedious, time -consuming and error prone. It has led to particularly slow software development cycles and, in consequence
Efficient Parallel Algorithm For Direct Numerical Simulation of Turbulent Flows
NASA Technical Reports Server (NTRS)
Moitra, Stuti; Gatski, Thomas B.
1997-01-01
A distributed algorithm for a high-order-accurate finite-difference approach to the direct numerical simulation (DNS) of transition and turbulence in compressible flows is described. This work has two major objectives. The first objective is to demonstrate that parallel and distributed-memory machines can be successfully and efficiently used to solve computationally intensive and input/output intensive algorithms of the DNS class. The second objective is to show that the computational complexity involved in solving the tridiagonal systems inherent in the DNS algorithm can be reduced by algorithm innovations that obviate the need to use a parallelized tridiagonal solver.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Smith, Tyler Barratt; Urrea, Jorge Mario
2012-06-01
The aim of the Authenticating Cache architecture is to ensure that machine instructions in a Read Only Memory (ROM) are legitimate from the time the ROM image is signed (immediately after compilation) to the time they are placed in the cache for the processor to consume. The proposed architecture allows the detection of ROM image modifications during distribution or when it is loaded into memory. It also ensures that modified instructions will not execute in the processor-as the cache will not be loaded with a page that fails an integrity check. The authenticity of the instruction stream can also bemore » verified in this architecture. The combination of integrity and authenticity assurance greatly improves the security profile of a system.« less
Unconditional polarization qubit quantum memory at room temperature
NASA Astrophysics Data System (ADS)
Namazi, Mehdi; Kupchak, Connor; Jordaan, Bertus; Shahrokhshahi, Reihaneh; Figueroa, Eden
2016-05-01
The creation of global quantum key distribution and quantum communication networks requires multiple operational quantum memories. Achieving a considerable reduction in experimental and cost overhead in these implementations is thus a major challenge. Here we present a polarization qubit quantum memory fully-operational at 330K, an unheard frontier in the development of useful qubit quantum technology. This result is achieved through extensive study of how optical response of cold atomic medium is transformed by the motion of atoms at room temperature leading to an optimal characterization of room temperature quantum light-matter interfaces. Our quantum memory shows an average fidelity of 86.6 +/- 0.6% for optical pulses containing on average 1 photon per pulse, thereby defeating any classical strategy exploiting the non-unitary character of the memory efficiency. Our system significantly decreases the technological overhead required to achieve quantum memory operation and will serve as a building block for scalable and technologically simpler many-memory quantum machines. The work was supported by the US-Navy Office of Naval Research, Grant Number N00141410801 and the Simons Foundation, Grant Number SBF241180. B. J. acknowledges financial assistance of the National Research Foundation (NRF) of South Africa.
Three-Dimensional High-Lift Analysis Using a Parallel Unstructured Multigrid Solver
NASA Technical Reports Server (NTRS)
Mavriplis, Dimitri J.
1998-01-01
A directional implicit unstructured agglomeration multigrid solver is ported to shared and distributed memory massively parallel machines using the explicit domain-decomposition and message-passing approach. Because the algorithm operates on local implicit lines in the unstructured mesh, special care is required in partitioning the problem for parallel computing. A weighted partitioning strategy is described which avoids breaking the implicit lines across processor boundaries, while incurring minimal additional communication overhead. Good scalability is demonstrated on a 128 processor SGI Origin 2000 machine and on a 512 processor CRAY T3E machine for reasonably fine grids. The feasibility of performing large-scale unstructured grid calculations with the parallel multigrid algorithm is demonstrated by computing the flow over a partial-span flap wing high-lift geometry on a highly resolved grid of 13.5 million points in approximately 4 hours of wall clock time on the CRAY T3E.
Disruptions of network connectivity predict impairment in multiple behavioral domains after stroke
Ramsey, Lenny E.; Metcalf, Nicholas V.; Chacko, Ravi V.; Weinberger, Kilian; Baldassarre, Antonello; Hacker, Carl D.; Shulman, Gordon L.; Corbetta, Maurizio
2016-01-01
Deficits following stroke are classically attributed to focal damage, but recent evidence suggests a key role of distributed brain network disruption. We measured resting functional connectivity (FC), lesion topography, and behavior in multiple domains (attention, visual memory, verbal memory, language, motor, and visual) in a cohort of 132 stroke patients, and used machine-learning models to predict neurological impairment in individual subjects. We found that visual memory and verbal memory were better predicted by FC, whereas visual and motor impairments were better predicted by lesion topography. Attention and language deficits were well predicted by both. Next, we identified a general pattern of physiological network dysfunction consisting of decrease of interhemispheric integration and intrahemispheric segregation, which strongly related to behavioral impairment in multiple domains. Network-specific patterns of dysfunction predicted specific behavioral deficits, and loss of interhemispheric communication across a set of regions was associated with impairment across multiple behavioral domains. These results link key organizational features of brain networks to brain–behavior relationships in stroke. PMID:27402738
Distributed Patterns of Brain Activity that Lead to Forgetting
Öztekin, Ilke; Badre, David
2011-01-01
Proactive interference (PI), in which irrelevant information from prior learning disrupts memory performance, is widely viewed as a major cause of forgetting. However, the hypothesized spontaneous recovery (i.e., automatic retrieval) of interfering information presumed to be at the base of PI remains to be demonstrated directly. Moreover, it remains unclear at what point during learning and/or retrieval interference impacts memory performance. In order to resolve these open questions, we employed a machine-learning algorithm to identify distributed patterns of brain activity associated with retrieval of interfering information that engenders PI and causes forgetting. Participants were scanned using functional magnetic resonance imaging during an item recognition task. We induced PI by constructing sets of three consecutive study lists from the same semantic category. The classifier quantified the magnitude of category-related activity at encoding and retrieval. Category-specific activity during retrieval increased across lists, consistent with the category information becoming increasingly available and producing interference. Critically, this increase was correlated with individual differences in forgetting and the deployment of frontal lobe mechanisms that resolve interference. Collectively, these findings suggest that distributed patterns of brain activity pertaining to the interfering information during retrieval contribute to forgetting. The prefrontal cortex mediates the relationship between the spontaneous recovery of interfering information at retrieval and individual differences in memory performance. PMID:21897814
Automatic data partitioning on distributed memory multicomputers. Ph.D. Thesis
NASA Technical Reports Server (NTRS)
Gupta, Manish
1992-01-01
Distributed-memory parallel computers are increasingly being used to provide high levels of performance for scientific applications. Unfortunately, such machines are not very easy to program. A number of research efforts seek to alleviate this problem by developing compilers that take over the task of generating communication. The communication overheads and the extent of parallelism exploited in the resulting target program are determined largely by the manner in which data is partitioned across different processors of the machine. Most of the compilers provide no assistance to the programmer in the crucial task of determining a good data partitioning scheme. A novel approach is presented, the constraints-based approach, to the problem of automatic data partitioning for numeric programs. In this approach, the compiler identifies some desirable requirements on the distribution of various arrays being referenced in each statement, based on performance considerations. These desirable requirements are referred to as constraints. For each constraint, the compiler determines a quality measure that captures its importance with respect to the performance of the program. The quality measure is obtained through static performance estimation, without actually generating the target data-parallel program with explicit communication. Each data distribution decision is taken by combining all the relevant constraints. The compiler attempts to resolve any conflicts between constraints such that the overall execution time of the parallel program is minimized. This approach has been implemented as part of a compiler called Paradigm, that accepts Fortran 77 programs, and specifies the partitioning scheme to be used for each array in the program. We have obtained results on some programs taken from the Linpack and Eispack libraries, and the Perfect Benchmarks. These results are quite promising, and demonstrate the feasibility of automatic data partitioning for a significant class of scientific application programs with regular computations.
Design and implementation of a UNIX based distributed computing system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Love, J.S.; Michael, M.W.
1994-12-31
We have designed, implemented, and are running a corporate-wide distributed processing batch queue on a large number of networked workstations using the UNIX{reg_sign} operating system. Atlas Wireline researchers and scientists have used the system for over a year. The large increase in available computer power has greatly reduced the time required for nuclear and electromagnetic tool modeling. Use of remote distributed computing has simultaneously reduced computation costs and increased usable computer time. The system integrates equipment from different manufacturers, using various CPU architectures, distinct operating system revisions, and even multiple processors per machine. Various differences between the machines have tomore » be accounted for in the master scheduler. These differences include shells, command sets, swap spaces, memory sizes, CPU sizes, and OS revision levels. Remote processing across a network must be performed in a manner that is seamless from the users` perspective. The system currently uses IBM RISC System/6000{reg_sign}, SPARCstation{sup TM}, HP9000s700, HP9000s800, and DEC Alpha AXP{sup TM} machines. Each CPU in the network has its own speed rating, allowed working hours, and workload parameters. The system if designed so that all of the computers in the network can be optimally scheduled without adversely impacting the primary users of the machines. The increase in the total usable computational capacity by means of distributed batch computing can change corporate computing strategy. The integration of disparate computer platforms eliminates the need to buy one type of computer for computations, another for graphics, and yet another for day-to-day operations. It might be possible, for example, to meet all research and engineering computing needs with existing networked computers.« less
NASA Technical Reports Server (NTRS)
Sun, Xian-He; Moitra, Stuti
1996-01-01
Various tridiagonal solvers have been proposed in recent years for different parallel platforms. In this paper, the performance of three tridiagonal solvers, namely, the parallel partition LU algorithm, the parallel diagonal dominant algorithm, and the reduced diagonal dominant algorithm, is studied. These algorithms are designed for distributed-memory machines and are tested on an Intel Paragon and an IBM SP2 machines. Measured results are reported in terms of execution time and speedup. Analytical study are conducted for different communication topologies and for different tridiagonal systems. The measured results match the analytical results closely. In addition to address implementation issues, performance considerations such as problem sizes and models of speedup are also discussed.
Farzadfard, Fahim; Lu, Timothy K
2014-11-14
Cellular memory is crucial to many natural biological processes and sophisticated synthetic biology applications. Existing cellular memories rely on epigenetic switches or recombinases, which are limited in scalability and recording capacity. In this work, we use the DNA of living cell populations as genomic "tape recorders" for the analog and distributed recording of long-term event histories. We describe a platform for generating single-stranded DNA (ssDNA) in vivo in response to arbitrary transcriptional signals. When coexpressed with a recombinase, these intracellularly expressed ssDNAs target specific genomic DNA addresses, resulting in precise mutations that accumulate in cell populations as a function of the magnitude and duration of the inputs. This platform could enable long-term cellular recorders for environmental and biomedical applications, biological state machines, and enhanced genome engineering strategies. Copyright © 2014, American Association for the Advancement of Science.
High-throughput state-machine replication using software transactional memory.
Zhao, Wenbing; Yang, William; Zhang, Honglei; Yang, Jack; Luo, Xiong; Zhu, Yueqin; Yang, Mary; Luo, Chaomin
2016-11-01
State-machine replication is a common way of constructing general purpose fault tolerance systems. To ensure replica consistency, requests must be executed sequentially according to some total order at all non-faulty replicas. Unfortunately, this could severely limit the system throughput. This issue has been partially addressed by identifying non-conflicting requests based on application semantics and executing these requests concurrently. However, identifying and tracking non-conflicting requests require intimate knowledge of application design and implementation, and a custom fault tolerance solution developed for one application cannot be easily adopted by other applications. Software transactional memory offers a new way of constructing concurrent programs. In this article, we present the mechanisms needed to retrofit existing concurrency control algorithms designed for software transactional memory for state-machine replication. The main benefit for using software transactional memory in state-machine replication is that general purpose concurrency control mechanisms can be designed without deep knowledge of application semantics. As such, new fault tolerance systems based on state-machine replications with excellent throughput can be easily designed and maintained. In this article, we introduce three different concurrency control mechanisms for state-machine replication using software transactional memory, namely, ordered strong strict two-phase locking, conventional timestamp-based multiversion concurrency control, and speculative timestamp-based multiversion concurrency control. Our experiments show that speculative timestamp-based multiversion concurrency control mechanism has the best performance in all types of workload, the conventional timestamp-based multiversion concurrency control offers the worst performance due to high abort rate in the presence of even moderate contention between transactions. The ordered strong strict two-phase locking mechanism offers the simplest solution with excellent performance in low contention workload, and fairly good performance in high contention workload.
High-throughput state-machine replication using software transactional memory
Yang, William; Zhang, Honglei; Yang, Jack; Luo, Xiong; Zhu, Yueqin; Yang, Mary; Luo, Chaomin
2017-01-01
State-machine replication is a common way of constructing general purpose fault tolerance systems. To ensure replica consistency, requests must be executed sequentially according to some total order at all non-faulty replicas. Unfortunately, this could severely limit the system throughput. This issue has been partially addressed by identifying non-conflicting requests based on application semantics and executing these requests concurrently. However, identifying and tracking non-conflicting requests require intimate knowledge of application design and implementation, and a custom fault tolerance solution developed for one application cannot be easily adopted by other applications. Software transactional memory offers a new way of constructing concurrent programs. In this article, we present the mechanisms needed to retrofit existing concurrency control algorithms designed for software transactional memory for state-machine replication. The main benefit for using software transactional memory in state-machine replication is that general purpose concurrency control mechanisms can be designed without deep knowledge of application semantics. As such, new fault tolerance systems based on state-machine replications with excellent throughput can be easily designed and maintained. In this article, we introduce three different concurrency control mechanisms for state-machine replication using software transactional memory, namely, ordered strong strict two-phase locking, conventional timestamp-based multiversion concurrency control, and speculative timestamp-based multiversion concurrency control. Our experiments show that speculative timestamp-based multiversion concurrency control mechanism has the best performance in all types of workload, the conventional timestamp-based multiversion concurrency control offers the worst performance due to high abort rate in the presence of even moderate contention between transactions. The ordered strong strict two-phase locking mechanism offers the simplest solution with excellent performance in low contention workload, and fairly good performance in high contention workload. PMID:29075049
NASA Astrophysics Data System (ADS)
Elliott, Thomas J.; Gu, Mile
2018-03-01
Continuous-time stochastic processes pervade everyday experience, and the simulation of models of these processes is of great utility. Classical models of systems operating in continuous-time must typically track an unbounded amount of information about past behaviour, even for relatively simple models, enforcing limits on precision due to the finite memory of the machine. However, quantum machines can require less information about the past than even their optimal classical counterparts to simulate the future of discrete-time processes, and we demonstrate that this advantage extends to the continuous-time regime. Moreover, we show that this reduction in the memory requirement can be unboundedly large, allowing for arbitrary precision even with a finite quantum memory. We provide a systematic method for finding superior quantum constructions, and a protocol for analogue simulation of continuous-time renewal processes with a quantum machine.
Implementation and performance of parallel Prolog interpreter
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wei, S.; Kale, L.V.; Balkrishna, R.
1988-01-01
In this paper, the authors discuss the implementation of a parallel Prolog interpreter on different parallel machines. The implementation is based on the REDUCE--OR process model which exploits both AND and OR parallelism in logic programs. It is machine independent as it runs on top of the chare-kernel--a machine-independent parallel programming system. The authors also give the performance of the interpreter running a diverse set of benchmark pargrams on parallel machines including shared memory systems: an Alliant FX/8, Sequent and a MultiMax, and a non-shared memory systems: Intel iPSC/32 hypercube, in addition to its performance on a multiprocessor simulation system.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Chao; Pouransari, Hadi; Rajamanickam, Sivasankaran
We present a parallel hierarchical solver for general sparse linear systems on distributed-memory machines. For large-scale problems, this fully algebraic algorithm is faster and more memory-efficient than sparse direct solvers because it exploits the low-rank structure of fill-in blocks. Depending on the accuracy of low-rank approximations, the hierarchical solver can be used either as a direct solver or as a preconditioner. The parallel algorithm is based on data decomposition and requires only local communication for updating boundary data on every processor. Moreover, the computation-to-communication ratio of the parallel algorithm is approximately the volume-to-surface-area ratio of the subdomain owned by everymore » processor. We also provide various numerical results to demonstrate the versatility and scalability of the parallel algorithm.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Feo, J.T.
1993-10-01
This report contain papers on: Programmability and performance issues; The case of an iterative partial differential equation solver; Implementing the kernal of the Australian Region Weather Prediction Model in Sisal; Even and quarter-even prime length symmetric FFTs and their Sisal Implementations; Top-down thread generation for Sisal; Overlapping communications and computations on NUMA architechtures; Compiling technique based on dataflow analysis for funtional programming language Valid; Copy elimination for true multidimensional arrays in Sisal 2.0; Increasing parallelism for an optimization that reduces copying in IF2 graphs; Caching in on Sisal; Cache performance of Sisal Vs. FORTRAN; FFT algorithms on a shared-memory multiprocessor;more » A parallel implementation of nonnumeric search problems in Sisal; Computer vision algorithms in Sisal; Compilation of Sisal for a high-performance data driven vector processor; Sisal on distributed memory machines; A virtual shared addressing system for distributed memory Sisal; Developing a high-performance FFT algorithm in Sisal for a vector supercomputer; Implementation issues for IF2 on a static data-flow architechture; and Systematic control of parallelism in array-based data-flow computation. Selected papers have been indexed separately for inclusion in the Energy Science and Technology Database.« less
Implementing Molecular Dynamics on Hybrid High Performance Computers - Three-Body Potentials
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, W Michael; Yamada, Masako
The use of coprocessors or accelerators such as graphics processing units (GPUs) has become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power re- quirements. Hybrid high-performance computers, defined as machines with nodes containing more than one type of floating-point processor (e.g. CPU and GPU), are now becoming more prevalent due to these advantages. Although there has been extensive research into methods to efficiently use accelerators to improve the performance of molecular dynamics (MD) employing pairwise potential energy models, little is reported in the literature for models that includemore » many-body effects. 3-body terms are required for many popular potentials such as MEAM, Tersoff, REBO, AIREBO, Stillinger-Weber, Bond-Order Potentials, and others. Because the per-atom simulation times are much higher for models incorporating 3-body terms, there is a clear need for efficient algo- rithms usable on hybrid high performance computers. Here, we report a shared-memory force-decomposition for 3-body potentials that avoids memory conflicts to allow for a deterministic code with substantial performance improvements on hybrid machines. We describe modifications necessary for use in distributed memory MD codes and show results for the simulation of water with Stillinger-Weber on the hybrid Titan supercomputer. We compare performance of the 3-body model to the SPC/E water model when using accelerators. Finally, we demonstrate that our approach can attain a speedup of 5.1 with acceleration on Titan for production simulations to study water droplet freezing on a surface.« less
The art of war: beyond memory-one strategies in population games.
Lee, Christopher; Harper, Marc; Fryer, Dashiell
2015-01-01
We show that the history of play in a population game contains exploitable information that can be successfully used by sophisticated strategies to defeat memory-one opponents, including zero determinant strategies. The history allows a player to label opponents by their strategies, enabling a player to determine the population distribution and to act differentially based on the opponent's strategy in each pairwise interaction. For the Prisoner's Dilemma, these advantages lead to the natural formation of cooperative coalitions among similarly behaving players and eventually to unilateral defection against opposing player types. We show analytically and empirically that optimal play in population games depends strongly on the population distribution. For example, the optimal strategy for a minority player type against a resident TFT population is ALLC, while for a majority player type the optimal strategy versus TFT players is ALLD. Such behaviors are not accessible to memory-one strategies. Drawing inspiration from Sun Tzu's the Art of War, we implemented a non-memory-one strategy for population games based on techniques from machine learning and statistical inference that can exploit the history of play in this manner. Via simulation we find that this strategy is essentially uninvadable and can successfully invade (significantly more likely than a neutral mutant) essentially all known memory-one strategies for the Prisoner's Dilemma, including ALLC (always cooperate), ALLD (always defect), tit-for-tat (TFT), win-stay-lose-shift (WSLS), and zero determinant (ZD) strategies, including extortionate and generous strategies.
Efficacy of Code Optimization on Cache-based Processors
NASA Technical Reports Server (NTRS)
VanderWijngaart, Rob F.; Chancellor, Marisa K. (Technical Monitor)
1997-01-01
The current common wisdom in the U.S. is that the powerful, cost-effective supercomputers of tomorrow will be based on commodity (RISC) micro-processors with cache memories. Already, most distributed systems in the world use such hardware as building blocks. This shift away from vector supercomputers and towards cache-based systems has brought about a change in programming paradigm, even when ignoring issues of parallelism. Vector machines require inner-loop independence and regular, non-pathological memory strides (usually this means: non-power-of-two strides) to allow efficient vectorization of array operations. Cache-based systems require spatial and temporal locality of data, so that data once read from main memory and stored in high-speed cache memory is used optimally before being written back to main memory. This means that the most cache-friendly array operations are those that feature zero or unit stride, so that each unit of data read from main memory (a cache line) contains information for the next iteration in the loop. Moreover, loops ought to be 'fat', meaning that as many operations as possible are performed on cache data-provided instruction caches do not overflow and enough registers are available. If unit stride is not possible, for example because of some data dependency, then care must be taken to avoid pathological strides, just ads on vector computers. For cache-based systems the issues are more complex, due to the effects of associativity and of non-unit block (cache line) size. But there is more to the story. Most modern micro-processors are superscalar, which means that they can issue several (arithmetic) instructions per clock cycle, provided that there are enough independent instructions in the loop body. This is another argument for providing fat loop bodies. With these restrictions, it appears fairly straightforward to produce code that will run efficiently on any cache-based system. It can be argued that although some of the important computational algorithms employed at NASA Ames require different programming styles on vector machines and cache-based machines, respectively, neither architecture class appeared to be favored by particular algorithms in principle. Practice tells us that the situation is more complicated. This report presents observations and some analysis of performance tuning for cache-based systems. We point out several counterintuitive results that serve as a cautionary reminder that memory accesses are not the only factors that determine performance, and that within the class of cache-based systems, significant differences exist.
Highly parallel sparse Cholesky factorization
NASA Technical Reports Server (NTRS)
Gilbert, John R.; Schreiber, Robert
1990-01-01
Several fine grained parallel algorithms were developed and compared to compute the Cholesky factorization of a sparse matrix. The experimental implementations are on the Connection Machine, a distributed memory SIMD machine whose programming model conceptually supplies one processor per data element. In contrast to special purpose algorithms in which the matrix structure conforms to the connection structure of the machine, the focus is on matrices with arbitrary sparsity structure. The most promising algorithm is one whose inner loop performs several dense factorizations simultaneously on a 2-D grid of processors. Virtually any massively parallel dense factorization algorithm can be used as the key subroutine. The sparse code attains execution rates comparable to those of the dense subroutine. Although at present architectural limitations prevent the dense factorization from realizing its potential efficiency, it is concluded that a regular data parallel architecture can be used efficiently to solve arbitrarily structured sparse problems. A performance model is also presented and it is used to analyze the algorithms.
Tuning collective communication for Partitioned Global Address Space programming models
Nishtala, Rajesh; Zheng, Yili; Hargrove, Paul H.; ...
2011-06-12
Partitioned Global Address Space (PGAS) languages offer programmers the convenience of a shared memory programming style combined with locality control necessary to run on large-scale distributed memory systems. Even within a PGAS language programmers often need to perform global communication operations such as broadcasts or reductions, which are best performed as collective operations in which a group of threads work together to perform the operation. In this study we consider the problem of implementing collective communication within PGAS languages and explore some of the design trade-offs in both the interface and implementation. In particular, PGAS collectives have semantic issues thatmore » are different than in send–receive style message passing programs, and different implementation approaches that take advantage of the one-sided communication style in these languages. We present an implementation framework for PGAS collectives as part of the GASNet communication layer, which supports shared memory, distributed memory and hybrids. The framework supports a broad set of algorithms for each collective, over which the implementation may be automatically tuned. In conclusion, we demonstrate the benefit of optimized GASNet collectives using application benchmarks written in UPC, and demonstrate that the GASNet collectives can deliver scalable performance on a variety of state-of-the-art parallel machines including a Cray XT4, an IBM BlueGene/P, and a Sun Constellation system with InfiniBand interconnect.« less
Identification of memory reactivation during sleep by EEG classification.
Belal, Suliman; Cousins, James; El-Deredy, Wael; Parkes, Laura; Schneider, Jules; Tsujimura, Hikaru; Zoumpoulaki, Alexia; Perapoch, Marta; Santamaria, Lorena; Lewis, Penelope
2018-04-17
Memory reactivation during sleep is critical for consolidation, but also extremely difficult to measure as it is subtle, distributed and temporally unpredictable. This article reports a novel method for detecting such reactivation in standard sleep recordings. During learning, participants produced a complex sequence of finger presses, with each finger cued by a distinct audio-visual stimulus. Auditory cues were then re-played during subsequent sleep to trigger neural reactivation through a method known as targeted memory reactivation (TMR). Next, we used electroencephalography data from the learning session to train a machine learning classifier, and then applied this classifier to sleep data to determine how successfully each tone had elicited memory reactivation. Neural reactivation was classified above chance in all participants when TMR was applied in SWS, and in 5 of the 14 participants to whom TMR was applied in N2. Classification success reduced across numerous repetitions of the tone cue, suggesting either a gradually reducing responsiveness to such cues or a plasticity-related change in the neural signature as a result of cueing. We believe this method will be valuable for future investigations of memory consolidation. Copyright © 2018 Elsevier Inc. All rights reserved.
A variable-mode stator consequent pole memory machine
NASA Astrophysics Data System (ADS)
Yang, Hui; Lyu, Shukang; Lin, Heyun; Zhu, Z. Q.
2018-05-01
In this paper, a variable-mode concept is proposed for the speed range extension of a stator-consequent-pole memory machine (SCPMM). An integrated permanent magnet (PM) and electrically excited control scheme is utilized to simplify the flux-weakening control instead of relatively complicated continuous PM magnetization control. Due to the nature of memory machine, the magnetization state of low coercive force (LCF) magnets can be easily changed by applying either a positive or negative current pulse. Therefore, the number of PM poles may be changed to satisfy the specific performance requirement under different speed ranges, i.e. the machine with all PM poles can offer high torque output while that with half PM poles provides wide constant power range. In addition, the SCPMM with non-magnetized PMs can be considered as a dual-three phase electrically excited reluctance machine, which can be fed by an open-winding based dual inverters that provide direct current (DC) bias excitation to further extend the speed range. The effectiveness of the proposed variable-mode operation for extending its operating region and improving the system reliability is verified by both finite element analysis (FEA) and experiments.
Sengupta, Partho P; Huang, Yen-Min; Bansal, Manish; Ashrafi, Ali; Fisher, Matt; Shameer, Khader; Gall, Walt; Dudley, Joel T
2016-06-01
Associating a patient's profile with the memories of prototypical patients built through previous repeat clinical experience is a key process in clinical judgment. We hypothesized that a similar process using a cognitive computing tool would be well suited for learning and recalling multidimensional attributes of speckle tracking echocardiography data sets derived from patients with known constrictive pericarditis and restrictive cardiomyopathy. Clinical and echocardiographic data of 50 patients with constrictive pericarditis and 44 with restrictive cardiomyopathy were used for developing an associative memory classifier-based machine-learning algorithm. The speckle tracking echocardiography data were normalized in reference to 47 controls with no structural heart disease, and the diagnostic area under the receiver operating characteristic curve of the associative memory classifier was evaluated for differentiating constrictive pericarditis from restrictive cardiomyopathy. Using only speckle tracking echocardiography variables, associative memory classifier achieved a diagnostic area under the curve of 89.2%, which improved to 96.2% with addition of 4 echocardiographic variables. In comparison, the area under the curve of early diastolic mitral annular velocity and left ventricular longitudinal strain were 82.1% and 63.7%, respectively. Furthermore, the associative memory classifier demonstrated greater accuracy and shorter learning curves than other machine-learning approaches, with accuracy asymptotically approaching 90% after a training fraction of 0.3 and remaining flat at higher training fractions. This study demonstrates feasibility of a cognitive machine-learning approach for learning and recalling patterns observed during echocardiographic evaluations. Incorporation of machine-learning algorithms in cardiac imaging may aid standardized assessments and support the quality of interpretations, particularly for novice readers with limited experience. © 2016 American Heart Association, Inc.
Clark, Ian A.; Niehaus, Katherine E.; Duff, Eugene P.; Di Simplicio, Martina C.; Clifford, Gari D.; Smith, Stephen M.; Mackay, Clare E.; Woolrich, Mark W.; Holmes, Emily A.
2014-01-01
After psychological trauma, why do some only some parts of the traumatic event return as intrusive memories while others do not? Intrusive memories are key to cognitive behavioural treatment for post-traumatic stress disorder, and an aetiological understanding is warranted. We present here analyses using multivariate pattern analysis (MVPA) and a machine learning classifier to investigate whether peri-traumatic brain activation was able to predict later intrusive memories (i.e. before they had happened). To provide a methodological basis for understanding the context of the current results, we first show how functional magnetic resonance imaging (fMRI) during an experimental analogue of trauma (a trauma film) via a prospective event-related design was able to capture an individual's later intrusive memories. Results showed widespread increases in brain activation at encoding when viewing a scene in the scanner that would later return as an intrusive memory in the real world. These fMRI results were replicated in a second study. While traditional mass univariate regression analysis highlighted an association between brain processing and symptomatology, this is not the same as prediction. Using MVPA and a machine learning classifier, it was possible to predict later intrusive memories across participants with 68% accuracy, and within a participant with 97% accuracy; i.e. the classifier could identify out of multiple scenes those that would later return as an intrusive memory. We also report here brain networks key in intrusive memory prediction. MVPA opens the possibility of decoding brain activity to reconstruct idiosyncratic cognitive events with relevance to understanding and predicting mental health symptoms. PMID:25151915
Data flow language and interpreter for a reconfigurable distributed data processor
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hurt, A.D.; Heath, J.R.
1982-01-01
An analytic language and an interpreter whereby an applications data flow graph may serve as an input to a reconfigurable distributed data processor is proposed. The architecture considered consists of a number of loosely coupled computing elements (CES) which may be linked to data and file memories through fully nonblocking interconnect networks. The real-time performance of such an architecture depends upon its ability to alter its topology in response to changes in application, asynchronous data rates and faults. Such a data flow language enhances the versatility of a reconfigurable architecture by allowing the user to specify the machine's topology atmore » a very high level. 11 references.« less
Parallel spatial direct numerical simulations on the Intel iPSC/860 hypercube
NASA Technical Reports Server (NTRS)
Joslin, Ronald D.; Zubair, Mohammad
1993-01-01
The implementation and performance of a parallel spatial direct numerical simulation (PSDNS) approach on the Intel iPSC/860 hypercube is documented. The direct numerical simulation approach is used to compute spatially evolving disturbances associated with the laminar-to-turbulent transition in boundary-layer flows. The feasibility of using the PSDNS on the hypercube to perform transition studies is examined. The results indicate that the direct numerical simulation approach can effectively be parallelized on a distributed-memory parallel machine. By increasing the number of processors nearly ideal linear speedups are achieved with nonoptimized routines; slower than linear speedups are achieved with optimized (machine dependent library) routines. This slower than linear speedup results because the Fast Fourier Transform (FFT) routine dominates the computational cost and because the routine indicates less than ideal speedups. However with the machine-dependent routines the total computational cost decreases by a factor of 4 to 5 compared with standard FORTRAN routines. The computational cost increases linearly with spanwise wall-normal and streamwise grid refinements. The hypercube with 32 processors was estimated to require approximately twice the amount of Cray supercomputer single processor time to complete a comparable simulation; however it is estimated that a subgrid-scale model which reduces the required number of grid points and becomes a large-eddy simulation (PSLES) would reduce the computational cost and memory requirements by a factor of 10 over the PSDNS. This PSLES implementation would enable transition simulations on the hypercube at a reasonable computational cost.
Proceedings of the second SISAL users` conference
DOE Office of Scientific and Technical Information (OSTI.GOV)
Feo, J T; Frerking, C; Miller, P J
1992-12-01
This report contains papers on the following topics: A sisal code for computing the fourier transform on S{sub N}; five ways to fill your knapsack; simulating material dislocation motion in sisal; candis as an interface for sisal; parallelisation and performance of the burg algorithm on a shared-memory multiprocessor; use of genetic algorithm in sisal to solve the file design problem; implementing FFT`s in sisal; programming and evaluating the performance of signal processing applications in the sisal programming environment; sisal and Von Neumann-based languages: translation and intercommunication; an IF2 code generator for ADAM architecture; program partitioning for NUMA multiprocessor computer systems;more » mapping functional parallelism on distributed memory machines; implicit array copying: prevention is better than cure ; mathematical syntax for sisal; an approach for optimizing recursive functions; implementing arrays in sisal 2.0; Fol: an object oriented extension to the sisal language; twine: a portable, extensible sisal execution kernel; and investigating the memory performance of the optimizing sisal compiler.« less
Debugging Fortran on a shared memory machine
DOE Office of Scientific and Technical Information (OSTI.GOV)
Allen, T.R.; Padua, D.A.
1987-01-01
Debugging on a parallel processor is more difficult than debugging on a serial machine because errors in a parallel program may introduce nondeterminism. The approach to parallel debugging presented here attempts to reduce the problem of debugging on a parallel machine to that of debugging on a serial machine by automatically detecting nondeterminism. 20 refs., 6 figs.
Hardware support for software controlled fast multiplexing of performance counters
Salapura, Valentina; Wisniewski, Robert W
2013-10-01
Performance counters may be operable to collect one or more counts of one or more selected activities, and registers may be operable to store a set of performance counter configurations. A state machine may be operable to automatically select a register from the registers for reconfiguring the one or more performance counters in response to receiving a first signal. The state machine may be further operable to reconfigure the one or more performance counters based on a configuration specified in the selected register. The state machine yet further may be operable to copy data in selected one or more of the performance counters to a memory location, or to copy data from the memory location to the counters, in response to receiving a second signal. The state machine may be operable to store or restore the counter values and state machine configuration in response to a context switch event.
Hardware support for software controlled fast multiplexing of performance counters
Salapura, Valentina; Wisniewski, Robert W.
2013-01-01
Performance counters may be operable to collect one or more counts of one or more selected activities, and registers may be operable to store a set of performance counter configurations. A state machine may be operable to automatically select a register from the registers for reconfiguring the one or more performance counters in response to receiving a first signal. The state machine may be further operable to reconfigure the one or more performance counters based on a configuration specified in the selected register. The state machine yet further may be operable to copy data in selected one or more of the performance counters to a memory location, or to copy data from the memory location to the counters, in response to receiving a second signal. The state machine may be operable to store or restore the counter values and state machine configuration in response to a context switch event.
Effect of cutting parameters on strain hardening of nickel–titanium shape memory alloy
NASA Astrophysics Data System (ADS)
Wang, Guijie; Liu, Zhanqiang; Ai, Xing; Huang, Weimin; Niu, Jintao
2018-07-01
Nickel–titanium shape memory alloy (SMA) has been widely used as implant materials due to its good biocompatibility, shape memory property and super-elasticity. However, the severe strain hardening is a main challenge due to cutting force and temperature caused by machining. An orthogonal experiment of nickel–titanium SMA with different milling parameters conditions was conducted in this paper. On the one hand, the effect of cutting parameters on work hardening is obtained. It is found that the cutting speed has the most important effect on work hardening. The depth of machining induced layer and the degree of hardening become smaller with the increase of cutting speed when the cutting speed is less than 200 m min‑1 and then get larger with further increase of cutting speed. The relative intensity of diffraction peak increases as the cutting speed increase. In addition, all of the depth of machining induced layer, the degree of hardening and the relative intensity of diffraction peak increase when the feed rate increases. On the other hand, it is found that the depth of machining induced layer is closely related with the degree of hardening and phase transition. The higher the content of austenite in the machined surface is, the higher the degree of hardening will be. The depth of the machining induced layer increases with the degree of hardening increasing.
NASA Astrophysics Data System (ADS)
Hassan, A. H.; Fluke, C. J.; Barnes, D. G.
2012-09-01
Upcoming and future astronomy research facilities will systematically generate terabyte-sized data sets moving astronomy into the Petascale data era. While such facilities will provide astronomers with unprecedented levels of accuracy and coverage, the increases in dataset size and dimensionality will pose serious computational challenges for many current astronomy data analysis and visualization tools. With such data sizes, even simple data analysis tasks (e.g. calculating a histogram or computing data minimum/maximum) may not be achievable without access to a supercomputing facility. To effectively handle such dataset sizes, which exceed today's single machine memory and processing limits, we present a framework that exploits the distributed power of GPUs and many-core CPUs, with a goal of providing data analysis and visualizing tasks as a service for astronomers. By mixing shared and distributed memory architectures, our framework effectively utilizes the underlying hardware infrastructure handling both batched and real-time data analysis and visualization tasks. Offering such functionality as a service in a “software as a service” manner will reduce the total cost of ownership, provide an easy to use tool to the wider astronomical community, and enable a more optimized utilization of the underlying hardware infrastructure.
Computerized scoring algorithms for the Autobiographical Memory Test.
Takano, Keisuke; Gutenbrunner, Charlotte; Martens, Kris; Salmon, Karen; Raes, Filip
2018-02-01
Reduced specificity of autobiographical memories is a hallmark of depressive cognition. Autobiographical memory (AM) specificity is typically measured by the Autobiographical Memory Test (AMT), in which respondents are asked to describe personal memories in response to emotional cue words. Due to this free descriptive responding format, the AMT relies on experts' hand scoring for subsequent statistical analyses. This manual coding potentially impedes research activities in big data analytics such as large epidemiological studies. Here, we propose computerized algorithms to automatically score AM specificity for the Dutch (adult participants) and English (youth participants) versions of the AMT by using natural language processing and machine learning techniques. The algorithms showed reliable performances in discriminating specific and nonspecific (e.g., overgeneralized) autobiographical memories in independent testing data sets (area under the receiver operating characteristic curve > .90). Furthermore, outcome values of the algorithms (i.e., decision values of support vector machines) showed a gradient across similar (e.g., specific and extended memories) and different (e.g., specific memory and semantic associates) categories of AMT responses, suggesting that, for both adults and youth, the algorithms well capture the extent to which a memory has features of specific memories. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Parallel implementation of D-Phylo algorithm for maximum likelihood clusters.
Malik, Shamita; Sharma, Dolly; Khatri, Sunil Kumar
2017-03-01
This study explains a newly developed parallel algorithm for phylogenetic analysis of DNA sequences. The newly designed D-Phylo is a more advanced algorithm for phylogenetic analysis using maximum likelihood approach. The D-Phylo while misusing the seeking capacity of k -means keeps away from its real constraint of getting stuck at privately conserved motifs. The authors have tested the behaviour of D-Phylo on Amazon Linux Amazon Machine Image(Hardware Virtual Machine)i2.4xlarge, six central processing unit, 122 GiB memory, 8 × 800 Solid-state drive Elastic Block Store volume, high network performance up to 15 processors for several real-life datasets. Distributing the clusters evenly on all the processors provides us the capacity to accomplish a near direct speed if there should arise an occurrence of huge number of processors.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fischler, M.
1992-04-01
The issues to be addressed here are those of balance'' in machine architecture. By this, we mean how much emphasis must be placed on various aspects of the system to maximize its usefulness for physics. There are three components that contribute to the utility of a system: How the machine can be used, how big a problem can be attacked, and what the effective capabilities (power) of the hardware are like. The effective power issue is a matter of evaluating the impact of design decisions trading off architectural features such as memory bandwidth and interprocessor communication capabilities. What is studiedmore » is the effect these machine parameters have on how quickly the system can solve desired problems. There is a reasonable method for studying this: One selects a few representative algorithms and computes the impact of changing memory bandwidths, and so forth. The only room for controversy here is in the selection of representative problems. The issue of how big a problem can be attacked boils down to a balance of memory size versus power. Although this is a balance issue it is very different than the effective power situation, because no firm answer can be given at this time. The power to memory ratio is highly problem dependent, and optimizing it requires several pieces of physics input, including: how big a lattice is needed for interesting results; what sort of algorithms are best to use; and how many sweeps are needed to get valid results. We seem to be at the threshold of learning things about these issues, but for now, the memory size issue will necessarily be addressed in terms of best guesses, rules of thumb, and researchers' opinions.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fischler, M.
1992-04-01
The issues to be addressed here are those of ``balance`` in machine architecture. By this, we mean how much emphasis must be placed on various aspects of the system to maximize its usefulness for physics. There are three components that contribute to the utility of a system: How the machine can be used, how big a problem can be attacked, and what the effective capabilities (power) of the hardware are like. The effective power issue is a matter of evaluating the impact of design decisions trading off architectural features such as memory bandwidth and interprocessor communication capabilities. What is studiedmore » is the effect these machine parameters have on how quickly the system can solve desired problems. There is a reasonable method for studying this: One selects a few representative algorithms and computes the impact of changing memory bandwidths, and so forth. The only room for controversy here is in the selection of representative problems. The issue of how big a problem can be attacked boils down to a balance of memory size versus power. Although this is a balance issue it is very different than the effective power situation, because no firm answer can be given at this time. The power to memory ratio is highly problem dependent, and optimizing it requires several pieces of physics input, including: how big a lattice is needed for interesting results; what sort of algorithms are best to use; and how many sweeps are needed to get valid results. We seem to be at the threshold of learning things about these issues, but for now, the memory size issue will necessarily be addressed in terms of best guesses, rules of thumb, and researchers` opinions.« less
Unstructured grids on SIMD torus machines
NASA Technical Reports Server (NTRS)
Bjorstad, Petter E.; Schreiber, Robert
1994-01-01
Unstructured grids lead to unstructured communication on distributed memory parallel computers, a problem that has been considered difficult. Here, we consider adaptive, offline communication routing for a SIMD processor grid. Our approach is empirical. We use large data sets drawn from supercomputing applications instead of an analytic model of communication load. The chief contribution of this paper is an experimental demonstration of the effectiveness of certain routing heuristics. Our routing algorithm is adaptive, nonminimal, and is generally designed to exploit locality. We have a parallel implementation of the router, and we report on its performance.
NASA Astrophysics Data System (ADS)
Majumder, Himadri; Maity, Kalipada
2018-03-01
Shape memory alloy has a unique capability to return to its original shape after physical deformation by applying heat or thermo-mechanical or magnetic load. In this experimental investigation, desirability function analysis (DFA), a multi-attribute decision making was utilized to find out the optimum input parameter setting during wire electrical discharge machining (WEDM) of Ni-Ti shape memory alloy. Four critical machining parameters, namely pulse on time (TON), pulse off time (TOFF), wire feed (WF) and wire tension (WT) were taken as machining inputs for the experiments to optimize three interconnected responses like cutting speed, kerf width, and surface roughness. Input parameter combination TON = 120 μs., TOFF = 55 μs., WF = 3 m/min. and WT = 8 kg-F were found to produce the optimum results. The optimum process parameters for each desired response were also attained using Taguchi’s signal-to-noise ratio. Confirmation test has been done to validate the optimum machining parameter combination which affirmed DFA was a competent approach to select optimum input parameters for the ideal response quality for WEDM of Ni-Ti shape memory alloy.
A discrete Fourier transform for virtual memory machines
NASA Technical Reports Server (NTRS)
Galant, David C.
1992-01-01
An algebraic theory of the Discrete Fourier Transform is developed in great detail. Examination of the details of the theory leads to a computationally efficient fast Fourier transform for the use on computers with virtual memory. Such an algorithm is of great use on modern desktop machines. A FORTRAN coded version of the algorithm is given for the case when the sequence of numbers to be transformed is a power of two.
1986-08-01
Craik and Lockhart describes the various levels of information processing involved in memory (8). The preliminary level ...A prevalent malfunction. Anesthesia and Analg~esia, 1985, 64: 745-747. 8. Craik , F.I.M., Lockhart , R.S. Levels of processing : A framework for memory...research. Journal of Verbal Learning and Behavior, 1972, 11: 671-684. 9. Craik , F.I.M., Lockhart , R.S. Levels of processing : A * framework for
The FORCE - A highly portable parallel programming language
NASA Technical Reports Server (NTRS)
Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger
1989-01-01
This paper explains why the FORCE parallel programming language is easily portable among six different shared-memory multiprocessors, and how a two-level macro preprocessor makes it possible to hide low-level machine dependencies and to build machine-independent high-level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared-memory multiprocessor executing them.
The FORCE: A highly portable parallel programming language
NASA Technical Reports Server (NTRS)
Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger
1989-01-01
Here, it is explained why the FORCE parallel programming language is easily portable among six different shared-memory microprocessors, and how a two-level macro preprocessor makes it possible to hide low level machine dependencies and to build machine-independent high level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared memory multiprocessor executing them.
Hybrid MPI+OpenMP Programming of an Overset CFD Solver and Performance Investigations
NASA Technical Reports Server (NTRS)
Djomehri, M. Jahed; Jin, Haoqiang H.; Biegel, Bryan (Technical Monitor)
2002-01-01
This report describes a two level parallelization of a Computational Fluid Dynamic (CFD) solver with multi-zone overset structured grids. The approach is based on a hybrid MPI+OpenMP programming model suitable for shared memory and clusters of shared memory machines. The performance investigations of the hybrid application on an SGI Origin2000 (O2K) machine is reported using medium and large scale test problems.
Influence of magnet eddy current on magnetization characteristics of variable flux memory machine
NASA Astrophysics Data System (ADS)
Yang, Hui; Lin, Heyun; Zhu, Z. Q.; Lyu, Shukang
2018-05-01
In this paper, the magnet eddy current characteristics of a newly developed variable flux memory machine (VFMM) is investigated. Firstly, the machine structure, non-linear hysteresis characteristics and eddy current modeling of low coercive force magnet are described, respectively. Besides, the PM eddy current behaviors when applying the demagnetizing current pulses are unveiled and investigated. The mismatch of the required demagnetization currents between the cases with or without considering the magnet eddy current is identified. In addition, the influences of the magnet eddy current on the demagnetization effect of VFMM are analyzed. Finally, a prototype is manufactured and tested to verify the theoretical analyses.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Arumugam, Kamesh
Efficient parallel implementations of scientific applications on multi-core CPUs with accelerators such as GPUs and Xeon Phis is challenging. This requires - exploiting the data parallel architecture of the accelerator along with the vector pipelines of modern x86 CPU architectures, load balancing, and efficient memory transfer between different devices. It is relatively easy to meet these requirements for highly structured scientific applications. In contrast, a number of scientific and engineering applications are unstructured. Getting performance on accelerators for these applications is extremely challenging because many of these applications employ irregular algorithms which exhibit data-dependent control-ow and irregular memory accesses. Furthermore,more » these applications are often iterative with dependency between steps, and thus making it hard to parallelize across steps. As a result, parallelism in these applications is often limited to a single step. Numerical simulation of charged particles beam dynamics is one such application where the distribution of work and memory access pattern at each time step is irregular. Applications with these properties tend to present significant branch and memory divergence, load imbalance between different processor cores, and poor compute and memory utilization. Prior research on parallelizing such irregular applications have been focused around optimizing the irregular, data-dependent memory accesses and control-ow during a single step of the application independent of the other steps, with the assumption that these patterns are completely unpredictable. We observed that the structure of computation leading to control-ow divergence and irregular memory accesses in one step is similar to that in the next step. It is possible to predict this structure in the current step by observing the computation structure of previous steps. In this dissertation, we present novel machine learning based optimization techniques to address the parallel implementation challenges of such irregular applications on different HPC architectures. In particular, we use supervised learning to predict the computation structure and use it to address the control-ow and memory access irregularities in the parallel implementation of such applications on GPUs, Xeon Phis, and heterogeneous architectures composed of multi-core CPUs with GPUs or Xeon Phis. We use numerical simulation of charged particles beam dynamics simulation as a motivating example throughout the dissertation to present our new approach, though they should be equally applicable to a wide range of irregular applications. The machine learning approach presented here use predictive analytics and forecasting techniques to adaptively model and track the irregular memory access pattern at each time step of the simulation to anticipate the future memory access pattern. Access pattern forecasts can then be used to formulate optimization decisions during application execution which improves the performance of the application at a future time step based on the observations from earlier time steps. In heterogeneous architectures, forecasts can also be used to improve the memory performance and resource utilization of all the processing units to deliver a good aggregate performance. We used these optimization techniques and anticipation strategy to design a cache-aware, memory efficient parallel algorithm to address the irregularities in the parallel implementation of charged particles beam dynamics simulation on different HPC architectures. Experimental result using a diverse mix of HPC architectures shows that our approach in using anticipation strategy is effective in maximizing data reuse, ensuring workload balance, minimizing branch and memory divergence, and in improving resource utilization.« less
Machine parts recognition using a trinary associative memory
NASA Technical Reports Server (NTRS)
Awwal, Abdul Ahad S.; Karim, Mohammad A.; Liu, Hua-Kuang
1989-01-01
The convergence mechanism of vectors in Hopfield's neural network in relation to recognition of partially known patterns is studied in terms of both inner products and Hamming distance. It has been shown that Hamming distance should not always be used in determining the convergence of vectors. Instead, inner product weighting coefficients play a more dominant role in certain data representations for determining the convergence mechanism. A trinary neuron representation for associative memory is found to be more effective for associative recall. Applications of the trinary associative memory to reconstruct machine part images that are partially missing are demonstrated by means of computer simulation as examples of the usefulness of this approach.
Machine Parts Recognition Using A Trinary Associative Memory
NASA Astrophysics Data System (ADS)
Awwal, Abdul A. S.; Karim, Mohammad A.; Liu, Hua-Kuang
1989-05-01
The convergence mechanism of vectors in Hopfield's neural network in relation to recognition of partially known patterns is studied in terms of both inner products and Hamming distance. It has been shown that Hamming distance should not always be used in determining the convergence of vectors. Instead, inner product weighting coefficients play a more dominant role in certain data representations for determining the convergence mechanism. A trinary neuron representation for associative memory is found to be more effective for associative recall. Applications of the trinary associative memory to reconstruct machine part images that are partially missing are demonstrated by means of computer simulation as examples of the usefulness of this approach.
Improved cache performance in Monte Carlo transport calculations using energy banding
NASA Astrophysics Data System (ADS)
Siegel, A.; Smith, K.; Felker, K.; Romano, P.; Forget, B.; Beckman, P.
2014-04-01
We present an energy banding algorithm for Monte Carlo (MC) neutral particle transport simulations which depend on large cross section lookup tables. In MC codes, read-only cross section data tables are accessed frequently, exhibit poor locality, and are typically too much large to fit in fast memory. Thus, performance is often limited by long latencies to RAM, or by off-node communication latencies when the data footprint is very large and must be decomposed on a distributed memory machine. The proposed energy banding algorithm allows maximal temporal reuse of data in band sizes that can flexibly accommodate different architectural features. The energy banding algorithm is general and has a number of benefits compared to the traditional approach. In the present analysis we explore its potential to achieve improvements in time-to-solution on modern cache-based architectures.
Parallel ALLSPD-3D: Speeding Up Combustor Analysis Via Parallel Processing
NASA Technical Reports Server (NTRS)
Fricker, David M.
1997-01-01
The ALLSPD-3D Computational Fluid Dynamics code for reacting flow simulation was run on a set of benchmark test cases to determine its parallel efficiency. These test cases included non-reacting and reacting flow simulations with varying numbers of processors. Also, the tests explored the effects of scaling the simulation with the number of processors in addition to distributing a constant size problem over an increasing number of processors. The test cases were run on a cluster of IBM RS/6000 Model 590 workstations with ethernet and ATM networking plus a shared memory SGI Power Challenge L workstation. The results indicate that the network capabilities significantly influence the parallel efficiency, i.e., a shared memory machine is fastest and ATM networking provides acceptable performance. The limitations of ethernet greatly hamper the rapid calculation of flows using ALLSPD-3D.
Save Now [Y/N]? Machine Memory at War in Iain Banks' "Look to Windward"
ERIC Educational Resources Information Center
Blackmore, Tim
2010-01-01
Creating memory during and after wartime trauma is vexed by state attempts to control public and private discourse. Science fiction author Iain Banks' novel "Look to Windward" proposes different ways of preserving memory and culture, from posthuman memory devices, to artwork, to architecture, to personal, local ways of remembering.…
Analysis towards VMEM File of a Suspended Virtual Machine
NASA Astrophysics Data System (ADS)
Song, Zheng; Jin, Bo; Sun, Yongqing
With the popularity of virtual machines, forensic investigators are challenged with more complicated situations, among which discovering the evidences in virtualized environment is of significant importance. This paper mainly analyzes the file suffixed with .vmem in VMware Workstation, which stores all pseudo-physical memory into an image. The internal file structure of .vmem file is studied and disclosed. Key information about processes and threads of a suspended virtual machine is revealed. Further investigation into the Windows XP SP3 heap contents is conducted and a proof-of-concept tool is provided. Different methods to obtain forensic memory images are introduced, with both advantages and limits analyzed. We conclude with an outlook.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, W Michael; Kohlmeyer, Axel; Plimpton, Steven J
The use of accelerators such as graphics processing units (GPUs) has become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power requirements. Hybrid high-performance computers, machines with nodes containing more than one type of floating-point processor (e.g. CPU and GPU), are now becoming more prevalent due to these advantages. In this paper, we present a continuation of previous work implementing algorithms for using accelerators into the LAMMPS molecular dynamics software for distributed memory parallel hybrid machines. In our previous work, we focused on acceleration for short-range models with anmore » approach intended to harness the processing power of both the accelerator and (multi-core) CPUs. To augment the existing implementations, we present an efficient implementation of long-range electrostatic force calculation for molecular dynamics. Specifically, we present an implementation of the particle-particle particle-mesh method based on the work by Harvey and De Fabritiis. We present benchmark results on the Keeneland InfiniBand GPU cluster. We provide a performance comparison of the same kernels compiled with both CUDA and OpenCL. We discuss limitations to parallel efficiency and future directions for improving performance on hybrid or heterogeneous computers.« less
Computational Model-Based Prediction of Human Episodic Memory Performance Based on Eye Movements
NASA Astrophysics Data System (ADS)
Sato, Naoyuki; Yamaguchi, Yoko
Subjects' episodic memory performance is not simply reflected by eye movements. We use a ‘theta phase coding’ model of the hippocampus to predict subjects' memory performance from their eye movements. Results demonstrate the ability of the model to predict subjects' memory performance. These studies provide a novel approach to computational modeling in the human-machine interface.
Kacsoh, Balint Z; Greene, Casey S; Bosco, Giovanni
2017-11-06
High-throughput experiments are becoming increasingly common, and scientists must balance hypothesis-driven experiments with genome-wide data acquisition. We sought to predict novel genes involved in Drosophila learning and long-term memory from existing public high-throughput data. We performed an analysis using PILGRM, which analyzes public gene expression compendia using machine learning. We evaluated the top prediction alongside genes involved in learning and memory in IMP, an interface for functional relationship networks. We identified Grunge/Atrophin ( Gug/Atro ), a transcriptional repressor, histone deacetylase, as our top candidate. We find, through multiple, distinct assays, that Gug has an active role as a modulator of memory retention in the fly and its function is required in the adult mushroom body. Depletion of Gug specifically in neurons of the adult mushroom body, after cell division and neuronal development is complete, suggests that Gug function is important for memory retention through regulation of neuronal activity, and not by altering neurodevelopment. Our study provides a previously uncharacterized role for Gug as a possible regulator of neuronal plasticity at the interface of memory retention and memory extinction. Copyright © 2017 Kacsoh et al.
Dynamically programmable cache
NASA Astrophysics Data System (ADS)
Nakkar, Mouna; Harding, John A.; Schwartz, David A.; Franzon, Paul D.; Conte, Thomas
1998-10-01
Reconfigurable machines have recently been used as co- processors to accelerate the execution of certain algorithms or program subroutines. The problems with the above approach include high reconfiguration time and limited partial reconfiguration. By far the most critical problems are: (1) the small on-chip memory which results in slower execution time, and (2) small FPGA areas that cannot implement large subroutines. Dynamically Programmable Cache (DPC) is a novel architecture for embedded processors which offers solutions to the above problems. To solve memory access problems, DPC processors merge reconfigurable arrays with the data cache at various cache levels to create a multi-level reconfigurable machines. As a result DPC machines have both higher data accessibility and FPGA memory bandwidth. To solve the limited FPGA resource problem, DPC processors implemented multi-context switching (Virtualization) concept. Virtualization allows implementation of large subroutines with fewer FPGA cells. Additionally, DPC processors can parallelize the execution of several operations resulting in faster execution time. In this paper, the speedup improvement for DPC machines are shown to be 5X faster than an Altera FLEX10K FPGA chip and 2X faster than a Sun Ultral SPARC station for two different algorithms (convolution and motion estimation).
Content addressable memory project
NASA Technical Reports Server (NTRS)
Hall, Josh; Levy, Saul; Smith, D.; Wei, S.; Miyake, K.; Murdocca, M.
1991-01-01
The progress on the Rutgers CAM (Content Addressable Memory) Project is described. The overall design of the system is completed at the architectural level and described. The machine is composed of two kinds of cells: (1) the CAM cells which include both memory and processor, and support local processing within each cell; and (2) the tree cells, which have smaller instruction set, and provide global processing over the CAM cells. A parameterized design of the basic CAM cell is completed. Progress was made on the final specification of the CPS. The machine architecture was driven by the design of algorithms whose requirements are reflected in the resulted instruction set(s). A few of these algorithms are described.
Memory-Augmented Cellular Automata for Image Analysis.
1978-11-01
case in which each cell has memory size proportional to the logarithm of the input size, showing the increased capabilities of these machines for executing a variety of basic image analysis and recognition tasks. (Author)
Exploring Manycore Multinode Systems for Irregular Applications with FPGA Prototyping
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ceriani, Marco; Palermo, Gianluca; Secchi, Simone
We present a prototype of a multi-core architecture implemented on FPGA, designed to enable efficient execution of irregular applications on distributed shared memory machines, while maintaining high performance on regular workloads. The architecture is composed of off-the-shelf soft-core cores, local interconnection and memory interface, integrated with custom components that optimize it for irregular applications. It relies on three key elements: a global address space, multithreading, and fine-grained synchronization. Global addresses are scrambled to reduce the formation of network hot-spots, while the latency of the transactions is covered by integrating an hardware scheduler within the custom load/store buffers to take advantagemore » from the availability of multiple executions threads, increasing the efficiency in a transparent way to the application. We evaluated a dual node system irregular kernels showing scalability in the number of cores and threads.« less
Mobile and replicated alignment of arrays in data-parallel programs
NASA Technical Reports Server (NTRS)
Chatterjee, Siddhartha; Gilbert, John R.; Schreiber, Robert
1993-01-01
When a data-parallel language like FORTRAN 90 is compiled for a distributed-memory machine, aggregate data objects (such as arrays) are distributed across the processor memories. The mapping determines the amount of residual communication needed to bring operands of parallel operations into alignment with each other. A common approach is to break the mapping into two stages: first, an alignment that maps all the objects to an abstract template, and then a distribution that maps the template to the processors. We solve two facets of the problem of finding alignments that reduce residual communication: we determine alignments that vary in loops, and objects that should have replicated alignments. We show that loop-dependent mobile alignment is sometimes necessary for optimum performance, and we provide algorithms with which a compiler can determine good mobile alignments for objects within do loops. We also identify situations in which replicated alignment is either required by the program itself (via spread operations) or can be used to improve performance. We propose an algorithm based on network flow that determines which objects to replicate so as to minimize the total amount of broadcast communication in replication. This work on mobile and replicated alignment extends our earlier work on determining static alignment.
RTNN: The New Parallel Machine in Zaragoza
NASA Astrophysics Data System (ADS)
Sijs, A. J. V. D.
I report on the development of RTNN, a parallel computer designed as a 4^4 hypercube of 256 T9000 transputer nodes, each with 8 MB memory. The peak performance of the machine is expected to be 2.5 Gflops.
Distributed computing for membrane-based modeling of action potential propagation.
Porras, D; Rogers, J M; Smith, W M; Pollard, A E
2000-08-01
Action potential propagation simulations with physiologic membrane currents and macroscopic tissue dimensions are computationally expensive. We, therefore, analyzed distributed computing schemes to reduce execution time in workstation clusters by parallelizing solutions with message passing. Four schemes were considered in two-dimensional monodomain simulations with the Beeler-Reuter membrane equations. Parallel speedups measured with each scheme were compared to theoretical speedups, recognizing the relationship between speedup and code portions that executed serially. A data decomposition scheme based on total ionic current provided the best performance. Analysis of communication latencies in that scheme led to a load-balancing algorithm in which measured speedups at 89 +/- 2% and 75 +/- 8% of theoretical speedups were achieved in homogeneous and heterogeneous clusters of workstations. Speedups in this scheme with the Luo-Rudy dynamic membrane equations exceeded 3.0 with eight distributed workstations. Cluster speedups were comparable to those measured during parallel execution on a shared memory machine.
Nonlinear machine learning and design of reconfigurable digital colloids.
Long, Andrew W; Phillips, Carolyn L; Jankowksi, Eric; Ferguson, Andrew L
2016-09-14
Digital colloids, a cluster of freely rotating "halo" particles tethered to the surface of a central particle, were recently proposed as ultra-high density memory elements for information storage. Rational design of these digital colloids for memory storage applications requires a quantitative understanding of the thermodynamic and kinetic stability of the configurational states within which information is stored. We apply nonlinear machine learning to Brownian dynamics simulations of these digital colloids to extract the low-dimensional intrinsic manifold governing digital colloid morphology, thermodynamics, and kinetics. By modulating the relative size ratio between halo particles and central particles, we investigate the size-dependent configurational stability and transition kinetics for the 2-state tetrahedral (N = 4) and 30-state octahedral (N = 6) digital colloids. We demonstrate the use of this framework to guide the rational design of a memory storage element to hold a block of text that trades off the competing design criteria of memory addressability and volatility.
Code of Federal Regulations, 2011 CFR
2011-07-01
... 34 Education 2 2011-07-01 2010-07-01 true Collection and distribution of vending machine income from vending machines on Federal property. 395.32 Section 395.32 Education Regulations of the Offices... Management § 395.32 Collection and distribution of vending machine income from vending machines on Federal...
Code of Federal Regulations, 2012 CFR
2012-07-01
... 34 Education 2 2012-07-01 2012-07-01 false Collection and distribution of vending machine income from vending machines on Federal property. 395.32 Section 395.32 Education Regulations of the Offices... Management § 395.32 Collection and distribution of vending machine income from vending machines on Federal...
Code of Federal Regulations, 2014 CFR
2014-07-01
... 34 Education 2 2014-07-01 2013-07-01 true Collection and distribution of vending machine income from vending machines on Federal property. 395.32 Section 395.32 Education Regulations of the Offices... Management § 395.32 Collection and distribution of vending machine income from vending machines on Federal...
Code of Federal Regulations, 2013 CFR
2013-07-01
... 34 Education 2 2013-07-01 2013-07-01 false Collection and distribution of vending machine income from vending machines on Federal property. 395.32 Section 395.32 Education Regulations of the Offices... Management § 395.32 Collection and distribution of vending machine income from vending machines on Federal...
The performance of disk arrays in shared-memory database machines
NASA Technical Reports Server (NTRS)
Katz, Randy H.; Hong, Wei
1993-01-01
In this paper, we examine how disk arrays and shared memory multiprocessors lead to an effective method for constructing database machines for general-purpose complex query processing. We show that disk arrays can lead to cost-effective storage systems if they are configured from suitably small formfactor disk drives. We introduce the storage system metric data temperature as a way to evaluate how well a disk configuration can sustain its workload, and we show that disk arrays can sustain the same data temperature as a more expensive mirrored-disk configuration. We use the metric to evaluate the performance of disk arrays in XPRS, an operational shared-memory multiprocessor database system being developed at the University of California, Berkeley.
Proceedings of the 14th International Conference on the Numerical Simulation of Plasmas
NASA Astrophysics Data System (ADS)
Partial Contents are as follows: Numerical Simulations of the Vlasov-Maxwell Equations by Coupled Particle-Finite Element Methods on Unstructured Meshes; Electromagnetic PIC Simulations Using Finite Elements on Unstructured Grids; Modelling Travelling Wave Output Structures with the Particle-in-Cell Code CONDOR; SST--A Single-Slice Particle Simulation Code; Graphical Display and Animation of Data Produced by Electromagnetic, Particle-in-Cell Codes; A Post-Processor for the PEST Code; Gray Scale Rendering of Beam Profile Data; A 2D Electromagnetic PIC Code for Distributed Memory Parallel Computers; 3-D Electromagnetic PIC Simulation on the NRL Connection Machine; Plasma PIC Simulations on MIMD Computers; Vlasov-Maxwell Algorithm for Electromagnetic Plasma Simulation on Distributed Architectures; MHD Boundary Layer Calculation Using the Vortex Method; and Eulerian Codes for Plasma Simulations.
Salvado, José; Espírito-Santo, António; Calado, Maria
2012-01-01
This paper proposes a distributed system for analysis and monitoring (DSAM) of vibrations and acoustic noise, which consists of an array of intelligent modules, sensor modules, communication bus and a host PC acting as data center. The main advantages of the DSAM are its modularity, scalability, and flexibility for use of different type of sensors/transducers, with analog or digital outputs, and for signals of different nature. Its final cost is also significantly lower than other available commercial solutions. The system is reconfigurable, can operate either with synchronous or asynchronous modes, with programmable sampling frequencies, 8-bit or 12-bit resolution and a memory buffer of 15 kbyte. It allows real-time data-acquisition for signals of different nature, in applications that require a large number of sensors, thus it is suited for monitoring of vibrations in Linear Switched Reluctance Actuators (LSRAs). The acquired data allows the full characterization of the LSRA in terms of its response to vibrations of structural origins, and the vibrations and acoustic noise emitted under normal operation. The DSAM can also be used for electrical machine condition monitoring, machine fault diagnosis, structural characterization and monitoring, among other applications. PMID:22969364
Streaming data analytics via message passing with application to graph algorithms
Plimpton, Steven J.; Shead, Tim
2014-05-06
The need to process streaming data, which arrives continuously at high-volume in real-time, arises in a variety of contexts including data produced by experiments, collections of environmental or network sensors, and running simulations. Streaming data can also be formulated as queries or transactions which operate on a large dynamic data store, e.g. a distributed database. We describe a lightweight, portable framework named PHISH which enables a set of independent processes to compute on a stream of data in a distributed-memory parallel manner. Datums are routed between processes in patterns defined by the application. PHISH can run on top of eithermore » message-passing via MPI or sockets via ZMQ. The former means streaming computations can be run on any parallel machine which supports MPI; the latter allows them to run on a heterogeneous, geographically dispersed network of machines. We illustrate how PHISH can support streaming MapReduce operations, and describe streaming versions of three algorithms for large, sparse graph analytics: triangle enumeration, subgraph isomorphism matching, and connected component finding. Lastly, we also provide benchmark timings for MPI versus socket performance of several kernel operations useful in streaming algorithms.« less
Apostolova, Liana G; Morra, Jonathan H; Green, Amity E; Hwang, Kristy S; Avedissian, Christina; Woo, Ellen; Cummings, Jeffrey L; Toga, Arthur W; Jack, Clifford R; Weiner, Michael W; Thompson, Paul M
2010-05-15
We used a previously validated automated machine learning algorithm based on adaptive boosting to segment the hippocampi in baseline and 12-month follow-up 3D T1-weighted brain MRIs of 150 cognitively normal elderly (NC), 245 mild cognitive impairment (MCI) and 97 Dementia of the Alzheimer's type (DAT) ADNI subjects. Using the radial distance mapping technique, we examined the hippocampal correlates of delayed recall performance on three well-established verbal memory tests--ADAScog delayed recall (ADAScog-DR), the Rey Auditory Verbal Learning Test -DR (AVLT-DR) and Wechsler Logical Memory II-DR (LM II-DR). We observed no significant correlations between delayed recall performance and hippocampal radial distance on any of the three verbal memory measures in NC. All three measures were associated with hippocampal volumes and radial distance in the full sample and in the MCI group at baseline and at follow-up. In DAT we observed stronger left-sided associations between hippocampal radial distance, LM II-DR and ADAScog-DR both at baseline and at follow-up. The strongest linkage between memory performance and hippocampal atrophy in the MCI sample was observed with the most challenging verbal memory test-the AVLT-DR, as opposed to the DAT sample where the least challenging test the ADAScog-DR showed strongest associations with the hippocampal structure. After controlling for baseline hippocampal atrophy, memory performance showed regionally specific associations with hippocampal radial distance in predominantly CA1 but also in subicular distribution. Copyright (c) 2009 Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Castellana, Vito G.; Tumeo, Antonino; Ferrandi, Fabrizio
Emerging applications such as data mining, bioinformatics, knowledge discovery, social network analysis are irregular. They use data structures based on pointers or linked lists, such as graphs, unbalanced trees or unstructures grids, which generates unpredictable memory accesses. These data structures usually are large, but difficult to partition. These applications mostly are memory bandwidth bounded and have high synchronization intensity. However, they also have large amounts of inherent dynamic parallelism, because they potentially perform a task for each one of the element they are exploring. Several efforts are looking at accelerating these applications on hybrid architectures, which integrate general purpose processorsmore » with reconfigurable devices. Some solutions, which demonstrated significant speedups, include custom-hand tuned accelerators or even full processor architectures on the reconfigurable logic. In this paper we present an approach for the automatic synthesis of accelerators from C, targeted at irregular applications. In contrast to typical High Level Synthesis paradigms, which construct a centralized Finite State Machine, our approach generates dynamically scheduled hardware components. While parallelism exploitation in typical HLS-generated accelerators is usually bound within a single execution flow, our solution allows concurrently running multiple execution flow, thus also exploiting the coarser grain task parallelism of irregular applications. Our approach supports multiple, multi-ported and distributed memories, and atomic memory operations. Its main objective is parallelizing as many memory operations as possible, independently from their execution time, to maximize the memory bandwidth utilization. This significantly differs from current HLS flows, which usually consider a single memory port and require precise scheduling of memory operations. A key innovation of our approach is the generation of a memory interface controller, which dynamically maps concurrent memory accesses to multiple ports. We present a case study on a typical irregular kernel, Graph Breadth First search (BFS), exploring different tradeoffs in terms of parallelism and number of memories.« less
Forsythe, J Chris [Sandia Park, NM; Xavier, Patrick G [Albuquerque, NM; Abbott, Robert G [Albuquerque, NM; Brannon, Nathan G [Albuquerque, NM; Bernard, Michael L [Tijeras, NM; Speed, Ann E [Albuquerque, NM
2009-04-28
Digital technology utilizing a cognitive model based on human naturalistic decision-making processes, including pattern recognition and episodic memory, can reduce the dependency of human-machine interactions on the abilities of a human user and can enable a machine to more closely emulate human-like responses. Such a cognitive model can enable digital technology to use cognitive capacities fundamental to human-like communication and cooperation to interact with humans.
NASA Technical Reports Server (NTRS)
Ray, R. B.
1994-01-01
OPMILL is a computer operating system for a Kearney and Trecker milling machine that provides a fast and easy way to program machine part manufacture with an IBM compatible PC. The program gives the machinist an "equation plotter" feature which plots any set of equations that define axis moves (up to three axes simultaneously) and converts those equations to a machine milling program that will move a cutter along a defined path. Other supported functions include: drill with peck, bolt circle, tap, mill arc, quarter circle, circle, circle 2 pass, frame, frame 2 pass, rotary frame, pocket, loop and repeat, and copy blocks. The system includes a tool manager that can handle up to 25 tools and automatically adjusts tool length for each tool. It will display all tool information and stop the milling machine at the appropriate time. Information for the program is entered via a series of menus and compiled to the Kearney and Trecker format. The program can then be loaded into the milling machine, the tool path graphically displayed, and tool change information or the program in Kearney and Trecker format viewed. The program has a complete file handling utility that allows the user to load the program into memory from the hard disk, save the program to the disk with comments, view directories, merge a program on the disk with one in memory, save a portion of a program in memory, and change directories. OPMILL was developed on an IBM PS/2 running DOS 3.3 with 1 MB of RAM. OPMILL was written for an IBM PC or compatible 8088 or 80286 machine connected via an RS-232 port to a Kearney and Trecker Data Mill 700/C Control milling machine. It requires a "D:" drive (fixed-disk or virtual), a browse or text display utility, and an EGA or better display. Users wishing to modify and recompile the source code will also need Turbo BASIC, Turbo C, and Crescent Software's QuickPak for Turbo BASIC. IBM PC and IBM PS/2 are registered trademarks of International Business Machines. Turbo BASIC and Turbo C are trademarks of Borland International.
Operating System For Numerically Controlled Milling Machine
NASA Technical Reports Server (NTRS)
Ray, R. B.
1992-01-01
OPMILL program is operating system for Kearney and Trecker milling machine providing fast easy way to program manufacture of machine parts with IBM-compatible personal computer. Gives machinist "equation plotter" feature, which plots equations that define movements and converts equations to milling-machine-controlling program moving cutter along defined path. System includes tool-manager software handling up to 25 tools and automatically adjusts to account for each tool. Developed on IBM PS/2 computer running DOS 3.3 with 1 MB of random-access memory.
Single bus star connected reluctance drive and method
Fahimi, Babak; Shamsi, Pourya
2016-05-10
A system and methods for operating a switched reluctance machine includes a controller, an inverter connected to the controller and to the switched reluctance machine, a hysteresis control connected to the controller and to the inverter, a set of sensors connected to the switched reluctance machine and to the controller, the switched reluctance machine further including a set of phases the controller further comprising a processor and a memory connected to the processor, wherein the processor programmed to execute a control process and a generation process.
Three-Dimensional Cellular Structures Enhanced By Shape Memory Alloys
NASA Technical Reports Server (NTRS)
Nathal, Michael V.; Krause, David L.; Wilmoth, Nathan G.; Bednarcyk, Brett A.; Baker, Eric H.
2014-01-01
This research effort explored lightweight structural concepts married with advanced smart materials to achieve a wide variety of benefits in airframe and engine components. Lattice block structures were cast from an aerospace structural titanium alloy Ti-6Al-4V and a NiTi shape memory alloy (SMA), and preliminary properties have been measured. A finite element-based modeling approach that can rapidly and accurately capture the deformation response of lattice architectures was developed. The Ti-6-4 and SMA material behavior was calibrated via experimental tests of ligaments machined from the lattice. Benchmark testing of complete lattice structures verified the main aspects of the model as well as demonstrated the advantages of the lattice structure. Shape memory behavior of a sample machined from a lattice block was also demonstrated.
Urgent Virtual Machine Eviction with Enlightened Post-Copy
2015-12-01
memory is in use, almost all of which is by Memcached. MySQL : The VMs run MySQL 5.6, and the clients execute OLTPBenchmark [3] using the Twitter...workload with scale factor of 960. The VMs are each allocated 16 cores and 30 GB of memory, and MySQL is configured with a 16 GB buffer pool in memory. The...operation mix for 5 minutes as a warm-up. At the time of migration, MySQL uses approximately 17 GB of memory, and almost all of the 30 GB memory is
Tensorial Basis Spline Collocation Method for Poisson's Equation
NASA Astrophysics Data System (ADS)
Plagne, Laurent; Berthou, Jean-Yves
2000-01-01
This paper aims to describe the tensorial basis spline collocation method applied to Poisson's equation. In the case of a localized 3D charge distribution in vacuum, this direct method based on a tensorial decomposition of the differential operator is shown to be competitive with both iterative BSCM and FFT-based methods. We emphasize the O(h4) and O(h6) convergence of TBSCM for cubic and quintic splines, respectively. We describe the implementation of this method on a distributed memory parallel machine. Performance measurements on a Cray T3E are reported. Our code exhibits high performance and good scalability: As an example, a 27 Gflops performance is obtained when solving Poisson's equation on a 2563 non-uniform 3D Cartesian mesh by using 128 T3E-750 processors. This represents 215 Mflops per processors.
Checkpoint repair for high-performance out-of-order execution machines
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hwu, W.M.W.; Patt, Y.N.
Out-or-order execution and branch prediction are two mechanisms that can be used profitably in the design of supercomputers to increase performance. Proper exception handling and branch prediction miss handling in an out-of-order execution machine to require some kind of repair mechanism which can restore the machine to a known previous state. In this paper the authors present a class of repair mechanisms using the concept of checkpointing. The authors derive several properties of checkpoint repair mechanisms. In addition, they provide algorithms for performing checkpoint repair that incur little overhead in time and modest cost in hardware, which also require nomore » additional complexity or time for use with write-back cache memory systems than they do with write-through cache memory systems, contrary to statements made by previous researchers.« less
Binary classification of items of interest in a repeatable process
Abell, Jeffrey A.; Spicer, John Patrick; Wincek, Michael Anthony; Wang, Hui; Chakraborty, Debejyo
2014-06-24
A system includes host and learning machines in electrical communication with sensors positioned with respect to an item of interest, e.g., a weld, and memory. The host executes instructions from memory to predict a binary quality status of the item. The learning machine receives signals from the sensor(s), identifies candidate features, and extracts features from the candidates that are more predictive of the binary quality status relative to other candidate features. The learning machine maps the extracted features to a dimensional space that includes most of the items from a passing binary class and excludes all or most of the items from a failing binary class. The host also compares the received signals for a subsequent item of interest to the dimensional space to thereby predict, in real time, the binary quality status of the subsequent item of interest.
Over-Distribution in Source Memory
Brainerd, C. J.; Reyna, V. F.; Holliday, R. E.; Nakamura, K.
2012-01-01
Semantic false memories are confounded with a second type of error, over-distribution, in which items are attributed to contradictory episodic states. Over-distribution errors have proved to be more common than false memories when the two are disentangled. We investigated whether over-distribution is prevalent in another classic false memory paradigm: source monitoring. It is. Conventional false memory responses (source misattributions) were predominantly over-distribution errors, but unlike semantic false memory, over-distribution also accounted for more than half of true memory responses (correct source attributions). Experimental control of over-distribution was achieved via a series of manipulations that affected either recollection of contextual details or item memory (concreteness, frequency, list-order, number of presentation contexts, and individual differences in verbatim memory). A theoretical model was used to analyze the data (conjoint process dissociation) that predicts that predicts that (a) over-distribution is directly proportional to item memory but inversely proportional to recollection and (b) item memory is not a necessary precondition for recollection of contextual details. The results were consistent with both predictions. PMID:21942494
Trinary Associative Memory Would Recognize Machine Parts
NASA Technical Reports Server (NTRS)
Liu, Hua-Kuang; Awwal, Abdul Ahad S.; Karim, Mohammad A.
1991-01-01
Trinary associative memory combines merits and overcomes major deficiencies of unipolar and bipolar logics by combining them in three-valued logic that reverts to unipolar or bipolar binary selectively, as needed to perform specific tasks. Advantage of associative memory: one obtains access to all parts of it simultaneously on basis of content, rather than address, of data. Consequently, used to exploit fully parallelism and speed of optical computing.
Properties of a memory network in psychology
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wedemann, Roseli S.; Donangelo, Raul; Carvalho, Luis A. V. de
We have previously described neurotic psychopathology and psychoanalytic working-through by an associative memory mechanism, based on a neural network model, where memory was modelled by a Boltzmann machine (BM). Since brain neural topology is selectively structured, we simulated known microscopic mechanisms that control synaptic properties, showing that the network self-organizes to a hierarchical, clustered structure. Here, we show some statistical mechanical properties of the complex networks which result from this self-organization. They indicate that a generalization of the BM may be necessary to model memory.
Properties of a memory network in psychology
NASA Astrophysics Data System (ADS)
Wedemann, Roseli S.; Donangelo, Raul; de Carvalho, Luís A. V.
2007-12-01
We have previously described neurotic psychopathology and psychoanalytic working-through by an associative memory mechanism, based on a neural network model, where memory was modelled by a Boltzmann machine (BM). Since brain neural topology is selectively structured, we simulated known microscopic mechanisms that control synaptic properties, showing that the network self-organizes to a hierarchical, clustered structure. Here, we show some statistical mechanical properties of the complex networks which result from this self-organization. They indicate that a generalization of the BM may be necessary to model memory.
Intelligent Machines in the 21st Century: Automating the Processes of Inference and Inquiry
NASA Technical Reports Server (NTRS)
Knuth, Kevin H.
2003-01-01
The last century saw the application of Boolean algebra toward the construction of computing machines, which work by applying logical transformations to information contained in their memory. The development of information theory and the generalization of Boolean algebra to Bayesian inference have enabled these computing machines. in the last quarter of the twentieth century, to be endowed with the ability to learn by making inferences from data. This revolution is just beginning as new computational techniques continue to make difficult problems more accessible. However, modern intelligent machines work by inferring knowledge using only their pre-programmed prior knowledge and the data provided. They lack the ability to ask questions, or request data that would aid their inferences. Recent advances in understanding the foundations of probability theory have revealed implications for areas other than logic. Of relevance to intelligent machines, we identified the algebra of questions as the free distributive algebra, which now allows us to work with questions in a way analogous to that which Boolean algebra enables us to work with logical statements. In this paper we describe this logic of inference and inquiry using the mathematics of partially ordered sets and the scaffolding of lattice theory, discuss the far-reaching implications of the methodology, and demonstrate its application with current examples in machine learning. Automation of both inference and inquiry promises to allow robots to perform science in the far reaches of our solar system and in other star systems by enabling them to not only make inferences from data, but also decide which question to ask, experiment to perform, or measurement to take given what they have learned and what they are designed to understand.
NASA Technical Reports Server (NTRS)
Riedel, Joseph E.; Grasso, Christopher A.
2012-01-01
VML (Virtual Machine Language) is an advanced computing environment that allows spacecraft to operate using mechanisms ranging from simple, time-oriented sequencing to advanced, multicomponent reactive systems. VML has developed in four evolutionary stages. VML 0 is a core execution capability providing multi-threaded command execution, integer data types, and rudimentary branching. VML 1 added named parameterized procedures, extensive polymorphism, data typing, branching, looping issuance of commands using run-time parameters, and named global variables. VML 2 added for loops, data verification, telemetry reaction, and an open flight adaptation architecture. VML 2.1 contains major advances in control flow capabilities for executable state machines. On the resource requirements front, VML 2.1 features a reduced memory footprint in order to fit more capability into modestly sized flight processors, and endian-neutral data access for compatibility with Intel little-endian processors. Sequence packaging has been improved with object-oriented programming constructs and the use of implicit (rather than explicit) time tags on statements. Sequence event detection has been significantly enhanced with multi-variable waiting, which allows a sequence to detect and react to conditions defined by complex expressions with multiple global variables. This multi-variable waiting serves as the basis for implementing parallel rule checking, which in turn, makes possible executable state machines. The new state machine feature in VML 2.1 allows the creation of sophisticated autonomous reactive systems without the need to develop expensive flight software. Users specify named states and transitions, along with the truth conditions required, before taking transitions. Transitions with the same signal name allow separate state machines to coordinate actions: the conditions distributed across all state machines necessary to arm a particular signal are evaluated, and once found true, that signal is raised. The selected signal then causes all identically named transitions in all present state machines to be taken simultaneously. VML 2.1 has relevance to all potential space missions, both manned and unmanned. It was under consideration for use on Orion.
Compacting de Bruijn graphs from sequencing data quickly and in low memory.
Chikhi, Rayan; Limasset, Antoine; Medvedev, Paul
2016-06-15
As the quantity of data per sequencing experiment increases, the challenges of fragment assembly are becoming increasingly computational. The de Bruijn graph is a widely used data structure in fragment assembly algorithms, used to represent the information from a set of reads. Compaction is an important data reduction step in most de Bruijn graph based algorithms where long simple paths are compacted into single vertices. Compaction has recently become the bottleneck in assembly pipelines, and improving its running time and memory usage is an important problem. We present an algorithm and a tool bcalm 2 for the compaction of de Bruijn graphs. bcalm 2 is a parallel algorithm that distributes the input based on a minimizer hashing technique, allowing for good balance of memory usage throughout its execution. For human sequencing data, bcalm 2 reduces the computational burden of compacting the de Bruijn graph to roughly an hour and 3 GB of memory. We also applied bcalm 2 to the 22 Gbp loblolly pine and 20 Gbp white spruce sequencing datasets. Compacted graphs were constructed from raw reads in less than 2 days and 40 GB of memory on a single machine. Hence, bcalm 2 is at least an order of magnitude more efficient than other available methods. Source code of bcalm 2 is freely available at: https://github.com/GATB/bcalm rayan.chikhi@univ-lille1.fr. © The Author 2016. Published by Oxford University Press.
Memory-Intensive Benchmarks: IRAM vs. Cache-Based Machines
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Gaeke, Brian R.; Husbands, Parry; Li, Xiaoye S.; Oliker, Leonid; Yelick, Katherine A.; Biegel, Bryan (Technical Monitor)
2002-01-01
The increasing gap between processor and memory performance has lead to new architectural models for memory-intensive applications. In this paper, we explore the performance of a set of memory-intensive benchmarks and use them to compare the performance of conventional cache-based microprocessors to a mixed logic and DRAM processor called VIRAM. The benchmarks are based on problem statements, rather than specific implementations, and in each case we explore the fundamental hardware requirements of the problem, as well as alternative algorithms and data structures that can help expose fine-grained parallelism or simplify memory access patterns. The benchmarks are characterized by their memory access patterns, their basic control structures, and the ratio of computation to memory operation.
Ringo: Interactive Graph Analytics on Big-Memory Machines
Perez, Yonathan; Sosič, Rok; Banerjee, Arijit; Puttagunta, Rohan; Raison, Martin; Shah, Pararth; Leskovec, Jure
2016-01-01
We present Ringo, a system for analysis of large graphs. Graphs provide a way to represent and analyze systems of interacting objects (people, proteins, webpages) with edges between the objects denoting interactions (friendships, physical interactions, links). Mining graphs provides valuable insights about individual objects as well as the relationships among them. In building Ringo, we take advantage of the fact that machines with large memory and many cores are widely available and also relatively affordable. This allows us to build an easy-to-use interactive high-performance graph analytics system. Graphs also need to be built from input data, which often resides in the form of relational tables. Thus, Ringo provides rich functionality for manipulating raw input data tables into various kinds of graphs. Furthermore, Ringo also provides over 200 graph analytics functions that can then be applied to constructed graphs. We show that a single big-memory machine provides a very attractive platform for performing analytics on all but the largest graphs as it offers excellent performance and ease of use as compared to alternative approaches. With Ringo, we also demonstrate how to integrate graph analytics with an iterative process of trial-and-error data exploration and rapid experimentation, common in data mining workloads. PMID:27081215
Ringo: Interactive Graph Analytics on Big-Memory Machines.
Perez, Yonathan; Sosič, Rok; Banerjee, Arijit; Puttagunta, Rohan; Raison, Martin; Shah, Pararth; Leskovec, Jure
2015-01-01
We present Ringo, a system for analysis of large graphs. Graphs provide a way to represent and analyze systems of interacting objects (people, proteins, webpages) with edges between the objects denoting interactions (friendships, physical interactions, links). Mining graphs provides valuable insights about individual objects as well as the relationships among them. In building Ringo, we take advantage of the fact that machines with large memory and many cores are widely available and also relatively affordable. This allows us to build an easy-to-use interactive high-performance graph analytics system. Graphs also need to be built from input data, which often resides in the form of relational tables. Thus, Ringo provides rich functionality for manipulating raw input data tables into various kinds of graphs. Furthermore, Ringo also provides over 200 graph analytics functions that can then be applied to constructed graphs. We show that a single big-memory machine provides a very attractive platform for performing analytics on all but the largest graphs as it offers excellent performance and ease of use as compared to alternative approaches. With Ringo, we also demonstrate how to integrate graph analytics with an iterative process of trial-and-error data exploration and rapid experimentation, common in data mining workloads.
Talaminos-Barroso, Alejandro; Estudillo-Valderrama, Miguel A; Roa, Laura M; Reina-Tosina, Javier; Ortega-Ruiz, Francisco
2016-06-01
M2M (Machine-to-Machine) communications represent one of the main pillars of the new paradigm of the Internet of Things (IoT), and is making possible new opportunities for the eHealth business. Nevertheless, the large number of M2M protocols currently available hinders the election of a suitable solution that satisfies the requirements that can demand eHealth applications. In the first place, to develop a tool that provides a benchmarking analysis in order to objectively select among the most relevant M2M protocols for eHealth solutions. In the second place, to validate the tool with a particular use case: the respiratory rehabilitation. A software tool, called Distributed Computing Framework (DFC), has been designed and developed to execute the benchmarking tests and facilitate the deployment in environments with a large number of machines, with independence of the protocol and performance metrics selected. DDS, MQTT, CoAP, JMS, AMQP and XMPP protocols were evaluated considering different specific performance metrics, including CPU usage, memory usage, bandwidth consumption, latency and jitter. The results obtained allowed to validate a case of use: respiratory rehabilitation of chronic obstructive pulmonary disease (COPD) patients in two scenarios with different types of requirement: Home-Based and Ambulatory. The results of the benchmark comparison can guide eHealth developers in the choice of M2M technologies. In this regard, the framework presented is a simple and powerful tool for the deployment of benchmark tests under specific environments and conditions. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Stocker, Kurt
2012-04-01
This article provides the first comprehensive conceptual account for the imagistic mental machinery that allows us to travel through time--for the time machine in our mind. It is argued that language reveals this imagistic machine and how we use it. Findings from a range of cognitive fields are theoretically unified and a recent proposal about spatialized mental time travel is elaborated on. The following novel distinctions are offered: external versus internal viewing of time; ''watching" time versus projective ''travel" through time; optional versus obligatory mental time travel; mental time travel into anteriority or posteriority versus mental time travel into the past or future; single mental time travel versus nested dual mental time travel; mental time travel in episodic memory versus mental time travel in semantic memory; and ''seeing" versus ''sensing" mental imagery. Theoretical, empirical, and applied implications are discussed. Copyright © 2012 Cognitive Science Society, Inc.
NASA Astrophysics Data System (ADS)
Cha, Moon Hoe
2007-02-01
The NearFar program is a package for carrying out an interactive nearside-farside decomposition of heavy-ion elastic scattering amplitude. The program is implemented in Java to perform numerical operations on the nearside and farside angular distributions. It contains a graphical display interface for the numerical results. A test run has been applied to the elastic O16+Si28 scattering at E=1503 MeV. Program summaryTitle of program: NearFar Catalogue identifier: ADYP_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADYP_v1_0 Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Licensing provisions: none Computers: designed for any machine capable of running Java, developed on PC-Pentium-4 Operating systems under which the program has been tested: Microsoft Windows XP (Home Edition) Program language used: Java Number of bits in a word: 64 Memory required to execute with typical data: case dependent No. of lines in distributed program, including test data, etc.: 3484 Number of bytes distributed program, including test data, etc.: 142 051 Distribution format: tar.gz Other software required: A Java runtime interpreter, or the Java Development Kit, version 5.0 Nature of physical problem: Interactive nearside-farside decomposition of heavy-ion elastic scattering amplitude. Method of solution: The user must supply a external data file or PPSM parameters which calculates theoretical values of the quantities to be decomposed. Typical running time: Problem dependent. In a test run, it is about 35 s on a 2.40 GHz Intel P4-processor machine.
Unraveling Network-induced Memory Contention: Deeper Insights with Machine Learning
Groves, Taylor Liles; Grant, Ryan; Gonzales, Aaron; ...
2017-11-21
Remote Direct Memory Access (RDMA) is expected to be an integral communication mechanism for future exascale systems enabling asynchronous data transfers, so that applications may fully utilize CPU resources while simultaneously sharing data amongst remote nodes. We examine Network-induced Memory Contention (NiMC) on Infiniband networks. We expose the interactions between RDMA, main-memory and cache, when applications and out-of-band services compete for memory resources. We then explore NiMCs resulting impact on application-level performance. For a range of hardware technologies and HPC workloads, we quantify NiMC and show that NiMCs impact grows with scale resulting in up to 3X performance degradation atmore » scales as small as 8K processes even in applications that previously have been shown to be performance resilient in the presence of noise. In addition, this work examines the problem of predicting NiMC's impact on applications by leveraging machine learning and easily accessible performance counters. This approach provides additional insights about the root cause of NiMC and facilitates dynamic selection of potential solutions. Finally, we evaluated three potential techniques to reduce NiMCs impact, namely hardware offloading, core reservation and network throttling.« less
Unraveling Network-induced Memory Contention: Deeper Insights with Machine Learning
DOE Office of Scientific and Technical Information (OSTI.GOV)
Groves, Taylor Liles; Grant, Ryan; Gonzales, Aaron
Remote Direct Memory Access (RDMA) is expected to be an integral communication mechanism for future exascale systems enabling asynchronous data transfers, so that applications may fully utilize CPU resources while simultaneously sharing data amongst remote nodes. We examine Network-induced Memory Contention (NiMC) on Infiniband networks. We expose the interactions between RDMA, main-memory and cache, when applications and out-of-band services compete for memory resources. We then explore NiMCs resulting impact on application-level performance. For a range of hardware technologies and HPC workloads, we quantify NiMC and show that NiMCs impact grows with scale resulting in up to 3X performance degradation atmore » scales as small as 8K processes even in applications that previously have been shown to be performance resilient in the presence of noise. In addition, this work examines the problem of predicting NiMC's impact on applications by leveraging machine learning and easily accessible performance counters. This approach provides additional insights about the root cause of NiMC and facilitates dynamic selection of potential solutions. Finally, we evaluated three potential techniques to reduce NiMCs impact, namely hardware offloading, core reservation and network throttling.« less
Context-aware distributed cloud computing using CloudScheduler
NASA Astrophysics Data System (ADS)
Seuster, R.; Leavett-Brown, CR; Casteels, K.; Driemel, C.; Paterson, M.; Ring, D.; Sobie, RJ; Taylor, RP; Weldon, J.
2017-10-01
The distributed cloud using the CloudScheduler VM provisioning service is one of the longest running systems for HEP workloads. It has run millions of jobs for ATLAS and Belle II over the past few years using private and commercial clouds around the world. Our goal is to scale the distributed cloud to the 10,000-core level, with the ability to run any type of application (low I/O, high I/O and high memory) on any cloud. To achieve this goal, we have been implementing changes that utilize context-aware computing designs that are currently employed in the mobile communication industry. Context-awareness makes use of real-time and archived data to respond to user or system requirements. In our distributed cloud, we have many opportunistic clouds with no local HEP services, software or storage repositories. A context-aware design significantly improves the reliability and performance of our system by locating the nearest location of the required services. We describe how we are collecting and managing contextual information from our workload management systems, the clouds, the virtual machines and our services. This information is used not only to monitor the system but also to carry out automated corrective actions. We are incrementally adding new alerting and response services to our distributed cloud. This will enable us to scale the number of clouds and virtual machines. Further, a context-aware design will enable us to run analysis or high I/O application on opportunistic clouds. We envisage an open-source HTTP data federation (for example, the DynaFed system at CERN) as a service that would provide us access to existing storage elements used by the HEP experiments.
Exascale Hardware Architectures Working Group
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hemmert, S; Ang, J; Chiang, P
2011-03-15
The ASC Exascale Hardware Architecture working group is challenged to provide input on the following areas impacting the future use and usability of potential exascale computer systems: processor, memory, and interconnect architectures, as well as the power and resilience of these systems. Going forward, there are many challenging issues that will need to be addressed. First, power constraints in processor technologies will lead to steady increases in parallelism within a socket. Additionally, all cores may not be fully independent nor fully general purpose. Second, there is a clear trend toward less balanced machines, in terms of compute capability compared tomore » memory and interconnect performance. In order to mitigate the memory issues, memory technologies will introduce 3D stacking, eventually moving on-socket and likely on-die, providing greatly increased bandwidth but unfortunately also likely providing smaller memory capacity per core. Off-socket memory, possibly in the form of non-volatile memory, will create a complex memory hierarchy. Third, communication energy will dominate the energy required to compute, such that interconnect power and bandwidth will have a significant impact. All of the above changes are driven by the need for greatly increased energy efficiency, as current technology will prove unsuitable for exascale, due to unsustainable power requirements of such a system. These changes will have the most significant impact on programming models and algorithms, but they will be felt across all layers of the machine. There is clear need to engage all ASC working groups in planning for how to deal with technological changes of this magnitude. The primary function of the Hardware Architecture Working Group is to facilitate codesign with hardware vendors to ensure future exascale platforms are capable of efficiently supporting the ASC applications, which in turn need to meet the mission needs of the NNSA Stockpile Stewardship Program. This issue is relatively immediate, as there is only a small window of opportunity to influence hardware design for 2018 machines. Given the short timeline a firm co-design methodology with vendors is of prime importance.« less
[Research on infrared safety protection system for machine tool].
Zhang, Shuan-Ji; Zhang, Zhi-Ling; Yan, Hui-Ying; Wang, Song-De
2008-04-01
In order to ensure personal safety and prevent injury accident in machine tool operation, an infrared machine tool safety system was designed with infrared transmitting-receiving module, memory self-locked relay and voice recording-playing module. When the operator does not enter the danger area, the system has no response. Once the operator's whole or part of body enters the danger area and shades the infrared beam, the system will alarm and output an control signal to the machine tool executive element, and at the same time, the system makes the machine tool emergency stop to prevent equipment damaged and person injured. The system has a module framework, and has many advantages including safety, reliability, common use, circuit simplicity, maintenance convenience, low power consumption, low costs, working stability, easy debugging, vibration resistance and interference resistance. It is suitable for being installed and used in different machine tools such as punch machine, pour plastic machine, digital control machine, armor plate cutting machine, pipe bending machine, oil pressure machine etc.
The Tera Multithreaded Architecture and Unstructured Meshes
NASA Technical Reports Server (NTRS)
Bokhari, Shahid H.; Mavriplis, Dimitri J.
1998-01-01
The Tera Multithreaded Architecture (MTA) is a new parallel supercomputer currently being installed at San Diego Supercomputing Center (SDSC). This machine has an architecture quite different from contemporary parallel machines. The computational processor is a custom design and the machine uses hardware to support very fine grained multithreading. The main memory is shared, hardware randomized and flat. These features make the machine highly suited to the execution of unstructured mesh problems, which are difficult to parallelize on other architectures. We report the results of a study carried out during July-August 1998 to evaluate the execution of EUL3D, a code that solves the Euler equations on an unstructured mesh, on the 2 processor Tera MTA at SDSC. Our investigation shows that parallelization of an unstructured code is extremely easy on the Tera. We were able to get an existing parallel code (designed for a shared memory machine), running on the Tera by changing only the compiler directives. Furthermore, a serial version of this code was compiled to run in parallel on the Tera by judicious use of directives to invoke the "full/empty" tag bits of the machine to obtain synchronization. This version achieves 212 and 406 Mflop/s on one and two processors respectively, and requires no attention to partitioning or placement of data issues that would be of paramount importance in other parallel architectures.
Parallel Simulation of Unsteady Turbulent Flames
NASA Technical Reports Server (NTRS)
Menon, Suresh
1996-01-01
Time-accurate simulation of turbulent flames in high Reynolds number flows is a challenging task since both fluid dynamics and combustion must be modeled accurately. To numerically simulate this phenomenon, very large computer resources (both time and memory) are required. Although current vector supercomputers are capable of providing adequate resources for simulations of this nature, the high cost and their limited availability, makes practical use of such machines less than satisfactory. At the same time, the explicit time integration algorithms used in unsteady flow simulations often possess a very high degree of parallelism, making them very amenable to efficient implementation on large-scale parallel computers. Under these circumstances, distributed memory parallel computers offer an excellent near-term solution for greatly increased computational speed and memory, at a cost that may render the unsteady simulations of the type discussed above more feasible and affordable.This paper discusses the study of unsteady turbulent flames using a simulation algorithm that is capable of retaining high parallel efficiency on distributed memory parallel architectures. Numerical studies are carried out using large-eddy simulation (LES). In LES, the scales larger than the grid are computed using a time- and space-accurate scheme, while the unresolved small scales are modeled using eddy viscosity based subgrid models. This is acceptable for the moment/energy closure since the small scales primarily provide a dissipative mechanism for the energy transferred from the large scales. However, for combustion to occur, the species must first undergo mixing at the small scales and then come into molecular contact. Therefore, global models cannot be used. Recently, a new model for turbulent combustion was developed, in which the combustion is modeled, within the subgrid (small-scales) using a methodology that simulates the mixing and the molecular transport and the chemical kinetics within each LES grid cell. Finite-rate kinetics can be included without any closure and this approach actually provides a means to predict the turbulent rates and the turbulent flame speed. The subgrid combustion model requires resolution of the local time scales associated with small-scale mixing, molecular diffusion and chemical kinetics and, therefore, within each grid cell, a significant amount of computations must be carried out before the large-scale (LES resolved) effects are incorporated. Therefore, this approach is uniquely suited for parallel processing and has been implemented on various systems such as: Intel Paragon, IBM SP-2, Cray T3D and SGI Power Challenge (PC) using the system independent Message Passing Interface (MPI) compiler. In this paper, timing data on these machines is reported along with some characteristic results.
Load Balancing Unstructured Adaptive Grids for CFD Problems
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Oliker, Leonid
1996-01-01
Mesh adaption is a powerful tool for efficient unstructured-grid computations but causes load imbalance among processors on a parallel machine. A dynamic load balancing method is presented that balances the workload across all processors with a global view. After each parallel tetrahedral mesh adaption, the method first determines if the new mesh is sufficiently unbalanced to warrant a repartitioning. If so, the adapted mesh is repartitioned, with new partitions assigned to processors so that the redistribution cost is minimized. The new partitions are accepted only if the remapping cost is compensated by the improved load balance. Results indicate that this strategy is effective for large-scale scientific computations on distributed-memory multiprocessors.
Automatic Data Traffic Control on DSM Architecture
NASA Technical Reports Server (NTRS)
Frumkin, Michael; Jin, Hao-Qiang; Yan, Jerry; Kwak, Dochan (Technical Monitor)
2000-01-01
We study data traffic on distributed shared memory machines and conclude that data placement and grouping improve performance of scientific codes. We present several methods which user can employ to improve data traffic in his code. We report on implementation of a tool which detects the code fragments causing data congestions and advises user on improvements of data routing in these fragments. The capabilities of the tool include deduction of data alignment and affinity from the source code; detection of the code constructs having abnormally high cache or TLB misses; generation of data placement constructs. We demonstrate the capabilities of the tool on experiments with NAS parallel benchmarks and with a simple computational fluid dynamics application ARC3D.
NASA Technical Reports Server (NTRS)
Soltis, Steven R.; Ruwart, Thomas M.; OKeefe, Matthew T.
1996-01-01
The global file system (GFS) is a prototype design for a distributed file system in which cluster nodes physically share storage devices connected via a network-like fiber channel. Networks and network-attached storage devices have advanced to a level of performance and extensibility so that the previous disadvantages of shared disk architectures are no longer valid. This shared storage architecture attempts to exploit the sophistication of storage device technologies whereas a server architecture diminishes a device's role to that of a simple component. GFS distributes the file system responsibilities across processing nodes, storage across the devices, and file system resources across the entire storage pool. GFS caches data on the storage devices instead of the main memories of the machines. Consistency is established by using a locking mechanism maintained by the storage devices to facilitate atomic read-modify-write operations. The locking mechanism is being prototyped in the Silicon Graphics IRIX operating system and is accessed using standard Unix commands and modules.
Scalable parallel distance field construction for large-scale applications
Yu, Hongfeng; Xie, Jinrong; Ma, Kwan -Liu; ...
2015-10-01
Computing distance fields is fundamental to many scientific and engineering applications. Distance fields can be used to direct analysis and reduce data. In this paper, we present a highly scalable method for computing 3D distance fields on massively parallel distributed-memory machines. Anew distributed spatial data structure, named parallel distance tree, is introduced to manage the level sets of data and facilitate surface tracking overtime, resulting in significantly reduced computation and communication costs for calculating the distance to the surface of interest from any spatial locations. Our method supports several data types and distance metrics from real-world applications. We demonstrate itsmore » efficiency and scalability on state-of-the-art supercomputers using both large-scale volume datasets and surface models. We also demonstrate in-situ distance field computation on dynamic turbulent flame surfaces for a petascale combustion simulation. In conclusion, our work greatly extends the usability of distance fields for demanding applications.« less
Scalable Parallel Distance Field Construction for Large-Scale Applications.
Yu, Hongfeng; Xie, Jinrong; Ma, Kwan-Liu; Kolla, Hemanth; Chen, Jacqueline H
2015-10-01
Computing distance fields is fundamental to many scientific and engineering applications. Distance fields can be used to direct analysis and reduce data. In this paper, we present a highly scalable method for computing 3D distance fields on massively parallel distributed-memory machines. A new distributed spatial data structure, named parallel distance tree, is introduced to manage the level sets of data and facilitate surface tracking over time, resulting in significantly reduced computation and communication costs for calculating the distance to the surface of interest from any spatial locations. Our method supports several data types and distance metrics from real-world applications. We demonstrate its efficiency and scalability on state-of-the-art supercomputers using both large-scale volume datasets and surface models. We also demonstrate in-situ distance field computation on dynamic turbulent flame surfaces for a petascale combustion simulation. Our work greatly extends the usability of distance fields for demanding applications.
3D hierarchical spatial representation and memory of multimodal sensory data
NASA Astrophysics Data System (ADS)
Khosla, Deepak; Dow, Paul A.; Huber, David J.
2009-04-01
This paper describes an efficient method and system for representing, processing and understanding multi-modal sensory data. More specifically, it describes a computational method and system for how to process and remember multiple locations in multimodal sensory space (e.g., visual, auditory, somatosensory, etc.). The multimodal representation and memory is based on a biologically-inspired hierarchy of spatial representations implemented with novel analogues of real representations used in the human brain. The novelty of the work is in the computationally efficient and robust spatial representation of 3D locations in multimodal sensory space as well as an associated working memory for storage and recall of these representations at the desired level for goal-oriented action. We describe (1) A simple and efficient method for human-like hierarchical spatial representations of sensory data and how to associate, integrate and convert between these representations (head-centered coordinate system, body-centered coordinate, etc.); (2) a robust method for training and learning a mapping of points in multimodal sensory space (e.g., camera-visible object positions, location of auditory sources, etc.) to the above hierarchical spatial representations; and (3) a specification and implementation of a hierarchical spatial working memory based on the above for storage and recall at the desired level for goal-oriented action(s). This work is most useful for any machine or human-machine application that requires processing of multimodal sensory inputs, making sense of it from a spatial perspective (e.g., where is the sensory information coming from with respect to the machine and its parts) and then taking some goal-oriented action based on this spatial understanding. A multi-level spatial representation hierarchy means that heterogeneous sensory inputs (e.g., visual, auditory, somatosensory, etc.) can map onto the hierarchy at different levels. When controlling various machine/robot degrees of freedom, the desired movements and action can be computed from these different levels in the hierarchy. The most basic embodiment of this machine could be a pan-tilt camera system, an array of microphones, a machine with arm/hand like structure or/and a robot with some or all of the above capabilities. We describe the approach, system and present preliminary results on a real-robotic platform.
Second Report of the Multirate Processor (MRP) for Digital Voice Communications.
1982-09-30
machine are: * two arithmetic logic units (ALUs)-one for data processing, and the other for address generation, * two memorys -6144 words (70 bits per word...of program memory , and 6094 words (16 bits per word) of data memory , q * input/output through modem and teletype, -15 .9 S-;. KANG AND FRANSEN Table...provides a measure of intelligibility and allows one to evaluate the discriminability of six distinctive features: voicing, nasality, sustention
Protein remote homology detection based on bidirectional long short-term memory.
Li, Shumin; Chen, Junjie; Liu, Bin
2017-10-10
Protein remote homology detection plays a vital role in studies of protein structures and functions. Almost all of the traditional machine leaning methods require fixed length features to represent the protein sequences. However, it is never an easy task to extract the discriminative features with limited knowledge of proteins. On the other hand, deep learning technique has demonstrated its advantage in automatically learning representations. It is worthwhile to explore the applications of deep learning techniques to the protein remote homology detection. In this study, we employ the Bidirectional Long Short-Term Memory (BLSTM) to learn effective features from pseudo proteins, also propose a predictor called ProDec-BLSTM: it includes input layer, bidirectional LSTM, time distributed dense layer and output layer. This neural network can automatically extract the discriminative features by using bidirectional LSTM and the time distributed dense layer. Experimental results on a widely-used benchmark dataset show that ProDec-BLSTM outperforms other related methods in terms of both the mean ROC and mean ROC50 scores. This promising result shows that ProDec-BLSTM is a useful tool for protein remote homology detection. Furthermore, the hidden patterns learnt by ProDec-BLSTM can be interpreted and visualized, and therefore, additional useful information can be obtained.
Scemama, Anthony; Caffarel, Michel; Oseret, Emmanuel; Jalby, William
2013-04-30
Various strategies to implement efficiently quantum Monte Carlo (QMC) simulations for large chemical systems are presented. These include: (i) the introduction of an efficient algorithm to calculate the computationally expensive Slater matrices. This novel scheme is based on the use of the highly localized character of atomic Gaussian basis functions (not the molecular orbitals as usually done), (ii) the possibility of keeping the memory footprint minimal, (iii) the important enhancement of single-core performance when efficient optimization tools are used, and (iv) the definition of a universal, dynamic, fault-tolerant, and load-balanced framework adapted to all kinds of computational platforms (massively parallel machines, clusters, or distributed grids). These strategies have been implemented in the QMC=Chem code developed at Toulouse and illustrated with numerical applications on small peptides of increasing sizes (158, 434, 1056, and 1731 electrons). Using 10-80 k computing cores of the Curie machine (GENCI-TGCC-CEA, France), QMC=Chem has been shown to be capable of running at the petascale level, thus demonstrating that for this machine a large part of the peak performance can be achieved. Implementation of large-scale QMC simulations for future exascale platforms with a comparable level of efficiency is expected to be feasible. Copyright © 2013 Wiley Periodicals, Inc.
Analytical design of intelligent machines
NASA Technical Reports Server (NTRS)
Saridis, George N.; Valavanis, Kimon P.
1987-01-01
The problem of designing 'intelligent machines' to operate in uncertain environments with minimum supervision or interaction with a human operator is examined. The structure of an 'intelligent machine' is defined to be the structure of a Hierarchically Intelligent Control System, composed of three levels hierarchically ordered according to the principle of 'increasing precision with decreasing intelligence', namely: the organizational level, performing general information processing tasks in association with a long-term memory; the coordination level, dealing with specific information processing tasks with a short-term memory; and the control level, which performs the execution of various tasks through hardware using feedback control methods. The behavior of such a machine may be managed by controls with special considerations and its 'intelligence' is directly related to the derivation of a compatible measure that associates the intelligence of the higher levels with the concept of entropy, which is a sufficient analytic measure that unifies the treatment of all the levels of an 'intelligent machine' as the mathematical problem of finding the right sequence of internal decisions and controls for a system structured in the order of intelligence and inverse order of precision such that it minimizes its total entropy. A case study on the automatic maintenance of a nuclear plant illustrates the proposed approach.
The Linux operating system: An introduction
NASA Technical Reports Server (NTRS)
Bokhari, Shahid H.
1995-01-01
Linux is a Unix-like operating system for Intel 386/486/Pentium based IBM-PCs and compatibles. The kernel of this operating system was written from scratch by Linus Torvalds and, although copyrighted by the author, may be freely distributed. A world-wide group has collaborated in developing Linux on the Internet. Linux can run the powerful set of compilers and programming tools of the Free Software Foundation, and XFree86, a port of the X Window System from MIT. Most capabilities associated with high performance workstations, such as networking, shared file systems, electronic mail, TeX, LaTeX, etc. are freely available for Linux. It can thus transform cheap IBM-PC compatible machines into Unix workstations with considerable capabilities. The author explains how Linux may be obtained, installed and networked. He also describes some interesting applications for Linux that are freely available. The enormous consumer market for IBM-PC compatible machines continually drives down prices of CPU chips, memory, hard disks, CDROMs, etc. Linux can convert such machines into powerful workstations that can be used for teaching, research and software development. For professionals who use Unix based workstations at work, Linux permits virtually identical working environments on their personal home machines. For cost conscious educational institutions Linux can create world-class computing environments from cheap, easily maintained, PC clones. Finally, for university students, it provides an essentially cost-free path away from DOS into the world of Unix and X Windows.
34 CFR 395.8 - Distribution and use of income from vending machines on Federal property.
Code of Federal Regulations, 2012 CFR
2012-07-01
... 34 Education 2 2012-07-01 2012-07-01 false Distribution and use of income from vending machines on... use of income from vending machines on Federal property. (a) Vending machine income from vending machines on Federal property which has been disbursed to the State licensing agency by a property managing...
34 CFR 395.8 - Distribution and use of income from vending machines on Federal property.
Code of Federal Regulations, 2013 CFR
2013-07-01
... 34 Education 2 2013-07-01 2013-07-01 false Distribution and use of income from vending machines on... use of income from vending machines on Federal property. (a) Vending machine income from vending machines on Federal property which has been disbursed to the State licensing agency by a property managing...
34 CFR 395.8 - Distribution and use of income from vending machines on Federal property.
Code of Federal Regulations, 2014 CFR
2014-07-01
... 34 Education 2 2014-07-01 2013-07-01 true Distribution and use of income from vending machines on... use of income from vending machines on Federal property. (a) Vending machine income from vending machines on Federal property which has been disbursed to the State licensing agency by a property managing...
A translator and simulator for the Burroughs D machine
NASA Technical Reports Server (NTRS)
Roberts, J.
1972-01-01
The D Machine is described as a small user microprogrammable computer designed to be a versatile building block for such diverse functions as: disk file controllers, I/O controllers, and emulators. TRANSLANG is an ALGOL-like language, which allows D Machine users to write microprograms in an English-like format as opposed to creating binary bit pattern maps. The TRANSLANG translator parses TRANSLANG programs into D Machine microinstruction bit patterns which can be executed on the D Machine simulator. In addition to simulation and translation, the two programs also offer several debugging tools, such as: a full set of diagnostic error messages, register dumps, simulated memory dumps, traces on instructions and groups of instructions, and breakpoints.
The 2008 Battle of Sadr City: Reimagining Urban Combat
2013-01-01
so intense that some of the Strykers ran out of ammunition for the .50 caliber machine guns in their remote weapons system. Adding to the confusion...review. Out of all the day’s unusual violence, the one thing that stood out in their memories was the lone gunman with his RPK machine gun . Collings’s...platoon suppressed the enemy effectively using the .50 caliber machine guns in the remote weapons systems. They also called in AH-64 Apache attack
NASA Astrophysics Data System (ADS)
Bisaria, Himanshu; Shandilya, Pragya
2018-03-01
Nowadays NiTi SMAs are gaining more prominence due to their unique properties such as superelasticity, shape memory effect, high fatigue strength and many other enriched physical and mechanical properties. The current studies explore the effect of machining parameters namely, peak current (Ip), pulse off time (TOFF), and pulse on time (TON) on wire wear ratio (WWR), and dimensional deviation (DD) in WEDM. It was found that high discharge energy was mainly ascribed to high WWR and DD. The WWR and DD increased with the increase in pulse on time and peak current whereas high pulse off time was favourable for low WWR and DD.
A sparse matrix algorithm on the Boolean vector machine
NASA Technical Reports Server (NTRS)
Wagner, Robert A.; Patrick, Merrell L.
1988-01-01
VLSI technology is being used to implement a prototype Boolean Vector Machine (BVM), which is a large network of very small processors with equally small memories that operate in SIMD mode; these use bit-serial arithmetic, and communicate via cube-connected cycles network. The BVM's bit-serial arithmetic and the small memories of individual processors are noted to compromise the system's effectiveness in large numerical problem applications. Attention is presently given to the implementation of a basic matrix-vector iteration algorithm for space matrices of the BVM, in order to generate over 1 billion useful floating-point operations/sec for this iteration algorithm. The algorithm is expressed in a novel language designated 'BVM'.
NASA Astrophysics Data System (ADS)
Frotscher, M.; Kahleyss, F.; Simon, T.; Biermann, D.; Eggeler, G.
2011-07-01
NiTi shape memory alloys (SMA) are used for a variety of applications including medical implants and tools as well as actuators, making use of their unique properties. However, due to the hardness and strength, in combination with the high elasticity of the material, the machining of components can be challenging. The most common machining techniques used today are laser cutting and electrical discharge machining (EDM). In this study, we report on the machining of small structures into binary NiTi sheets, applying alternative processing methods being well-established for other metallic materials. Our results indicate that water jet machining and micro milling can be used to machine delicate structures, even in very thin NiTi sheets. Further work is required to optimize the cut quality and the machining speed in order to increase the cost-effectiveness and to make both methods more competitive.
Generalized memory associativity in a network model for the neuroses
NASA Astrophysics Data System (ADS)
Wedemann, Roseli S.; Donangelo, Raul; de Carvalho, Luís A. V.
2009-03-01
We review concepts introduced in earlier work, where a neural network mechanism describes some mental processes in neurotic pathology and psychoanalytic working-through, as associative memory functioning, according to the findings of Freud. We developed a complex network model, where modules corresponding to sensorial and symbolic memories interact, representing unconscious and conscious mental processes. The model illustrates Freud's idea that consciousness is related to symbolic and linguistic memory activity in the brain. We have introduced a generalization of the Boltzmann machine to model memory associativity. Model behavior is illustrated with simulations and some of its properties are analyzed with methods from statistical mechanics.
Packaging of a large capacity magnetic bubble domain spacecraft recorder
NASA Technical Reports Server (NTRS)
Becker, F. J.; Stermer, R. L.
1977-01-01
A Solid State Spacecraft Data Recorder (SSDR), based on bubble domain technology, having a storage capacity of 10 to the 8th power bits, was designed and is being tested. The recorder consists of two memory modules each having 32 cells, each cell containing sixteen 100 kilobit serial bubble memory chips. The memory modules are interconnected to a Drive and Control Unit (DCU) module containing four microprocessors, 500 integrated circuits, a RAM core memory and two PROM's. The two memory modules and DCU are housed in individual machined aluminum frames, are stacked in brick fashion and through bolted to a base plate assembly which also houses the power supply.
Air Bearings Machined On Ultra Precision, Hydrostatic CNC-Lathe
NASA Astrophysics Data System (ADS)
Knol, Pierre H.; Szepesi, Denis; Deurwaarder, Jan M.
1987-01-01
Micromachining of precision elements requires an adequate machine concept to meet the high demand of surface finish, dimensional and shape accuracy. The Hembrug ultra precision lathes have been exclusively designed with hydrostatic principles for main spindle and guideways. This concept is to be explained with some major advantages of hydrostatics compared with aerostatics at universal micromachining applications. Hembrug has originally developed the conventional Mikroturn ultra precision facing lathes, for diamond turning of computer memory discs. This first generation of machines was followed by the advanced computer numerically controlled types for machining of complex precision workpieces. One of these parts, an aerostatic bearing component has been succesfully machined on the Super-Mikroturn CNC. A case study of airbearing machining confirms the statement that a good result of the micromachining does not depend on machine performance alone, but also on the technology applied.
Efficient checkpointing schemes for depletion perturbation solutions on memory-limited architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stripling, H. F.; Adams, M. L.; Hawkins, W. D.
2013-07-01
We describe a methodology for decreasing the memory footprint and machine I/O load associated with the need to access a forward solution during an adjoint solve. Specifically, we are interested in the depletion perturbation equations, where terms in the adjoint Bateman and transport equations depend on the forward flux solution. Checkpointing is the procedure of storing snapshots of the forward solution to disk and using these snapshots to recompute the parts of the forward solution that are necessary for the adjoint solve. For large problems, however, the storage cost of just a few copies of an angular flux vector canmore » exceed the available RAM on the host machine. We propose a methodology that does not checkpoint the angular flux vector; instead, we write and store converged source moments, which are typically of a much lower dimension than the angular flux solution. This reduces the memory footprint and I/O load of the problem, but requires that we perform single sweeps to reconstruct flux vectors on demand. We argue that this trade-off is exactly the kind of algorithm that will scale on advanced, memory-limited architectures. We analyze the cost, in terms of FLOPS and memory footprint, of five checkpointing schemes. We also provide computational results that support the analysis and show that the memory-for-work trade off does improve time to solution. (authors)« less
Color line scan camera technology and machine vision: requirements to consider
NASA Astrophysics Data System (ADS)
Paernaenen, Pekka H. T.
1997-08-01
Color machine vision has shown a dynamic uptrend in use within the past few years as the introduction of new cameras and scanner technologies itself underscores. In the future, the movement from monochrome imaging to color will hasten, as machine vision system users demand more knowledge about their product stream. As color has come to the machine vision, certain requirements for the equipment used to digitize color images are needed. Color machine vision needs not only a good color separation but also a high dynamic range and a good linear response from the camera used. Good dynamic range and linear response is necessary for color machine vision. The importance of these features becomes even more important when the image is converted to another color space. There is always lost some information when converting integer data to another form. Traditionally the color image processing has been much slower technique than the gray level image processing due to the three times greater data amount per image. The same has applied for the three times more memory needed. The advancements in computers, memory and processing units has made it possible to handle even large color images today cost efficiently. In some cases he image analysis in color images can in fact even be easier and faster than with a similar gray level image because of more information per pixel. Color machine vision sets new requirements for lighting, too. High intensity and white color light is required in order to acquire good images for further image processing or analysis. New development in lighting technology is bringing eventually solutions for color imaging.
Performance Analysis of the NAS Y-MP Workload
NASA Technical Reports Server (NTRS)
Bergeron, Robert J.; Kutler, Paul (Technical Monitor)
1997-01-01
This paper describes the performance characteristics of the computational workloads on the NAS Cray Y-MP machines, a Y-MP 832 and later a Y-MP 8128. Hardware measurements indicated that the Y-MP workload performance matured over time, ultimately sustaining an average throughput of 0.8 GFLOPS and a vector operation fraction of 87%. The measurements also revealed an operation rate exceeding 1 per clock period, a well-balanced architecture featuring a strong utilization of vector functional units, and an efficient memory organization. Introduction of the larger memory 8128 increased throughput by allowing a more efficient utilization of CPUs. Throughput also depended on the metering of the batch queues; low-idle Saturday workloads required a buffer of small jobs to prevent memory starvation of the CPU. UNICOS required about 7% of total CPU time to service the 832 workloads; this overhead decreased to 5% for the 8128 workloads. While most of the system time went to service I/O requests, efficient scheduling prevented excessive idle due to I/O wait. System measurements disclosed no obvious bottlenecks in the response of the machine and UNICOS to the workloads. In most cases, Cray-provided software tools were- quite sufficient for measuring the performance of both the machine and operating, system.
On the relationship between parallel computation and graph embedding
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gupta, A.K.
1989-01-01
The problem of efficiently simulating an algorithm designed for an n-processor parallel machine G on an m-processor parallel machine H with n > m arises when parallel algorithms designed for an ideal size machine are simulated on existing machines which are of a fixed size. The author studies this problem when every processor of H takes over the function of a number of processors in G, and he phrases the simulation problem as a graph embedding problem. New embeddings presented address relevant issues arising from the parallel computation environment. The main focus centers around embedding complete binary trees into smaller-sizedmore » binary trees, butterflies, and hypercubes. He also considers simultaneous embeddings of r source machines into a single hypercube. Constant factors play a crucial role in his embeddings since they are not only important in practice but also lead to interesting theoretical problems. All of his embeddings minimize dilation and load, which are the conventional cost measures in graph embeddings and determine the maximum amount of time required to simulate one step of G on H. His embeddings also optimize a new cost measure called ({alpha},{beta})-utilization which characterizes how evenly the processors of H are used by the processors of G. Ideally, the utilization should be balanced (i.e., every processor of H simulates at most (n/m) processors of G) and the ({alpha},{beta})-utilization measures how far off from a balanced utilization the embedding is. He presents embeddings for the situation when some processors of G have different capabilities (e.g. memory or I/O) than others and the processors with different capabilities are to be distributed uniformly among the processors of H. Placing such conditions on an embedding results in an increase in some of the cost measures.« less
Supporting 64-bit global indices in Epetra and other Trilinos packages :
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jhurani, Chetan; Austin, Travis M.; Heroux, Michael Allen
The Trilinos Project is an effort to facilitate the design, development, integration and ongoing support of mathematical software libraries within an object-oriented framework. It is intended for large-scale, complex multiphysics engineering and scientific applications [2, 4, 3]. Epetra is one of its basic packages. It provides serial and parallel linear algebra capabilities. Before Trilinos version 11.0, released in 2012, Epetra used the C++ int data-type for storing global and local indices for degrees of freedom (DOFs). Since int is typically 32-bit, this limited the largest problem size to be smaller than approximately two billion DOFs. This was true even ifmore » a distributed memory machine could handle larger problems. We have added optional support for C++ long long data-type, which is at least 64-bit wide, for global indices. To save memory, maintain the speed of memory-bound operations, and reduce further changes to the code, the local indices are still 32-bit. We document the changes required to achieve this feature and how the new functionality can be used. We also report on the lessons learned in modifying a mature and popular package from various perspectives design goals, backward compatibility, engineering decisions, C++ language features, effects on existing users and other packages, and build integration.« less
An Investigation into the Economics of Retrospective Conversion Using a CD-ROM System.
ERIC Educational Resources Information Center
Co, Francisca K.
This study compares the cost effectiveness of using a CD-ROM (compact disk read-only memory) system known as Bibliofile and the currently used OCLC (Online Computer Library Center)-based method to convert a university library's shelflist into a machine-readable database in the MARC (Machine-Readable Cataloging) format. The cost of each method of…
Machine vision for real time orbital operations
NASA Technical Reports Server (NTRS)
Vinz, Frank L.
1988-01-01
Machine vision for automation and robotic operation of Space Station era systems has the potential for increasing the efficiency of orbital servicing, repair, assembly and docking tasks. A machine vision research project is described in which a TV camera is used for inputing visual data to a computer so that image processing may be achieved for real time control of these orbital operations. A technique has resulted from this research which reduces computer memory requirements and greatly increases typical computational speed such that it has the potential for development into a real time orbital machine vision system. This technique is called AI BOSS (Analysis of Images by Box Scan and Syntax).
Munsell, B C; Wu, G; Fridriksson, J; Thayer, K; Mofrad, N; Desisto, N; Shen, D; Bonilha, L
2017-09-09
Impaired confrontation naming is a common symptom of temporal lobe epilepsy (TLE). The neurobiological mechanisms underlying this impairment are poorly understood but may indicate a structural disorganization of broadly distributed neuronal networks that support naming ability. Importantly, naming is frequently impaired in other neurological disorders and by contrasting the neuronal structures supporting naming in TLE with other diseases, it will become possible to elucidate the common systems supporting naming. We aimed to evaluate the neuronal networks that support naming in TLE by using a machine learning algorithm intended to predict naming performance in subjects with medication refractory TLE using only the structural brain connectome reconstructed from diffusion tensor imaging. A connectome-based prediction framework was developed using network properties from anatomically defined brain regions across the entire brain, which were used in a multi-task machine learning algorithm followed by support vector regression. Nodal eigenvector centrality, a measure of regional network integration, predicted approximately 60% of the variance in naming. The nodes with the highest regression weight were bilaterally distributed among perilimbic sub-networks involving mainly the medial and lateral temporal lobe regions. In the context of emerging evidence regarding the role of large structural networks that support language processing, our results suggest intact naming relies on the integration of sub-networks, as opposed to being dependent on isolated brain areas. In the case of TLE, these sub-networks may be disproportionately indicative naming processes that are dependent semantic integration from memory and lexical retrieval, as opposed to multi-modal perception or motor speech production. Copyright © 2017. Published by Elsevier Inc.
Performance Measurement, Visualization and Modeling of Parallel and Distributed Programs
NASA Technical Reports Server (NTRS)
Yan, Jerry C.; Sarukkai, Sekhar R.; Mehra, Pankaj; Lum, Henry, Jr. (Technical Monitor)
1994-01-01
This paper presents a methodology for debugging the performance of message-passing programs on both tightly coupled and loosely coupled distributed-memory machines. The AIMS (Automated Instrumentation and Monitoring System) toolkit, a suite of software tools for measurement and analysis of performance, is introduced and its application illustrated using several benchmark programs drawn from the field of computational fluid dynamics. AIMS includes (i) Xinstrument, a powerful source-code instrumentor, which supports both Fortran77 and C as well as a number of different message-passing libraries including Intel's NX Thinking Machines' CMMD, and PVM; (ii) Monitor, a library of timestamping and trace -collection routines that run on supercomputers (such as Intel's iPSC/860, Delta, and Paragon and Thinking Machines' CM5) as well as on networks of workstations (including Convex Cluster and SparcStations connected by a LAN); (iii) Visualization Kernel, a trace-animation facility that supports source-code clickback, simultaneous visualization of computation and communication patterns, as well as analysis of data movements; (iv) Statistics Kernel, an advanced profiling facility, that associates a variety of performance data with various syntactic components of a parallel program; (v) Index Kernel, a diagnostic tool that helps pinpoint performance bottlenecks through the use of abstract indices; (vi) Modeling Kernel, a facility for automated modeling of message-passing programs that supports both simulation -based and analytical approaches to performance prediction and scalability analysis; (vii) Intrusion Compensator, a utility for recovering true performance from observed performance by removing the overheads of monitoring and their effects on the communication pattern of the program; and (viii) Compatibility Tools, that convert AIMS-generated traces into formats used by other performance-visualization tools, such as ParaGraph, Pablo, and certain AVS/Explorer modules.
Hardware enabled performance counters with support for operating system context switching
Salapura, Valentina; Wisniewski, Robert W.
2015-06-30
A device for supporting hardware enabled performance counters with support for context switching include a plurality of performance counters operable to collect information associated with one or more computer system related activities, a first register operable to store a memory address, a second register operable to store a mode indication, and a state machine operable to read the second register and cause the plurality of performance counters to copy the information to memory area indicated by the memory address based on the mode indication.
NASA Technical Reports Server (NTRS)
Kanerva, P.
1986-01-01
To determine the relation of the sparse, distributed memory to other architectures, a broad review of the literature was made. The memory is called a pattern memory because they work with large patterns of features (high-dimensional vectors). A pattern is stored in a pattern memory by distributing it over a large number of storage elements and by superimposing it over other stored patterns. A pattern is retrieved by mathematical or statistical reconstruction from the distributed elements. Three pattern memories are discussed.
New laser machining processes for shape memory alloys
NASA Astrophysics Data System (ADS)
Haferkamp, Heinz; Paschko, Stefan; Goede, Martin
2001-04-01
Due to special material properties, shape memory alloys (SMA) are finding increasing attention in micro system technology. However, only a few processes are available for the machining of miniaturized SMA-components. In this connection, laser material processing offers completely new possibilities. This paper describes the actual status of two projects that are being carried out to qualify new methods to machine SMA components by means of laser radiation. Within one project, the laser material ablation process of miniaturized SMA- components using ultra-short laser pulses (pulse duration: approx. 200 fs) in comparison to conventional laser material ablation is being investigated. Especially for SMA micro- sensors and actuators, it is important to minimize the heat affected zone (HAZ) to maintain the special mechanical properties. Light-microscopic investigations of the grain texture of SMA devices processed with ultra-short laser pulses show that the HAZ can be neglected. Presently, the main goal of the project is to qualify this new processing technique for the micro-structuring of complex SMA micro devices with high precision. Within a second project, investigations are being carried out to realize the induction of the two-way memory effect (TWME) into SMA components using laser radiation. By precisely heating SMA components with laser radiation, local tensions remain near the component surface. In connection with the shape memory effect, these tensions can be used to make the components execute complicated movements. Compared to conventional training methods to induce the TWME, this procedure is faster and easier. Furthermore, higher numbers of thermal cycling are expected because of the low dislocation density in the main part of the component.
Methods For Self-Organizing Software
Bouchard, Ann M.; Osbourn, Gordon C.
2005-10-18
A method for dynamically self-assembling and executing software is provided, containing machines that self-assemble execution sequences and data structures. In addition to ordered functions calls (found commonly in other software methods), mutual selective bonding between bonding sites of machines actuates one or more of the bonding machines. Two or more machines can be virtually isolated by a construct, called an encapsulant, containing a population of machines and potentially other encapsulants that can only bond with each other. A hierarchical software structure can be created using nested encapsulants. Multi-threading is implemented by populations of machines in different encapsulants that are interacting concurrently. Machines and encapsulants can move in and out of other encapsulants, thereby changing the functionality. Bonding between machines' sites can be deterministic or stochastic with bonding triggering a sequence of actions that can be implemented by each machine. A self-assembled execution sequence occurs as a sequence of stochastic binding between machines followed by their deterministic actuation. It is the sequence of bonding of machines that determines the execution sequence, so that the sequence of instructions need not be contiguous in memory.
A Parameter Communication Optimization Strategy for Distributed Machine Learning in Sensors.
Zhang, Jilin; Tu, Hangdi; Ren, Yongjian; Wan, Jian; Zhou, Li; Li, Mingwei; Wang, Jue; Yu, Lifeng; Zhao, Chang; Zhang, Lei
2017-09-21
In order to utilize the distributed characteristic of sensors, distributed machine learning has become the mainstream approach, but the different computing capability of sensors and network delays greatly influence the accuracy and the convergence rate of the machine learning model. Our paper describes a reasonable parameter communication optimization strategy to balance the training overhead and the communication overhead. We extend the fault tolerance of iterative-convergent machine learning algorithms and propose the Dynamic Finite Fault Tolerance (DFFT). Based on the DFFT, we implement a parameter communication optimization strategy for distributed machine learning, named Dynamic Synchronous Parallel Strategy (DSP), which uses the performance monitoring model to dynamically adjust the parameter synchronization strategy between worker nodes and the Parameter Server (PS). This strategy makes full use of the computing power of each sensor, ensures the accuracy of the machine learning model, and avoids the situation that the model training is disturbed by any tasks unrelated to the sensors.
NASA Technical Reports Server (NTRS)
Tick, Evan
1987-01-01
This note describes an efficient software emulator for the Warren Abstract Machine (WAM) Prolog architecture. The version of the WAM implemented is called Lcode. The Lcode emulator, written in C, executes the 'naive reverse' benchmark at 3900 LIPS. The emulator is one of a set of tools used to measure the memory-referencing characteristics and performance of Prolog programs. These tools include a compiler, assembler, and memory simulators. An overview of the Lcode architecture is given here, followed by a description and listing of the emulator code implementing each Lcode instruction. This note will be of special interest to those studying the WAM and its performance characteristics. In general, this note will be of interest to those creating efficient software emulators for abstract machine architectures.
A performance comparison of the IBM RS/6000 and the Astronautics ZS-1
DOE Office of Scientific and Technical Information (OSTI.GOV)
Smith, W.M.; Abraham, S.G.; Davidson, E.S.
1991-01-01
Concurrent uniprocessor architectures, of which vector and superscalar are two examples, are designed to capitalize on fine-grain parallelism. The authors have developed a performance evaluation method for comparing and improving these architectures, and in this article they present the methodology and a detailed case study of two machines. The runtime of many programs is dominated by time spent in loop constructs - for example, Fortran Do-loops. Loops generally comprise two logical processes: The access process generates addresses for memory operations while the execute process operates on floating-point data. Memory access patterns typically can be generated independently of the data inmore » the execute process. This independence allows the access process to slip ahead, thereby hiding memory latency. The IBM 360/91 was designed in 1967 to achieve slip dynamically, at runtime. One CPU unit executes integer operations while another handles floating-point operations. Other machines, including the VAX 9000 and the IBM RS/6000, use a similar approach.« less
High Order Accuracy Methods for Supersonic Reactive Flows
2008-06-25
k = 0, · · · , N and N is the polynomial order used. The positive constant M is chosen such that σ(N) becomes machine zero. Typically M ∼ 32. Table...function used in this study is the Exponential filter given by σ(η) = exp(−αηp), (38) where α = − ln() and is the machine zero. The spectral...respectively. All numerical experiments were run 49 on a 667 MHz Compaq Alpha machine with 1GB memory and with an Alpha internal floating point processor. 9.1
Research on intelligent machine self-perception method based on LSTM
NASA Astrophysics Data System (ADS)
Wang, Qiang; Cheng, Tao
2018-05-01
In this paper, we use the advantages of LSTM in feature extraction and processing high-dimensional and complex nonlinear data, and apply it to the autonomous perception of intelligent machines. Compared with the traditional multi-layer neural network, this model has memory, can handle time series information of any length. Since the multi-physical domain signals of processing machines have a certain timing relationship, and there is a contextual relationship between states and states, using this deep learning method to realize the self-perception of intelligent processing machines has strong versatility and adaptability. The experiment results show that the method proposed in this paper can obviously improve the sensing accuracy under various working conditions of the intelligent machine, and also shows that the algorithm can well support the intelligent processing machine to realize self-perception.
Sparse distributed memory overview
NASA Technical Reports Server (NTRS)
Raugh, Mike
1990-01-01
The Sparse Distributed Memory (SDM) project is investigating the theory and applications of massively parallel computing architecture, called sparse distributed memory, that will support the storage and retrieval of sensory and motor patterns characteristic of autonomous systems. The immediate objectives of the project are centered in studies of the memory itself and in the use of the memory to solve problems in speech, vision, and robotics. Investigation of methods for encoding sensory data is an important part of the research. Examples of NASA missions that may benefit from this work are Space Station, planetary rovers, and solar exploration. Sparse distributed memory offers promising technology for systems that must learn through experience and be capable of adapting to new circumstances, and for operating any large complex system requiring automatic monitoring and control. Sparse distributed memory is a massively parallel architecture motivated by efforts to understand how the human brain works. Sparse distributed memory is an associative memory, able to retrieve information from cues that only partially match patterns stored in the memory. It is able to store long temporal sequences derived from the behavior of a complex system, such as progressive records of the system's sensory data and correlated records of the system's motor controls.
Multi-variants synthesis of Petri nets for FPGA devices
NASA Astrophysics Data System (ADS)
Bukowiec, Arkadiusz; Doligalski, Michał
2015-09-01
There is presented new method of synthesis of application specific logic controllers for FPGA devices. The specification of control algorithm is made with use of control interpreted Petri net (PT type). It allows specifying parallel processes in easy way. The Petri net is decomposed into state-machine type subnets. In this case, each subnet represents one parallel process. For this purpose there are applied algorithms of coloring of Petri nets. There are presented two approaches of such decomposition: with doublers of macroplaces or with one global wait place. Next, subnets are implemented into two-level logic circuit of the controller. The levels of logic circuit are obtained as a result of its architectural decomposition. The first level combinational circuit is responsible for generation of next places and second level decoder is responsible for generation output symbols. There are worked out two variants of such circuits: with one shared operational memory or with many flexible distributed memories as a decoder. Variants of Petri net decomposition and structures of logic circuits can be combined together without any restrictions. It leads to existence of four variants of multi-variants synthesis.
Recognition techniques for extracting information from semistructured documents
NASA Astrophysics Data System (ADS)
Della Ventura, Anna; Gagliardi, Isabella; Zonta, Bruna
2000-12-01
Archives of optical documents are more and more massively employed, the demand driven also by the new norms sanctioning the legal value of digital documents, provided they are stored on supports that are physically unalterable. On the supply side there is now a vast and technologically advanced market, where optical memories have solved the problem of the duration and permanence of data at costs comparable to those for magnetic memories. The remaining bottleneck in these systems is the indexing. The indexing of documents with a variable structure, while still not completely automated, can be machine supported to a large degree with evident advantages both in the organization of the work, and in extracting information, providing data that is much more detailed and potentially significant for the user. We present here a system for the automatic registration of correspondence to and from a public office. The system is based on a general methodology for the extraction, indexing, archiving, and retrieval of significant information from semi-structured documents. This information, in our prototype application, is distributed among the database fields of sender, addressee, subject, date, and body of the document.
Summer Proceedings 2016: The Center for Computing Research at Sandia National Laboratories
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carleton, James Brian; Parks, Michael L.
Solving sparse linear systems from the discretization of elliptic partial differential equations (PDEs) is an important building block in many engineering applications. Sparse direct solvers can solve general linear systems, but are usually slower and use much more memory than effective iterative solvers. To overcome these two disadvantages, a hierarchical solver (LoRaSp) based on H2-matrices was introduced in [22]. Here, we have developed a parallel version of the algorithm in LoRaSp to solve large sparse matrices on distributed memory machines. On a single processor, the factorization time of our parallel solver scales almost linearly with the problem size for three-dimensionalmore » problems, as opposed to the quadratic scalability of many existing sparse direct solvers. Moreover, our solver leads to almost constant numbers of iterations, when used as a preconditioner for Poisson problems. On more than one processor, our algorithm has significant speedups compared to sequential runs. With this parallel algorithm, we are able to solve large problems much faster than many existing packages as demonstrated by the numerical experiments.« less
Intelligent machines in the twenty-first century: foundations of inference and inquiry.
Knuth, Kevin H
2003-12-15
The last century saw the application of Boolean algebra to the construction of computing machines, which work by applying logical transformations to information contained in their memory. The development of information theory and the generalization of Boolean algebra to Bayesian inference have enabled these computing machines, in the last quarter of the twentieth century, to be endowed with the ability to learn by making inferences from data. This revolution is just beginning as new computational techniques continue to make difficult problems more accessible. Recent advances in our understanding of the foundations of probability theory have revealed implications for areas other than logic. Of relevance to intelligent machines, we recently identified the algebra of questions as the free distributive algebra, which will now allow us to work with questions in a way analogous to that which Boolean algebra enables us to work with logical statements. In this paper, we examine the foundations of inference and inquiry. We begin with a history of inferential reasoning, highlighting key concepts that have led to the automation of inference in modern machine-learning systems. We then discuss the foundations of inference in more detail using a modern viewpoint that relies on the mathematics of partially ordered sets and the scaffolding of lattice theory. This new viewpoint allows us to develop the logic of inquiry and introduce a measure describing the relevance of a proposed question to an unresolved issue. Last, we will demonstrate the automation of inference, and discuss how this new logic of inquiry will enable intelligent machines to ask questions. Automation of both inference and inquiry promises to allow robots to perform science in the far reaches of our solar system and in other star systems by enabling them not only to make inferences from data, but also to decide which question to ask, which experiment to perform, or which measurement to take given what they have learned and what they are designed to understand.
Intelligent machines in the twenty-first century: foundations of inference and inquiry
NASA Technical Reports Server (NTRS)
Knuth, Kevin H.
2003-01-01
The last century saw the application of Boolean algebra to the construction of computing machines, which work by applying logical transformations to information contained in their memory. The development of information theory and the generalization of Boolean algebra to Bayesian inference have enabled these computing machines, in the last quarter of the twentieth century, to be endowed with the ability to learn by making inferences from data. This revolution is just beginning as new computational techniques continue to make difficult problems more accessible. Recent advances in our understanding of the foundations of probability theory have revealed implications for areas other than logic. Of relevance to intelligent machines, we recently identified the algebra of questions as the free distributive algebra, which will now allow us to work with questions in a way analogous to that which Boolean algebra enables us to work with logical statements. In this paper, we examine the foundations of inference and inquiry. We begin with a history of inferential reasoning, highlighting key concepts that have led to the automation of inference in modern machine-learning systems. We then discuss the foundations of inference in more detail using a modern viewpoint that relies on the mathematics of partially ordered sets and the scaffolding of lattice theory. This new viewpoint allows us to develop the logic of inquiry and introduce a measure describing the relevance of a proposed question to an unresolved issue. Last, we will demonstrate the automation of inference, and discuss how this new logic of inquiry will enable intelligent machines to ask questions. Automation of both inference and inquiry promises to allow robots to perform science in the far reaches of our solar system and in other star systems by enabling them not only to make inferences from data, but also to decide which question to ask, which experiment to perform, or which measurement to take given what they have learned and what they are designed to understand.
NASA Astrophysics Data System (ADS)
Kaynak, Y.; Huang, B.; Karaca, H. E.; Jawahir, I. S.
2017-07-01
This experimental study focuses on the phase state and phase transformation response of the surface and subsurface of machined NiTi alloys. X-ray diffraction (XRD) analysis and differential scanning calorimeter techniques were utilized to measure the phase state and the transformation response of machined specimens, respectively. Specimens were machined under dry machining at ambient temperature, preheated conditions, and cryogenic cooling conditions at various cutting speeds. The findings from this research demonstrate that cryogenic machining substantially alters austenite finish temperature of martensitic NiTi alloy. Austenite finish ( A f) temperature shows more than 25 percent increase resulting from cryogenic machining compared with austenite finish temperature of as-received NiTi. Dry and preheated conditions do not substantially alter austenite finish temperature. XRD analysis shows that distinctive transformation from martensite to austenite occurs during machining process in all three conditions. Complete transformation from martensite to austenite is observed in dry cutting at all selected cutting speeds.
A Pervasive Parallel Processing Framework for Data Visualization and Analysis at Extreme Scale
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moreland, Kenneth; Geveci, Berk
2014-11-01
The evolution of the computing world from teraflop to petaflop has been relatively effortless, with several of the existing programming models scaling effectively to the petascale. The migration to exascale, however, poses considerable challenges. All industry trends infer that the exascale machine will be built using processors containing hundreds to thousands of cores per chip. It can be inferred that efficient concurrency on exascale machines requires a massive amount of concurrent threads, each performing many operations on a localized piece of data. Currently, visualization libraries and applications are based off what is known as the visualization pipeline. In the pipelinemore » model, algorithms are encapsulated as filters with inputs and outputs. These filters are connected by setting the output of one component to the input of another. Parallelism in the visualization pipeline is achieved by replicating the pipeline for each processing thread. This works well for today’s distributed memory parallel computers but cannot be sustained when operating on processors with thousands of cores. Our project investigates a new visualization framework designed to exhibit the pervasive parallelism necessary for extreme scale machines. Our framework achieves this by defining algorithms in terms of worklets, which are localized stateless operations. Worklets are atomic operations that execute when invoked unlike filters, which execute when a pipeline request occurs. The worklet design allows execution on a massive amount of lightweight threads with minimal overhead. Only with such fine-grained parallelism can we hope to fill the billions of threads we expect will be necessary for efficient computation on an exascale machine.« less
Architectural requirements for the Red Storm computing system.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Camp, William J.; Tomkins, James Lee
This report is based on the Statement of Work (SOW) describing the various requirements for delivering 3 new supercomputer system to Sandia National Laboratories (Sandia) as part of the Department of Energy's (DOE) Accelerated Strategic Computing Initiative (ASCI) program. This system is named Red Storm and will be a distributed memory, massively parallel processor (MPP) machine built primarily out of commodity parts. The requirements presented here distill extensive architectural and design experience accumulated over a decade and a half of research, development and production operation of similar machines at Sandia. Red Storm will have an unusually high bandwidth, low latencymore » interconnect, specially designed hardware and software reliability features, a light weight kernel compute node operating system and the ability to rapidly switch major sections of the machine between classified and unclassified computing environments. Particular attention has been paid to architectural balance in the design of Red Storm, and it is therefore expected to achieve an atypically high fraction of its peak speed of 41 TeraOPS on real scientific computing applications. In addition, Red Storm is designed to be upgradeable to many times this initial peak capability while still retaining appropriate balance in key design dimensions. Installation of the Red Storm computer system at Sandia's New Mexico site is planned for 2004, and it is expected that the system will be operated for a minimum of five years following installation.« less
Scalable Nearest Neighbor Algorithms for High Dimensional Data.
Muja, Marius; Lowe, David G
2014-11-01
For many computer vision and machine learning problems, large training sets are key for good performance. However, the most computationally expensive part of many computer vision and machine learning algorithms consists of finding nearest neighbor matches to high dimensional vectors that represent the training data. We propose new algorithms for approximate nearest neighbor matching and evaluate and compare them with previous algorithms. For matching high dimensional features, we find two algorithms to be the most efficient: the randomized k-d forest and a new algorithm proposed in this paper, the priority search k-means tree. We also propose a new algorithm for matching binary features by searching multiple hierarchical clustering trees and show it outperforms methods typically used in the literature. We show that the optimal nearest neighbor algorithm and its parameters depend on the data set characteristics and describe an automated configuration procedure for finding the best algorithm to search a particular data set. In order to scale to very large data sets that would otherwise not fit in the memory of a single machine, we propose a distributed nearest neighbor matching framework that can be used with any of the algorithms described in the paper. All this research has been released as an open source library called fast library for approximate nearest neighbors (FLANN), which has been incorporated into OpenCV and is now one of the most popular libraries for nearest neighbor matching.
34 CFR 395.8 - Distribution and use of income from vending machines on Federal property.
Code of Federal Regulations, 2010 CFR
2010-07-01
... blind vendor in any amount exceeding the average net income of the total number of blind vendors in the... 34 Education 2 2010-07-01 2010-07-01 false Distribution and use of income from vending machines on... use of income from vending machines on Federal property. (a) Vending machine income from vending...
34 CFR 395.8 - Distribution and use of income from vending machines on Federal property.
Code of Federal Regulations, 2011 CFR
2011-07-01
... blind vendor in any amount exceeding the average net income of the total number of blind vendors in the... 34 Education 2 2011-07-01 2010-07-01 true Distribution and use of income from vending machines on... use of income from vending machines on Federal property. (a) Vending machine income from vending...
Hardware support for collecting performance counters directly to memory
Gara, Alan; Salapura, Valentina; Wisniewski, Robert W.
2012-09-25
Hardware support for collecting performance counters directly to memory, in one aspect, may include a plurality of performance counters operable to collect one or more counts of one or more selected activities. A first storage element may be operable to store an address of a memory location. A second storage element may be operable to store a value indicating whether the hardware should begin copying. A state machine may be operable to detect the value in the second storage element and trigger hardware copying of data in selected one or more of the plurality of performance counters to the memory location whose address is stored in the first storage element.
Architectures for reasoning in parallel
NASA Technical Reports Server (NTRS)
Hall, Lawrence O.
1989-01-01
The research conducted has dealt with rule-based expert systems. The algorithms that may lead to effective parallelization of them were investigated. Both the forward and backward chained control paradigms were investigated in the course of this work. The best computer architecture for the developed and investigated algorithms has been researched. Two experimental vehicles were developed to facilitate this research. They are Backpac, a parallel backward chained rule-based reasoning system and Datapac, a parallel forward chained rule-based reasoning system. Both systems have been written in Multilisp, a version of Lisp which contains the parallel construct, future. Applying the future function to a function causes the function to become a task parallel to the spawning task. Additionally, Backpac and Datapac have been run on several disparate parallel processors. The machines are an Encore Multimax with 10 processors, the Concert Multiprocessor with 64 processors, and a 32 processor BBN GP1000. Both the Concert and the GP1000 are switch-based machines. The Multimax has all its processors hung off a common bus. All are shared memory machines, but have different schemes for sharing the memory and different locales for the shared memory. The main results of the investigations come from experiments on the 10 processor Encore and the Concert with partitions of 32 or less processors. Additionally, experiments have been run with a stripped down version of EMYCIN.
Federal Register 2010, 2011, 2012, 2013, 2014
2011-01-03
...] International Business Machines (IBM), Global Sales Operations Organization, Sales and Distribution Business Manager Roles; One Teleworker Located in Charleston, WV; International Business Machines (IBM), Global Sales Operations Organization, Sales and Distribution Business Unit, Relations Analyst and Band 8...
Algorithms for Automatic Alignment of Arrays
NASA Technical Reports Server (NTRS)
Chatterjee, Siddhartha; Gilbert, John R.; Oliker, Leonid; Schreiber, Robert; Sheffler, Thomas J.
1996-01-01
Aggregate data objects (such as arrays) are distributed across the processor memories when compiling a data-parallel language for a distributed-memory machine. The mapping determines the amount of communication needed to bring operands of parallel operations into alignment with each other. A common approach is to break the mapping into two stages: an alignment that maps all the objects to an abstract template, followed by a distribution that maps the template to the processors. This paper describes algorithms for solving the various facets of the alignment problem: axis and stride alignment, static and mobile offset alignment, and replication labeling. We show that optimal axis and stride alignment is NP-complete for general program graphs, and give a heuristic method that can explore the space of possible solutions in a number of ways. We show that some of these strategies can give better solutions than a simple greedy approach proposed earlier. We also show how local graph contractions can reduce the size of the problem significantly without changing the best solution. This allows more complex and effective heuristics to be used. We show how to model the static offset alignment problem using linear programming, and we show that loop-dependent mobile offset alignment is sometimes necessary for optimum performance. We describe an algorithm with for determining mobile alignments for objects within do loops. We also identify situations in which replicated alignment is either required by the program itself or can be used to improve performance. We describe an algorithm based on network flow that replicates objects so as to minimize the total amount of broadcast communication in replication.
A Parameter Communication Optimization Strategy for Distributed Machine Learning in Sensors
Zhang, Jilin; Tu, Hangdi; Ren, Yongjian; Wan, Jian; Zhou, Li; Li, Mingwei; Wang, Jue; Yu, Lifeng; Zhao, Chang; Zhang, Lei
2017-01-01
In order to utilize the distributed characteristic of sensors, distributed machine learning has become the mainstream approach, but the different computing capability of sensors and network delays greatly influence the accuracy and the convergence rate of the machine learning model. Our paper describes a reasonable parameter communication optimization strategy to balance the training overhead and the communication overhead. We extend the fault tolerance of iterative-convergent machine learning algorithms and propose the Dynamic Finite Fault Tolerance (DFFT). Based on the DFFT, we implement a parameter communication optimization strategy for distributed machine learning, named Dynamic Synchronous Parallel Strategy (DSP), which uses the performance monitoring model to dynamically adjust the parameter synchronization strategy between worker nodes and the Parameter Server (PS). This strategy makes full use of the computing power of each sensor, ensures the accuracy of the machine learning model, and avoids the situation that the model training is disturbed by any tasks unrelated to the sensors. PMID:28934163
Efficient Checkpointing of Virtual Machines using Virtual Machine Introspection
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aderholdt, Ferrol; Han, Fang; Scott, Stephen L
Cloud Computing environments rely heavily on system-level virtualization. This is due to the inherent benefits of virtualization including fault tolerance through checkpoint/restart (C/R) mechanisms. Because clouds are the abstraction of large data centers and large data centers have a higher potential for failure, it is imperative that a C/R mechanism for such an environment provide minimal latency as well as a small checkpoint file size. Recently, there has been much research into C/R with respect to virtual machines (VM) providing excellent solutions to reduce either checkpoint latency or checkpoint file size. However, these approaches do not provide both. This papermore » presents a method of checkpointing VMs by utilizing virtual machine introspection (VMI). Through the usage of VMI, we are able to determine which pages of memory within the guest are used or free and are better able to reduce the amount of pages written to disk during a checkpoint. We have validated this work by using various benchmarks to measure the latency along with the checkpoint size. With respect to checkpoint file size, our approach results in file sizes within 24% or less of the actual used memory within the guest. Additionally, the checkpoint latency of our approach is up to 52% faster than KVM s default method.« less
Compile-time estimation of communication costs in multicomputers
NASA Technical Reports Server (NTRS)
Gupta, Manish; Banerjee, Prithviraj
1991-01-01
An important problem facing numerous research projects on parallelizing compilers for distributed memory machines is that of automatically determining a suitable data partitioning scheme for a program. Any strategy for automatic data partitioning needs a mechanism for estimating the performance of a program under a given partitioning scheme, the most crucial part of which involves determining the communication costs incurred by the program. A methodology is described for estimating the communication costs at compile-time as functions of the numbers of processors over which various arrays are distributed. A strategy is described along with its theoretical basis, for making program transformations that expose opportunities for combining of messages, leading to considerable savings in the communication costs. For certain loops with regular dependences, the compiler can detect the possibility of pipelining, and thus estimate communication costs more accurately than it could otherwise. These results are of great significance to any parallelization system supporting numeric applications on multicomputers. In particular, they lay down a framework for effective synthesis of communication on multicomputers from sequential program references.
Large Scale Analysis of Geospatial Data with Dask and XArray
NASA Astrophysics Data System (ADS)
Zender, C. S.; Hamman, J.; Abernathey, R.; Evans, K. J.; Rocklin, M.; Zender, C. S.; Rocklin, M.
2017-12-01
The analysis of geospatial data with high level languages has acceleratedinnovation and the impact of existing data resources. However, as datasetsgrow beyond single-machine memory, data structures within these high levellanguages can become a bottleneck. New libraries like Dask and XArray resolve some of these scalability issues,providing interactive workflows that are both familiar tohigh-level-language researchers while also scaling out to much largerdatasets. This broadens the access of researchers to larger datasets on highperformance computers and, through interactive development, reducestime-to-insight when compared to traditional parallel programming techniques(MPI). This talk describes Dask, a distributed dynamic task scheduler, Dask.array, amulti-dimensional array that copies the popular NumPy interface, and XArray,a library that wraps NumPy/Dask.array with labeled and indexes axes,implementing the CF conventions. We discuss both the basic design of theselibraries and how they change interactive analysis of geospatial data, and alsorecent benefits and challenges of distributed computing on clusters ofmachines.
NASA Astrophysics Data System (ADS)
Johnson, Bradley; May, Gayle L.; Korn, Paula
The present conference discusses the currently envisioned goals of human-machine systems in spacecraft environments, prospects for human exploration of the solar system, and plausible methods for meeting human needs in space. Also discussed are the problems of human-machine interaction in long-duration space flights, remote medical systems for space exploration, the use of virtual reality for planetary exploration, the alliance between U.S. Antarctic and space programs, and the economic and educational impacts of the U.S. space program.
Immunological memory is associative
DOE Office of Scientific and Technical Information (OSTI.GOV)
Smith, D.J.; Forrest, S.; Perelson, A.S.
1996-12-31
The purpose of this paper is to show that immunological memory is an associative and robust memory that belongs to the class of sparse distributed memories. This class of memories derives its associative and robust nature by sparsely sampling the input space and distributing the data among many independent agents. Other members of this class include a model of the cerebellar cortex and Sparse Distributed Memory (SDM). First we present a simplified account of the immune response and immunological memory. Next we present SDM, and then we show the correlations between immunological memory and SDM. Finally, we show how associativemore » recall in the immune response can be both beneficial and detrimental to the fitness of an individual.« less
Using PVM to host CLIPS in distributed environments
NASA Technical Reports Server (NTRS)
Myers, Leonard; Pohl, Kym
1994-01-01
It is relatively easy to enhance CLIPS (C Language Integrated Production System) to support multiple expert systems running in a distributed environment with heterogeneous machines. The task is minimized by using the PVM (Parallel Virtual Machine) code from Oak Ridge Labs to provide the distributed utility. PVM is a library of C and FORTRAN subprograms that supports distributive computing on many different UNIX platforms. A PVM deamon is easily installed on each CPU that enters the virtual machine environment. Any user with rsh or rexec access to a machine can use the one PVM deamon to obtain a generous set of distributed facilities. The ready availability of both CLIPS and PVM makes the combination of software particularly attractive for budget conscious experimentation of heterogeneous distributive computing with multiple CLIPS executables. This paper presents a design that is sufficient to provide essential message passing functions in CLIPS and enable the full range of PVM facilities.
Information processing via physical soft body
Nakajima, Kohei; Hauser, Helmut; Li, Tao; Pfeifer, Rolf
2015-01-01
Soft machines have recently gained prominence due to their inherent softness and the resulting safety and resilience in applications. However, these machines also have disadvantages, as they respond with complex body dynamics when stimulated. These dynamics exhibit a variety of properties, including nonlinearity, memory, and potentially infinitely many degrees of freedom, which are often difficult to control. Here, we demonstrate that these seemingly undesirable properties can in fact be assets that can be exploited for real-time computation. Using body dynamics generated from a soft silicone arm, we show that they can be employed to emulate desired nonlinear dynamical systems. First, by using benchmark tasks, we demonstrate that the nonlinearity and memory within the body dynamics can increase the computational performance. Second, we characterize our system’s computational capability by comparing its task performance with a standard machine learning technique and identify its range of validity and limitation. Our results suggest that soft bodies are not only impressive in their deformability and flexibility but can also be potentially used as computational resources on top and for free. PMID:26014748
75 FR 48955 - Arbitration Panel Decision Under the Randolph-Sheppard Act
Federal Register 2010, 2011, 2012, 2013, 2014
2010-08-12
... vending machine facility operated by a blind vendor at the USPS's Chicago Processing and Distribution... cafeteria operations are exempt from the Act and whether the vending machines operated by a private vendor at the Chicago Processing and Distribution Center are in direct competition with the vending machines...
Parallel grid generation algorithm for distributed memory computers
NASA Technical Reports Server (NTRS)
Moitra, Stuti; Moitra, Anutosh
1994-01-01
A parallel grid-generation algorithm and its implementation on the Intel iPSC/860 computer are described. The grid-generation scheme is based on an algebraic formulation of homotopic relations. Methods for utilizing the inherent parallelism of the grid-generation scheme are described, and implementation of multiple levELs of parallelism on multiple instruction multiple data machines are indicated. The algorithm is capable of providing near orthogonality and spacing control at solid boundaries while requiring minimal interprocessor communications. Results obtained on the Intel hypercube for a blended wing-body configuration are used to demonstrate the effectiveness of the algorithm. Fortran implementations bAsed on the native programming model of the iPSC/860 computer and the Express system of software tools are reported. Computational gains in execution time speed-up ratios are given.
Operation of a quantum dot in the finite-state machine mode: Single-electron dynamic memory
DOE Office of Scientific and Technical Information (OSTI.GOV)
Klymenko, M. V.; Klein, M.; Levine, R. D.
2016-07-14
A single electron dynamic memory is designed based on the non-equilibrium dynamics of charge states in electrostatically defined metallic quantum dots. Using the orthodox theory for computing the transfer rates and a master equation, we model the dynamical response of devices consisting of a charge sensor coupled to either a single and or a double quantum dot subjected to a pulsed gate voltage. We show that transition rates between charge states in metallic quantum dots are characterized by an asymmetry that can be controlled by the gate voltage. This effect is more pronounced when the switching between charge states correspondsmore » to a Markovian process involving electron transport through a chain of several quantum dots. By simulating the dynamics of electron transport we demonstrate that the quantum box operates as a finite-state machine that can be addressed by choosing suitable shapes and switching rates of the gate pulses. We further show that writing times in the ns range and retention memory times six orders of magnitude longer, in the ms range, can be achieved on the double quantum dot system using experimentally feasible parameters, thereby demonstrating that the device can operate as a dynamic single electron memory.« less
Department of Defense In-House RDT and E Activities: Management Analysis Report for Fiscal Year 1993
1994-11-01
A worldwide unique lab because it houses a high - speed modeling and simulation system, a prototype...E Division, San Diego, CA: High Performance Computing Laboratory providing a wide range of advanced computer systems for the scientific investigation...Machines CM-200 and a 256-node Thinking Machines CM-S. The CM-5 is in a very large memory, ( high performance 32 Gbytes, >4 0 OFlop) coafiguration,
NASA Astrophysics Data System (ADS)
Johnson, Bradley; May, Gayle L.; Korn, Paula
A recent symposium produced papers in the areas of solar system exploration, man machine interfaces, cybernetics, virtual reality, telerobotics, life support systems and the scientific and technology spinoff from the NASA space program. A number of papers also addressed the social and economic impacts of the space program. For individual titles, see A95-87468 through A95-87479.
What does fault tolerant Deep Learning need from MPI?
DOE Office of Scientific and Technical Information (OSTI.GOV)
Amatya, Vinay C.; Vishnu, Abhinav; Siegel, Charles M.
Deep Learning (DL) algorithms have become the {\\em de facto} Machine Learning (ML) algorithm for large scale data analysis. DL algorithms are computationally expensive -- even distributed DL implementations which use MPI require days of training (model learning) time on commonly studied datasets. Long running DL applications become susceptible to faults -- requiring development of a fault tolerant system infrastructure, in addition to fault tolerant DL algorithms. This raises an important question: {\\em What is needed from MPI for designing fault tolerant DL implementations?} In this paper, we address this problem for permanent faults. We motivate the need for amore » fault tolerant MPI specification by an in-depth consideration of recent innovations in DL algorithms and their properties, which drive the need for specific fault tolerance features. We present an in-depth discussion on the suitability of different parallelism types (model, data and hybrid); a need (or lack thereof) for check-pointing of any critical data structures; and most importantly, consideration for several fault tolerance proposals (user-level fault mitigation (ULFM), Reinit) in MPI and their applicability to fault tolerant DL implementations. We leverage a distributed memory implementation of Caffe, currently available under the Machine Learning Toolkit for Extreme Scale (MaTEx). We implement our approaches by extending MaTEx-Caffe for using ULFM-based implementation. Our evaluation using the ImageNet dataset and AlexNet neural network topology demonstrates the effectiveness of the proposed fault tolerant DL implementation using OpenMPI based ULFM.« less
Efficient Parallel Kernel Solvers for Computational Fluid Dynamics Applications
NASA Technical Reports Server (NTRS)
Sun, Xian-He
1997-01-01
Distributed-memory parallel computers dominate today's parallel computing arena. These machines, such as Intel Paragon, IBM SP2, and Cray Origin2OO, have successfully delivered high performance computing power for solving some of the so-called "grand-challenge" problems. Despite initial success, parallel machines have not been widely accepted in production engineering environments due to the complexity of parallel programming. On a parallel computing system, a task has to be partitioned and distributed appropriately among processors to reduce communication cost and to attain load balance. More importantly, even with careful partitioning and mapping, the performance of an algorithm may still be unsatisfactory, since conventional sequential algorithms may be serial in nature and may not be implemented efficiently on parallel machines. In many cases, new algorithms have to be introduced to increase parallel performance. In order to achieve optimal performance, in addition to partitioning and mapping, a careful performance study should be conducted for a given application to find a good algorithm-machine combination. This process, however, is usually painful and elusive. The goal of this project is to design and develop efficient parallel algorithms for highly accurate Computational Fluid Dynamics (CFD) simulations and other engineering applications. The work plan is 1) developing highly accurate parallel numerical algorithms, 2) conduct preliminary testing to verify the effectiveness and potential of these algorithms, 3) incorporate newly developed algorithms into actual simulation packages. The work plan has well achieved. Two highly accurate, efficient Poisson solvers have been developed and tested based on two different approaches: (1) Adopting a mathematical geometry which has a better capacity to describe the fluid, (2) Using compact scheme to gain high order accuracy in numerical discretization. The previously developed Parallel Diagonal Dominant (PDD) algorithm and Reduced Parallel Diagonal Dominant (RPDD) algorithm have been carefully studied on different parallel platforms for different applications, and a NASA simulation code developed by Man M. Rai and his colleagues has been parallelized and implemented based on data dependency analysis. These achievements are addressed in detail in the paper.
A Parallel Vector Machine for the PM Programming Language
NASA Astrophysics Data System (ADS)
Bellerby, Tim
2016-04-01
PM is a new programming language which aims to make the writing of computational geoscience models on parallel hardware accessible to scientists who are not themselves expert parallel programmers. It is based around the concept of communicating operators: language constructs that enable variables local to a single invocation of a parallelised loop to be viewed as if they were arrays spanning the entire loop domain. This mechanism enables different loop invocations (which may or may not be executing on different processors) to exchange information in a manner that extends the successful Communicating Sequential Processes idiom from single messages to collective communication. Communicating operators avoid the additional synchronisation mechanisms, such as atomic variables, required when programming using the Partitioned Global Address Space (PGAS) paradigm. Using a single loop invocation as the fundamental unit of concurrency enables PM to uniformly represent different levels of parallelism from vector operations through shared memory systems to distributed grids. This paper describes an implementation of PM based on a vectorised virtual machine. On a single processor node, concurrent operations are implemented using masked vector operations. Virtual machine instructions operate on vectors of values and may be unmasked, masked using a Boolean field, or masked using an array of active vector cell locations. Conditional structures (such as if-then-else or while statement implementations) calculate and apply masks to the operations they control. A shift in mask representation from Boolean to location-list occurs when active locations become sufficiently sparse. Parallel loops unfold data structures (or vectors of data structures for nested loops) into vectors of values that may additionally be distributed over multiple computational nodes and then split into micro-threads compatible with the size of the local cache. Inter-node communication is accomplished using standard OpenMP and MPI. Performance analyses of the PM vector machine, demonstrating its scaling properties with respect to domain size and the number of processor nodes will be presented for a range of hardware configurations. The PM software and language definition are being made available under unrestrictive MIT and Creative Commons Attribution licenses respectively: www.pm-lang.org.
Distributed representations in memory: Insights from functional brain imaging
Rissman, Jesse; Wagner, Anthony D.
2015-01-01
Forging new memories for facts and events, holding critical details in mind on a moment-to-moment basis, and retrieving knowledge in the service of current goals all depend on a complex interplay between neural ensembles throughout the brain. Over the past decade, researchers have increasingly leveraged powerful analytical tools (e.g., multi-voxel pattern analysis) to decode the information represented within distributed fMRI activity patterns. In this review, we discuss how these methods can sensitively index neural representations of perceptual and semantic content, and how leverage on the engagement of distributed representations provides unique insights into distinct aspects of memory-guided behavior. We emphasize that, in addition to characterizing the contents of memories, analyses of distributed patterns shed light on the processes that influence how information is encoded, maintained, or retrieved, and thus inform memory theory. We conclude by highlighting open questions about memory that can be addressed through distributed pattern analyses. PMID:21943171
Scaling Deep Learning on GPU and Knights Landing clusters
You, Yang; Buluc, Aydin; Demmel, James
2017-09-26
The speed of deep neural networks training has become a big bottleneck of deep learning research and development. For example, training GoogleNet by ImageNet dataset on one Nvidia K20 GPU needs 21 days. To speed up the training process, the current deep learning systems heavily rely on the hardware accelerators. However, these accelerators have limited on-chip memory compared with CPUs. To handle large datasets, they need to fetch data from either CPU memory or remote processors. We use both self-hosted Intel Knights Landing (KNL) clusters and multi-GPU clusters as our target platforms. From an algorithm aspect, current distributed machine learningmore » systems are mainly designed for cloud systems. These methods are asynchronous because of the slow network and high fault-tolerance requirement on cloud systems. We focus on Elastic Averaging SGD (EASGD) to design algorithms for HPC clusters. Original EASGD used round-robin method for communication and updating. The communication is ordered by the machine rank ID, which is inefficient on HPC clusters. First, we redesign four efficient algorithms for HPC systems to improve EASGD's poor scaling on clusters. Async EASGD, Async MEASGD, and Hogwild EASGD are faster \\textcolor{black}{than} their existing counterparts (Async SGD, Async MSGD, and Hogwild SGD, resp.) in all the comparisons. Finally, we design Sync EASGD, which ties for the best performance among all the methods while being deterministic. In addition to the algorithmic improvements, we use some system-algorithm codesign techniques to scale up the algorithms. By reducing the percentage of communication from 87% to 14%, our Sync EASGD achieves 5.3x speedup over original EASGD on the same platform. We get 91.5% weak scaling efficiency on 4253 KNL cores, which is higher than the state-of-the-art implementation.« less
Scaling Deep Learning on GPU and Knights Landing clusters
DOE Office of Scientific and Technical Information (OSTI.GOV)
You, Yang; Buluc, Aydin; Demmel, James
The speed of deep neural networks training has become a big bottleneck of deep learning research and development. For example, training GoogleNet by ImageNet dataset on one Nvidia K20 GPU needs 21 days. To speed up the training process, the current deep learning systems heavily rely on the hardware accelerators. However, these accelerators have limited on-chip memory compared with CPUs. To handle large datasets, they need to fetch data from either CPU memory or remote processors. We use both self-hosted Intel Knights Landing (KNL) clusters and multi-GPU clusters as our target platforms. From an algorithm aspect, current distributed machine learningmore » systems are mainly designed for cloud systems. These methods are asynchronous because of the slow network and high fault-tolerance requirement on cloud systems. We focus on Elastic Averaging SGD (EASGD) to design algorithms for HPC clusters. Original EASGD used round-robin method for communication and updating. The communication is ordered by the machine rank ID, which is inefficient on HPC clusters. First, we redesign four efficient algorithms for HPC systems to improve EASGD's poor scaling on clusters. Async EASGD, Async MEASGD, and Hogwild EASGD are faster \\textcolor{black}{than} their existing counterparts (Async SGD, Async MSGD, and Hogwild SGD, resp.) in all the comparisons. Finally, we design Sync EASGD, which ties for the best performance among all the methods while being deterministic. In addition to the algorithmic improvements, we use some system-algorithm codesign techniques to scale up the algorithms. By reducing the percentage of communication from 87% to 14%, our Sync EASGD achieves 5.3x speedup over original EASGD on the same platform. We get 91.5% weak scaling efficiency on 4253 KNL cores, which is higher than the state-of-the-art implementation.« less
NASA Technical Reports Server (NTRS)
Davis, G. J.
1994-01-01
One area of research of the Information Sciences Division at NASA Ames Research Center is devoted to the analysis and enhancement of processors and advanced computer architectures, specifically in support of automation and robotic systems. To compare systems' abilities to efficiently process Lisp and Ada, scientists at Ames Research Center have developed a suite of non-parallel benchmarks called ELAPSE. The benchmark suite was designed to test a single computer's efficiency as well as alternate machine comparisons on Lisp, and/or Ada languages. ELAPSE tests the efficiency with which a machine can execute the various routines in each environment. The sample routines are based on numeric and symbolic manipulations and include two-dimensional fast Fourier transformations, Cholesky decomposition and substitution, Gaussian elimination, high-level data processing, and symbol-list references. Also included is a routine based on a Bayesian classification program sorting data into optimized groups. The ELAPSE benchmarks are available for any computer with a validated Ada compiler and/or Common Lisp system. Of the 18 routines that comprise ELAPSE, provided within this package are 14 developed or translated at Ames. The others are readily available through literature. The benchmark that requires the most memory is CHOLESKY.ADA. Under VAX/VMS, CHOLESKY.ADA requires 760K of main memory. ELAPSE is available on either two 5.25 inch 360K MS-DOS format diskettes (standard distribution) or a 9-track 1600 BPI ASCII CARD IMAGE format magnetic tape. The contents of the diskettes are compressed using the PKWARE archiving tools. The utility to unarchive the files, PKUNZIP.EXE, is included. The ELAPSE benchmarks were written in 1990. VAX and VMS are trademarks of Digital Equipment Corporation. MS-DOS is a registered trademark of Microsoft Corporation.
Assessing Advanced Technology in CENATE
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tallent, Nathan R.; Barker, Kevin J.; Gioiosa, Roberto
PNNL's Center for Advanced Technology Evaluation (CENATE) is a new U.S. Department of Energy center whose mission is to assess and facilitate access to emerging computing technology. CENATE is assessing a range of advanced technologies, from evolutionary to disruptive. Technologies of interest include the processor socket (homogeneous and accelerated systems), memories (dynamic, static, memory cubes), motherboards, networks (network interface cards and switches), and input/output and storage devices. CENATE is developing a multi-perspective evaluation process based on integrating advanced system instrumentation, performance measurements, and modeling and simulation. We show evaluations of two emerging network technologies: silicon photonics interconnects and the Datamore » Vortex network. CENATE's evaluation also addresses the question of which machine is best for a given workload under certain constraints. We show a performance-power tradeoff analysis of a well-known machine learning application on two systems.« less
NASA Technical Reports Server (NTRS)
Lyster, P. M.; Liewer, P. C.; Decyk, V. K.; Ferraro, R. D.
1995-01-01
A three-dimensional electrostatic particle-in-cell (PIC) plasma simulation code has been developed on coarse-grain distributed-memory massively parallel computers with message passing communications. Our implementation is the generalization to three-dimensions of the general concurrent particle-in-cell (GCPIC) algorithm. In the GCPIC algorithm, the particle computation is divided among the processors using a domain decomposition of the simulation domain. In a three-dimensional simulation, the domain can be partitioned into one-, two-, or three-dimensional subdomains ("slabs," "rods," or "cubes") and we investigate the efficiency of the parallel implementation of the push for all three choices. The present implementation runs on the Intel Touchstone Delta machine at Caltech; a multiple-instruction-multiple-data (MIMD) parallel computer with 512 nodes. We find that the parallel efficiency of the push is very high, with the ratio of communication to computation time in the range 0.3%-10.0%. The highest efficiency (> 99%) occurs for a large, scaled problem with 64(sup 3) particles per processing node (approximately 134 million particles of 512 nodes) which has a push time of about 250 ns per particle per time step. We have also developed expressions for the timing of the code which are a function of both code parameters (number of grid points, particles, etc.) and machine-dependent parameters (effective FLOP rate, and the effective interprocessor bandwidths for the communication of particles and grid points). These expressions can be used to estimate the performance of scaled problems--including those with inhomogeneous plasmas--to other parallel machines once the machine-dependent parameters are known.
Tamboer, P; Vorst, H C M; Ghebreab, S; Scholte, H S
2016-01-01
Meta-analytic studies suggest that dyslexia is characterized by subtle and spatially distributed variations in brain anatomy, although many variations failed to be significant after corrections of multiple comparisons. To circumvent issues of significance which are characteristic for conventional analysis techniques, and to provide predictive value, we applied a machine learning technique--support vector machine--to differentiate between subjects with and without dyslexia. In a sample of 22 students with dyslexia (20 women) and 27 students without dyslexia (25 women) (18-21 years), a classification performance of 80% (p < 0.001; d-prime = 1.67) was achieved on the basis of differences in gray matter (sensitivity 82%, specificity 78%). The voxels that were most reliable for classification were found in the left occipital fusiform gyrus (LOFG), in the right occipital fusiform gyrus (ROFG), and in the left inferior parietal lobule (LIPL). Additionally, we found that classification certainty (e.g. the percentage of times a subject was correctly classified) correlated with severity of dyslexia (r = 0.47). Furthermore, various significant correlations were found between the three anatomical regions and behavioural measures of spelling, phonology and whole-word-reading. No correlations were found with behavioural measures of short-term memory and visual/attentional confusion. These data indicate that the LOFG, ROFG and the LIPL are neuro-endophenotype and potentially biomarkers for types of dyslexia related to reading, spelling and phonology. In a second and independent sample of 876 young adults of a general population, the trained classifier of the first sample was tested, resulting in a classification performance of 59% (p = 0.07; d-prime = 0.65). This decline in classification performance resulted from a large percentage of false alarms. This study provided support for the use of machine learning in anatomical brain imaging.
Choi, Hae-Yoon; Kensinger, Elizabeth A; Rajaram, Suparna
2017-09-01
Social transmission of memory and its consequence on collective memory have generated enduring interdisciplinary interest because of their widespread significance in interpersonal, sociocultural, and political arenas. We tested the influence of 3 key factors-emotional salience of information, group structure, and information distribution-on mnemonic transmission, social contagion, and collective memory. Participants individually studied emotionally salient (negative or positive) and nonemotional (neutral) picture-word pairs that were completely shared, partially shared, or unshared within participant triads, and then completed 3 consecutive recalls in 1 of 3 conditions: individual-individual-individual (control), collaborative-collaborative (identical group; insular structure)-individual, and collaborative-collaborative (reconfigured group; diverse structure)-individual. Collaboration enhanced negative memories especially in insular group structure and especially for shared information, and promoted collective forgetting of positive memories. Diverse group structure reduced this negativity effect. Unequally distributed information led to social contagion that creates false memories; diverse structure propagated a greater variety of false memories whereas insular structure promoted confidence in false recognition and false collective memory. A simultaneous assessment of network structure, information distribution, and emotional valence breaks new ground to specify how network structure shapes the spread of negative memories and false memories, and the emergence of collective memory. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
SU-E-T-113: Dose Distribution Using Respiratory Signals and Machine Parameters During Treatment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Imae, T; Haga, A; Saotome, N
Purpose: Volumetric modulated arc therapy (VMAT) is a rotational intensity-modulated radiotherapy (IMRT) technique capable of acquiring projection images during treatment. Treatment plans for lung tumors using stereotactic body radiotherapy (SBRT) are calculated with planning computed tomography (CT) images only exhale phase. Purpose of this study is to evaluate dose distribution by reconstructing from only the data such as respiratory signals and machine parameters acquired during treatment. Methods: Phantom and three patients with lung tumor underwent CT scans for treatment planning. They were treated by VMAT while acquiring projection images to derive their respiratory signals and machine parameters including positions ofmore » multi leaf collimators, dose rates and integrated monitor units. The respiratory signals were divided into 4 and 10 phases and machine parameters were correlated with the divided respiratory signals based on the gantry angle. Dose distributions of each respiratory phase were calculated from plans which were reconstructed from the respiratory signals and the machine parameters during treatment. The doses at isocenter, maximum point and the centroid of target were evaluated. Results and Discussion: Dose distributions during treatment were calculated using the machine parameters and the respiratory signals detected from projection images. Maximum dose difference between plan and in treatment distribution was −1.8±0.4% at centroid of target and dose differences of evaluated points between 4 and 10 phases were no significant. Conclusion: The present method successfully evaluated dose distribution using respiratory signals and machine parameters during treatment. This method is feasible to verify the actual dose for moving target.« less
Restoration of fMRI Decodability Does Not Imply Latent Working Memory States
Schneegans, Sebastian; Bays, Paul M.
2018-01-01
Recent imaging studies have challenged the prevailing view that working memory is mediated by sustained neural activity. Using machine learning methods to reconstruct memory content, these studies found that previously diminished representations can be restored by retrospective cueing or other forms of stimulation. These findings have been interpreted as evidence for an activity-silent working memory state that can be reactivated dependent on task demands. Here, we test the validity of this conclusion by formulating a neural process model of working memory based on sustained activity and using this model to emulate a spatial recall task with retrocueing. The simulation reproduces both behavioral and fMRI results previously taken as evidence for latent states, in particular the restoration of spatial reconstruction quality following an informative cue. Our results demonstrate that recovery of the decodability of an imaging signal does not provide compelling evidence for an activity-silent working memory state. PMID:28820674
Reliability Analysis of Uniaxially Ground Brittle Materials
NASA Technical Reports Server (NTRS)
Salem, Jonathan A.; Nemeth, Noel N.; Powers, Lynn M.; Choi, Sung R.
1995-01-01
The fast fracture strength distribution of uniaxially ground, alpha silicon carbide was investigated as a function of grinding angle relative to the principal stress direction in flexure. Both as-ground and ground/annealed surfaces were investigated. The resulting flexural strength distributions were used to verify reliability models and predict the strength distribution of larger plate specimens tested in biaxial flexure. Complete fractography was done on the specimens. Failures occurred from agglomerates, machining cracks, or hybrid flaws that consisted of a machining crack located at a processing agglomerate. Annealing eliminated failures due to machining damage. Reliability analyses were performed using two and three parameter Weibull and Batdorf methodologies. The Weibull size effect was demonstrated for machining flaws. Mixed mode reliability models reasonably predicted the strength distributions of uniaxial flexure and biaxial plate specimens.
A Multi-scale Cognitive Approach to Intrusion Detection and Response
2015-12-28
the behavior of the traffic on the network, either by using mathematical formulas or by replaying packet streams. As a result, simulators depend...large scale. Summary of the most important results We obtained a powerful machine, which has 768 cores and 1.25 TB memory . RBG has been...time. Each client is configured with 1GB memory , 10 GB disk space, and one 100M Ethernet interface. The server nodes include web servers
This Is Your Brain: A Decision-Making Machine
2015-11-01
brain has vast comput-ing power that performs a plethora of vital tasks. It regu-lates your bodily functions, movements and emotions . It processes and...system beneath the cerebrum and associated with long-term memory and emotions . In our “The brain is a wonderful organ. It starts working when you get...presence of perceived danger. Long-term memories and experiences also are stored here, often along with their emotional connections to pain or
Three Types of Memory in Emergency Medical Services Communication
ERIC Educational Resources Information Center
Angeli, Elizabeth L.
2015-01-01
This article examines memory and distributed cognition involved in the writing practices of emergency medical services (EMS) professionals. Results from a 16-month study indicate that EMS professionals rely on distributed cognition and three kinds of memory: individual, collaborative, and professional. Distributed cognition and the three types of…
Fast and Accurate Support Vector Machines on Large Scale Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vishnu, Abhinav; Narasimhan, Jayenthi; Holder, Larry
Support Vector Machines (SVM) is a supervised Machine Learning and Data Mining (MLDM) algorithm, which has become ubiquitous largely due to its high accuracy and obliviousness to dimensionality. The objective of SVM is to find an optimal boundary --- also known as hyperplane --- which separates the samples (examples in a dataset) of different classes by a maximum margin. Usually, very few samples contribute to the definition of the boundary. However, existing parallel algorithms use the entire dataset for finding the boundary, which is sub-optimal for performance reasons. In this paper, we propose a novel distributed memory algorithm to eliminatemore » the samples which do not contribute to the boundary definition in SVM. We propose several heuristics, which range from early (aggressive) to late (conservative) elimination of the samples, such that the overall time for generating the boundary is reduced considerably. In a few cases, a sample may be eliminated (shrunk) pre-emptively --- potentially resulting in an incorrect boundary. We propose a scalable approach to synchronize the necessary data structures such that the proposed algorithm maintains its accuracy. We consider the necessary trade-offs of single/multiple synchronization using in-depth time-space complexity analysis. We implement the proposed algorithm using MPI and compare it with libsvm--- de facto sequential SVM software --- which we enhance with OpenMP for multi-core/many-core parallelism. Our proposed approach shows excellent efficiency using up to 4096 processes on several large datasets such as UCI HIGGS Boson dataset and Offending URL dataset.« less
Efficient Execution of Microscopy Image Analysis on CPU, GPU, and MIC Equipped Cluster Systems.
Andrade, G; Ferreira, R; Teodoro, George; Rocha, Leonardo; Saltz, Joel H; Kurc, Tahsin
2014-10-01
High performance computing is experiencing a major paradigm shift with the introduction of accelerators, such as graphics processing units (GPUs) and Intel Xeon Phi (MIC). These processors have made available a tremendous computing power at low cost, and are transforming machines into hybrid systems equipped with CPUs and accelerators. Although these systems can deliver a very high peak performance, making full use of its resources in real-world applications is a complex problem. Most current applications deployed to these machines are still being executed in a single processor, leaving other devices underutilized. In this paper we explore a scenario in which applications are composed of hierarchical data flow tasks which are allocated to nodes of a distributed memory machine in coarse-grain, but each of them may be composed of several finer-grain tasks which can be allocated to different devices within the node. We propose and implement novel performance aware scheduling techniques that can be used to allocate tasks to devices. We evaluate our techniques using a pathology image analysis application used to investigate brain cancer morphology, and our experimental evaluation shows that the proposed scheduling strategies significantly outperforms other efficient scheduling techniques, such as Heterogeneous Earliest Finish Time - HEFT, in cooperative executions using CPUs, GPUs, and MICs. We also experimentally show that our strategies are less sensitive to inaccuracy in the scheduling input data and that the performance gains are maintained as the application scales.
Efficient Execution of Microscopy Image Analysis on CPU, GPU, and MIC Equipped Cluster Systems
Andrade, G.; Ferreira, R.; Teodoro, George; Rocha, Leonardo; Saltz, Joel H.; Kurc, Tahsin
2015-01-01
High performance computing is experiencing a major paradigm shift with the introduction of accelerators, such as graphics processing units (GPUs) and Intel Xeon Phi (MIC). These processors have made available a tremendous computing power at low cost, and are transforming machines into hybrid systems equipped with CPUs and accelerators. Although these systems can deliver a very high peak performance, making full use of its resources in real-world applications is a complex problem. Most current applications deployed to these machines are still being executed in a single processor, leaving other devices underutilized. In this paper we explore a scenario in which applications are composed of hierarchical data flow tasks which are allocated to nodes of a distributed memory machine in coarse-grain, but each of them may be composed of several finer-grain tasks which can be allocated to different devices within the node. We propose and implement novel performance aware scheduling techniques that can be used to allocate tasks to devices. We evaluate our techniques using a pathology image analysis application used to investigate brain cancer morphology, and our experimental evaluation shows that the proposed scheduling strategies significantly outperforms other efficient scheduling techniques, such as Heterogeneous Earliest Finish Time - HEFT, in cooperative executions using CPUs, GPUs, and MICs. We also experimentally show that our strategies are less sensitive to inaccuracy in the scheduling input data and that the performance gains are maintained as the application scales. PMID:26640423
NASA Astrophysics Data System (ADS)
Baldi, Livio; Bez, Roberto; Sandhu, Gurtej
2014-12-01
Memory is a key component of any data processing system. Following the classical Turing machine approach, memories hold both the data to be processed and the rules for processing them. In the history of microelectronics, the distinction has been rather between working memory, which is exemplified by DRAM, and storage memory, exemplified by NAND. These two types of memory devices now represent 90% of all memory market and 25% of the total semiconductor market, and have been the technology drivers in the last decades. Even if radically different in characteristics, they are however based on the same storage mechanism: charge storage, and this mechanism seems to be near to reaching its physical limits. The search for new alternative memory approaches, based on more scalable mechanisms, has therefore gained new momentum. The status of incumbent memory technologies and their scaling limitations will be discussed. Emerging memory technologies will be analyzed, starting from the ones that are already present for niche applications, and which are getting new attention, thanks to recent technology breakthroughs. Maturity level, physical limitations and potential for scaling will be compared to existing memories. At the end the possible future composition of memory systems will be discussed.
Feasibility of Virtual Machine and Cloud Computing Technologies for High Performance Computing
2014-05-01
Hat Enterprise Linux SaaS software as a service VM virtual machine vNUMA virtual non-uniform memory access WRF weather research and forecasting...previously mentioned in Chapter I Section B1 of this paper, which is used to run the weather research and forecasting ( WRF ) model in their experiments...against a VMware virtualization solution of WRF . The experiment consisted of running WRF in a standard configuration between the D-VTM and VMware while
Performance study of a data flow architecture
NASA Technical Reports Server (NTRS)
Adams, George
1985-01-01
Teams of scientists studied data flow concepts, static data flow machine architecture, and the VAL language. Each team mapped its application onto the machine and coded it in VAL. The principal findings of the study were: (1) Five of the seven applications used the full power of the target machine. The galactic simulation and multigrid fluid flow teams found that a significantly smaller version of the machine (16 processing elements) would suffice. (2) A number of machine design parameters including processing element (PE) function unit numbers, array memory size and bandwidth, and routing network capability were found to be crucial for optimal machine performance. (3) The study participants readily acquired VAL programming skills. (4) Participants learned that application-based performance evaluation is a sound method of evaluating new computer architectures, even those that are not fully specified. During the course of the study, participants developed models for using computers to solve numerical problems and for evaluating new architectures. These models form the bases for future evaluation studies.
NASA Astrophysics Data System (ADS)
Leamy, Michael J.; Springer, Adam C.
In this research we report parallel implementation of a Cellular Automata-based simulation tool for computing elastodynamic response on complex, two-dimensional domains. Elastodynamic simulation using Cellular Automata (CA) has recently been presented as an alternative, inherently object-oriented technique for accurately and efficiently computing linear and nonlinear wave propagation in arbitrarily-shaped geometries. The local, autonomous nature of the method should lead to straight-forward and efficient parallelization. We address this notion on symmetric multiprocessor (SMP) hardware using a Java-based object-oriented CA code implementing triangular state machines (i.e., automata) and the MPI bindings written in Java (MPJ Express). We use MPJ Express to reconfigure our existing CA code to distribute a domain's automata to cores present on a dual quad-core shared-memory system (eight total processors). We note that this message passing parallelization strategy is directly applicable to computer clustered computing, which will be the focus of follow-on research. Results on the shared memory platform indicate nearly-ideal, linear speed-up. We conclude that the CA-based elastodynamic simulator is easily configured to run in parallel, and yields excellent speed-up on SMP hardware.
Sengupta, Partho P.; Huang, Yen-Min; Bansal, Manish; Ashrafi, Ali; Fisher, Matt; Shameer, Khader; Gall, Walt; Dudley, Joel T
2016-01-01
Background Associating a patient’s profile with the memories of prototypical patients built through previous repeat clinical experience is a key process in clinical judgment. We hypothesized that a similar process using a cognitive computing tool would be well suited for learning and recalling multidimensional attributes of speckle tracking echocardiography (STE) data sets derived from patients with known constrictive pericarditis (CP) and restrictive cardiomyopathy (RCM). Methods and Results Clinical and echocardiographic data of 50 patients with CP and 44 with RCM were used for developing an associative memory classifier (AMC) based machine learning algorithm. The STE data was normalized in reference to 47 controls with no structural heart disease, and the diagnostic area under the receiver operating characteristic curve (AUC) of the AMC was evaluated for differentiating CP from RCM. Using only STE variables, AMC achieved a diagnostic AUC of 89·2%, which improved to 96·2% with addition of 4 echocardiographic variables. In comparison, the AUC of early diastolic mitral annular velocity and left ventricular longitudinal strain were 82.1% and 63·7%, respectively. Furthermore, AMC demonstrated greater accuracy and shorter learning curves than other machine learning approaches with accuracy asymptotically approaching 90% after a training fraction of 0·3 and remaining flat at higher training fractions. Conclusions This study demonstrates feasibility of a cognitive machine learning approach for learning and recalling patterns observed during echocardiographic evaluations. Incorporation of machine learning algorithms in cardiac imaging may aid standardized assessments and support the quality of interpretations, particularly for novice readers with limited experience. PMID:27266599
Experiences and results multitasking a hydrodynamics code on global and local memory machines
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mandell, D.
1987-01-01
A one-dimensional, time-dependent Lagrangian hydrodynamics code using a Godunov solution method has been multitasked for the Cray X-MP/48, the Intel iPSC hypercube, the Alliant FX series and the IBM RP3 computers. Actual multitasking results have been obtained for the Cray, Intel and Alliant computers and simulated results were obtained for the Cray and RP3 machines. The differences in the methods required to multitask on each of the machines is discussed. Results are presented for a sample problem involving a shock wave moving down a channel. Comparisons are made between theoretical speedups, predicted by Amdahl's law, and the actual speedups obtained.more » The problems of debugging on the different machines are also described.« less
QRAP: A numerical code for projected (Q)uasiparticle (RA)ndom (P)hase approximation
NASA Astrophysics Data System (ADS)
Samana, A. R.; Krmpotić, F.; Bertulani, C. A.
2010-06-01
A computer code for quasiparticle random phase approximation - QRPA and projected quasiparticle random phase approximation - PQRPA models of nuclear structure is explained in details. The residual interaction is approximated by a simple δ-force. An important application of the code consists in evaluating nuclear matrix elements involved in neutrino-nucleus reactions. As an example, cross sections for 56Fe and 12C are calculated and the code output is explained. The application to other nuclei and the description of other nuclear and weak decay processes are also discussed. Program summaryTitle of program: QRAP ( Quasiparticle RAndom Phase approximation) Computers: The code has been created on a PC, but also runs on UNIX or LINUX machines Operating systems: WINDOWS or UNIX Program language used: Fortran-77 Memory required to execute with typical data: 16 Mbytes of RAM memory and 2 MB of hard disk space No. of lines in distributed program, including test data, etc.: ˜ 8000 No. of bytes in distributed program, including test data, etc.: ˜ 256 kB Distribution format: tar.gz Nature of physical problem: The program calculates neutrino- and antineutrino-nucleus cross sections as a function of the incident neutrino energy, and muon capture rates, using the QRPA or PQRPA as nuclear structure models. Method of solution: The QRPA, or PQRPA, equations are solved in a self-consistent way for even-even nuclei. The nuclear matrix elements for the neutrino-nucleus interaction are treated as the beta inverse reaction of odd-odd nuclei as function of the transfer momentum. Typical running time: ≈ 5 min on a 3 GHz processor for Data set 1.
Lara-Tejero, María; Bewersdorf, Jörg; Galán, Jorge E.
2017-01-01
Type III protein secretion machines have evolved to deliver bacterially encoded effector proteins into eukaryotic cells. Although electron microscopy has provided a detailed view of these machines in isolation or fixed samples, little is known about their organization in live bacteria. Here we report the visualization and characterization of the Salmonella type III secretion machine in live bacteria by 2D and 3D single-molecule switching superresolution microscopy. This approach provided access to transient components of this machine, which previously could not be analyzed. We determined the subcellular distribution of individual machines, the stoichiometry of the different components of this machine in situ, and the spatial distribution of the substrates of this machine before secretion. Furthermore, by visualizing this machine in Salmonella mutants we obtained major insights into the machine’s assembly. This study bridges a major resolution gap in the visualization of this nanomachine and may serve as a paradigm for the examination of other bacterially encoded molecular machines. PMID:28533372
Notes on a storage manager for the Clouds kernel
NASA Technical Reports Server (NTRS)
Pitts, David V.; Spafford, Eugene H.
1986-01-01
The Clouds project is research directed towards producing a reliable distributed computing system. The initial goal is to produce a kernel which provides a reliable environment with which a distributed operating system can be built. The Clouds kernal consists of a set of replicated subkernels, each of which runs on a machine in the Clouds system. Each subkernel is responsible for the management of resources on its machine; the subkernal components communicate to provide the cooperation necessary to meld the various machines into one kernel. The implementation of a kernel-level storage manager that supports reliability is documented. The storage manager is a part of each subkernel and maintains the secondary storage residing at each machine in the distributed system. In addition to providing the usual data transfer services, the storage manager ensures that data being stored survives machine and system crashes, and that the secondary storage of a failed machine is recovered (made consistent) automatically when the machine is restarted. Since the storage manager is part of the Clouds kernel, efficiency of operation is also a concern.
Shared Memory Parallelization of an Implicit ADI-type CFD Code
NASA Technical Reports Server (NTRS)
Hauser, Th.; Huang, P. G.
1999-01-01
A parallelization study designed for ADI-type algorithms is presented using the OpenMP specification for shared-memory multiprocessor programming. Details of optimizations specifically addressed to cache-based computer architectures are described and performance measurements for the single and multiprocessor implementation are summarized. The paper demonstrates that optimization of memory access on a cache-based computer architecture controls the performance of the computational algorithm. A hybrid MPI/OpenMP approach is proposed for clusters of shared memory machines to further enhance the parallel performance. The method is applied to develop a new LES/DNS code, named LESTool. A preliminary DNS calculation of a fully developed channel flow at a Reynolds number of 180, Re(sub tau) = 180, has shown good agreement with existing data.
Vlasceanu, Madalina; Drach, Rae; Coman, Alin
2018-05-03
The mind is a prediction machine. In most situations, it has expectations as to what might happen. But when predictions are invalidated by experience (i.e., prediction errors), the memories that generate these predictions are suppressed. Here, we explore the effect of prediction error on listeners' memories following social interaction. We find that listening to a speaker recounting experiences similar to one's own triggers prediction errors on the part of the listener that lead to the suppression of her memories. This effect, we show, is sensitive to a perspective-taking manipulation, such that individuals who are instructed to take the perspective of the speaker experience memory suppression, whereas individuals who undergo a low-perspective-taking manipulation fail to show a mnemonic suppression effect. We discuss the relevance of these findings for our understanding of the bidirectional influences between cognition and social contexts, as well as for the real-world situations that involve memory-based predictions.
Chowdhury, M A K; Sharif Ullah, A M M; Anwar, Saqib
2017-09-12
Ti6Al4V alloys are difficult-to-cut materials that have extensive applications in the automotive and aerospace industry. A great deal of effort has been made to develop and improve the machining operations of Ti6Al4V alloys. This paper presents an experimental study that systematically analyzes the effects of the machining conditions (ultrasonic power, feed rate, spindle speed, and tool diameter) on the performance parameters (cutting force, tool wear, overcut error, and cylindricity error), while drilling high precision holes on the workpiece made of Ti6Al4V alloys using rotary ultrasonic machining (RUM). Numerical results were obtained by conducting experiments following the design of an experiment procedure. The effects of the machining conditions on each performance parameter have been determined by constructing a set of possibility distributions (i.e., trapezoidal fuzzy numbers) from the experimental data. A possibility distribution is a probability-distribution-neural representation of uncertainty, and is effective in quantifying the uncertainty underlying physical quantities when there is a limited number of data points which is the case here. Lastly, the optimal machining conditions have been identified using these possibility distributions.
A Directed Acyclic Graph-Large Margin Distribution Machine Model for Music Symbol Classification
Wen, Cuihong; Zhang, Jing; Rebelo, Ana; Cheng, Fanyong
2016-01-01
Optical Music Recognition (OMR) has received increasing attention in recent years. In this paper, we propose a classifier based on a new method named Directed Acyclic Graph-Large margin Distribution Machine (DAG-LDM). The DAG-LDM is an improvement of the Large margin Distribution Machine (LDM), which is a binary classifier that optimizes the margin distribution by maximizing the margin mean and minimizing the margin variance simultaneously. We modify the LDM to the DAG-LDM to solve the multi-class music symbol classification problem. Tests are conducted on more than 10000 music symbol images, obtained from handwritten and printed images of music scores. The proposed method provides superior classification capability and achieves much higher classification accuracy than the state-of-the-art algorithms such as Support Vector Machines (SVMs) and Neural Networks (NNs). PMID:26985826
A Directed Acyclic Graph-Large Margin Distribution Machine Model for Music Symbol Classification.
Wen, Cuihong; Zhang, Jing; Rebelo, Ana; Cheng, Fanyong
2016-01-01
Optical Music Recognition (OMR) has received increasing attention in recent years. In this paper, we propose a classifier based on a new method named Directed Acyclic Graph-Large margin Distribution Machine (DAG-LDM). The DAG-LDM is an improvement of the Large margin Distribution Machine (LDM), which is a binary classifier that optimizes the margin distribution by maximizing the margin mean and minimizing the margin variance simultaneously. We modify the LDM to the DAG-LDM to solve the multi-class music symbol classification problem. Tests are conducted on more than 10000 music symbol images, obtained from handwritten and printed images of music scores. The proposed method provides superior classification capability and achieves much higher classification accuracy than the state-of-the-art algorithms such as Support Vector Machines (SVMs) and Neural Networks (NNs).
Continuous Time Random Walks with memory and financial distributions
NASA Astrophysics Data System (ADS)
Montero, Miquel; Masoliver, Jaume
2017-11-01
We study financial distributions from the perspective of Continuous Time Random Walks with memory. We review some of our previous developments and apply them to financial problems. We also present some new models with memory that can be useful in characterizing tendency effects which are inherent in most markets. We also briefly study the effect on return distributions of fractional behaviors in the distribution of pausing times between successive transactions.
Computation of emotions in man and machines.
Robinson, Peter; el Kaliouby, Rana
2009-12-12
The importance of emotional expression as part of human communication has been understood since Aristotle, and the subject has been explored scientifically since Charles Darwin and others in the nineteenth century. Advances in computer technology now allow machines to recognize and express emotions, paving the way for improved human-computer and human-human communications. Recent advances in psychology have greatly improved our understanding of the role of affect in communication, perception, decision-making, attention and memory. At the same time, advances in technology mean that it is becoming possible for machines to sense, analyse and express emotions. We can now consider how these advances relate to each other and how they can be brought together to influence future research in perception, attention, learning, memory, communication, decision-making and other applications. The computation of emotions includes both recognition and synthesis, using channels such as facial expressions, non-verbal aspects of speech, posture, gestures, physiology, brain imaging and general behaviour. The combination of new results in psychology with new techniques of computation is leading to new technologies with applications in commerce, education, entertainment, security, therapy and everyday life. However, there are important issues of privacy and personal expression that must also be considered.
Computation of emotions in man and machines
Robinson, Peter; el Kaliouby, Rana
2009-01-01
The importance of emotional expression as part of human communication has been understood since Aristotle, and the subject has been explored scientifically since Charles Darwin and others in the nineteenth century. Advances in computer technology now allow machines to recognize and express emotions, paving the way for improved human–computer and human–human communications. Recent advances in psychology have greatly improved our understanding of the role of affect in communication, perception, decision-making, attention and memory. At the same time, advances in technology mean that it is becoming possible for machines to sense, analyse and express emotions. We can now consider how these advances relate to each other and how they can be brought together to influence future research in perception, attention, learning, memory, communication, decision-making and other applications. The computation of emotions includes both recognition and synthesis, using channels such as facial expressions, non-verbal aspects of speech, posture, gestures, physiology, brain imaging and general behaviour. The combination of new results in psychology with new techniques of computation is leading to new technologies with applications in commerce, education, entertainment, security, therapy and everyday life. However, there are important issues of privacy and personal expression that must also be considered. PMID:19884138
2014-01-01
Background Network-based learning algorithms for automated function prediction (AFP) are negatively affected by the limited coverage of experimental data and limited a priori known functional annotations. As a consequence their application to model organisms is often restricted to well characterized biological processes and pathways, and their effectiveness with poorly annotated species is relatively limited. A possible solution to this problem might consist in the construction of big networks including multiple species, but this in turn poses challenging computational problems, due to the scalability limitations of existing algorithms and the main memory requirements induced by the construction of big networks. Distributed computation or the usage of big computers could in principle respond to these issues, but raises further algorithmic problems and require resources not satisfiable with simple off-the-shelf computers. Results We propose a novel framework for scalable network-based learning of multi-species protein functions based on both a local implementation of existing algorithms and the adoption of innovative technologies: we solve “locally” the AFP problem, by designing “vertex-centric” implementations of network-based algorithms, but we do not give up thinking “globally” by exploiting the overall topology of the network. This is made possible by the adoption of secondary memory-based technologies that allow the efficient use of the large memory available on disks, thus overcoming the main memory limitations of modern off-the-shelf computers. This approach has been applied to the analysis of a large multi-species network including more than 300 species of bacteria and to a network with more than 200,000 proteins belonging to 13 Eukaryotic species. To our knowledge this is the first work where secondary-memory based network analysis has been applied to multi-species function prediction using biological networks with hundreds of thousands of proteins. Conclusions The combination of these algorithmic and technological approaches makes feasible the analysis of large multi-species networks using ordinary computers with limited speed and primary memory, and in perspective could enable the analysis of huge networks (e.g. the whole proteomes available in SwissProt), using well-equipped stand-alone machines. PMID:24843788
Total recall in distributive associative memories
NASA Technical Reports Server (NTRS)
Danforth, Douglas G.
1991-01-01
Iterative error correction of asymptotically large associative memories is equivalent to a one-step learning rule. This rule is the inverse of the activation function of the memory. Spectral representations of nonlinear activation functions are used to obtain the inverse in closed form for Sparse Distributed Memory, Selected-Coordinate Design, and Radial Basis Functions.
VOTable JAVA Streaming Writer and Applications.
NASA Astrophysics Data System (ADS)
Kulkarni, P.; Kembhavi, A.; Kale, S.
2004-07-01
Virtual Observatory related tools use a new standard for data transfer called the VOTable format. This is a variant of the xml format that enables easy transfer of data over the web. We describe a streaming interface that can bridge the VOTable format, through a user friendly graphical interface, with the FITS and ASCII formats, which are commonly used by astronomers. A streaming interface is important for efficient use of memory because of the large size of catalogues. The tools are developed in JAVA to provide a platform independent interface. We have also developed a stand-alone version that can be used to convert data stored in ASCII or FITS format on a local machine. The Streaming writer is successfully being used in VOPlot (See Kale et al 2004 for a description of VOPlot).We present the test results of converting huge FITS and ASCII data into the VOTable format on machines that have only limited memory.
NASA Technical Reports Server (NTRS)
Fatoohi, Rod; Saini, Subbash; Ciotti, Robert
2006-01-01
We study the performance of inter-process communication on four high-speed multiprocessor systems using a set of communication benchmarks. The goal is to identify certain limiting factors and bottlenecks with the interconnect of these systems as well as to compare these interconnects. We measured network bandwidth using different number of communicating processors and communication patterns, such as point-to-point communication, collective communication, and dense communication patterns. The four platforms are: a 512-processor SGI Altix 3700 BX2 shared-memory machine with 3.2 GB/s links; a 64-processor (single-streaming) Cray XI shared-memory machine with 32 1.6 GB/s links; a 128-processor Cray Opteron cluster using a Myrinet network; and a 1280-node Dell PowerEdge cluster with an InfiniBand network. Our, results show the impact of the network bandwidth and topology on the overall performance of each interconnect.
Toward Millions of File System IOPS on Low-Cost, Commodity Hardware
Zheng, Da; Burns, Randal; Szalay, Alexander S.
2013-01-01
We describe a storage system that removes I/O bottlenecks to achieve more than one million IOPS based on a user-space file abstraction for arrays of commodity SSDs. The file abstraction refactors I/O scheduling and placement for extreme parallelism and non-uniform memory and I/O. The system includes a set-associative, parallel page cache in the user space. We redesign page caching to eliminate CPU overhead and lock-contention in non-uniform memory architecture machines. We evaluate our design on a 32 core NUMA machine with four, eight-core processors. Experiments show that our design delivers 1.23 million 512-byte read IOPS. The page cache realizes the scalable IOPS of Linux asynchronous I/O (AIO) and increases user-perceived I/O performance linearly with cache hit rates. The parallel, set-associative cache matches the cache hit rates of the global Linux page cache under real workloads. PMID:24402052
Toward Millions of File System IOPS on Low-Cost, Commodity Hardware.
Zheng, Da; Burns, Randal; Szalay, Alexander S
2013-01-01
We describe a storage system that removes I/O bottlenecks to achieve more than one million IOPS based on a user-space file abstraction for arrays of commodity SSDs. The file abstraction refactors I/O scheduling and placement for extreme parallelism and non-uniform memory and I/O. The system includes a set-associative, parallel page cache in the user space. We redesign page caching to eliminate CPU overhead and lock-contention in non-uniform memory architecture machines. We evaluate our design on a 32 core NUMA machine with four, eight-core processors. Experiments show that our design delivers 1.23 million 512-byte read IOPS. The page cache realizes the scalable IOPS of Linux asynchronous I/O (AIO) and increases user-perceived I/O performance linearly with cache hit rates. The parallel, set-associative cache matches the cache hit rates of the global Linux page cache under real workloads.
Wolf, Tabea; Zimprich, Daniel
2016-10-01
The reminiscence bump phenomenon has frequently been reported for the recall of autobiographical memories. The present study complements previous research by examining individual differences in the distribution of word-cued autobiographical memories. More importantly, we introduce predictor variables that might account for individual differences in the mean (location) and the standard deviation (scale) of individual memory distributions. All variables were derived from different theoretical accounts for the reminiscence bump phenomenon. We used a mixed location-scale logitnormal model, to analyse the 4602 autobiographical memories reported by 118 older participants. Results show reliable individual differences in the location and the scale. After controlling for age and gender, individual proportions of first-time experiences and individual proportions of positive memories, as well as the ratings on Openness to new Experiences and Self-Concept Clarity accounted for 29% of individual differences in location and 42% of individual differences in scale of autobiographical memory distributions. Results dovetail with a life-story account for the reminiscence bump which integrates central components of previous accounts.
Job Management Requirements for NAS Parallel Systems and Clusters
NASA Technical Reports Server (NTRS)
Saphir, William; Tanner, Leigh Ann; Traversat, Bernard
1995-01-01
A job management system is a critical component of a production supercomputing environment, permitting oversubscribed resources to be shared fairly and efficiently. Job management systems that were originally designed for traditional vector supercomputers are not appropriate for the distributed-memory parallel supercomputers that are becoming increasingly important in the high performance computing industry. Newer job management systems offer new functionality but do not solve fundamental problems. We address some of the main issues in resource allocation and job scheduling we have encountered on two parallel computers - a 160-node IBM SP2 and a cluster of 20 high performance workstations located at the Numerical Aerodynamic Simulation facility. We describe the requirements for resource allocation and job management that are necessary to provide a production supercomputing environment on these machines, prioritizing according to difficulty and importance, and advocating a return to fundamental issues.
Families of Graph Algorithms: SSSP Case Study
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kanewala Appuhamilage, Thejaka Amila Jay; Zalewski, Marcin J.; Lumsdaine, Andrew
2017-08-28
Single-Source Shortest Paths (SSSP) is a well-studied graph problem. Examples of SSSP algorithms include the original Dijkstra’s algorithm and the parallel Δ-stepping and KLA-SSSP algorithms. In this paper, we use a novel Abstract Graph Machine (AGM) model to show that all these algorithms share a common logic and differ from one another by the order in which they perform work. We use the AGM model to thoroughly analyze the family of algorithms that arises from the common logic. We start with the basic algorithm without any ordering (Chaotic), and then we derive the existing and new algorithms by methodically exploringmore » semantic and spatial ordering of work. Our experimental results show that new derived algorithms show better performance than the existing distributed memory parallel algorithms, especially at higher scales.« less
The use of hypermedia to increase the productivity of software development teams
NASA Technical Reports Server (NTRS)
Coles, L. Stephen
1991-01-01
Rapid progress in low-cost commercial PC-class multimedia workstation technology will potentially have a dramatic impact on the productivity of distributed work groups of 50-100 software developers. Hypermedia/multimedia involves the seamless integration in a graphical user interface (GUI) of a wide variety of data structures, including high-resolution graphics, maps, images, voice, and full-motion video. Hypermedia will normally require the manipulation of large dynamic files for which relational data base technology and SQL servers are essential. Basic machine architecture, special-purpose video boards, video equipment, optical memory, software needed for animation, network technology, and the anticipated increase in productivity that will result for the introduction of hypermedia technology are covered. It is suggested that the cost of the hardware and software to support an individual multimedia workstation will be on the order of $10,000.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Weeratunga, S K
Ares and Kull are mature code frameworks that support ALE hydrodynamics for a variety of HEDP applications at LLNL, using two widely different meshing approaches. While Ares is based on a 2-D/3-D block-structured mesh data base, Kull is designed to support unstructured, arbitrary polygonal/polyhedral meshes. In addition, both frameworks are capable of running applications on large, distributed-memory parallel machines. Currently, both these frameworks separately support assorted collections of physics packages related to HEDP, including one for the energy deposition by laser/ion-beam ray tracing. This study analyzes the options available for developing a common laser/ion-beam ray tracing package that can bemore » easily shared between these two code frameworks and concludes with a set of recommendations for its development.« less
Recursive computer architecture for VLSI
DOE Office of Scientific and Technical Information (OSTI.GOV)
Treleaven, P.C.; Hopkins, R.P.
1982-01-01
A general-purpose computer architecture based on the concept of recursion and suitable for VLSI computer systems built from replicated (lego-like) computing elements is presented. The recursive computer architecture is defined by presenting a program organisation, a machine organisation and an experimental machine implementation oriented to VLSI. The experimental implementation is being restricted to simple, identical microcomputers each containing a memory, a processor and a communications capability. This future generation of lego-like computer systems are termed fifth generation computers by the Japanese. 30 references.
Parallelization Issues and Particle-In Codes.
NASA Astrophysics Data System (ADS)
Elster, Anne Cathrine
1994-01-01
"Everything should be made as simple as possible, but not simpler." Albert Einstein. The field of parallel scientific computing has concentrated on parallelization of individual modules such as matrix solvers and factorizers. However, many applications involve several interacting modules. Our analyses of a particle-in-cell code modeling charged particles in an electric field, show that these accompanying dependencies affect data partitioning and lead to new parallelization strategies concerning processor, memory and cache utilization. Our test-bed, a KSR1, is a distributed memory machine with a globally shared addressing space. However, most of the new methods presented hold generally for hierarchical and/or distributed memory systems. We introduce a novel approach that uses dual pointers on the local particle arrays to keep the particle locations automatically partially sorted. Complexity and performance analyses with accompanying KSR benchmarks, have been included for both this scheme and for the traditional replicated grids approach. The latter approach maintains load-balance with respect to particles. However, our results demonstrate it fails to scale properly for problems with large grids (say, greater than 128-by-128) running on as few as 15 KSR nodes, since the extra storage and computation time associated with adding the grid copies, becomes significant. Our grid partitioning scheme, although harder to implement, does not need to replicate the whole grid. Consequently, it scales well for large problems on highly parallel systems. It may, however, require load balancing schemes for non-uniform particle distributions. Our dual pointer approach may facilitate this through dynamically partitioned grids. We also introduce hierarchical data structures that store neighboring grid-points within the same cache -line by reordering the grid indexing. This alignment produces a 25% savings in cache-hits for a 4-by-4 cache. A consideration of the input data's effect on the simulation may lead to further improvements. For example, in the case of mean particle drift, it is often advantageous to partition the grid primarily along the direction of the drift. The particle-in-cell codes for this study were tested using physical parameters, which lead to predictable phenomena including plasma oscillations and two-stream instabilities. An overview of the most central references related to parallel particle codes is also given.
Portable parallel stochastic optimization for the design of aeropropulsion components
NASA Technical Reports Server (NTRS)
Sues, Robert H.; Rhodes, G. S.
1994-01-01
This report presents the results of Phase 1 research to develop a methodology for performing large-scale Multi-disciplinary Stochastic Optimization (MSO) for the design of aerospace systems ranging from aeropropulsion components to complete aircraft configurations. The current research recognizes that such design optimization problems are computationally expensive, and require the use of either massively parallel or multiple-processor computers. The methodology also recognizes that many operational and performance parameters are uncertain, and that uncertainty must be considered explicitly to achieve optimum performance and cost. The objective of this Phase 1 research was to initialize the development of an MSO methodology that is portable to a wide variety of hardware platforms, while achieving efficient, large-scale parallelism when multiple processors are available. The first effort in the project was a literature review of available computer hardware, as well as review of portable, parallel programming environments. The first effort was to implement the MSO methodology for a problem using the portable parallel programming language, Parallel Virtual Machine (PVM). The third and final effort was to demonstrate the example on a variety of computers, including a distributed-memory multiprocessor, a distributed-memory network of workstations, and a single-processor workstation. Results indicate the MSO methodology can be well-applied towards large-scale aerospace design problems. Nearly perfect linear speedup was demonstrated for computation of optimization sensitivity coefficients on both a 128-node distributed-memory multiprocessor (the Intel iPSC/860) and a network of workstations (speedups of almost 19 times achieved for 20 workstations). Very high parallel efficiencies (75 percent for 31 processors and 60 percent for 50 processors) were also achieved for computation of aerodynamic influence coefficients on the Intel. Finally, the multi-level parallelization strategy that will be needed for large-scale MSO problems was demonstrated to be highly efficient. The same parallel code instructions were used on both platforms, demonstrating portability. There are many applications for which MSO can be applied, including NASA's High-Speed-Civil Transport, and advanced propulsion systems. The use of MSO will reduce design and development time and testing costs dramatically.
Unstructured Adaptive Meshes: Bad for Your Memory?
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Feng, Hui-Yu; VanderWijngaart, Rob
2003-01-01
This viewgraph presentation explores the need for a NASA Advanced Supercomputing (NAS) parallel benchmark for problems with irregular dynamical memory access. This benchmark is important and necessary because: 1) Problems with localized error source benefit from adaptive nonuniform meshes; 2) Certain machines perform poorly on such problems; 3) Parallel implementation may provide further performance improvement but is difficult. Some examples of problems which use irregular dynamical memory access include: 1) Heat transfer problem; 2) Heat source term; 3) Spectral element method; 4) Base functions; 5) Elemental discrete equations; 6) Global discrete equations. Nonconforming Mesh and Mortar Element Method are covered in greater detail in this presentation.
NASA Astrophysics Data System (ADS)
Efron, Uzi
Recent advances in the technology and applications of spatial light modulators (SLMs) are discussed in review essays by leading experts. Topics addressed include materials for SLMs, SLM devices and device technology, applications to optical data processing, and applications to artificial neural networks. Particular attention is given to nonlinear optical polymers, liquid crystals, magnetooptic SLMs, multiple-quantum-well SLMs, deformable-mirror SLMs, three-dimensional optical memories, applications of photorefractive devices to optical computing, photonic neurocomputers and learning machines, holographic associative memories, SLMs as parallel memories for optoelectronic neural networks, and coherent-optics implementations of neural-network models.
NASA Technical Reports Server (NTRS)
Efron, Uzi (Editor)
1990-01-01
Recent advances in the technology and applications of spatial light modulators (SLMs) are discussed in review essays by leading experts. Topics addressed include materials for SLMs, SLM devices and device technology, applications to optical data processing, and applications to artificial neural networks. Particular attention is given to nonlinear optical polymers, liquid crystals, magnetooptic SLMs, multiple-quantum-well SLMs, deformable-mirror SLMs, three-dimensional optical memories, applications of photorefractive devices to optical computing, photonic neurocomputers and learning machines, holographic associative memories, SLMs as parallel memories for optoelectronic neural networks, and coherent-optics implementations of neural-network models.
NASA Technical Reports Server (NTRS)
Ross, Muriel D.
1991-01-01
The three-dimensional organization of the vestibular macula is under study by computer assisted reconstruction and simulation methods as a model for more complex neural systems. One goal of this research is to transition knowledge of biological neural network architecture and functioning to computer technology, to contribute to the development of thinking computers. Maculas are organized as weighted neural networks for parallel distributed processing of information. The network is characterized by non-linearity of its terminal/receptive fields. Wiring appears to develop through constrained randomness. A further property is the presence of two main circuits, highly channeled and distributed modifying, that are connected through feedforward-feedback collaterals and biasing subcircuit. Computer simulations demonstrate that differences in geometry of the feedback (afferent) collaterals affects the timing and the magnitude of voltage changes delivered to the spike initiation zone. Feedforward (efferent) collaterals act as voltage followers and likely inhibit neurons of the distributed modifying circuit. These results illustrate the importance of feedforward-feedback loops, of timing, and of inhibition in refining neural network output. They also suggest that it is the distributed modifying network that is most involved in adaptation, memory, and learning. Tests of macular adaptation, through hyper- and microgravitational studies, support this hypothesis since synapses in the distributed modifying circuit, but not the channeled circuit, are altered. Transitioning knowledge of biological systems to computer technology, however, remains problematical.
NASA Astrophysics Data System (ADS)
Ju, Heng; Lin, Chengxin; Liu, Zhijie; Zhang, Jiaqi
2018-08-01
To reduce the residual stresses and improve the mechanical properties of laser weldments, produced with the restrained mixing uniform design method, a Fe-Mn-Si shape memory alloy (SMA) welding seam was formed inside the 304 stainless steel by laser welding with powder filling. The mass fraction, shape memory effect, and phase composition of the welding seam was measured by SEM-EDS (photometric analyser), bending recovery method, and XRD, respectively. An optical microscope was used to observe the microstructure of the Fe-Mn-Si SMA welding seam by solid solution and pre-deformation treatment. Meanwhile, the mechanical properties (residual stress distribution, tensile strength, microhardness and fatigue strength) of the laser welded specimen with an Fe-Mn-Si SMA welding seam (experimental material) and a 304 stainless steel welding seam (contrast material) were measured by a tensile testing machine hole drilling method and full cycle bending fatigue test. The results show that Fe15Mn5Si12Cr6Ni SMA welding seam was formed in situ with shape memory effect and stress-induced γ → ε martensite phase transformation characteristic. The residual stress of the experimental material is lower than that of the contrast material. The former has larger tensile strength, longer elongation and higher microhardness than the latter has. The experimental material and contrast material possess 249 and 136 bending fatigue cycles at the strain of 6%, respectively. The mechanisms by which mechanical properties of the experimental material are strengthened includes (1) release of the residual stress inside the Fe-Mn-Si SMA welding seam due to the stress-induced γ → ε martensite phase transformation and (2) energy absorption and plastic slip restraint due to the deformations in martensite and reverse phase transformation.
Improving Memory Error Handling Using Linux
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carlton, Michael Andrew; Blanchard, Sean P.; Debardeleben, Nathan A.
As supercomputers continue to get faster and more powerful in the future, they will also have more nodes. If nothing is done, then the amount of memory in supercomputer clusters will soon grow large enough that memory failures will be unmanageable to deal with by manually replacing memory DIMMs. "Improving Memory Error Handling Using Linux" is a process oriented method to solve this problem by using the Linux kernel to disable (offline) faulty memory pages containing bad addresses, preventing them from being used again by a process. The process of offlining memory pages simplifies error handling and results in reducingmore » both hardware and manpower costs required to run Los Alamos National Laboratory (LANL) clusters. This process will be necessary for the future of supercomputing to allow the development of exascale computers. It will not be feasible without memory error handling to manually replace the number of DIMMs that will fail daily on a machine consisting of 32-128 petabytes of memory. Testing reveals the process of offlining memory pages works and is relatively simple to use. As more and more testing is conducted, the entire process will be automated within the high-performance computing (HPC) monitoring software, Zenoss, at LANL.« less
Enhancing an appointment diary on a pocket computer for use by people after brain injury.
Wright, P; Rogers, N; Hall, C; Wilson, B; Evans, J; Emslie, H
2001-12-01
People with memory loss resulting from brain injury benefit from purpose-designed memory aids such as appointment diaries on pocket computers. The present study explores the effects of extending the range of memory aids and including games. For 2 months, 12 people who had sustained brain injury were loaned a pocket computer containing three purpose-designed memory aids: diary, notebook and to-do list. A month later they were given another computer with the same memory aids but a different method of text entry (physical keyboard or touch-screen keyboard). Machine order was counterbalanced across participants. Assessment was by interviews during the loan periods, rating scales, performance tests and computer log files. All participants could use the memory aids and ten people (83%) found them very useful. Correlations among the three memory aids were not significant, suggesting individual variation in how they were used. Games did not increase use of the memory aids, nor did loan of the preferred pocket computer (with physical keyboard). Significantly more diary entries were made by people who had previously used other memory aids, suggesting that a better understanding of how to use a range of memory aids could benefit some people with brain injury.
Characterization of NiTi Shape Memory Damping Elements designed for Automotive Safety Systems
NASA Astrophysics Data System (ADS)
Strittmatter, Joachim; Clipa, Victor; Gheorghita, Viorel; Gümpel, Paul
2014-07-01
Actuator elements made of NiTi shape memory material are more and more known in industry because of their unique properties. Due to the martensitic phase change, they can revert to their original shape by heating when subjected to an appropriate treatment. This thermal shape memory effect (SME) can show a significant shape change combined with a considerable force. Therefore such elements can be used to solve many technical tasks in the field of actuating elements and mechatronics and will play an increasing role in the next years, especially within the automotive technology, energy management, power, and mechanical engineering as well as medical technology. Beside this thermal SME, these materials also show a mechanical SME, characterized by a superelastic plateau with reversible elongations in the range of 8%. This behavior is based on the building of stress-induced martensite of loaded austenite material at constant temperature and facilitates a lot of applications especially in the medical field. Both SMEs are attended by energy dissipation during the martensitic phase change. This paper describes the first results obtained on different actuator and superelastic NiTi wires concerning their use as damping elements in automotive safety systems. In a first step, the damping behavior of small NiTi wires up to 0.5 mm diameter was examined at testing speeds varying between 0.1 and 50 mm/s upon an adapted tensile testing machine. In order to realize higher testing speeds, a drop impact testing machine was designed, which allows testing speeds up to 4000 mm/s. After introducing this new type of testing machine, the first results of vertical-shock tests of superelastic and electrically activated actuator wires are presented. The characterization of these high dynamic phase change parameters represents the basis for new applications for shape memory damping elements, especially in automotive safety systems.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Plimpton, Steven J.; Agarwal, Sapan; Schiek, Richard
2016-09-02
CrossSim is a simulator for modeling neural-inspired machine learning algorithms on analog hardware, such as resistive memory crossbars. It includes noise models for reading and updating the resistances, which can be based on idealized equations or experimental data. It can also introduce noise and finite precision effects when converting values from digital to analog and vice versa. All of these effects can be turned on or off as an algorithm processes a data set and attempts to learn its salient attributes so that it can be categorized in the machine learning training/classification context. CrossSim thus allows the robustness, accuracy, andmore » energy usage of a machine learning algorithm to be tested on simulated hardware.« less
A distributed algorithm for machine learning
NASA Astrophysics Data System (ADS)
Chen, Shihong
2018-04-01
This paper considers a distributed learning problem in which a group of machines in a connected network, each learning its own local dataset, aim to reach a consensus at an optimal model, by exchanging information only with their neighbors but without transmitting data. A distributed algorithm is proposed to solve this problem under appropriate assumptions.
Performance prediction: A case study using a multi-ring KSR-1 machine
NASA Technical Reports Server (NTRS)
Sun, Xian-He; Zhu, Jianping
1995-01-01
While computers with tens of thousands of processors have successfully delivered high performance power for solving some of the so-called 'grand-challenge' applications, the notion of scalability is becoming an important metric in the evaluation of parallel machine architectures and algorithms. In this study, the prediction of scalability and its application are carefully investigated. A simple formula is presented to show the relation between scalability, single processor computing power, and degradation of parallelism. A case study is conducted on a multi-ring KSR1 shared virtual memory machine. Experimental and theoretical results show that the influence of topology variation of an architecture is predictable. Therefore, the performance of an algorithm on a sophisticated, heirarchical architecture can be predicted and the best algorithm-machine combination can be selected for a given application.
Microcomputer Calculated Diagnostic X-Ray Exposure Factors: Clinical Evaluation
Markivee, C. R.; Edwards, F. Marc; Leonard, Patricia
1981-01-01
Calculation of correct settings for the controls of a diagnostic x-ray machine was established as feasible in a microcomputer with 4K memory. The cost effectiveness and other findings in the application of this method are discussed.
The Generalized Quantum Episodic Memory Model.
Trueblood, Jennifer S; Hemmer, Pernille
2017-11-01
Recent evidence suggests that experienced events are often mapped to too many episodic states, including those that are logically or experimentally incompatible with one another. For example, episodic over-distribution patterns show that the probability of accepting an item under different mutually exclusive conditions violates the disjunction rule. A related example, called subadditivity, occurs when the probability of accepting an item under mutually exclusive and exhaustive instruction conditions sums to a number >1. Both the over-distribution effect and subadditivity have been widely observed in item and source-memory paradigms. These phenomena are difficult to explain using standard memory frameworks, such as signal-detection theory. A dual-trace model called the over-distribution (OD) model (Brainerd & Reyna, 2008) can explain the episodic over-distribution effect, but not subadditivity. Our goal is to develop a model that can explain both effects. In this paper, we propose the Generalized Quantum Episodic Memory (GQEM) model, which extends the Quantum Episodic Memory (QEM) model developed by Brainerd, Wang, and Reyna (2013). We test GQEM by comparing it to the OD model using data from a novel item-memory experiment and a previously published source-memory experiment (Kellen, Singmann, & Klauer, 2014) examining the over-distribution effect. Using the best-fit parameters from the over-distribution experiments, we conclude by showing that the GQEM model can also account for subadditivity. Overall these results add to a growing body of evidence suggesting that quantum probability theory is a valuable tool in modeling recognition memory. Copyright © 2016 Cognitive Science Society, Inc.
Differentiation and Response Bias in Episodic Memory: Evidence from Reaction Time Distributions
ERIC Educational Resources Information Center
Criss, Amy H.
2010-01-01
In differentiation models, the processes of encoding and retrieval produce an increase in the distribution of memory strength for targets and a decrease in the distribution of memory strength for foils as the amount of encoding increases. This produces an increase in the hit rate and decrease in the false-alarm rate for a strongly encoded compared…
NASA Astrophysics Data System (ADS)
Hunter, Geoffrey
2004-01-01
A computational process is classified according to the theoretical model that is capable of executing it; computational processes that require a non-predeterminable amount of intermediate storage for their execution are Turing-machine (TM) processes, while those whose storage are predeterminable are Finite Automation (FA) processes. Simple processes (such as traffic light controller) are executable by Finite Automation, whereas the most general kind of computation requires a Turing Machine for its execution. This implies that a TM process must have a non-predeterminable amount of memory allocated to it at intermediate instants of its execution; i.e. dynamic memory allocation. Many processes encountered in practice are TM processes. The implication for computational practice is that the hardware (CPU) architecture and its operating system must facilitate dynamic memory allocation, and that the programming language used to specify TM processes must have statements with the semantic attribute of dynamic memory allocation, for in Alan Turing"s thesis on computation (1936) the "standard description" of a process is invariant over the most general data that the process is designed to process; i.e. the program describing the process should never have to be modified to allow for differences in the data that is to be processed in different instantiations; i.e. data-invariant programming. Any non-trivial program is partitioned into sub-programs (procedures, subroutines, functions, modules, etc). Examination of the calls/returns between the subprograms reveals that they are nodes in a tree-structure; this tree-structure is independent of the programming language used to encode (define) the process. Each sub-program typically needs some memory for its own use (to store values intermediate between its received data and its computed results); this locally required memory is not needed before the subprogram commences execution, and it is not needed after its execution terminates; it may be allocated as its execution commences, and deallocated as its execution terminates, and if the amount of this local memory is not known until just before execution commencement, then it is essential that it be allocated dynamically as the first action of its execution. This dynamically allocated/deallocated storage of each subprogram"s intermediate values, conforms with the stack discipline; i.e. last allocated = first to be deallocated, an incidental benefit of which is automatic overlaying of variables. This stack-based dynamic memory allocation was a semantic implication of the nested block structure that originated in the ALGOL-60 programming language. AGLOL-60 was a TM language, because the amount of memory allocated on subprogram (block/procedure) entry (for arrays, etc) was computable at execution time. A more general requirement of a Turing machine process is for code generation at run-time; this mandates access to the source language processor (compiler/interpretor) during execution of the process. This fundamental aspect of computer science is important to the future of system design, because it has been overlooked throughout the 55 years since modern computing began in 1048. The popular computer systems of this first half-century of computing were constrained by compile-time (or even operating system boot-time) memory allocation, and were thus limited to executing FA processes. The practical effect was that the distinction between the data-invariant program and its variable data was blurred; programmers had to make trial and error executions, modifying the program"s compile-time constants (array dimensions) to iterate towards the values required at run-time by the data being processed. This era of trial and error computing still persists; it pervades the culture of current (2003) computing practice.
Parallel computation and the Basis system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Smith, G.R.
1992-12-16
A software package has been written that can facilitate efforts to develop powerful, flexible, and easy-to-use programs that can run in single-processor, massively parallel, and distributed computing environments. Particular attention has been given to the difficulties posed by a program consisting of many science packages that represent subsystems of a complicated, coupled system. Methods have been found to maintain independence of the packages by hiding data structures without increasing the communication costs in a parallel computing environment. Concepts developed in this work are demonstrated by a prototype program that uses library routines from two existing software systems, Basis and Parallelmore » Virtual Machine (PVM). Most of the details of these libraries have been encapsulated in routines and macros that could be rewritten for alternative libraries that possess certain minimum capabilities. The prototype software uses a flexible master-and-slaves paradigm for parallel computation and supports domain decomposition with message passing for partitioning work among slaves. Facilities are provided for accessing variables that are distributed among the memories of slaves assigned to subdomains. The software is named PROTOPAR.« less
Parallel computation and the basis system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Smith, G.R.
1993-05-01
A software package has been written that can facilitate efforts to develop powerful, flexible, and easy-to use programs that can run in single-processor, massively parallel, and distributed computing environments. Particular attention has been given to the difficulties posed by a program consisting of many science packages that represent subsystems of a complicated, coupled system. Methods have been found to maintain independence of the packages by hiding data structures without increasing the communications costs in a parallel computing environment. Concepts developed in this work are demonstrated by a prototype program that uses library routines from two existing software systems, Basis andmore » Parallel Virtual Machine (PVM). Most of the details of these libraries have been encapsulated in routines and macros that could be rewritten for alternative libraries that possess certain minimum capabilities. The prototype software uses a flexible master-and-slaves paradigm for parallel computation and supports domain decomposition with message passing for partitioning work among slaves. Facilities are provided for accessing variables that are distributed among the memories of slaves assigned to subdomains. The software is named PROTOPAR.« less
Self-folding with shape memory composites at the millimeter scale
NASA Astrophysics Data System (ADS)
Felton, S. M.; Becker, K. P.; Aukes, D. M.; Wood, R. J.
2015-08-01
Self-folding is an effective method for creating 3D shapes from flat sheets. In particular, shape memory composites—laminates containing shape memory polymers—have been used to self-fold complex structures and machines. To date, however, these composites have been limited to feature sizes larger than one centimeter. We present a new shape memory composite capable of folding millimeter-scale features. This technique can be activated by a global heat source for simultaneous folding, or by resistive heaters for sequential folding. It is capable of feature sizes ranging from 0.5 to 40 mm, and is compatible with multiple laminate compositions. We demonstrate the ability to produce complex structures and mechanisms by building two self-folding pieces: a model ship and a model bumblebee.
Memory-built-in quantum cloning in a hybrid solid-state spin register
NASA Astrophysics Data System (ADS)
Wang, W.-B.; Zu, C.; He, L.; Zhang, W.-G.; Duan, L.-M.
2015-07-01
As a way to circumvent the quantum no-cloning theorem, approximate quantum cloning protocols have received wide attention with remarkable applications. Copying of quantum states to memory qubits provides an important strategy for eavesdropping in quantum cryptography. We report an experiment that realizes cloning of quantum states from an electron spin to a nuclear spin in a hybrid solid-state spin register with near-optimal fidelity. The nuclear spin provides an ideal memory qubit at room temperature, which stores the cloned quantum states for a millisecond under ambient conditions, exceeding the lifetime of the original quantum state carried by the electron spin by orders of magnitude. The realization of a cloning machine with built-in quantum memory provides a key step for application of quantum cloning in quantum information science.
Memory-built-in quantum cloning in a hybrid solid-state spin register.
Wang, W-B; Zu, C; He, L; Zhang, W-G; Duan, L-M
2015-07-16
As a way to circumvent the quantum no-cloning theorem, approximate quantum cloning protocols have received wide attention with remarkable applications. Copying of quantum states to memory qubits provides an important strategy for eavesdropping in quantum cryptography. We report an experiment that realizes cloning of quantum states from an electron spin to a nuclear spin in a hybrid solid-state spin register with near-optimal fidelity. The nuclear spin provides an ideal memory qubit at room temperature, which stores the cloned quantum states for a millisecond under ambient conditions, exceeding the lifetime of the original quantum state carried by the electron spin by orders of magnitude. The realization of a cloning machine with built-in quantum memory provides a key step for application of quantum cloning in quantum information science.
Machine Learning Feature Selection for Tuning Memory Page Swapping
2013-09-01
environments we set up. 13 Figure 4.1 Updated Feature Vector List. Features we added to the kernel are anno - tated with “(MLVM...Feb. 1966. [2] P. J . Denning, “The working set model for program behavior,” Communications of the ACM, vol. 11, no. 5, pp. 323–333, May 1968. [3] L. A...8] R. W. Cart and J . L. Hennessy, “WSClock — A simple and effective algorithm for virtual memory management,” M.S. thesis, Dept. Computer Science
The OpenMP Implementation of NAS Parallel Benchmarks and its Performance
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Frumkin, Michael; Yan, Jerry
1999-01-01
As the new ccNUMA architecture became popular in recent years, parallel programming with compiler directives on these machines has evolved to accommodate new needs. In this study, we examine the effectiveness of OpenMP directives for parallelizing the NAS Parallel Benchmarks. Implementation details will be discussed and performance will be compared with the MPI implementation. We have demonstrated that OpenMP can achieve very good results for parallelization on a shared memory system, but effective use of memory and cache is very important.
AFL-1: A programming Language for Massively Concurrent Computers.
1986-11-01
Bibliography Ackley, D.H., Hinton, G.E., Sejnowski, T.J., "A Learning Algorithm for boltzmann Machines", Cognitive Science, 1985, 9, 147-169. Agre...P.E., "Routines", Memo 828, MIT AI Laboratory, Many 1985. Ballard, D.H., Hayes, P.J., "Parallel Logical Inference", Conference of the Cognitive Science...34Experiments on Semantic Memory and Language Com- 125 prehension", in L.W. Greg (Ed.), Cognition in Learning and Memory, New York, Wiley, 1972._ Collins
A three phase optimization method for precopy based VM live migration.
Sharma, Sangeeta; Chawla, Meenu
2016-01-01
Virtual machine live migration is a method of moving virtual machine across hosts within a virtualized datacenter. It provides significant benefits for administrator to manage datacenter efficiently. It reduces service interruption by transferring the virtual machine without stopping at source. Transfer of large number of virtual machine memory pages results in long migration time as well as downtime, which also affects the overall system performance. This situation becomes unbearable when migration takes place over slower network or a long distance migration within a cloud. In this paper, precopy based virtual machine live migration method is thoroughly analyzed to trace out the issues responsible for its performance drops. In order to address these issues, this paper proposes three phase optimization (TPO) method. It works in three phases as follows: (i) reduce the transfer of memory pages in first phase, (ii) reduce the transfer of duplicate pages by classifying frequently and non-frequently updated pages, and (iii) reduce the data sent in last iteration of migration by applying the simple RLE compression technique. As a result, each phase significantly reduces total pages transferred, total migration time and downtime respectively. The proposed TPO method is evaluated using different representative workloads on a Xen virtualized environment. Experimental results show that TPO method reduces total pages transferred by 71 %, total migration time by 70 %, downtime by 3 % for higher workload, and it does not impose significant overhead as compared to traditional precopy method. Comparison of TPO method with other methods is also done for supporting and showing its effectiveness. TPO method and precopy methods are also tested at different number of iterations. The TPO method gives better performance even with less number of iterations.
Efficiently modeling neural networks on massively parallel computers
NASA Technical Reports Server (NTRS)
Farber, Robert M.
1993-01-01
Neural networks are a very useful tool for analyzing and modeling complex real world systems. Applying neural network simulations to real world problems generally involves large amounts of data and massive amounts of computation. To efficiently handle the computational requirements of large problems, we have implemented at Los Alamos a highly efficient neural network compiler for serial computers, vector computers, vector parallel computers, and fine grain SIMD computers such as the CM-2 connection machine. This paper describes the mapping used by the compiler to implement feed-forward backpropagation neural networks for a SIMD (Single Instruction Multiple Data) architecture parallel computer. Thinking Machines Corporation has benchmarked our code at 1.3 billion interconnects per second (approximately 3 gigaflops) on a 64,000 processor CM-2 connection machine (Singer 1990). This mapping is applicable to other SIMD computers and can be implemented on MIMD computers such as the CM-5 connection machine. Our mapping has virtually no communications overhead with the exception of the communications required for a global summation across the processors (which has a sub-linear runtime growth on the order of O(log(number of processors)). We can efficiently model very large neural networks which have many neurons and interconnects and our mapping can extend to arbitrarily large networks (within memory limitations) by merging the memory space of separate processors with fast adjacent processor interprocessor communications. This paper will consider the simulation of only feed forward neural network although this method is extendable to recurrent networks.
Ghosts in the Machine II: Neural Correlates of Memory Interference from the Previous Trial.
Papadimitriou, Charalampos; White, Robert L; Snyder, Lawrence H
2017-04-01
Previous memoranda interfere with working memory. For example, spatial memories are biased toward locations memorized on the previous trial. We predicted, based on attractor network models of memory, that activity in the frontal eye fields (FEFs) encoding a previous target location can persist into the subsequent trial and that this ghost will then bias the readout of the current target. Contrary to this prediction, we find that FEF memory representations appear biased away from (not toward) the previous target location. The behavioral and neural data can be reconciled by a model in which receptive fields of memory neurons converge toward remembered locations, much as receptive fields converge toward attended locations. Convergence increases the resources available to encode the relevant memoranda and decreases overall error in the network, but the residual convergence from the previous trial can give rise to an attractive behavioral bias on the next trial. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Static Memory Deduplication for Performance Optimization in Cloud Computing.
Jia, Gangyong; Han, Guangjie; Wang, Hao; Yang, Xuan
2017-04-27
In a cloud computing environment, the number of virtual machines (VMs) on a single physical server and the number of applications running on each VM are continuously growing. This has led to an enormous increase in the demand of memory capacity and subsequent increase in the energy consumption in the cloud. Lack of enough memory has become a major bottleneck for scalability and performance of virtualization interfaces in cloud computing. To address this problem, memory deduplication techniques which reduce memory demand through page sharing are being adopted. However, such techniques suffer from overheads in terms of number of online comparisons required for the memory deduplication. In this paper, we propose a static memory deduplication (SMD) technique which can reduce memory capacity requirement and provide performance optimization in cloud computing. The main innovation of SMD is that the process of page detection is performed offline, thus potentially reducing the performance cost, especially in terms of response time. In SMD, page comparisons are restricted to the code segment, which has the highest shared content. Our experimental results show that SMD efficiently reduces memory capacity requirement and improves performance. We demonstrate that, compared to other approaches, the cost in terms of the response time is negligible.
Static Memory Deduplication for Performance Optimization in Cloud Computing
Jia, Gangyong; Han, Guangjie; Wang, Hao; Yang, Xuan
2017-01-01
In a cloud computing environment, the number of virtual machines (VMs) on a single physical server and the number of applications running on each VM are continuously growing. This has led to an enormous increase in the demand of memory capacity and subsequent increase in the energy consumption in the cloud. Lack of enough memory has become a major bottleneck for scalability and performance of virtualization interfaces in cloud computing. To address this problem, memory deduplication techniques which reduce memory demand through page sharing are being adopted. However, such techniques suffer from overheads in terms of number of online comparisons required for the memory deduplication. In this paper, we propose a static memory deduplication (SMD) technique which can reduce memory capacity requirement and provide performance optimization in cloud computing. The main innovation of SMD is that the process of page detection is performed offline, thus potentially reducing the performance cost, especially in terms of response time. In SMD, page comparisons are restricted to the code segment, which has the highest shared content. Our experimental results show that SMD efficiently reduces memory capacity requirement and improves performance. We demonstrate that, compared to other approaches, the cost in terms of the response time is negligible. PMID:28448434
An imperialist competitive algorithm for virtual machine placement in cloud computing
NASA Astrophysics Data System (ADS)
Jamali, Shahram; Malektaji, Sepideh; Analoui, Morteza
2017-05-01
Cloud computing, the recently emerged revolution in IT industry, is empowered by virtualisation technology. In this paradigm, the user's applications run over some virtual machines (VMs). The process of selecting proper physical machines to host these virtual machines is called virtual machine placement. It plays an important role on resource utilisation and power efficiency of cloud computing environment. In this paper, we propose an imperialist competitive-based algorithm for the virtual machine placement problem called ICA-VMPLC. The base optimisation algorithm is chosen to be ICA because of its ease in neighbourhood movement, good convergence rate and suitable terminology. The proposed algorithm investigates search space in a unique manner to efficiently obtain optimal placement solution that simultaneously minimises power consumption and total resource wastage. Its final solution performance is compared with several existing methods such as grouping genetic and ant colony-based algorithms as well as bin packing heuristic. The simulation results show that the proposed method is superior to other tested algorithms in terms of power consumption, resource wastage, CPU usage efficiency and memory usage efficiency.
DOE Office of Scientific and Technical Information (OSTI.GOV)
McGoldrick, P.R.
1980-12-11
The Interprocess Communications System (IPCS) was written to provide a virtual machine upon which the Supervisory Control and Diagnostic System (SCDS) for the Mirror Fusion Test Facility (MFTF) could be built. The hardware upon which the IPCS runs consists of nine minicomputers sharing some common memory.
Reversibility in Quantum Models of Stochastic Processes
NASA Astrophysics Data System (ADS)
Gier, David; Crutchfield, James; Mahoney, John; James, Ryan
Natural phenomena such as time series of neural firing, orientation of layers in crystal stacking and successive measurements in spin-systems are inherently probabilistic. The provably minimal classical models of such stochastic processes are ɛ-machines, which consist of internal states, transition probabilities between states and output values. The topological properties of the ɛ-machine for a given process characterize the structure, memory and patterns of that process. However ɛ-machines are often not ideal because their statistical complexity (Cμ) is demonstrably greater than the excess entropy (E) of the processes they represent. Quantum models (q-machines) of the same processes can do better in that their statistical complexity (Cq) obeys the relation Cμ >= Cq >= E. q-machines can be constructed to consider longer lengths of strings, resulting in greater compression. With code-words of sufficiently long length, the statistical complexity becomes time-symmetric - a feature apparently novel to this quantum representation. This result has ramifications for compression of classical information in quantum computing and quantum communication technology.
The scheme machine: A case study in progress in design derivation at system levels
NASA Technical Reports Server (NTRS)
Johnson, Steven D.
1995-01-01
The Scheme Machine is one of several design projects of the Digital Design Derivation group at Indiana University. It differs from the other projects in its focus on issues of system design and its connection to surrounding research in programming language semantics, compiler construction, and programming methodology underway at Indiana and elsewhere. The genesis of the project dates to the early 1980's, when digital design derivation research branched from the surrounding research effort in programming languages. Both branches have continued to develop in parallel, with this particular project serving as a bridge. However, by 1990 there remained little real interaction between the branches and recently we have undertaken to reintegrate them. On the software side, researchers have refined a mathematically rigorous (but not mechanized) treatment starting with the fully abstract semantic definition of Scheme and resulting in an efficient implementation consisting of a compiler and virtual machine model, the latter typically realized with a general purpose microprocessor. The derivation includes a number of sophisticated factorizations and representations and is also deep example of the underlying engineering methodology. The hardware research has created a mechanized algebra supporting the tedious and massive transformations often seen at lower levels of design. This work has progressed to the point that large scale devices, such as processors, can be derived from first-order finite state machine specifications. This is roughly where the language oriented research stops; thus, together, the two efforts establish a thread from the highest levels of abstract specification to detailed digital implementation. The Scheme Machine project challenges hardware derivation research in several ways, although the individual components of the system are of a similar scale to those we have worked with before. The machine has a custom dual-ported memory to support garbage collection. It consists of four tightly coupled processes--processor, collector, allocator, memory--with a very non-trivial synchronization relationship. Finally, there are deep issues of representation for the run-time objects of a symbolic processing language. The research centers on verification through integrated formal reasoning systems, but is also involved with modeling and prototyping environments. Since the derivation algebra is basd on an executable modeling language, there is opportunity to incorporate design animation in the design process. We are looking for ways to move smoothly and incrementally from executable specifications into hardware realization. For example, we can run the garbage collector specification, a Scheme program, directly against the physical memory prototype, and similarly, the instruction processor model against the heap implementation.
Statistical prediction with Kanerva's sparse distributed memory
NASA Technical Reports Server (NTRS)
Rogers, David
1989-01-01
A new viewpoint of the processing performed by Kanerva's sparse distributed memory (SDM) is presented. In conditions of near- or over-capacity, where the associative-memory behavior of the model breaks down, the processing performed by the model can be interpreted as that of a statistical predictor. Mathematical results are presented which serve as the framework for a new statistical viewpoint of sparse distributed memory and for which the standard formulation of SDM is a special case. This viewpoint suggests possible enhancements to the SDM model, including a procedure for improving the predictiveness of the system based on Holland's work with genetic algorithms, and a method for improving the capacity of SDM even when used as an associative memory.
Differential memory in the earth's magnetotail
NASA Technical Reports Server (NTRS)
Burkhart, G. R.; Chen, J.
1991-01-01
The process of 'differential memory' in the earth's magnetotail is studied in the framework of the modified Harris magnetotail geometry. It is verified that differential memory can generate non-Maxwellian features in the modified Harris field model. The time scales and the potentially observable distribution functions associated with the process of differential memory are investigated, and it is shown that non-Maxwelllian distributions can evolve as a test particle response to distribution function boundary conditions in a Harris field magnetotail model. The non-Maxwellian features which arise from distribution function mapping have definite time scales associated with them, which are generally shorter than the earthward convection time scale but longer than the typical Alfven crossing time.
Balancing Contention and Synchronization on the Intel Paragon
NASA Technical Reports Server (NTRS)
Bokhari, Shahid H.; Nicol, David M.
1996-01-01
The Intel Paragon is a mesh-connected distributed memory parallel computer. It uses an oblivious and deterministic message routing algorithm: this permits us to develop highly optimized schedules for frequently needed communication patterns. The complete exchange is one such pattern. Several approaches are available for carrying it out on the mesh. We study an algorithm developed by Scott. This algorithm assumes that a communication link can carry one message at a time and that a node can only transmit one message at a time. It requires global synchronization to enforce a schedule of transmissions. Unfortunately global synchronization has substantial overhead on the Paragon. At the same time the powerful interconnection mechanism of this machine permits 2 or 3 messages to share a communication link with minor overhead. It can also overlap multiple message transmission from the same node to some extent. We develop a generalization of Scott's algorithm that executes complete exchange with a prescribed contention. Schedules that incur greater contention require fewer synchronization steps. This permits us to tradeoff contention against synchronization overhead. We describe the performance of this algorithm and compare it with Scott's original algorithm as well as with a naive algorithm that does not take interconnection structure into account. The Bounded contention algorithm is always better than Scott's algorithm and outperforms the naive algorithm for all but the smallest message sizes. The naive algorithm fails to work on meshes larger than 12 x 12. These results show that due consideration of processor interconnect and machine performance parameters is necessary to obtain peak performance from the Paragon and its successor mesh machines.
Limits of the memory coefficient in measuring correlated bursts
NASA Astrophysics Data System (ADS)
Jo, Hang-Hyun; Hiraoka, Takayuki
2018-03-01
Temporal inhomogeneities in event sequences of natural and social phenomena have been characterized in terms of interevent times and correlations between interevent times. The inhomogeneities of interevent times have been extensively studied, while the correlations between interevent times, often called correlated bursts, are far from being fully understood. For measuring the correlated bursts, two relevant approaches were suggested, i.e., memory coefficient and burst size distribution. Here a burst size denotes the number of events in a bursty train detected for a given time window. Empirical analyses have revealed that the larger memory coefficient tends to be associated with the heavier tail of the burst size distribution. In particular, empirical findings in human activities appear inconsistent, such that the memory coefficient is close to 0, while burst size distributions follow a power law. In order to comprehend these observations, by assuming the conditional independence between consecutive interevent times, we derive the analytical form of the memory coefficient as a function of parameters describing interevent time and burst size distributions. Our analytical result can explain the general tendency of the larger memory coefficient being associated with the heavier tail of burst size distribution. We also find that the apparently inconsistent observations in human activities are compatible with each other, indicating that the memory coefficient has limits to measure the correlated bursts.
Distributed communications and control network for robotic mining
NASA Technical Reports Server (NTRS)
Schiffbauer, William H.
1989-01-01
The application of robotics to coal mining machines is one approach pursued to increase productivity while providing enhanced safety for the coal miner. Toward that end, a network composed of microcontrollers, computers, expert systems, real time operating systems, and a variety of program languages are being integrated that will act as the backbone for intelligent machine operation. Actual mining machines, including a few customized ones, have been given telerobotic semiautonomous capabilities by applying the described network. Control devices, intelligent sensors and computers onboard these machines are showing promise of achieving improved mining productivity and safety benefits. Current research using these machines involves navigation, multiple machine interaction, machine diagnostics, mineral detection, and graphical machine representation. Guidance sensors and systems employed include: sonar, laser rangers, gyroscopes, magnetometers, clinometers, and accelerometers. Information on the network of hardware/software and its implementation on mining machines are presented. Anticipated coal production operations using the network are discussed. A parallelism is also drawn between the direction of present day underground coal mining research to how the lunar soil (regolith) may be mined. A conceptual lunar mining operation that employs a distributed communication and control network is detailed.
Notes on implementation of sparsely distributed memory
NASA Technical Reports Server (NTRS)
Keeler, J. D.; Denning, P. J.
1986-01-01
The Sparsely Distributed Memory (SDM) developed by Kanerva is an unconventional memory design with very interesting and desirable properties. The memory works in a manner that is closely related to modern theories of human memory. The SDM model is discussed in terms of its implementation in hardware. Two appendices discuss the unconventional approaches of the SDM: Appendix A treats a resistive circuit for fast, parallel address decoding; and Appendix B treats a systolic array for high throughput read and write operations.
Finite element computation on nearest neighbor connected machines
NASA Technical Reports Server (NTRS)
Mcaulay, A. D.
1984-01-01
Research aimed at faster, more cost effective parallel machines and algorithms for improving designer productivity with finite element computations is discussed. A set of 8 boards, containing 4 nearest neighbor connected arrays of commercially available floating point chips and substantial memory, are inserted into a commercially available machine. One-tenth Mflop (64 bit operation) processors provide an 89% efficiency when solving the equations arising in a finite element problem for a single variable regular grid of size 40 by 40 by 40. This is approximately 15 to 20 times faster than a much more expensive machine such as a VAX 11/780 used in double precision. The efficiency falls off as faster or more processors are envisaged because communication times become dominant. A novel successive overrelaxation algorithm which uses cyclic reduction in order to permit data transfer and computation to overlap in time is proposed.
Machine Learning Toolkit for Extreme Scale
DOE Office of Scientific and Technical Information (OSTI.GOV)
2014-03-31
Support Vector Machines (SVM) is a popular machine learning technique, which has been applied to a wide range of domains such as science, finance, and social networks for supervised learning. MaTEx undertakes the challenge of designing a scalable parallel SVM training algorithm for large scale systems, which includes commodity multi-core machines, tightly connected supercomputers and cloud computing systems. Several techniques are proposed for improved speed and memory space usage including adaptive and aggressive elimination of samples for faster convergence , and sparse format representation of data samples. Several heuristics for earliest possible to lazy elimination of non-contributing samples are consideredmore » in MaTEx. In many cases, where an early sample elimination might result in a false positive, low overhead mechanisms for reconstruction of key data structures are proposed. The proposed algorithm and heuristics are implemented and evaluated on various publicly available datasets« less
PWM Inverter control and the application thereof within electric vehicles
Geppert, Steven
1982-01-01
An inverter (34) which provides power to an A.C. machine (28) is controlled by a circuit (36) employing PWM control strategy whereby A.C. power is supplied to the machine at a preselectable frequency and preselectable voltage. This is accomplished by the technique of waveform notching in which the shapes of the notches are varied to determine the average energy content of the overall waveform. Through this arrangement, the operational efficiency of the A.C. machine is optimized. The control circuit includes a micro-computer and memory element which receive various parametric inputs and calculate optimized machine control data signals therefrom. The control data is asynchronously loaded into the inverter through an intermediate buffer (38). In its preferred embodiment, the present invention is incorporated within an electric vehicle (10) employing a 144 VDC battery pack (32) and a three-phase induction motor (18).
Ejected Particle Size Distributions from Shocked Metal Surfaces
Schauer, M. M.; Buttler, W. T.; Frayer, D. K.; ...
2017-04-12
Here, we present size distributions for particles ejected from features machined onto the surface of shocked Sn targets. The functional form of the size distributions is assumed to be log-normal, and the characteristic parameters of the distribution are extracted from the measured angular distribution of light scattered from a laser beam incident on the ejected particles. We also found strong evidence for a bimodal distribution of particle sizes with smaller particles evolved from features machined into the target surface and larger particles being produced at the edges of these features.
Ejected Particle Size Distributions from Shocked Metal Surfaces
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schauer, M. M.; Buttler, W. T.; Frayer, D. K.
Here, we present size distributions for particles ejected from features machined onto the surface of shocked Sn targets. The functional form of the size distributions is assumed to be log-normal, and the characteristic parameters of the distribution are extracted from the measured angular distribution of light scattered from a laser beam incident on the ejected particles. We also found strong evidence for a bimodal distribution of particle sizes with smaller particles evolved from features machined into the target surface and larger particles being produced at the edges of these features.
NASA Technical Reports Server (NTRS)
Denning, Peter J.
1990-01-01
The scientific tradition of saving all the data from experiments for independent validation and for further investigation is under profound challenge by modern satellite data collectors and by supercomputers. The volume of data is beyond the capacity to store, transmit, and comprehend the data. A promising line of study is discovery machines that study the data at the collection site and transmit statistical summaries of patterns observed. Examples of discovery machines are the Autoclass system and the genetic memory system of NASA-Ames, and the proposal for knowbots by Kahn and Cerf.
Computational Nanotechnology of Materials, Devices, and Machines: Carbon Nanotubes
NASA Technical Reports Server (NTRS)
Srivastava, Deepak; Kwak, Dolhan (Technical Monitor)
2000-01-01
The mechanics and chemistry of carbon nanotubes have relevance for their numerous electronic applications. Mechanical deformations such as bending and twisting affect the nanotube's conductive properties, and at the same time they possess high strength and elasticity. Two principal techniques were utilized including the analysis of large scale classical molecular dynamics on a shared memory architecture machine and a quantum molecular dynamics methodology. In carbon based electronics, nanotubes are used as molecular wires with topological defects which are mediated through various means. Nanotubes can be connected to form junctions.
Computational Nanotechnology of Materials, Electronics and Machines: Carbon Nanotubes
NASA Technical Reports Server (NTRS)
Srivastava, Deepak
2001-01-01
This report presents the goals and research of the Integrated Product Team (IPT) on Devices and Nanotechnology. NASA's needs for this technology are discussed and then related to the research focus of the team. The two areas of focus for technique development are: 1) large scale classical molecular dynamics on a shared memory architecture machine; and 2) quantum molecular dynamics methodology. The areas of focus for research are: 1) nanomechanics/materials; 2) carbon based electronics; 3) BxCyNz composite nanotubes and junctions; 4) nano mechano-electronics; and 5) nano mechano-chemistry.
Current problems in the dynamics and design of mechanisms and machines
NASA Astrophysics Data System (ADS)
Kestel'Man, V. N.
The papers contained in this volume deal with possible ways of improving the dynamic and structural properties of machines and mechanisms and also with problems associated with the design of aircraft equipment. Topics discussed include estimation of the stressed state of a model of an orbital film structure, a study of the operation of an aerodynamic angle transducer in flow of a hot gas, calculation of the efficiency of aircraft gear drives, and dynamic accuracy of a controlled manipulator. Papers are also presented on optimal synthesis of mechanical systems with variable properties, synthesis of mechanisms using initial kinematic chains, and using shape memory materials in the design of machines and mechanisms. (For individual items see A93-31202 to A93-31214)
Accelerating atomistic calculations of quantum energy eigenstates on graphic cards
NASA Astrophysics Data System (ADS)
Rodrigues, Walter; Pecchia, A.; Lopez, M.; Auf der Maur, M.; Di Carlo, A.
2014-10-01
Electronic properties of nanoscale materials require the calculation of eigenvalues and eigenvectors of large matrices. This bottleneck can be overcome by parallel computing techniques or the introduction of faster algorithms. In this paper we report a custom implementation of the Lanczos algorithm with simple restart, optimized for graphical processing units (GPUs). The whole algorithm has been developed using CUDA and runs entirely on the GPU, with a specialized implementation that spares memory and reduces at most machine-to-device data transfers. Furthermore parallel distribution over several GPUs has been attained using the standard message passing interface (MPI). Benchmark calculations performed on a GaN/AlGaN wurtzite quantum dot with up to 600,000 atoms are presented. The empirical tight-binding (ETB) model with an sp3d5s∗+spin-orbit parametrization has been used to build the system Hamiltonian (H).
Global Load Balancing with Parallel Mesh Adaption on Distributed-Memory Systems
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Oliker, Leonid; Sohn, Andrew
1996-01-01
Dynamic mesh adaption on unstructured grids is a powerful tool for efficiently computing unsteady problems to resolve solution features of interest. Unfortunately, this causes load imbalance among processors on a parallel machine. This paper describes the parallel implementation of a tetrahedral mesh adaption scheme and a new global load balancing method. A heuristic remapping algorithm is presented that assigns partitions to processors such that the redistribution cost is minimized. Results indicate that the parallel performance of the mesh adaption code depends on the nature of the adaption region and show a 35.5X speedup on 64 processors of an SP2 when 35% of the mesh is randomly adapted. For large-scale scientific computations, our load balancing strategy gives almost a sixfold reduction in solver execution times over non-balanced loads. Furthermore, our heuristic remapper yields processor assignments that are less than 3% off the optimal solutions but requires only 1% of the computational time.
Basic Requirements for Systems Software Research and Development
NASA Technical Reports Server (NTRS)
Kuszmaul, Chris; Nitzberg, Bill
1996-01-01
Our success over the past ten years evaluating and developing advanced computing technologies has been due to a simple research and development (R/D) model. Our model has three phases: (a) evaluating the state-of-the-art, (b) identifying problems and creating innovations, and (c) developing solutions, improving the state- of-the-art. This cycle has four basic requirements: a large production testbed with real users, a diverse collection of state-of-the-art hardware, facilities for evalua- tion of emerging technologies and development of innovations, and control over system management on these testbeds. Future research will be irrelevant and future products will not work if any of these requirements is eliminated. In order to retain our effectiveness, the numerical aerospace simulator (NAS) must replace out-of-date production testbeds in as timely a fashion as possible, and cannot afford to ignore innovative designs such as new distributed shared memory machines, clustered commodity-based computers, and multi-threaded architectures.
Gregarious Data Re-structuring in a Many Core Architecture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shrestha, Sunil; Manzano Franco, Joseph B.; Marquez, Andres
this paper, we have developed a new methodology that takes in consideration the access patterns from a single parallel actor (e.g. a thread), as well as, the access patterns of “grouped” parallel actors that share a resource (e.g. a distributed Level 3 cache). We start with a hierarchical tile code for our target machine and apply a series of transformations at the tile level to improve data residence in a given memory hierarchy level. The contribution of this paper includes (a) collaborative data restructuring for group reuse and (b) low overhead transformation technique to improve access pattern and bring closelymore » connected data elements together. Preliminary results in a many core architecture, Tilera TileGX, shows promising improvements over optimized OpenMP code (up to 31% increase in GFLOPS) and over our own previous work on fine grained runtimes (up to 16%) for selected kernels« less
Interaction with Machine Improvisation
NASA Astrophysics Data System (ADS)
Assayag, Gerard; Bloch, George; Cont, Arshia; Dubnov, Shlomo
We describe two multi-agent architectures for an improvisation oriented musician-machine interaction systems that learn in real time from human performers. The improvisation kernel is based on sequence modeling and statistical learning. We present two frameworks of interaction with this kernel. In the first, the stylistic interaction is guided by a human operator in front of an interactive computer environment. In the second framework, the stylistic interaction is delegated to machine intelligence and therefore, knowledge propagation and decision are taken care of by the computer alone. The first framework involves a hybrid architecture using two popular composition/performance environments, Max and OpenMusic, that are put to work and communicate together, each one handling the process at a different time/memory scale. The second framework shares the same representational schemes with the first but uses an Active Learning architecture based on collaborative, competitive and memory-based learning to handle stylistic interactions. Both systems are capable of processing real-time audio/video as well as MIDI. After discussing the general cognitive background of improvisation practices, the statistical modelling tools and the concurrent agent architecture are presented. Then, an Active Learning scheme is described and considered in terms of using different improvisation regimes for improvisation planning. Finally, we provide more details about the different system implementations and describe several performances with the system.
Distributed memory compiler methods for irregular problems: Data copy reuse and runtime partitioning
NASA Technical Reports Server (NTRS)
Das, Raja; Ponnusamy, Ravi; Saltz, Joel; Mavriplis, Dimitri
1991-01-01
Outlined here are two methods which we believe will play an important role in any distributed memory compiler able to handle sparse and unstructured problems. We describe how to link runtime partitioners to distributed memory compilers. In our scheme, programmers can implicitly specify how data and loop iterations are to be distributed between processors. This insulates users from having to deal explicitly with potentially complex algorithms that carry out work and data partitioning. We also describe a viable mechanism for tracking and reusing copies of off-processor data. In many programs, several loops access the same off-processor memory locations. As long as it can be verified that the values assigned to off-processor memory locations remain unmodified, we show that we can effectively reuse stored off-processor data. We present experimental data from a 3-D unstructured Euler solver run on iPSC/860 to demonstrate the usefulness of our methods.
Memory-built-in quantum cloning in a hybrid solid-state spin register
Wang, W.-B.; Zu, C.; He, L.; Zhang, W.-G.; Duan, L.-M.
2015-01-01
As a way to circumvent the quantum no-cloning theorem, approximate quantum cloning protocols have received wide attention with remarkable applications. Copying of quantum states to memory qubits provides an important strategy for eavesdropping in quantum cryptography. We report an experiment that realizes cloning of quantum states from an electron spin to a nuclear spin in a hybrid solid-state spin register with near-optimal fidelity. The nuclear spin provides an ideal memory qubit at room temperature, which stores the cloned quantum states for a millisecond under ambient conditions, exceeding the lifetime of the original quantum state carried by the electron spin by orders of magnitude. The realization of a cloning machine with built-in quantum memory provides a key step for application of quantum cloning in quantum information science. PMID:26178617
Runtime support for parallelizing data mining algorithms
NASA Astrophysics Data System (ADS)
Jin, Ruoming; Agrawal, Gagan
2002-03-01
With recent technological advances, shared memory parallel machines have become more scalable, and offer large main memories and high bus bandwidths. They are emerging as good platforms for data warehousing and data mining. In this paper, we focus on shared memory parallelization of data mining algorithms. We have developed a series of techniques for parallelization of data mining algorithms, including full replication, full locking, fixed locking, optimized full locking, and cache-sensitive locking. Unlike previous work on shared memory parallelization of specific data mining algorithms, all of our techniques apply to a large number of common data mining algorithms. In addition, we propose a reduction-object based interface for specifying a data mining algorithm. We show how our runtime system can apply any of the technique we have developed starting from a common specification of the algorithm.
Predicting episodic memory formation for movie events
Tang, Hanlin; Singer, Jed; Ison, Matias J.; Pivazyan, Gnel; Romaine, Melissa; Frias, Rosa; Meller, Elizabeth; Boulin, Adrianna; Carroll, James; Perron, Victoria; Dowcett, Sarah; Arellano, Marlise; Kreiman, Gabriel
2016-01-01
Episodic memories are long lasting and full of detail, yet imperfect and malleable. We quantitatively evaluated recollection of short audiovisual segments from movies as a proxy to real-life memory formation in 161 subjects at 15 minutes up to a year after encoding. Memories were reproducible within and across individuals, showed the typical decay with time elapsed between encoding and testing, were fallible yet accurate, and were insensitive to low-level stimulus manipulations but sensitive to high-level stimulus properties. Remarkably, memorability was also high for single movie frames, even one year post-encoding. To evaluate what determines the efficacy of long-term memory formation, we developed an extensive set of content annotations that included actions, emotional valence, visual cues and auditory cues. These annotations enabled us to document the content properties that showed a stronger correlation with recognition memory and to build a machine-learning computational model that accounted for episodic memory formation in single events for group averages and individual subjects with an accuracy of up to 80%. These results provide initial steps towards the development of a quantitative computational theory capable of explaining the subjective filtering steps that lead to how humans learn and consolidate memories. PMID:27686330
Predicting episodic memory formation for movie events.
Tang, Hanlin; Singer, Jed; Ison, Matias J; Pivazyan, Gnel; Romaine, Melissa; Frias, Rosa; Meller, Elizabeth; Boulin, Adrianna; Carroll, James; Perron, Victoria; Dowcett, Sarah; Arellano, Marlise; Kreiman, Gabriel
2016-09-30
Episodic memories are long lasting and full of detail, yet imperfect and malleable. We quantitatively evaluated recollection of short audiovisual segments from movies as a proxy to real-life memory formation in 161 subjects at 15 minutes up to a year after encoding. Memories were reproducible within and across individuals, showed the typical decay with time elapsed between encoding and testing, were fallible yet accurate, and were insensitive to low-level stimulus manipulations but sensitive to high-level stimulus properties. Remarkably, memorability was also high for single movie frames, even one year post-encoding. To evaluate what determines the efficacy of long-term memory formation, we developed an extensive set of content annotations that included actions, emotional valence, visual cues and auditory cues. These annotations enabled us to document the content properties that showed a stronger correlation with recognition memory and to build a machine-learning computational model that accounted for episodic memory formation in single events for group averages and individual subjects with an accuracy of up to 80%. These results provide initial steps towards the development of a quantitative computational theory capable of explaining the subjective filtering steps that lead to how humans learn and consolidate memories.
NASA Astrophysics Data System (ADS)
Fakhari, Abbas; Mitchell, Travis; Leonardi, Christopher; Bolster, Diogo
2017-11-01
Based on phase-field theory, we introduce a robust lattice-Boltzmann equation for modeling immiscible multiphase flows at large density and viscosity contrasts. Our approach is built by modifying the method proposed by Zu and He [Phys. Rev. E 87, 043301 (2013), 10.1103/PhysRevE.87.043301] in such a way as to improve efficiency and numerical stability. In particular, we employ a different interface-tracking equation based on the so-called conservative phase-field model, a simplified equilibrium distribution that decouples pressure and velocity calculations, and a local scheme based on the hydrodynamic distribution functions for calculation of the stress tensor. In addition to two distribution functions for interface tracking and recovery of hydrodynamic properties, the only nonlocal variable in the proposed model is the phase field. Moreover, within our framework there is no need to use biased or mixed difference stencils for numerical stability and accuracy at high density ratios. This not only simplifies the implementation and efficiency of the model, but also leads to a model that is better suited to parallel implementation on distributed-memory machines. Several benchmark cases are considered to assess the efficacy of the proposed model, including the layered Poiseuille flow in a rectangular channel, Rayleigh-Taylor instability, and the rise of a Taylor bubble in a duct. The numerical results are in good agreement with available numerical and experimental data.
1994-05-01
PARALLEL DISTRIBUTED MEMORY ARCHITECTURE LTJh T. M. Eidson 0 - 8 l 9 5 " G. Erlebacher _ _ _. _ DTIe QUALITY INSPECTED a Contract NAS I - 19480 May 1994...DISTRIBUTED MEMORY ARCHITECTURE T.M. Eidson * High Technology Corporation Hampton, VA 23665 G. Erlebachert Institute for Computer Applications in Science and...developed and evaluated. Simple model calculations as well as timing results are pres.nted to evaluate the various strategies. The particular
Examining procedural working memory processing in obsessive-compulsive disorder.
Shahar, Nitzan; Teodorescu, Andrei R; Anholt, Gideon E; Karmon-Presser, Anat; Meiran, Nachshon
2017-07-01
Previous research has suggested that a deficit in working memory might underlie the difficulty of obsessive-compulsive disorder (OCD) patients to control their thoughts and actions. However, a recent meta-analyses found only small effect sizes for working memory deficits in OCD. Recently, a distinction has been made between declarative and procedural working memory. Working memory in OCD was tested mostly using declarative measurements. However, OCD symptoms typically concerns actions, making procedural working-memory more relevant. Here, we tested the operation of procedural working memory in OCD. Participants with OCD and healthy controls performed a battery of choice reaction tasks under high and low procedural working memory demands. Reaction-times (RT) were estimated using ex-Gaussian distribution fitting, revealing no group differences in the size of the RT distribution tail (i.e., τ parameter), known to be sensitive to procedural working memory manipulations. Group differences, unrelated to working memory manipulations, were found in the leading-edge of the RT distribution and analyzed using a two-stage evidence accumulation model. Modeling results suggested that perceptual difficulties might underlie the current group differences. In conclusion, our results suggest that procedural working-memory processing is most likely intact in OCD, and raise a novel, yet untested assumption regarding perceptual deficits in OCD. Copyright © 2017 Elsevier Ireland Ltd. All rights reserved.
A Cognitive Architecture for Solving Ill-Structured Problems
1997-08-01
R. C. (1982). Dynamic memory. Cambridge, Mass.: Cambridge University Press. Selfridge, 0. G., & Neisser , U . (1960). Pattern recognition by machine...Page 1 . In tro d u ctio n...1 1.1 Relevance to the ARI M ission ............................................................................... 1 1.2 Components of Analogy U se
48 CFR 252.211-7003 - Item identification and valuation.
Code of Federal Regulations, 2013 CFR
2013-10-01
..., used to retrieve data encoded on machine-readable media. Concatenated unique item identifier means— (1... (or controlling) authority for the enterprise identifier. Item means a single hardware article or a...-readable means an automatic identification technology media, such as bar codes, contact memory buttons...
48 CFR 252.211-7003 - Item identification and valuation.
Code of Federal Regulations, 2011 CFR
2011-10-01
..., used to retrieve data encoded on machine-readable media. Concatenated unique item identifier means— (1... (or controlling) authority for the enterprise identifier. Item means a single hardware article or a...-readable means an automatic identification technology media, such as bar codes, contact memory buttons...
48 CFR 252.211-7003 - Item identification and valuation.
Code of Federal Regulations, 2012 CFR
2012-10-01
..., used to retrieve data encoded on machine-readable media. Concatenated unique item identifier means— (1... (or controlling) authority for the enterprise identifier. Item means a single hardware article or a...-readable means an automatic identification technology media, such as bar codes, contact memory buttons...
Simulating and Detecting Radiation-Induced Errors for Onboard Machine Learning
NASA Technical Reports Server (NTRS)
Wagstaff, Kiri L.; Bornstein, Benjamin; Granat, Robert; Tang, Benyang; Turmon, Michael
2009-01-01
Spacecraft processors and memory are subjected to high radiation doses and therefore employ radiation-hardened components. However, these components are orders of magnitude more expensive than typical desktop components, and they lag years behind in terms of speed and size. We have integrated algorithm-based fault tolerance (ABFT) methods into onboard data analysis algorithms to detect radiation-induced errors, which ultimately may permit the use of spacecraft memory that need not be fully hardened, reducing cost and increasing capability at the same time. We have also developed a lightweight software radiation simulator, BITFLIPS, that permits evaluation of error detection strategies in a controlled fashion, including the specification of the radiation rate and selective exposure of individual data structures. Using BITFLIPS, we evaluated our error detection methods when using a support vector machine to analyze data collected by the Mars Odyssey spacecraft. We found ABFT error detection for matrix multiplication is very successful, while error detection for Gaussian kernel computation still has room for improvement.
Episodic memory in aspects of large-scale brain networks
Jeong, Woorim; Chung, Chun Kee; Kim, June Sic
2015-01-01
Understanding human episodic memory in aspects of large-scale brain networks has become one of the central themes in neuroscience over the last decade. Traditionally, episodic memory was regarded as mostly relying on medial temporal lobe (MTL) structures. However, recent studies have suggested involvement of more widely distributed cortical network and the importance of its interactive roles in the memory process. Both direct and indirect neuro-modulations of the memory network have been tried in experimental treatments of memory disorders. In this review, we focus on the functional organization of the MTL and other neocortical areas in episodic memory. Task-related neuroimaging studies together with lesion studies suggested that specific sub-regions of the MTL are responsible for specific components of memory. However, recent studies have emphasized that connectivity within MTL structures and even their network dynamics with other cortical areas are essential in the memory process. Resting-state functional network studies also have revealed that memory function is subserved by not only the MTL system but also a distributed network, particularly the default-mode network (DMN). Furthermore, researchers have begun to investigate memory networks throughout the entire brain not restricted to the specific resting-state network (RSN). Altered patterns of functional connectivity (FC) among distributed brain regions were observed in patients with memory impairments. Recently, studies have shown that brain stimulation may impact memory through modulating functional networks, carrying future implications of a novel interventional therapy for memory impairment. PMID:26321939
The Social Life of a Data Base
NASA Technical Reports Server (NTRS)
Linde, Charlotte; Wales, Roxana; Clancy, Dan (Technical Monitor)
2002-01-01
This paper presents the complex social life of a large data base. The topics include: 1) Social Construction of Mechanisms of Memory; 2) Data Bases: The Invisible Memory Mechanism; 3) The Human in the Machine; 4) Data of the Study: A Large-Scale Problem Reporting Data Base; 5) The PRACA Study; 6) Description of PRACA; 7) PRACA and Paper; 8) Multiple Uses of PRACA; 9) The Work of PRACA; 10) Multiple Forms of Invisibility; 11) Such Systems are Everywhere; and 12) Two Morals to the Story. This paper is in viewgraph form.
Modeling Confidence and Response Time in Recognition Memory
ERIC Educational Resources Information Center
Ratcliff, Roger; Starns, Jeffrey J.
2009-01-01
A new model for confidence judgments in recognition memory is presented. In the model, the match between a single test item and memory produces a distribution of evidence, with better matches corresponding to distributions with higher means. On this match dimension, confidence criteria are placed, and the areas between the criteria under the…
Combining Distributed and Shared Memory Models: Approach and Evolution of the Global Arrays Toolkit
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nieplocha, Jarek; Harrison, Robert J.; Kumar, Mukul
2002-07-29
Both shared memory and distributed memory models have advantages and shortcomings. Shared memory model is much easier to use but it ignores data locality/placement. Given the hierarchical nature of the memory subsystems in the modern computers this characteristic might have a negative impact on performance and scalability. Various techniques, such as code restructuring to increase data reuse and introducing blocking in data accesses, can address the problem and yield performance competitive with message passing[Singh], however at the cost of compromising the ease of use feature. Distributed memory models such as message passing or one-sided communication offer performance and scalability butmore » they compromise the ease-of-use. In this context, the message-passing model is sometimes referred to as?assembly programming for the scientific computing?. The Global Arrays toolkit[GA1, GA2] attempts to offer the best features of both models. It implements a shared-memory programming model in which data locality is managed explicitly by the programmer. This management is achieved by explicit calls to functions that transfer data between a global address space (a distributed array) and local storage. In this respect, the GA model has similarities to the distributed shared-memory models that provide an explicit acquire/release protocol. However, the GA model acknowledges that remote data is slower to access than local data and allows data locality to be explicitly specified and hence managed. The GA model exposes to the programmer the hierarchical memory of modern high-performance computer systems, and by recognizing the communication overhead for remote data transfer, it promotes data reuse and locality of reference. This paper describes the characteristics of the Global Arrays programming model, capabilities of the toolkit, and discusses its evolution.« less
Tamboer, P.; Vorst, H.C.M.; Ghebreab, S.; Scholte, H.S.
2016-01-01
Meta-analytic studies suggest that dyslexia is characterized by subtle and spatially distributed variations in brain anatomy, although many variations failed to be significant after corrections of multiple comparisons. To circumvent issues of significance which are characteristic for conventional analysis techniques, and to provide predictive value, we applied a machine learning technique – support vector machine – to differentiate between subjects with and without dyslexia. In a sample of 22 students with dyslexia (20 women) and 27 students without dyslexia (25 women) (18–21 years), a classification performance of 80% (p < 0.001; d-prime = 1.67) was achieved on the basis of differences in gray matter (sensitivity 82%, specificity 78%). The voxels that were most reliable for classification were found in the left occipital fusiform gyrus (LOFG), in the right occipital fusiform gyrus (ROFG), and in the left inferior parietal lobule (LIPL). Additionally, we found that classification certainty (e.g. the percentage of times a subject was correctly classified) correlated with severity of dyslexia (r = 0.47). Furthermore, various significant correlations were found between the three anatomical regions and behavioural measures of spelling, phonology and whole-word-reading. No correlations were found with behavioural measures of short-term memory and visual/attentional confusion. These data indicate that the LOFG, ROFG and the LIPL are neuro-endophenotype and potentially biomarkers for types of dyslexia related to reading, spelling and phonology. In a second and independent sample of 876 young adults of a general population, the trained classifier of the first sample was tested, resulting in a classification performance of 59% (p = 0.07; d-prime = 0.65). This decline in classification performance resulted from a large percentage of false alarms. This study provided support for the use of machine learning in anatomical brain imaging. PMID:27114899
Federal Register 2010, 2011, 2012, 2013, 2014
2011-06-03
... for the workers and former workers of International Business Machines (IBM), Sales and Distribution... reconsideration alleges that IBM outsourced to India and China. During the reconsideration investigation, it was..., Armonk, New York. The subject worker group supply computer software development and maintenance services...
Distributed simulation using a real-time shared memory network
NASA Technical Reports Server (NTRS)
Simon, Donald L.; Mattern, Duane L.; Wong, Edmond; Musgrave, Jeffrey L.
1993-01-01
The Advanced Control Technology Branch of the NASA Lewis Research Center performs research in the area of advanced digital controls for aeronautic and space propulsion systems. This work requires the real-time implementation of both control software and complex dynamical models of the propulsion system. We are implementing these systems in a distributed, multi-vendor computer environment. Therefore, a need exists for real-time communication and synchronization between the distributed multi-vendor computers. A shared memory network is a potential solution which offers several advantages over other real-time communication approaches. A candidate shared memory network was tested for basic performance. The shared memory network was then used to implement a distributed simulation of a ramjet engine. The accuracy and execution time of the distributed simulation was measured and compared to the performance of the non-partitioned simulation. The ease of partitioning the simulation, the minimal time required to develop for communication between the processors and the resulting execution time all indicate that the shared memory network is a real-time communication technique worthy of serious consideration.
Nakahara, Kiyoshi; Adachi, Ken; Kawasaki, Keisuke; Matsuo, Takeshi; Sawahata, Hirohito; Majima, Kei; Takeda, Masaki; Sugiyama, Sayaka; Nakata, Ryota; Iijima, Atsuhiko; Tanigawa, Hisashi; Suzuki, Takafumi; Kamitani, Yukiyasu; Hasegawa, Isao
2016-01-01
Highly localized neuronal spikes in primate temporal cortex can encode associative memory; however, whether memory formation involves area-wide reorganization of ensemble activity, which often accompanies rhythmicity, or just local microcircuit-level plasticity, remains elusive. Using high-density electrocorticography, we capture local-field potentials spanning the monkey temporal lobes, and show that the visual pair-association (PA) memory is encoded in spatial patterns of theta activity in areas TE, 36, and, partially, in the parahippocampal cortex, but not in the entorhinal cortex. The theta patterns elicited by learned paired associates are distinct between pairs, but similar within pairs. This pattern similarity, emerging through novel PA learning, allows a machine-learning decoder trained on theta patterns elicited by a particular visual item to correctly predict the identity of those elicited by its paired associate. Our results suggest that the formation and sharing of widespread cortical theta patterns via learning-induced reorganization are involved in the mechanisms of associative memory representation. PMID:27282247
Cartesian critters can't remember.
Curry, Devin Sanchez
2018-06-01
Descartes held the following view of declarative memory: to remember is to reconstruct an idea that you intellectually recognize as a reconstruction. Descartes countenanced two overarching varieties of declarative memory. To have an intellectual memory is to intellectually reconstruct a universal idea that you recognize as a reconstruction, and to have a sensory memory is to neurophysiologically reconstruct a particular idea that you recognize as a reconstruction. Sensory remembering is thus a capacity of neither ghosts nor machines, but only of human beings qua mind-body unions. This interpretation unifies Descartes's various remarks (and conspicuous silences) about remembering, from the 1628 Rules for the Direction of the Mind through the suppressed-in-1633 Treatise of Man to the 1649 Passions of the Soul. It also rebuts a prevailing thesis in the current secondary literature-that Cartesian critters can remember-while incorporating the textual evidence for that thesis-Descartes's detailed descriptions of the corporeal mechanisms that construct sensory memories. Copyright © 2018 Elsevier Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Snyder, L.; Notkin, D.; Adams, L.
1990-03-31
This task relates to research on programming massively parallel computers. Previous work on the Ensamble concept of programming was extended and investigation into nonshared memory models of parallel computation was undertaken. Previous work on the Ensamble concept defined a set of programming abstractions and was used to organize the programming task into three distinct levels; Composition of machine instruction, composition of processes, and composition of phases. It was applied to shared memory models of computations. During the present research period, these concepts were extended to nonshared memory models. During the present research period, one Ph D. thesis was completed, onemore » book chapter, and six conference proceedings were published.« less
Implementing Shared Memory Parallelism in MCBEND
NASA Astrophysics Data System (ADS)
Bird, Adam; Long, David; Dobson, Geoff
2017-09-01
MCBEND is a general purpose radiation transport Monte Carlo code from AMEC Foster Wheelers's ANSWERS® Software Service. MCBEND is well established in the UK shielding community for radiation shielding and dosimetry assessments. The existing MCBEND parallel capability effectively involves running the same calculation on many processors. This works very well except when the memory requirements of a model restrict the number of instances of a calculation that will fit on a machine. To more effectively utilise parallel hardware OpenMP has been used to implement shared memory parallelism in MCBEND. This paper describes the reasoning behind the choice of OpenMP, notes some of the challenges of multi-threading an established code such as MCBEND and assesses the performance of the parallel method implemented in MCBEND.
Applications Performance on NAS Intel Paragon XP/S - 15#
NASA Technical Reports Server (NTRS)
Saini, Subhash; Simon, Horst D.; Copper, D. M. (Technical Monitor)
1994-01-01
The Numerical Aerodynamic Simulation (NAS) Systems Division received an Intel Touchstone Sigma prototype model Paragon XP/S- 15 in February, 1993. The i860 XP microprocessor with an integrated floating point unit and operating in dual -instruction mode gives peak performance of 75 million floating point operations (NIFLOPS) per second for 64 bit floating point arithmetic. It is used in the Paragon XP/S-15 which has been installed at NAS, NASA Ames Research Center. The NAS Paragon has 208 nodes and its peak performance is 15.6 GFLOPS. Here, we will report on early experience using the Paragon XP/S- 15. We have tested its performance using both kernels and applications of interest to NAS. We have measured the performance of BLAS 1, 2 and 3 both assembly-coded and Fortran coded on NAS Paragon XP/S- 15. Furthermore, we have investigated the performance of a single node one-dimensional FFT, a distributed two-dimensional FFT and a distributed three-dimensional FFT Finally, we measured the performance of NAS Parallel Benchmarks (NPB) on the Paragon and compare it with the performance obtained on other highly parallel machines, such as CM-5, CRAY T3D, IBM SP I, etc. In particular, we investigated the following issues, which can strongly affect the performance of the Paragon: a. Impact of the operating system: Intel currently uses as a default an operating system OSF/1 AD from the Open Software Foundation. The paging of Open Software Foundation (OSF) server at 22 MB to make more memory available for the application degrades the performance. We found that when the limit of 26 NIB per node out of 32 MB available is reached, the application is paged out of main memory using virtual memory. When the application starts paging, the performance is considerably reduced. We found that dynamic memory allocation can help applications performance under certain circumstances. b. Impact of data cache on the i860/XP: We measured the performance of the BLAS both assembly coded and Fortran coded. We found that the measured performance of assembly-coded BLAS is much less than what memory bandwidth limitation would predict. The influence of data cache on different sizes of vectors is also investigated using one-dimensional FFTs. c. Impact of processor layout: There are several different ways processors can be laid out within the two-dimensional grid of processors on the Paragon. We have used the FFT example to investigate performance differences based on processors layout.
Multi-functional dielectric elastomer artificial muscles for soft and smart machines
NASA Astrophysics Data System (ADS)
Anderson, Iain A.; Gisby, Todd A.; McKay, Thomas G.; O'Brien, Benjamin M.; Calius, Emilio P.
2012-08-01
Dielectric elastomer (DE) actuators are popularly referred to as artificial muscles because their impressive actuation strain and speed, low density, compliant nature, and silent operation capture many of the desirable physical properties of muscle. Unlike conventional robots and machines, whose mechanisms and drive systems rapidly become very complex as the number of degrees of freedom increases, groups of DE artificial muscles have the potential to generate rich motions combining many translational and rotational degrees of freedom. These artificial muscle systems can mimic the agonist-antagonist approach found in nature, so that active expansion of one artificial muscle is taken up by passive contraction in the other. They can also vary their stiffness. In addition, they have the ability to produce electricity from movement. But departing from the high stiffness paradigm of electromagnetic motors and gearboxes leads to new control challenges, and for soft machines to be truly dexterous like their biological analogues, they need precise control. Humans control their limbs using sensory feedback from strain sensitive cells embedded in muscle. In DE actuators, deformation is inextricably linked to changes in electrical parameters that include capacitance and resistance, so the state of strain can be inferred by sensing these changes, enabling the closed loop control that is critical for a soft machine. But the increased information processing required for a soft machine can impose a substantial burden on a central controller. The natural solution is to distribute control within the mechanism itself. The octopus arm is an example of a soft actuator with a virtually infinite number of degrees of freedom (DOF). The arm utilizes neural ganglia to process sensory data at the local "arm" level and perform complex tasks. Recent advances in soft electronics such as the piezoresistive dielectric elastomer switch (DES) have the potential to be fully integrated with actuators and sensors. With the DE switch, we can produce logic gates, oscillators, and a memory element, the building blocks for a soft computer, thus bringing us closer to emulating smart living structures like the octopus arm. The goal of future research is to develop fully soft machines that exploit smart actuation networks to gain capabilities formerly reserved to nature, and open new vistas in mechanical engineering.
31 CFR 12.3 - Sale of tobacco products in vending machines prohibited.
Code of Federal Regulations, 2011 CFR
2011-07-01
... machines prohibited. 12.3 Section 12.3 Money and Finance: Treasury Office of the Secretary of the Treasury RESTRICTION OF SALE AND DISTRIBUTION OF TOBACCO PRODUCTS § 12.3 Sale of tobacco products in vending machines prohibited. The sale of tobacco products in vending machines located in or around any Federal building under...
An empirical investigation of sparse distributed memory using discrete speech recognition
NASA Technical Reports Server (NTRS)
Danforth, Douglas G.
1990-01-01
Presented here is a step by step analysis of how the basic Sparse Distributed Memory (SDM) model can be modified to enhance its generalization capabilities for classification tasks. Data is taken from speech generated by a single talker. Experiments are used to investigate the theory of associative memories and the question of generalization from specific instances.
NASA Technical Reports Server (NTRS)
Mavriplis, D. J.; Das, Raja; Saltz, Joel; Vermeland, R. E.
1992-01-01
An efficient three dimensional unstructured Euler solver is parallelized on a Cray Y-MP C90 shared memory computer and on an Intel Touchstone Delta distributed memory computer. This paper relates the experiences gained and describes the software tools and hardware used in this study. Performance comparisons between two differing architectures are made.
Switching behavior of resistive change memory using oxide nanowires
NASA Astrophysics Data System (ADS)
Aono, Takashige; Sugawa, Kosuke; Shimizu, Tomohiro; Shingubara, Shoso; Takase, Kouichi
2018-06-01
Resistive change random access memory (ReRAM), which is expected to be the next-generation nonvolatile memory, often has wide switching voltage distributions due to many kinds of conductive filaments. In this study, we have tried to suppress the distribution through the structural restriction of the filament-forming area using NiO nanowires. The capacitor with Ni metal nanowires whose surface is oxidized showed good switching behaviors with narrow distributions. The knowledge gained from our study will be very helpful in producing practical ReRAM devices.
NASA Astrophysics Data System (ADS)
Pickman, Yishai; Dunn-Walters, Deborah; Mehr, Ramit
2013-10-01
Complementarity-determining region 3 (CDR3) is the most hyper-variable region in B cell receptor (BCR) and T cell receptor (TCR) genes, and the most critical structure in antigen recognition and thereby in determining the fates of developing and responding lymphocytes. There are millions of different TCR Vβ chain or BCR heavy chain CDR3 sequences in human blood. Even now, when high-throughput sequencing becomes widely used, CDR3 length distributions (also called spectratypes) are still a much quicker and cheaper method of assessing repertoire diversity. However, distribution complexity and the large amount of information per sample (e.g. 32 distributions of the TCRα chain, and 24 of TCRβ) calls for the use of machine learning tools for full exploration. We have examined the ability of supervised machine learning, which uses computational models to find hidden patterns in predefined biological groups, to analyze CDR3 length distributions from various sources, and distinguish between experimental groups. We found that (a) splenic BCR CDR3 length distributions are characterized by low standard deviations and few local maxima, compared to peripheral blood distributions; (b) healthy elderly people's BCR CDR3 length distributions can be distinguished from those of the young; and (c) a machine learning model based on TCR CDR3 distribution features can detect myelodysplastic syndrome with approximately 93% accuracy. Overall, we demonstrate that using supervised machine learning methods can contribute to our understanding of lymphocyte repertoire diversity.
Cultural scripts guide recall of intensely positive life events.
Collins, Katherine A; Pillemer, David B; Ivcevic, Zorana; Gooze, Rachel A
2007-06-01
In four studies, we examined the temporal distribution of positive and negative memories of momentous life events. College students and middle-aged adults reported events occurring from the ages of 8 to 18 years in which they had felt especially good or especially bad about themselves. Distributions of positive memories showed a marked peak at ages 17 and 18. In contrast, distributions of negative memories were relatively flat. These patterns were consistent for males and females and for younger and older adults. Content analyses indicated that a substantial proportion of positive memories from late adolescence described culturally prescribed landmark events surrounding the major life transition from high school to college. When the participants were asked for recollections from life periods that lack obvious age-linked milestone events, age distributions of positive and negative memories were similar. The results support and extend Berntsen and Rubin's (2004) conclusion that cultural expectations, or life scripts, organize recall of positive, but not negative, events.
Direct match data flow machine apparatus and process for data driven computing
Davidson, G.S.; Grafe, V.G.
1997-08-12
A data flow computer and method of computing are disclosed which utilizes a data driven processor node architecture. The apparatus in a preferred embodiment includes a plurality of First-In-First-Out (FIFO) registers, a plurality of related data flow memories, and a processor. The processor makes the necessary calculations and includes a control unit to generate signals to enable the appropriate FIFO register receiving the result. In a particular embodiment, there are three FIFO registers per node: an input FIFO register to receive input information form an outside source and provide it to the data flow memories; an output FIFO register to provide output information from the processor to an outside recipient; and an internal FIFO register to provide information from the processor back to the data flow memories. The data flow memories are comprised of four commonly addressed memories. A parameter memory holds the A and B parameters used in the calculations; an opcode memory holds the instruction; a target memory holds the output address; and a tag memory contains status bits for each parameter. One status bit indicates whether the corresponding parameter is in the parameter memory and one status but to indicate whether the stored information in the corresponding data parameter is to be reused. The tag memory outputs a ``fire`` signal (signal R VALID) when all of the necessary information has been stored in the data flow memories, and thus when the instruction is ready to be fired to the processor. 11 figs.
Data flow machine for data driven computing
Davidson, G.S.; Grafe, V.G.
1988-07-22
A data flow computer and method of computing is disclosed which utilizes a data driven processor node architecture. The apparatus in a preferred embodiment includes a plurality of First-In-First-Out (FIFO) registers, a plurality of related data flow memories, and a processor. The processor makes the necessary calculations and includes a control unit to generate signals to enable the appropriate FIFO register receiving the result. In a particular embodiment, there are three FIFO registers per node: an input FIFO register to receive input information from an outside source and provide it to the data flow memories; an output FIFO register to provide output information from the processor to an outside recipient; and an internal FIFO register to provide information from the processor back to the data flow memories. The data flow memories are comprised of four commonly addressed memories. A parameter memory holds the A and B parameters used in the calculations; an opcode memory holds the instruction; a target memory holds the output address; and a tag memory contains status bits for each parameter. One status bit indicates whether the corresponding parameter is in the parameter memory and one status bit to indicate whether the stored information in the corresponding data parameter is to be reused. The tag memory outputs a ''fire'' signal (signal R VALID) when all of the necessary information has been stored in the data flow memories, and thus when the instruction is ready to be fired to the processor. 11 figs.
Data flow machine for data driven computing
Davidson, George S.; Grafe, Victor G.
1995-01-01
A data flow computer which of computing is disclosed which utilizes a data driven processor node architecture. The apparatus in a preferred embodiment includes a plurality of First-In-First-Out (FIFO) registers, a plurality of related data flow memories, and a processor. The processor makes the necessary calculations and includes a control unit to generate signals to enable the appropriate FIFO register receiving the result. In a particular embodiment, there are three FIFO registers per node: an input FIFO register to receive input information form an outside source and provide it to the data flow memories; an output FIFO register to provide output information from the processor to an outside recipient; and an internal FIFO register to provide information from the processor back to the data flow memories. The data flow memories are comprised of four commonly addressed memories. A parameter memory holds the A and B parameters used in the calculations; an opcode memory holds the instruction; a target memory holds the output address; and a tag memory contains status bits for each parameter. One status bit indicates whether the corresponding parameter is in the parameter memory and one status but to indicate whether the stored information in the corresponding data parameter is to be reused. The tag memory outputs a "fire" signal (signal R VALID) when all of the necessary information has been stored in the data flow memories, and thus when the instruction is ready to be fired to the processor.
Direct match data flow machine apparatus and process for data driven computing
Davidson, George S.; Grafe, Victor Gerald
1997-01-01
A data flow computer and method of computing is disclosed which utilizes a data driven processor node architecture. The apparatus in a preferred embodiment includes a plurality of First-In-First-Out (FIFO) registers, a plurality of related data flow memories, and a processor. The processor makes the necessary calculations and includes a control unit to generate signals to enable the appropriate FIFO register receiving the result. In a particular embodiment, there are three FIFO registers per node: an input FIFO register to receive input information form an outside source and provide it to the data flow memories; an output FIFO register to provide output information from the processor to an outside recipient; and an internal FIFO register to provide information from the processor back to the data flow memories. The data flow memories are comprised of four commonly addressed memories. A parameter memory holds the A and B parameters used in the calculations; an opcode memory holds the instruction; a target memory holds the output address; and a tag memory contains status bits for each parameter. One status bit indicates whether the corresponding parameter is in the parameter memory and one status but to indicate whether the stored information in the corresponding data parameter is to be reused. The tag memory outputs a "fire" signal (signal R VALID) when all of the necessary information has been stored in the data flow memories, and thus when the instruction is ready to be fired to the processor.
1990-04-23
developed Ada Real - Time Operating System (ARTOS) for bare machine environments(Target), ACW 1.1I0. " ; - -M.UIECTTERMS Ada programming language, Ada...configuration) Operating System: CSC developed Ada Real - Time Operating System (ARTOS) for bare machine environments Memory Size: 4MB 2.2...Test Method Testing of the MC Ado V1.2.beta/ Concurrent Computer Corporation compiler and the CSC developed Ada Real - Time Operating System (ARTOS) for
A real-time MPEG software decoder using a portable message-passing library
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kwong, Man Kam; Tang, P.T. Peter; Lin, Biquan
1995-12-31
We present a real-time MPEG software decoder that uses message-passing libraries such as MPL, p4 and MPI. The parallel MPEG decoder currently runs on the IBM SP system but can be easil ported to other parallel machines. This paper discusses our parallel MPEG decoding algorithm as well as the parallel programming environment under which it uses. Several technical issues are discussed, including balancing of decoding speed, memory limitation, 1/0 capacities, and optimization of MPEG decoding components. This project shows that a real-time portable software MPEG decoder is feasible in a general-purpose parallel machine.
2012-03-05
DISTRIBUTION A: Approved for public release; distribution is unlimited. Program Trends •Trust in Autonomous Systems • Cross - cultural Trust...Trust & trustworthiness are independent (Mayer et al, 1995) •Trust is relational •Humans in cross - cultural interactions •Complex human-machine...Interpersonal Trustworthiness •Ability •Benevolence •Integrity Trust Metrics Cross - Cultural Trust Issues Human-Machine Interactions Autonomous
Code of Federal Regulations, 2010 CFR
2010-07-01
... vending facilities, including vending machines, on property controlled by the Department of the Treasury... States. Treasury bureaus shall ensure that the collection and distribution of vending machine income from vending machines on Treasury-controlled property shall be in compliance with the regulations set forth in...
Distributed learning enhances relational memory consolidation.
Litman, Leib; Davachi, Lila
2008-09-01
It has long been known that distributed learning (DL) provides a mnemonic advantage over massed learning (ML). However, the underlying mechanisms that drive this robust mnemonic effect remain largely unknown. In two experiments, we show that DL across a 24 hr interval does not enhance immediate memory performance but instead slows the rate of forgetting relative to ML. Furthermore, we demonstrate that this savings in forgetting is specific to relational, but not item, memory. In the context of extant theories and knowledge of memory consolidation, these results suggest that an important mechanism underlying the mnemonic benefit of DL is enhanced memory consolidation. We speculate that synaptic strengthening mechanisms supporting long-term memory consolidation may be differentially mediated by the spacing of memory reactivation. These findings have broad implications for the scientific study of episodic memory consolidation and, more generally, for educational curriculum development and policy.
78 FR 21387 - Notice of Issuance of Final Determination Concerning Printer and Fax Machine
Federal Register 2010, 2011, 2012, 2013, 2014
2013-04-10
... in part of materials from another country or instrumentality, it has been substantially transformed... loading the firmware onto the print engine. In determining whether the combining of parts or materials... foreign Programmable Read Only Memory Chip (``PROM'') in the United States substantially transformed the...
Huebner, Philip A.; Willits, Jon A.
2018-01-01
Previous research has suggested that distributional learning mechanisms may contribute to the acquisition of semantic knowledge. However, distributional learning mechanisms, statistical learning, and contemporary “deep learning” approaches have been criticized for being incapable of learning the kind of abstract and structured knowledge that many think is required for acquisition of semantic knowledge. In this paper, we show that recurrent neural networks, trained on noisy naturalistic speech to children, do in fact learn what appears to be abstract and structured knowledge. We trained two types of recurrent neural networks (Simple Recurrent Network, and Long Short-Term Memory) to predict word sequences in a 5-million-word corpus of speech directed to children ages 0–3 years old, and assessed what semantic knowledge they acquired. We found that learned internal representations are encoding various abstract grammatical and semantic features that are useful for predicting word sequences. Assessing the organization of semantic knowledge in terms of the similarity structure, we found evidence of emergent categorical and hierarchical structure in both models. We found that the Long Short-term Memory (LSTM) and SRN are both learning very similar kinds of representations, but the LSTM achieved higher levels of performance on a quantitative evaluation. We also trained a non-recurrent neural network, Skip-gram, on the same input to compare our results to the state-of-the-art in machine learning. We found that Skip-gram achieves relatively similar performance to the LSTM, but is representing words more in terms of thematic compared to taxonomic relations, and we provide reasons why this might be the case. Our findings show that a learning system that derives abstract, distributed representations for the purpose of predicting sequential dependencies in naturalistic language may provide insight into emergence of many properties of the developing semantic system. PMID:29520243
Detecting Abnormal Machine Characteristics in Cloud Infrastructures
NASA Technical Reports Server (NTRS)
Bhaduri, Kanishka; Das, Kamalika; Matthews, Bryan L.
2011-01-01
In the cloud computing environment resources are accessed as services rather than as a product. Monitoring this system for performance is crucial because of typical pay-peruse packages bought by the users for their jobs. With the huge number of machines currently in the cloud system, it is often extremely difficult for system administrators to keep track of all machines using distributed monitoring programs such as Ganglia1 which lacks system health assessment and summarization capabilities. To overcome this problem, we propose a technique for automated anomaly detection using machine performance data in the cloud. Our algorithm is entirely distributed and runs locally on each computing machine on the cloud in order to rank the machines in order of their anomalous behavior for given jobs. There is no need to centralize any of the performance data for the analysis and at the end of the analysis, our algorithm generates error reports, thereby allowing the system administrators to take corrective actions. Experiments performed on real data sets collected for different jobs validate the fact that our algorithm has a low overhead for tracking anomalous machines in a cloud infrastructure.
Parallel Distributed Processing at 25: further explorations in the microstructure of cognition.
Rogers, Timothy T; McClelland, James L
2014-08-01
This paper introduces a special issue of Cognitive Science initiated on the 25th anniversary of the publication of Parallel Distributed Processing (PDP), a two-volume work that introduced the use of neural network models as vehicles for understanding cognition. The collection surveys the core commitments of the PDP framework, the key issues the framework has addressed, and the debates the framework has spawned, and presents viewpoints on the current status of these issues. The articles focus on both historical roots and contemporary developments in learning, optimality theory, perception, memory, language, conceptual knowledge, cognitive control, and consciousness. Here we consider the approach more generally, reviewing the original motivations, the resulting framework, and the central tenets of the underlying theory. We then evaluate the impact of PDP both on the field at large and within specific subdomains of cognitive science and consider the current role of PDP models within the broader landscape of contemporary theoretical frameworks in cognitive science. Looking to the future, we consider the implications for cognitive science of the recent success of machine learning systems called "deep networks"-systems that build on key ideas presented in the PDP volumes. Copyright © 2014 Cognitive Science Society, Inc.
Toward an automated parallel computing environment for geosciences
NASA Astrophysics Data System (ADS)
Zhang, Huai; Liu, Mian; Shi, Yaolin; Yuen, David A.; Yan, Zhenzhen; Liang, Guoping
2007-08-01
Software for geodynamic modeling has not kept up with the fast growing computing hardware and network resources. In the past decade supercomputing power has become available to most researchers in the form of affordable Beowulf clusters and other parallel computer platforms. However, to take full advantage of such computing power requires developing parallel algorithms and associated software, a task that is often too daunting for geoscience modelers whose main expertise is in geosciences. We introduce here an automated parallel computing environment built on open-source algorithms and libraries. Users interact with this computing environment by specifying the partial differential equations, solvers, and model-specific properties using an English-like modeling language in the input files. The system then automatically generates the finite element codes that can be run on distributed or shared memory parallel machines. This system is dynamic and flexible, allowing users to address different problems in geosciences. It is capable of providing web-based services, enabling users to generate source codes online. This unique feature will facilitate high-performance computing to be integrated with distributed data grids in the emerging cyber-infrastructures for geosciences. In this paper we discuss the principles of this automated modeling environment and provide examples to demonstrate its versatility.
Fiber laser drilling of Ni46Mn27Ga27 ferromagnetic shape memory alloy
NASA Astrophysics Data System (ADS)
Biffi, C. A.; Tuissi, A.
2014-11-01
The interest in ferromagnetic shape memory alloys (SMAs), such as NiMnGa, is increasing, thanks to the functional properties of these smart and functional materials. One of the most evident properties of these systems is their brittleness, which makes attractive the study of unconventional manufacturing processes, such as laser machining. In this work the interaction of laser beam, once focalized on the surface of Ni46Mn27Ga27 [at%] alloy, has been studied. The experiments were performed with a single laser pulse, using a 1 kW continuous wave fiber laser. The morphology of the laser machined surfaces was evaluated using scanning electron microscopy, coupled with energetic dispersion spectroscopy for the measurement of the chemical composition. The results showed that the high quality of the laser beam, coupled with great irradiances available, allow for blind or through holes to be machined on 1.8 mm plates with a single pulse in the order of a few ms. Holes were produced with size in the range of 200-300 μm; despite the long pulse duration, low amount of melted material is produced around the hole periphery. No significant variation of the chemical composition has been detected on the entrance surfaces while the exit ones have been characterized by the loss of Ga content, due to its melting point being significantly lower with respect to the other alloying elements.
González-Domínguez, Jorge; Remeseiro, Beatriz; Martín, María J
2017-02-01
The analysis of the interference patterns on the tear film lipid layer is a useful clinical test to diagnose dry eye syndrome. This task can be automated with a high degree of accuracy by means of the use of tear film maps. However, the time required by the existing applications to generate them prevents a wider acceptance of this method by medical experts. Multithreading has been previously successfully employed by the authors to accelerate the tear film map definition on multicore single-node machines. In this work, we propose a hybrid message-passing and multithreading parallel approach that further accelerates the generation of tear film maps by exploiting the computational capabilities of distributed-memory systems such as multicore clusters and supercomputers. The algorithm for drawing tear film maps is parallelized using Message Passing Interface (MPI) for inter-node communications and the multithreading support available in the C++11 standard for intra-node parallelization. The original algorithm is modified to reduce the communications and increase the scalability. The hybrid method has been tested on 32 nodes of an Intel cluster (with two 12-core Haswell 2680v3 processors per node) using 50 representative images. Results show that maximum runtime is reduced from almost two minutes using the previous only-multithreaded approach to less than ten seconds using the hybrid method. The hybrid MPI/multithreaded implementation can be used by medical experts to obtain tear film maps in only a few seconds, which will significantly accelerate and facilitate the diagnosis of the dry eye syndrome. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Layher, Georg; Schrodt, Fabian; Butz, Martin V.; Neumann, Heiko
2014-01-01
The categorization of real world objects is often reflected in the similarity of their visual appearances. Such categories of objects do not necessarily form disjunct sets of objects, neither semantically nor visually. The relationship between categories can often be described in terms of a hierarchical structure. For instance, tigers and leopards build two separate mammalian categories, both of which are subcategories of the category Felidae. In the last decades, the unsupervised learning of categories of visual input stimuli has been addressed by numerous approaches in machine learning as well as in computational neuroscience. However, the question of what kind of mechanisms might be involved in the process of subcategory learning, or category refinement, remains a topic of active investigation. We propose a recurrent computational network architecture for the unsupervised learning of categorial and subcategorial visual input representations. During learning, the connection strengths of bottom-up weights from input to higher-level category representations are adapted according to the input activity distribution. In a similar manner, top-down weights learn to encode the characteristics of a specific stimulus category. Feedforward and feedback learning in combination realize an associative memory mechanism, enabling the selective top-down propagation of a category's feedback weight distribution. We suggest that the difference between the expected input encoded in the projective field of a category node and the current input pattern controls the amplification of feedforward-driven representations. Large enough differences trigger the recruitment of new representational resources and the establishment of additional (sub-) category representations. We demonstrate the temporal evolution of such learning and show how the proposed combination of an associative memory with a modulatory feedback integration successfully establishes category and subcategory representations. PMID:25538637
NASA Astrophysics Data System (ADS)
Bamiah, Mervat Adib; Brohi, Sarfraz Nawaz; Chuprat, Suriayati
2012-01-01
Virtualization is one of the hottest research topics nowadays. Several academic researchers and developers from IT industry are designing approaches for solving security and manageability issues of Virtual Machines (VMs) residing on virtualized cloud infrastructures. Moving the application from a physical to a virtual platform increases the efficiency, flexibility and reduces management cost as well as effort. Cloud computing is adopting the paradigm of virtualization, using this technique, memory, CPU and computational power is provided to clients' VMs by utilizing the underlying physical hardware. Beside these advantages there are few challenges faced by adopting virtualization such as management of VMs and network traffic, unexpected additional cost and resource allocation. Virtual Machine Monitor (VMM) or hypervisor is the tool used by cloud providers to manage the VMs on cloud. There are several heterogeneous hypervisors provided by various vendors that include VMware, Hyper-V, Xen and Kernel Virtual Machine (KVM). Considering the challenge of VM management, this paper describes several techniques to monitor and manage virtualized cloud infrastructures.
Creation of smart composites using an embroidery machine
NASA Astrophysics Data System (ADS)
Torii, Nobuhiro; Oka, Kosuke; Ikeda, Tadashige
2016-04-01
A smart composite with functional fibers and reinforcement fibers optimally placed with an embroidery machine was created. Fiber orientation affects mechanical properties of composite laminates significantly. Accordingly, if the fibers can be placed along a desired curved path, fiber reinforced plastic (FRP) structures can be designed more lightly and more sophisticatedly. To this end a tailored fiber placement method using the embroidery machine have been studied. To add functions to the FRP structures, shape memory alloy (SMA) wires were placed as functional fibers. First, for a certain purpose the paths of the reinforcement fibers and the SMA wires were simultaneously optimized in analysis. Next, the reinforcement fibers and tubes with the SMA wires were placed on fabrics by using the embroidery machine and this fabric was impregnated with resin by using the vacuum assisted resin transfer molding method. This smart composite was activated by applying voltage to the SMA wires. Fundamental properties of the smart composite were examined and the feasibility of the proposed creation method was shown.
Designing Anticancer Peptides by Constructive Machine Learning.
Grisoni, Francesca; Neuhaus, Claudia S; Gabernet, Gisela; Müller, Alex T; Hiss, Jan A; Schneider, Gisbert
2018-04-21
Constructive (generative) machine learning enables the automated generation of novel chemical structures without the need for explicit molecular design rules. This study presents the experimental application of such a deep machine learning model to design membranolytic anticancer peptides (ACPs) de novo. A recurrent neural network with long short-term memory cells was trained on α-helical cationic amphipathic peptide sequences and then fine-tuned with 26 known ACPs by transfer learning. This optimized model was used to generate unique and novel amino acid sequences. Twelve of the peptides were synthesized and tested for their activity on MCF7 human breast adenocarcinoma cells and selectivity against human erythrocytes. Ten of these peptides were active against cancer cells. Six of the active peptides killed MCF7 cancer cells without affecting human erythrocytes with at least threefold selectivity. These results advocate constructive machine learning for the automated design of peptides with desired biological activities. © 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Base drive and overlap protection circuit
Gritter, David J.
1983-01-01
An inverter (34) which provides power to an A. C. machine (28) is controlled by a circuit (36) employing PWM control strategy whereby A. C. power is supplied to the machine at a preselectable frequency and preselectable voltage. This is accomplished by the technique of waveform notching in which the shapes of the notches are varied to determine the average energy content of the overall waveform. Through this arrangement, the operational efficiency of the A. C. machine is optimized. The control circuit includes a microcomputer and memory element which receive various parametric inputs and calculate optimized machine control data signals therefrom. The control data is asynchronously loaded into the inverter through an intermediate buffer (38). A base drive and overlap protection circuit is included to insure that both transistors of a complimentary pair are not conducting at the same time. In its preferred embodiment, the present invention is incorporated within an electric vehicle (10) employing a 144 VDC battery pack (32) and a three-phase induction motor (18).
Meeting the memory challenges of brain-scale network simulation.
Kunkel, Susanne; Potjans, Tobias C; Eppler, Jochen M; Plesser, Hans Ekkehard; Morrison, Abigail; Diesmann, Markus
2011-01-01
The development of high-performance simulation software is crucial for studying the brain connectome. Using connectome data to generate neurocomputational models requires software capable of coping with models on a variety of scales: from the microscale, investigating plasticity, and dynamics of circuits in local networks, to the macroscale, investigating the interactions between distinct brain regions. Prior to any serious dynamical investigation, the first task of network simulations is to check the consistency of data integrated in the connectome and constrain ranges for yet unknown parameters. Thanks to distributed computing techniques, it is possible today to routinely simulate local cortical networks of around 10(5) neurons with up to 10(9) synapses on clusters and multi-processor shared-memory machines. However, brain-scale networks are orders of magnitude larger than such local networks, in terms of numbers of neurons and synapses as well as in terms of computational load. Such networks have been investigated in individual studies, but the underlying simulation technologies have neither been described in sufficient detail to be reproducible nor made publicly available. Here, we discover that as the network model sizes approach the regime of meso- and macroscale simulations, memory consumption on individual compute nodes becomes a critical bottleneck. This is especially relevant on modern supercomputers such as the Blue Gene/P architecture where the available working memory per CPU core is rather limited. We develop a simple linear model to analyze the memory consumption of the constituent components of neuronal simulators as a function of network size and the number of cores used. This approach has multiple benefits. The model enables identification of key contributing components to memory saturation and prediction of the effects of potential improvements to code before any implementation takes place. As a consequence, development cycles can be shorter and less expensive. Applying the model to our freely available Neural Simulation Tool (NEST), we identify the software components dominant at different scales, and develop general strategies for reducing the memory consumption, in particular by using data structures that exploit the sparseness of the local representation of the network. We show that these adaptations enable our simulation software to scale up to the order of 10,000 processors and beyond. As memory consumption issues are likely to be relevant for any software dealing with complex connectome data on such architectures, our approach and our findings should be useful for researchers developing novel neuroinformatics solutions to the challenges posed by the connectome project.
A Distributed Data Base Version of INGRES.
ERIC Educational Resources Information Center
Stonebraker, Michael; Neuhold, Eric
Extensions are required to the currently operational INGRES data base system for it to manage a data base distributed over multiple machines in a computer network running the UNIX operating system. Three possible user views include: (1) each relation in a unique machine, (2) a user interaction with the data base which can only span relations at a…
The Effect of the Underlying Distribution in Hurst Exponent Estimation
Sánchez, Miguel Ángel; Trinidad, Juan E.; García, José; Fernández, Manuel
2015-01-01
In this paper, a heavy-tailed distribution approach is considered in order to explore the behavior of actual financial time series. We show that this kind of distribution allows to properly fit the empirical distribution of the stocks from S&P500 index. In addition to that, we explain in detail why the underlying distribution of the random process under study should be taken into account before using its self-similarity exponent as a reliable tool to state whether that financial series displays long-range dependence or not. Finally, we show that, under this model, no stocks from S&P500 index show persistent memory, whereas some of them do present anti-persistent memory and most of them present no memory at all. PMID:26020942
Latin-square three-dimensional gage master
Jones, L.
1981-05-12
A gage master for coordinate measuring machines has an nxn array of objects distributed in the Z coordinate utilizing the concept of a Latin square experimental design. Using analysis of variance techniques, the invention may be used to identify sources of error in machine geometry and quantify machine accuracy.
Latin square three dimensional gage master
Jones, Lynn L.
1982-01-01
A gage master for coordinate measuring machines has an nxn array of objects distributed in the Z coordinate utilizing the concept of a Latin square experimental design. Using analysis of variance techniques, the invention may be used to identify sources of error in machine geometry and quantify machine accuracy.
Hypercluster - Parallel processing for computational mechanics
NASA Technical Reports Server (NTRS)
Blech, Richard A.
1988-01-01
An account is given of the development status, performance capabilities and implications for further development of NASA-Lewis' testbed 'hypercluster' parallel computer network, in which multiple processors communicate through a shared memory. Processors have local as well as shared memory; the hypercluster is expanded in the same manner as the hypercube, with processor clusters replacing the normal single processor node. The NASA-Lewis machine has three nodes with a vector personality and one node with a scalar personality. Each of the vector nodes uses four board-level vector processors, while the scalar node uses four general-purpose microcomputer boards.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Reyna, David; Betty, Rita
Using High Performance Computing to Examine the Processes of Neurogenesis Underlying Pattern Separation/Completion of Episodic Information - Sandia researchers developed novel methods and metrics for studying the computational function of neurogenesis,thus generating substantial impact to the neuroscience and neural computing communities. This work could benefit applications in machine learning and other analysis activities. The purpose of this project was to computationally model the impact of neural population dynamics within the neurobiological memory system in order to examine how subareas in the brain enable pattern separation and completion of information in memory across time as associated experiences.
Modeling Distributions of Immediate Memory Effects: No Strategies Needed?
ERIC Educational Resources Information Center
Beaman, C. Philip; Neath, Ian; Surprenant, Aimee M.
2008-01-01
Many models of immediate memory predict the presence or absence of various effects, but none have been tested to see whether they predict an appropriate distribution of effect sizes. The authors show that the feature model (J. S. Nairne, 1990) produces appropriate distributions of effect sizes for both the phonological confusion effect and the…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stewart, Emma M.; Hendrix, Val; Chertkov, Michael
This white paper introduces the application of advanced data analytics to the modernized grid. In particular, we consider the field of machine learning and where it is both useful, and not useful, for the particular field of the distribution grid and buildings interface. While analytics, in general, is a growing field of interest, and often seen as the golden goose in the burgeoning distribution grid industry, its application is often limited by communications infrastructure, or lack of a focused technical application. Overall, the linkage of analytics to purposeful application in the grid space has been limited. In this paper wemore » consider the field of machine learning as a subset of analytical techniques, and discuss its ability and limitations to enable the future distribution grid and the building-to-grid interface. To that end, we also consider the potential for mixing distributed and centralized analytics and the pros and cons of these approaches. Machine learning is a subfield of computer science that studies and constructs algorithms that can learn from data and make predictions and improve forecasts. Incorporation of machine learning in grid monitoring and analysis tools may have the potential to solve data and operational challenges that result from increasing penetration of distributed and behind-the-meter energy resources. There is an exponentially expanding volume of measured data being generated on the distribution grid, which, with appropriate application of analytics, may be transformed into intelligible, actionable information that can be provided to the right actors – such as grid and building operators, at the appropriate time to enhance grid or building resilience, efficiency, and operations against various metrics or goals – such as total carbon reduction or other economic benefit to customers. While some basic analysis into these data streams can provide a wealth of information, computational and human boundaries on performing the analysis are becoming significant, with more data and multi-objective concerns. Efficient applications of analysis and the machine learning field are being considered in the loop.« less
AHaH computing-from metastable switches to attractors to machine learning.
Nugent, Michael Alexander; Molter, Timothy Wesley
2014-01-01
Modern computing architecture based on the separation of memory and processing leads to a well known problem called the von Neumann bottleneck, a restrictive limit on the data bandwidth between CPU and RAM. This paper introduces a new approach to computing we call AHaH computing where memory and processing are combined. The idea is based on the attractor dynamics of volatile dissipative electronics inspired by biological systems, presenting an attractive alternative architecture that is able to adapt, self-repair, and learn from interactions with the environment. We envision that both von Neumann and AHaH computing architectures will operate together on the same machine, but that the AHaH computing processor may reduce the power consumption and processing time for certain adaptive learning tasks by orders of magnitude. The paper begins by drawing a connection between the properties of volatility, thermodynamics, and Anti-Hebbian and Hebbian (AHaH) plasticity. We show how AHaH synaptic plasticity leads to attractor states that extract the independent components of applied data streams and how they form a computationally complete set of logic functions. After introducing a general memristive device model based on collections of metastable switches, we show how adaptive synaptic weights can be formed from differential pairs of incremental memristors. We also disclose how arrays of synaptic weights can be used to build a neural node circuit operating AHaH plasticity. By configuring the attractor states of the AHaH node in different ways, high level machine learning functions are demonstrated. This includes unsupervised clustering, supervised and unsupervised classification, complex signal prediction, unsupervised robotic actuation and combinatorial optimization of procedures-all key capabilities of biological nervous systems and modern machine learning algorithms with real world application.
AHaH Computing–From Metastable Switches to Attractors to Machine Learning
Nugent, Michael Alexander; Molter, Timothy Wesley
2014-01-01
Modern computing architecture based on the separation of memory and processing leads to a well known problem called the von Neumann bottleneck, a restrictive limit on the data bandwidth between CPU and RAM. This paper introduces a new approach to computing we call AHaH computing where memory and processing are combined. The idea is based on the attractor dynamics of volatile dissipative electronics inspired by biological systems, presenting an attractive alternative architecture that is able to adapt, self-repair, and learn from interactions with the environment. We envision that both von Neumann and AHaH computing architectures will operate together on the same machine, but that the AHaH computing processor may reduce the power consumption and processing time for certain adaptive learning tasks by orders of magnitude. The paper begins by drawing a connection between the properties of volatility, thermodynamics, and Anti-Hebbian and Hebbian (AHaH) plasticity. We show how AHaH synaptic plasticity leads to attractor states that extract the independent components of applied data streams and how they form a computationally complete set of logic functions. After introducing a general memristive device model based on collections of metastable switches, we show how adaptive synaptic weights can be formed from differential pairs of incremental memristors. We also disclose how arrays of synaptic weights can be used to build a neural node circuit operating AHaH plasticity. By configuring the attractor states of the AHaH node in different ways, high level machine learning functions are demonstrated. This includes unsupervised clustering, supervised and unsupervised classification, complex signal prediction, unsupervised robotic actuation and combinatorial optimization of procedures–all key capabilities of biological nervous systems and modern machine learning algorithms with real world application. PMID:24520315
The Efficiency and the Scalability of an Explicit Operator on an IBM POWER4 System
NASA Technical Reports Server (NTRS)
Frumkin, Michael; Biegel, Bryan A. (Technical Monitor)
2002-01-01
We present an evaluation of the efficiency and the scalability of an explicit CFD operator on an IBM POWER4 system. The POWER4 architecture exhibits a common trend in HPC architectures: boosting CPU processing power by increasing the number of functional units, while hiding the latency of memory access by increasing the depth of the memory hierarchy. The overall machine performance depends on the ability of the caches-buses-fabric-memory to feed the functional units with the data to be processed. In this study we evaluate the efficiency and scalability of one explicit CFD operator on an IBM POWER4. This operator performs computations at the points of a Cartesian grid and involves a few dozen floating point numbers and on the order of 100 floating point operations per grid point. The computations in all grid points are independent. Specifically, we estimate the efficiency of the RHS operator (SP of NPB) on a single processor as the observed/peak performance ratio. Then we estimate the scalability of the operator on a single chip (2 CPUs), a single MCM (8 CPUs), 16 CPUs, and the whole machine (32 CPUs). Then we perform the same measurements for a chache-optimized version of the RHS operator. For our measurements we use the HPM (Hardware Performance Monitor) counters available on the POWER4. These counters allow us to analyze the obtained performance results.
Failure detection in high-performance clusters and computers using chaotic map computations
Rao, Nageswara S.
2015-09-01
A programmable media includes a processing unit capable of independent operation in a machine that is capable of executing 10.sup.18 floating point operations per second. The processing unit is in communication with a memory element and an interconnect that couples computing nodes. The programmable media includes a logical unit configured to execute arithmetic functions, comparative functions, and/or logical functions. The processing unit is configured to detect computing component failures, memory element failures and/or interconnect failures by executing programming threads that generate one or more chaotic map trajectories. The central processing unit or graphical processing unit is configured to detect a computing component failure, memory element failure and/or an interconnect failure through an automated comparison of signal trajectories generated by the chaotic maps.
A FPGA-based Measurement System for Nonvolatile Semiconductor Memory Characterization
NASA Astrophysics Data System (ADS)
Bu, Jiankang; White, Marvin
2002-03-01
Low voltage, long retention, high density SONOS nonvolatile semiconductor memory (NVSM) devices are ideally suited for PCMCIA, FLASH and 'smart' cards. The SONOS memory transistor requires characterization with an accurate, rapid measurement system with minimum disturbance to the device. The FPGA-based measurement system includes three parts: 1) a pattern generator implemented with XILINX FPGAs and corresponding software, 2) a high-speed, constant-current, threshold voltage detection circuit, 3) and a data evaluation program, implemented with a LABVIEW program. Fig. 1 shows the general block diagram of the FPGA-based measurement system. The function generator is designed and simulated with XILINX Foundation Software. Under the control of the specific erase/write/read pulses, the analog detect circuit applies operational modes to the SONOS device under test (DUT) and determines the change of the memory-state of the SONOS nonvolatile memory transistor. The TEK460 digitizes the analog threshold voltage output and sends to the PC computer. The data is filtered and averaged with a LABVIEWTM program running on the PC computer and displayed on the monitor in real time. We have implemented the pattern generator with XILINX FPGAs. Fig. 2 shows the block diagram of the pattern generator. We realized the logic control by a method of state machine design. Fig. 3 shows a small part of the state machine. The flexibility of the FPGAs enhances the capabilities of this system and allows measurement variations without hardware changes. The characterization of the nonvolatile memory transistor device under test (DUT), as function of programming voltage and time, is achieved by a high-speed, constant-current threshold voltage detection circuit. The analog detection circuit incorporating fast analog switches controlled digitally with the FPGAs. The schematic circuit diagram is shown in Fig. 4. The various operational modes for the DUT are realized with control signals applied to the analog switches (SW) as shown in Fig. 5. A LABVIEWTM program, on a PC platform, collects and processes the data. The data is displayed on the monitor in real time. This time-domain filtering reduces the digitizing error. Fig. 6 shows the data processing. SONOS nonvolatile semiconductor memories are characterized by erase/write, retention and endurance measurements. Fig. 7 shows the erase/write characteristics of an n-Channel, 5V prog-rammable SONOS memory transistor. Fig.8 shows the retention characteristic of the same SONOS transistor. We have used this system to characterize SONOS nonvolatile semiconductor memory transistors. The attractive features of the test system design lies in the cost-effectiveness and flexibility of the test pattern implementation, fast read-out of memory state, low power, high precision determination of the device threshold voltage, and perhaps most importantly, minimum disturbance, which is indispensable for nonvolatile memory characterization.
Sparse distributed memory prototype: Principles of operation
NASA Technical Reports Server (NTRS)
Flynn, Michael J.; Kanerva, Pentti; Ahanin, Bahram; Bhadkamkar, Neal; Flaherty, Paul; Hickey, Philip
1988-01-01
Sparse distributed memory is a generalized random access memory (RAM) for long binary words. Such words can be written into and read from the memory, and they can be used to address the memory. The main attribute of the memory is sensitivity to similarity, meaning that a word can be read back not only by giving the original right address but also by giving one close to it as measured by the Hamming distance between addresses. Large memories of this kind are expected to have wide use in speech and scene analysis, in signal detection and verification, and in adaptive control of automated equipment. The memory can be realized as a simple, massively parallel computer. Digital technology has reached a point where building large memories is becoming practical. The research is aimed at resolving major design issues that have to be faced in building the memories. The design of a prototype memory with 256-bit addresses and from 8K to 128K locations for 256-bit words is described. A key aspect of the design is extensive use of dynamic RAM and other standard components.
A collaborative framework for Distributed Privacy-Preserving Support Vector Machine learning.
Que, Jialan; Jiang, Xiaoqian; Ohno-Machado, Lucila
2012-01-01
A Support Vector Machine (SVM) is a popular tool for decision support. The traditional way to build an SVM model is to estimate parameters based on a centralized repository of data. However, in the field of biomedicine, patient data are sometimes stored in local repositories or institutions where they were collected, and may not be easily shared due to privacy concerns. This creates a substantial barrier for researchers to effectively learn from the distributed data using machine learning tools like SVMs. To overcome this difficulty and promote efficient information exchange without sharing sensitive raw data, we developed a Distributed Privacy Preserving Support Vector Machine (DPP-SVM). The DPP-SVM enables privacy-preserving collaborative learning, in which a trusted server integrates "privacy-insensitive" intermediary results. The globally learned model is guaranteed to be exactly the same as learned from combined data. We also provide a free web-service (http://privacy.ucsd.edu:8080/ppsvm/) for multiple participants to collaborate and complete the SVM-learning task in an efficient and privacy-preserving manner.
Neylon, J; Min, Y; Kupelian, P; Low, D A; Santhanam, A
2017-04-01
In this paper, a multi-GPU cloud-based server (MGCS) framework is presented for dose calculations, exploring the feasibility of remote computing power for parallelization and acceleration of computationally and time intensive radiotherapy tasks in moving toward online adaptive therapies. An analytical model was developed to estimate theoretical MGCS performance acceleration and intelligently determine workload distribution. Numerical studies were performed with a computing setup of 14 GPUs distributed over 4 servers interconnected by a 1 Gigabits per second (Gbps) network. Inter-process communication methods were optimized to facilitate resource distribution and minimize data transfers over the server interconnect. The analytically predicted computation time predicted matched experimentally observations within 1-5 %. MGCS performance approached a theoretical limit of acceleration proportional to the number of GPUs utilized when computational tasks far outweighed memory operations. The MGCS implementation reproduced ground-truth dose computations with negligible differences, by distributing the work among several processes and implemented optimization strategies. The results showed that a cloud-based computation engine was a feasible solution for enabling clinics to make use of fast dose calculations for advanced treatment planning and adaptive radiotherapy. The cloud-based system was able to exceed the performance of a local machine even for optimized calculations, and provided significant acceleration for computationally intensive tasks. Such a framework can provide access to advanced technology and computational methods to many clinics, providing an avenue for standardization across institutions without the requirements of purchasing, maintaining, and continually updating hardware.
Machine characterization based on an abstract high-level language machine
NASA Technical Reports Server (NTRS)
Saavedra-Barrera, Rafael H.; Smith, Alan Jay; Miya, Eugene
1989-01-01
Measurements are presented for a large number of machines ranging from small workstations to supercomputers. The authors combine these measurements into groups of parameters which relate to specific aspects of the machine implementation, and use these groups to provide overall machine characterizations. The authors also define the concept of pershapes, which represent the level of performance of a machine for different types of computation. A metric based on pershapes is introduced that provides a quantitative way of measuring how similar two machines are in terms of their performance distributions. The metric is related to the extent to which pairs of machines have varying relative performance levels depending on which benchmark is used.
Do Children Think that Duplicating the Body also Duplicates the Mind?
ERIC Educational Resources Information Center
Hood, Bruce; Gjersoe, Nathalia L.; Bloom, Paul
2012-01-01
Philosophers use hypothetical duplication scenarios to explore intuitions about personal identity. Here we examined 5- to 6-year-olds' intuitions about the physical properties and memories of a live hamster that is apparently duplicated by a machine. In Study 1, children thought that more of the original's physical properties than episodic…
The IBM PC as an Online Search Machine. Part 6: Uploading.
ERIC Educational Resources Information Center
Kolner, Stuart J.
1986-01-01
This discussion of uploading (the transmission of information from local memory to a remote system) covers benefits, disadvantages, host system features, technical aspects, and four examples of uploading using CROSSTALK. Examples include function keys, searcher assistance with commonly-used terms, the use of hedges and boiler-plate searching, and…
Introduction to Parallel Computing
1992-05-01
Instruction Stream, Multiple Data Stream Machines .................... 19 2.4 Networks of M achines...independent memory units and connecting them to the processors by an interconnection network . Many different interconnection schemes have been considered, and...connected to the same processor at the same time. Crossbar switching networks are still too expensive to be practical for connecting large numbers of
Programming Models for Concurrency and Real-Time
NASA Astrophysics Data System (ADS)
Vitek, Jan
Modern real-time applications are increasingly large, complex and concurrent systems which must meet stringent performance and predictability requirements. Programming those systems require fundamental advances in programming languages and runtime systems. This talk presents our work on Flexotasks, a programming model for concurrent, real-time systems inspired by stream-processing and concurrent active objects. Some of the key innovations in Flexotasks are that it support both real-time garbage collection and region-based memory with an ownership type system for static safety. Communication between tasks is performed by channels with a linear type discipline to avoid copying messages, and by a non-blocking transactional memory facility. We have evaluated our model empirically within two distinct implementations, one based on Purdue’s Ovm research virtual machine framework and the other on Websphere, IBM’s production real-time virtual machine. We have written a number of small programs, as well as a 30 KLOC avionics collision detector application. We show that Flexotasks are capable of executing periodic threads at 10 KHz with a standard deviation of 1.2us and have performance competitive with hand coded C programs.
Huang, Song; Tian, Na; Wang, Yan; Ji, Zhicheng
2016-01-01
Taking resource allocation into account, flexible job shop problem (FJSP) is a class of complex scheduling problem in manufacturing system. In order to utilize the machine resources rationally, multi-objective particle swarm optimization (MOPSO) integrating with variable neighborhood search is introduced to address FJSP efficiently. Firstly, the assignment rules (AL) and dispatching rules (DR) are provided to initialize the population. And then special discrete operators are designed to produce new individuals and earliest completion machine (ECM) is adopted in the disturbance operator to escape the optima. Secondly, personal-best archives (cognitive memories) and global-best archive (social memory), which are updated by the predefined non-dominated archive update strategy, are simultaneously designed to preserve non-dominated individuals and select personal-best positions and the global-best position. Finally, three neighborhoods are provided to search the neighborhoods of global-best archive for enhancing local search ability. The proposed algorithm is evaluated by using Kacem instances and Brdata instances, and a comparison with other approaches shows the effectiveness of the proposed algorithm for FJSP.
Memory-assisted quantum key distribution resilient against multiple-excitation effects
NASA Astrophysics Data System (ADS)
Lo Piparo, Nicolò; Sinclair, Neil; Razavi, Mohsen
2018-01-01
Memory-assisted measurement-device-independent quantum key distribution (MA-MDI-QKD) has recently been proposed as a technique to improve the rate-versus-distance behavior of QKD systems by using existing, or nearly-achievable, quantum technologies. The promise is that MA-MDI-QKD would require less demanding quantum memories than the ones needed for probabilistic quantum repeaters. Nevertheless, early investigations suggest that, in order to beat the conventional memory-less QKD schemes, the quantum memories used in the MA-MDI-QKD protocols must have high bandwidth-storage products and short interaction times. Among different types of quantum memories, ensemble-based memories offer some of the required specifications, but they typically suffer from multiple excitation effects. To avoid the latter issue, in this paper, we propose two new variants of MA-MDI-QKD both relying on single-photon sources for entangling purposes. One is based on known techniques for entanglement distribution in quantum repeaters. This scheme turns out to offer no advantage even if one uses ideal single-photon sources. By finding the root cause of the problem, we then propose another setup, which can outperform single memory-less setups even if we allow for some imperfections in our single-photon sources. For such a scheme, we compare the key rate for different types of ensemble-based memories and show that certain classes of atomic ensembles can improve the rate-versus-distance behavior.
Introducing concurrency in the Gaudi data processing framework
NASA Astrophysics Data System (ADS)
Clemencic, Marco; Hegner, Benedikt; Mato, Pere; Piparo, Danilo
2014-06-01
In the past, the increasing demands for HEP processing resources could be fulfilled by the ever increasing clock-frequencies and by distributing the work to more and more physical machines. Limitations in power consumption of both CPUs and entire data centres are bringing an end to this era of easy scalability. To get the most CPU performance per watt, future hardware will be characterised by less and less memory per processor, as well as thinner, more specialized and more numerous cores per die, and rather heterogeneous resources. To fully exploit the potential of the many cores, HEP data processing frameworks need to allow for parallel execution of reconstruction or simulation algorithms on several events simultaneously. We describe our experience in introducing concurrency related capabilities into Gaudi, a generic data processing software framework, which is currently being used by several HEP experiments, including the ATLAS and LHCb experiments at the LHC. After a description of the concurrent framework and the most relevant design choices driving its development, we describe the behaviour of the framework in a more realistic environment, using a subset of the real LHCb reconstruction workflow, and present our strategy and the used tools to validate the physics outcome of the parallel framework against the results of the present, purely sequential LHCb software. We then summarize the measurement of the code performance of the multithreaded application in terms of memory and CPU usage.
Boosting the FM-Index on the GPU: Effective Techniques to Mitigate Random Memory Access.
Chacón, Alejandro; Marco-Sola, Santiago; Espinosa, Antonio; Ribeca, Paolo; Moure, Juan Carlos
2015-01-01
The recent advent of high-throughput sequencing machines producing big amounts of short reads has boosted the interest in efficient string searching techniques. As of today, many mainstream sequence alignment software tools rely on a special data structure, called the FM-index, which allows for fast exact searches in large genomic references. However, such searches translate into a pseudo-random memory access pattern, thus making memory access the limiting factor of all computation-efficient implementations, both on CPUs and GPUs. Here, we show that several strategies can be put in place to remove the memory bottleneck on the GPU: more compact indexes can be implemented by having more threads work cooperatively on larger memory blocks, and a k-step FM-index can be used to further reduce the number of memory accesses. The combination of those and other optimisations yields an implementation that is able to process about two Gbases of queries per second on our test platform, being about 8 × faster than a comparable multi-core CPU version, and about 3 × to 5 × faster than the FM-index implementation on the GPU provided by the recently announced Nvidia NVBIO bioinformatics library.
Ghosts in the machine: memory interference from the previous trial.
Papadimitriou, Charalampos; Ferdoash, Afreen; Snyder, Lawrence H
2015-01-15
Previous memoranda can interfere with the memorization or storage of new information, a concept known as proactive interference. Studies of proactive interference typically use categorical memoranda and match-to-sample tasks with categorical measures such as the proportion of correct to incorrect responses. In this study we instead train five macaques in a spatial memory task with continuous memoranda and responses, allowing us to more finely probe working memory circuits. We first ask whether the memoranda from the previous trial result in proactive interference in an oculomotor delayed response task. We then characterize the spatial and temporal profile of this interference and ask whether this profile can be predicted by an attractor network model of working memory. We find that memory in the current trial shows a bias toward the location of the memorandum of the previous trial. The magnitude of this bias increases with the duration of the memory period within which it is measured. Our simulations using standard attractor network models of working memory show that these models easily replicate the spatial profile of the bias. However, unlike the behavioral findings, these attractor models show an increase in bias with the duration of the previous rather than the current memory period. To model a bias that increases with current trial duration we posit two separate memory stores, a rapidly decaying visual store that resists proactive interference effects and a sustained memory store that is susceptible to proactive interference. Copyright © 2015 the American Physiological Society.
Ghosts in the machine: memory interference from the previous trial
Ferdoash, Afreen; Snyder, Lawrence H.
2014-01-01
Previous memoranda can interfere with the memorization or storage of new information, a concept known as proactive interference. Studies of proactive interference typically use categorical memoranda and match-to-sample tasks with categorical measures such as the proportion of correct to incorrect responses. In this study we instead train five macaques in a spatial memory task with continuous memoranda and responses, allowing us to more finely probe working memory circuits. We first ask whether the memoranda from the previous trial result in proactive interference in an oculomotor delayed response task. We then characterize the spatial and temporal profile of this interference and ask whether this profile can be predicted by an attractor network model of working memory. We find that memory in the current trial shows a bias toward the location of the memorandum of the previous trial. The magnitude of this bias increases with the duration of the memory period within which it is measured. Our simulations using standard attractor network models of working memory show that these models easily replicate the spatial profile of the bias. However, unlike the behavioral findings, these attractor models show an increase in bias with the duration of the previous rather than the current memory period. To model a bias that increases with current trial duration we posit two separate memory stores, a rapidly decaying visual store that resists proactive interference effects and a sustained memory store that is susceptible to proactive interference. PMID:25376781
NASA Astrophysics Data System (ADS)
Zhang, Zhen; Xia, Changliang; Yan, Yan; Geng, Qiang; Shi, Tingna
2017-08-01
Due to the complicated rotor structure and nonlinear saturation of rotor bridges, it is difficult to build a fast and accurate analytical field calculation model for multilayer interior permanent magnet (IPM) machines. In this paper, a hybrid analytical model suitable for the open-circuit field calculation of multilayer IPM machines is proposed by coupling the magnetic equivalent circuit (MEC) method and the subdomain technique. In the proposed analytical model, the rotor magnetic field is calculated by the MEC method based on the Kirchhoff's law, while the field in the stator slot, slot opening and air-gap is calculated by subdomain technique based on the Maxwell's equation. To solve the whole field distribution of the multilayer IPM machines, the coupled boundary conditions on the rotor surface are deduced for the coupling of the rotor MEC and the analytical field distribution of the stator slot, slot opening and air-gap. The hybrid analytical model can be used to calculate the open-circuit air-gap field distribution, back electromotive force (EMF) and cogging torque of multilayer IPM machines. Compared with finite element analysis (FEA), it has the advantages of faster modeling, less computation source occupying and shorter time consuming, and meanwhile achieves the approximate accuracy. The analytical model is helpful and applicable for the open-circuit field calculation of multilayer IPM machines with any size and pole/slot number combination.
Memory Benchmarks for SMP-Based High Performance Parallel Computers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yoo, A B; de Supinski, B; Mueller, F
2001-11-20
As the speed gap between CPU and main memory continues to grow, memory accesses increasingly dominates the performance of many applications. The problem is particularly acute for symmetric multiprocessor (SMP) systems, where the shared memory may be accessed concurrently by a group of threads running on separate CPUs. Unfortunately, several key issues governing memory system performance in current systems are not well understood. Complex interactions between the levels of the memory hierarchy, buses or switches, DRAM back-ends, system software, and application access patterns can make it difficult to pinpoint bottlenecks and determine appropriate optimizations, and the situation is even moremore » complex for SMP systems. To partially address this problem, we formulated a set of multi-threaded microbenchmarks for characterizing and measuring the performance of the underlying memory system in SMP-based high-performance computers. We report our use of these microbenchmarks on two important SMP-based machines. This paper has four primary contributions. First, we introduce a microbenchmark suite to systematically assess and compare the performance of different levels in SMP memory hierarchies. Second, we present a new tool based on hardware performance monitors to determine a wide array of memory system characteristics, such as cache sizes, quickly and easily; by using this tool, memory performance studies can be targeted to the full spectrum of performance regimes with many fewer data points than is otherwise required. Third, we present experimental results indicating that the performance of applications with large memory footprints remains largely constrained by memory. Fourth, we demonstrate that thread-level parallelism further degrades memory performance, even for the latest SMPs with hardware prefetching and switch-based memory interconnects.« less
First Applications of the New Parallel Krylov Solver for MODFLOW on a National and Global Scale
NASA Astrophysics Data System (ADS)
Verkaik, J.; Hughes, J. D.; Sutanudjaja, E.; van Walsum, P.
2016-12-01
Integrated high-resolution hydrologic models are increasingly being used for evaluating water management measures at field scale. Their drawbacks are large memory requirements and long run times. Examples of such models are The Netherlands Hydrological Instrument (NHI) model and the PCRaster Global Water Balance (PCR-GLOBWB) model. Typical simulation periods are 30-100 years with daily timesteps. The NHI model predicts water demands in periods of drought, supporting operational and long-term water-supply decisions. The NHI is a state-of-the-art coupling of several models: a 7-layer MODFLOW groundwater model ( 6.5M 250m cells), a MetaSWAP model for the unsaturated zone (Richards emulator of 0.5M cells), and a surface water model (MOZART-DM). The PCR-GLOBWB model provides a grid-based representation of global terrestrial hydrology and this work uses the version that includes a 2-layer MODFLOW groundwater model ( 4.5M 10km cells). The Parallel Krylov Solver (PKS) speeds up computation by both distributed memory parallelization (Message Passing Interface) and shared memory parallelization (Open Multi-Processing). PKS includes conjugate gradient, bi-conjugate gradient stabilized, and generalized minimal residual linear accelerators that use an overlapping additive Schwarz domain decomposition preconditioner. PKS can be used for both structured and unstructured grids and has been fully integrated in MODFLOW-USG using METIS partitioning and in iMODFLOW using RCB partitioning. iMODFLOW is an accelerated version of MODFLOW-2005 that is implicitly and online coupled to MetaSWAP. Results for benchmarks carried out on the Cartesius Dutch supercomputer (https://userinfo.surfsara.nl/systems/cartesius) for the PCRGLOB-WB model and on a 2x16 core Windows machine for the NHI model show speedups up to 10-20 and 5-10, respectively.
Tian, Yin; Zhang, Huiling; Xu, Wei; Zhang, Haiyong; Yang, Li; Zheng, Shuxing; Shi, Yupan
2017-01-01
Spectral entropy, which was generated by applying the Shannon entropy concept to the power distribution of the Fourier-transformed electroencephalograph (EEG), was utilized to measure the uniformity of power spectral density underlying EEG when subjects performed the working memory tasks twice, i.e., before and after training. According to Signed Residual Time (SRT) scores based on response speed and accuracy trade-off, 20 subjects were divided into two groups, namely high-performance and low-performance groups, to undertake working memory (WM) tasks. We found that spectral entropy derived from the retention period of WM on channel FC4 exhibited a high correlation with SRT scores. To this end, spectral entropy was used in support vector machine classifier with linear kernel to differentiate these two groups. Receiver operating characteristics analysis and leave-one out cross-validation (LOOCV) demonstrated that the averaged classification accuracy (CA) was 90.0 and 92.5% for intra-session and inter-session, respectively, indicating that spectral entropy could be used to distinguish these two different WM performance groups successfully. Furthermore, the support vector regression prediction model with radial basis function kernel and the root-mean-square error of prediction revealed that spectral entropy could be utilized to predict SRT scores on individual WM performance. After testing the changes in SRT scores and spectral entropy for each subject by short-time training, we found that 16 in 20 subjects’ SRT scores were clearly promoted after training and 15 in 20 subjects’ SRT scores showed consistent changes with spectral entropy before and after training. The findings revealed that spectral entropy could be a promising indicator to predict individual’s WM changes by training and further provide a novel application about WM for brain–computer interfaces. PMID:28912701
Behavioral decoding of working memory items inside and outside the focus of attention.
Mallett, Remington; Lewis-Peacock, Jarrod A
2018-03-31
How we attend to our thoughts affects how we attend to our environment. Holding information in working memory can automatically bias visual attention toward matching information. By observing attentional biases on reaction times to visual search during a memory delay, it is possible to reconstruct the source of that bias using machine learning techniques and thereby behaviorally decode the content of working memory. Can this be done when more than one item is held in working memory? There is some evidence that multiple items can simultaneously bias attention, but the effects have been inconsistent. One explanation may be that items are stored in different states depending on the current task demands. Recent models propose functionally distinct states of representation for items inside versus outside the focus of attention. Here, we use behavioral decoding to evaluate whether multiple memory items-including temporarily irrelevant items outside the focus of attention-exert biases on visual attention. Only the single item in the focus of attention was decodable. The other item showed a brief attentional bias that dissipated until it returned to the focus of attention. These results support the idea of dynamic, flexible states of working memory across time and priority. © 2018 New York Academy of Sciences.
Biologically-inspired robust and adaptive multi-sensor fusion and active control
NASA Astrophysics Data System (ADS)
Khosla, Deepak; Dow, Paul A.; Huber, David J.
2009-04-01
In this paper, we describe a method and system for robust and efficient goal-oriented active control of a machine (e.g., robot) based on processing, hierarchical spatial understanding, representation and memory of multimodal sensory inputs. This work assumes that a high-level plan or goal is known a priori or is provided by an operator interface, which translates into an overall perceptual processing strategy for the machine. Its analogy to the human brain is the download of plans and decisions from the pre-frontal cortex into various perceptual working memories as a perceptual plan that then guides the sensory data collection and processing. For example, a goal might be to look for specific colored objects in a scene while also looking for specific sound sources. This paper combines three key ideas and methods into a single closed-loop active control system. (1) Use high-level plan or goal to determine and prioritize spatial locations or waypoints (targets) in multimodal sensory space; (2) collect/store information about these spatial locations at the appropriate hierarchy and representation in a spatial working memory. This includes invariant learning of these spatial representations and how to convert between them; and (3) execute actions based on ordered retrieval of these spatial locations from hierarchical spatial working memory and using the "right" level of representation that can efficiently translate into motor actions. In its most specific form, the active control is described for a vision system (such as a pantilt- zoom camera system mounted on a robotic head and neck unit) which finds and then fixates on high saliency visual objects. We also describe the approach where the goal is to turn towards and sequentially foveate on salient multimodal cues that include both visual and auditory inputs.
DOE Office of Scientific and Technical Information (OSTI.GOV)
2013-09-26
The Gremlin sofrware package is a performance analysis approach targeted to support the Co-Design process for future systems. It consists of a series of modules that can be used to alter a machine's behavior with the goal of emulating future machine properties. The modules can be divided into several classes; the most significant ones are detailed below. PowGre is a series of modules that help explore the power consumption properties of applications and to determine the impact of power constraints on applications. Most of them use low-level processor interfaces to directly control voltage and frequency settings as well as permore » nodes, socket, or memory power bounds. MemGre are memory Gremlins and implement a new performance analysis technique that captures the application's effective use of the storage capacity of different levels of the memory hierarchy as well as the bandwidth between adjacent levels. The approach models various memory components as resources and measures how much of each resource the application uses from the application's own perspective. To the application a given amount of a resource is "used" if not having this amount will degrade the application's performance. This is in contrast to the hardware-centric perspective that considers "use" as any hardware action that utilizes the resource, even if it has no effect on performance. ResGre are Gremlins that use fault injection techniques to emulate higher fault rates than currently present in today's systems. Faults can be injected through various means, including network interposition, static analysis, and code modification, or direct application notification. ResGre also includes patches to previously released LLNL codes that can counteract and react to injected failures.« less
Wang, Ying; Goh, Joshua O; Resnick, Susan M; Davatzikos, Christos
2013-01-01
In this study, we used high-dimensional pattern regression methods based on structural (gray and white matter; GM and WM) and functional (positron emission tomography of regional cerebral blood flow; PET) brain data to identify cross-sectional imaging biomarkers of cognitive performance in cognitively normal older adults from the Baltimore Longitudinal Study of Aging (BLSA). We focused on specific components of executive and memory domains known to decline with aging, including manipulation, semantic retrieval, long-term memory (LTM), and short-term memory (STM). For each imaging modality, brain regions associated with each cognitive domain were generated by adaptive regional clustering. A relevance vector machine was adopted to model the nonlinear continuous relationship between brain regions and cognitive performance, with cross-validation to select the most informative brain regions (using recursive feature elimination) as imaging biomarkers and optimize model parameters. Predicted cognitive scores using our regression algorithm based on the resulting brain regions correlated well with actual performance. Also, regression models obtained using combined GM, WM, and PET imaging modalities outperformed models based on single modalities. Imaging biomarkers related to memory performance included the orbito-frontal and medial temporal cortical regions with LTM showing stronger correlation with the temporal lobe than STM. Brain regions predicting executive performance included orbito-frontal, and occipito-temporal areas. The PET modality had higher contribution to most cognitive domains except manipulation, which had higher WM contribution from the superior longitudinal fasciculus and the genu of the corpus callosum. These findings based on machine-learning methods demonstrate the importance of combining structural and functional imaging data in understanding complex cognitive mechanisms and also their potential usage as biomarkers that predict cognitive status.
A Machine-Learning-Driven Sky Model.
Satylmys, Pynar; Bashford-Rogers, Thomas; Chalmers, Alan; Debattista, Kurt
2017-01-01
Sky illumination is responsible for much of the lighting in a virtual environment. A machine-learning-based approach can compactly represent sky illumination from both existing analytic sky models and from captured environment maps. The proposed approach can approximate the captured lighting at a significantly reduced memory cost and enable smooth transitions of sky lighting to be created from a small set of environment maps captured at discrete times of day. The author's results demonstrate accuracy close to the ground truth for both analytical and capture-based methods. The approach has a low runtime overhead, so it can be used as a generic approach for both offline and real-time applications.
Application of an onboard processor to the OAO C spacecraft
NASA Technical Reports Server (NTRS)
Stewart, W. N.; Hartenstein, R. G.; Trevathan, C.
1972-01-01
The design of a stored program computer for spacecraft use and its application on the fourth Orbiting Astronomical Observatory (OAO) is reported. The computer is a medium scale, parallel machine with a memory capacity of 16384 words of 18 bits each. It possesses a comprehensive instruction repertoire and operates on 45 W of power (including the dc-to-dc converter). The machine operates at a 500-kHz rate and executes an add instruction in 10 microseconds. Its primary functions on OAO C will be auxiliary command storage, spacecraft monitoring and malfunction reporting, data compression and status summary, and possible performance of emergency corrective action for certain anomalous situations.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bartkiewicz, Karol; Miranowicz, Adam
We find an optimal quantum cloning machine, which clones qubits of arbitrary symmetrical distribution around the Bloch vector with the highest fidelity. The process is referred to as phase-independent cloning in contrast to the standard phase-covariant cloning for which an input qubit state is a priori better known. We assume that the information about the input state is encoded in an arbitrary axisymmetric distribution (phase function) on the Bloch sphere of the cloned qubits. We find analytical expressions describing the optimal cloning transformation and fidelity of the clones. As an illustration, we analyze cloning of qubit state described by themore » von Mises-Fisher and Brosseau distributions. Moreover, we show that the optimal phase-independent cloning machine can be implemented by modifying the mirror phase-covariant cloning machine for which quantum circuits are known.« less
Support Vector Machine-Based Endmember Extraction
DOE Office of Scientific and Technical Information (OSTI.GOV)
Filippi, Anthony M; Archibald, Richard K
Introduced in this paper is the utilization of Support Vector Machines (SVMs) to automatically perform endmember extraction from hyperspectral data. The strengths of SVM are exploited to provide a fast and accurate calculated representation of high-dimensional data sets that may consist of multiple distributions. Once this representation is computed, the number of distributions can be determined without prior knowledge. For each distribution, an optimal transform can be determined that preserves informational content while reducing the data dimensionality, and hence, the computational cost. Finally, endmember extraction for the whole data set is accomplished. Results indicate that this Support Vector Machine-Based Endmembermore » Extraction (SVM-BEE) algorithm has the capability of autonomously determining endmembers from multiple clusters with computational speed and accuracy, while maintaining a robust tolerance to noise.« less
Electron beam machining using rotating and shaped beam power distribution
Elmer, John W.; O'Brien, Dennis W.
1996-01-01
An apparatus and method for electron beam (EB) machining (drilling, cutting and welding) that uses conventional EB guns, power supplies, and welding machine technology without the need for fast bias pulsing technology. The invention involves a magnetic lensing (EB optics) system and electronic controls to: 1) concurrently bend, focus, shape, scan, and rotate the beam to protect the EB gun and to create a desired effective power-density distribution, and 2) rotate or scan this shaped beam in a controlled way. The shaped beam power-density distribution can be measured using a tomographic imaging system. For example, the EB apparatus of this invention has the ability to drill holes in metal having a diameter up to 1000 .mu.m (1 mm or larger), compared to the 250 .mu.m diameter of laser drilling.
NASA Astrophysics Data System (ADS)
Sembiring, N.; Ginting, E.; Darnello, T.
2017-12-01
Problems that appear in a company that produces refined sugar, the production floor has not reached the level of critical machine availability because it often suffered damage (breakdown). This results in a sudden loss of production time and production opportunities. This problem can be solved by Reliability Engineering method where the statistical approach to historical damage data is performed to see the pattern of the distribution. The method can provide a value of reliability, rate of damage, and availability level, of an machine during the maintenance time interval schedule. The result of distribution test to time inter-damage data (MTTF) flexible hose component is lognormal distribution while component of teflon cone lifthing is weibull distribution. While from distribution test to mean time of improvement (MTTR) flexible hose component is exponential distribution while component of teflon cone lifthing is weibull distribution. The actual results of the flexible hose component on the replacement schedule per 720 hours obtained reliability of 0.2451 and availability 0.9960. While on the critical components of teflon cone lifthing actual on the replacement schedule per 1944 hours obtained reliability of 0.4083 and availability 0.9927.
NASA Technical Reports Server (NTRS)
Jaeckel, Louis A.
1988-01-01
In Kanerva's Sparse Distributed Memory, writing to and reading from the memory are done in relation to spheres in an n-dimensional binary vector space. Thus it is important to know how many points are in the intersection of two spheres in this space. Two proofs are given of Wang's formula for spheres of unequal radii, and an integral approximation for the intersection in this case.
Global Load Balancing with Parallel Mesh Adaption on Distributed-Memory Systems
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Oliker, Leonid; Sohn, Andrew
1996-01-01
Dynamic mesh adaptation on unstructured grids is a powerful tool for efficiently computing unsteady problems to resolve solution features of interest. Unfortunately, this causes load inbalances among processors on a parallel machine. This paper described the parallel implementation of a tetrahedral mesh adaption scheme and a new global load balancing method. A heuristic remapping algorithm is presented that assigns partitions to processors such that the redistribution coast is minimized. Results indicate that the parallel performance of the mesh adaption code depends on the nature of the adaption region and show a 35.5X speedup on 64 processors of an SP2 when 35 percent of the mesh is randomly adapted. For large scale scientific computations, our load balancing strategy gives an almost sixfold reduction in solver execution times over non-balanced loads. Furthermore, our heuristic remappier yields processor assignments that are less than 3 percent of the optimal solutions, but requires only 1 percent of the computational time.
NASA Technical Reports Server (NTRS)
OKeefe, Matthew (Editor); Kerr, Christopher L. (Editor)
1998-01-01
This report contains the abstracts and technical papers from the Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications, held June 15-18, 1998, in Scottsdale, Arizona. The purpose of the workshop is to bring together software developers in meteorology and oceanography to discuss software engineering and code design issues for parallel architectures, including Massively Parallel Processors (MPP's), Parallel Vector Processors (PVP's), Symmetric Multi-Processors (SMP's), Distributed Shared Memory (DSM) multi-processors, and clusters. Issues to be discussed include: (1) code architectures for current parallel models, including basic data structures, storage allocation, variable naming conventions, coding rules and styles, i/o and pre/post-processing of data; (2) designing modular code; (3) load balancing and domain decomposition; (4) techniques that exploit parallelism efficiently yet hide the machine-related details from the programmer; (5) tools for making the programmer more productive; and (6) the proliferation of programming models (F--, OpenMP, MPI, and HPF).
Computation of Coupled Thermal-Fluid Problems in Distributed Memory Environment
NASA Technical Reports Server (NTRS)
Wei, H.; Shang, H. M.; Chen, Y. S.
2001-01-01
The thermal-fluid coupling problems are very important to aerospace and engineering applications. Instead of analyzing heat transfer and fluid flow separately, this study merged two well-accepted engineering solution methods, SINDA for thermal analysis and FDNS for fluid flow simulation, into a unified multi-disciplinary thermal fluid prediction method. A fully conservative patched grid interface algorithm for arbitrary two-dimensional and three-dimensional geometry has been developed. The state-of-the-art parallel computing concept was used to couple SINDA and FDNS for the communication of boundary conditions through PVM (Parallel Virtual Machine) libraries. Therefore, the thermal analysis performed by SINDA and the fluid flow calculated by FDNS are fully coupled to obtain steady state or transient solutions. The natural convection between two thick-walled eccentric tubes was calculated and the predicted results match the experiment data perfectly. A 3-D rocket engine model and a real 3-D SSME geometry were used to test the current model, and the reasonable temperature field was obtained.
NASA Astrophysics Data System (ADS)
Vestrand, W. T.; Theiler, J.; Woznia, P. R.
2004-10-01
The existence of rapidly slewing robotic telescopes and fast alert distribution via the Internet is revolutionizing our capability to study the physics of fast astrophysical transients. But the salient challenge that optical time domain surveys must conquer is mining the torrent of data to recognize important transients in a scene full of normal variations. Humans simply do not have the attention span, memory, or reaction time required to recognize fast transients and rapidly respond. Autonomous robotic instrumentation with the ability to extract pertinent information from the data stream in real time will therefore be essential for recognizing transients and commanding rapid follow-up observations while the ephemeral behavior is still present. Here we discuss how the development and integration of three technologies: (1) robotic telescope networks; (2) machine learning; and (3) advanced database technology, can enable the construction of smart robotic telescopes, which we loosely call ``thinking'' telescopes, capable of mining the sky in real time.
The design and implementation of a parallel unstructured Euler solver using software primitives
NASA Technical Reports Server (NTRS)
Das, R.; Mavriplis, D. J.; Saltz, J.; Gupta, S.; Ponnusamy, R.
1992-01-01
This paper is concerned with the implementation of a three-dimensional unstructured grid Euler-solver on massively parallel distributed-memory computer architectures. The goal is to minimize solution time by achieving high computational rates with a numerically efficient algorithm. An unstructured multigrid algorithm with an edge-based data structure has been adopted, and a number of optimizations have been devised and implemented in order to accelerate the parallel communication rates. The implementation is carried out by creating a set of software tools, which provide an interface between the parallelization issues and the sequential code, while providing a basis for future automatic run-time compilation support. Large practical unstructured grid problems are solved on the Intel iPSC/860 hypercube and Intel Touchstone Delta machine. The quantitative effect of the various optimizations are demonstrated, and we show that the combined effect of these optimizations leads to roughly a factor of three performance improvement. The overall solution efficiency is compared with that obtained on the CRAY-YMP vector supercomputer.
Parallelization of an Object-Oriented Unstructured Aeroacoustics Solver
NASA Technical Reports Server (NTRS)
Baggag, Abdelkader; Atkins, Harold; Oezturan, Can; Keyes, David
1999-01-01
A computational aeroacoustics code based on the discontinuous Galerkin method is ported to several parallel platforms using MPI. The discontinuous Galerkin method is a compact high-order method that retains its accuracy and robustness on non-smooth unstructured meshes. In its semi-discrete form, the discontinuous Galerkin method can be combined with explicit time marching methods making it well suited to time accurate computations. The compact nature of the discontinuous Galerkin method also makes it well suited for distributed memory parallel platforms. The original serial code was written using an object-oriented approach and was previously optimized for cache-based machines. The port to parallel platforms was achieved simply by treating partition boundaries as a type of boundary condition. Code modifications were minimal because boundary conditions were abstractions in the original program. Scalability results are presented for the SCI Origin, IBM SP2, and clusters of SGI and Sun workstations. Slightly superlinear speedup is achieved on a fixed-size problem on the Origin, due to cache effects.
Sparse distributed memory: Principles and operation
NASA Technical Reports Server (NTRS)
Flynn, M. J.; Kanerva, P.; Bhadkamkar, N.
1989-01-01
Sparse distributed memory is a generalized random access memory (RAM) for long (1000 bit) binary words. Such words can be written into and read from the memory, and they can also be used to address the memory. The main attribute of the memory is sensitivity to similarity, meaning that a word can be read back not only by giving the original write address but also by giving one close to it as measured by the Hamming distance between addresses. Large memories of this kind are expected to have wide use in speech recognition and scene analysis, in signal detection and verification, and in adaptive control of automated equipment, in general, in dealing with real world information in real time. The memory can be realized as a simple, massively parallel computer. Digital technology has reached a point where building large memories is becoming practical. Major design issues were resolved which were faced in building the memories. The design is described of a prototype memory with 256 bit addresses and from 8 to 128 K locations for 256 bit words. A key aspect of the design is extensive use of dynamic RAM and other standard components.
1990-02-01
human-to- human communication patterns during situation assessment and cooperative problem solving tasks. The research proposed for the second URRP year...Hardware development. In order to create an environment within which to study multi-channeled human-to- human communication , a multi-media observation...that machine-to- human communication can be used to increase cohesion between humans and intelligent machines and to promote human-machine team
Energy Harvesting for Soft-Matter Machines and Electronics
2016-06-09
AFRL-AFOSR-VA-TR-2016-0353 Energy Harvesting for Soft-Matter Machines and Electronics Carmel Majidi CARNEGIE MELLON UNIVERSITY Final Report 06/09...ES) CARNEGIE MELLON UNIVERSITY 5000 FORBES AVENUE PITTSBURGH, PA 15213-3815 US 8. PERFORMING ORGANIZATION REPORT NUMBER 9. SPONSORING/MONITORING...livelink.ebs.afrl.af.mil/livelink/llisapi.dll DISTRIBUTION A: Distribution approved for public release. Carnegie Mellon University MECHANICAL ENGINEERING FINAL
NASA Technical Reports Server (NTRS)
Warren, Wayne H., Jr.; Adelman, Saul J.
1989-01-01
The machine-readable version of the multiplet table, as it is currently being distributed from the Astronomical Data Center, is described. The computerized version of the table contains data on excitation potentials, J values, multiplet terms, intensities of the transitions, and multiplet numbers. Files ordered by multiplet and by wavelength are included in the distributed version.
Eternal Sunshine of the Spotless Machine: Protecting Privacy with Ephemeral Channels
Dunn, Alan M.; Lee, Michael Z.; Jana, Suman; Kim, Sangman; Silberstein, Mark; Xu, Yuanzhong; Shmatikov, Vitaly; Witchel, Emmett
2014-01-01
Modern systems keep long memories. As we show in this paper, an adversary who gains access to a Linux system, even one that implements secure deallocation, can recover the contents of applications’ windows, audio buffers, and data remaining in device drivers—long after the applications have terminated. We design and implement Lacuna, a system that allows users to run programs in “private sessions.” After the session is over, all memories of its execution are erased. The key abstraction in Lacuna is an ephemeral channel, which allows the protected program to talk to peripheral devices while making it possible to delete the memories of this communication from the host. Lacuna can run unmodified applications that use graphics, sound, USB input devices, and the network, with only 20 percentage points of additional CPU utilization. PMID:24755709
Dahl, Jonna J; Kingo, Osman S; Krøjgaard, Peter
2015-12-01
In a seminal study Simcock and Hayne (2002) showed that 3-year-olds were unable to use newly acquired words to describe a "magic" event experienced 6 or 12 months earlier. In the reference study the children's verbal recall was tested without props being present. Inspired by recent evidence, the original design was replicated, testing 33-and 39-month-olds (n = 180), but with props present at recall while controlling for potential online reasoning. The results revealed that the children did use newly acquired words to describe their preverbal memory. Thus, the present study shows that nonverbal memories can be verbalized if the recall setting provides a high level of contextual support, a finding relevant to researchers investigating the offset of childhood amnesia. (c) 2015 APA, all rights reserved).
NASA Technical Reports Server (NTRS)
Corker, Kevin M.; Pisanich, Gregory M.; Lebacqz, Victor (Technical Monitor)
1996-01-01
The Man-Machine Interaction Design and Analysis System (MIDAS) has been under development for the past ten years through a joint US Army and NASA cooperative agreement. MIDAS represents multiple human operators and selected perceptual, cognitive, and physical functions of those operators as they interact with simulated systems. MIDAS has been used as an integrated predictive framework for the investigation of human/machine systems, particularly in situations with high demands on the operators. Specific examples include: nuclear power plant crew simulation, military helicopter flight crew response, and police force emergency dispatch. In recent applications to airborne systems development, MIDAS has demonstrated an ability to predict flight crew decision-making and procedural behavior when interacting with automated flight management systems and Air Traffic Control. In this paper we describe two enhancements to MIDAS. The first involves the addition of working memory in the form of an articulatory buffer for verbal communication protocols and a visuo-spatial buffer for communications via digital datalink. The second enhancement is a representation of multiple operators working as a team. This enhanced model was used to predict the performance of human flight crews and their level of compliance with commercial aviation communication procedures. We show how the data produced by MIDAS compares with flight crew performance data from full mission simulations. Finally, we discuss the use of these features to study communications issues connected with aircraft-based separation assurance.
Binary classification of items of interest in a repeatable process
Abell, Jeffrey A; Spicer, John Patrick; Wincek, Michael Anthony; Wang, Hui; Chakraborty, Debejyo
2015-01-06
A system includes host and learning machines. Each machine has a processor in electrical communication with at least one sensor. Instructions for predicting a binary quality status of an item of interest during a repeatable process are recorded in memory. The binary quality status includes passing and failing binary classes. The learning machine receives signals from the at least one sensor and identifies candidate features. Features are extracted from the candidate features, each more predictive of the binary quality status. The extracted features are mapped to a dimensional space having a number of dimensions proportional to the number of extracted features. The dimensional space includes most of the passing class and excludes at least 90 percent of the failing class. Received signals are compared to the boundaries of the recorded dimensional space to predict, in real time, the binary quality status of a subsequent item of interest.
Are memory traces localized or distributed?
Thompson, R F
1991-01-01
Evidence supports the view that "memory traces" are formed in the hippocampus and in the cerebellum in classical conditioning of discrete behavioral responses (e.g. eyeblink conditioning). In the hippocampus, learning results in long-lasting increases in excitability of pyramidal neurons that appear to be localized to these neurons (i.e. changes in membrane properties and receptor function). However, these learning-altered pyramidal neurons are distributed widely throughout CA3 and CA1. Although it plays a key role in certain aspects of classical conditioning, the hippocampus is not necessary for learning and memory of the basic conditioned responses. The cerebellum and its associated brain stem circuitry, on the other hand, does appear to be essential (necessary and sufficient) for learning and memory of the conditioned response. Evidence to date is most consistent with a localized trace in the interpositus nucleus and multiple localized traces in cerebellar cortex, each involving relatively large ensembles of neurons. Perhaps "procedural" memory traces are relatively localized and "declarative" traces more widely distributed.
NASA Technical Reports Server (NTRS)
Chung, Ming-Ying; Ciardo, Gianfranco; Siminiceanu, Radu I.
2007-01-01
The Saturation algorithm for symbolic state-space generation, has been a recent break-through in the exhaustive veri cation of complex systems, in particular globally-asyn- chronous/locally-synchronous systems. The algorithm uses a very compact Multiway Decision Diagram (MDD) encoding for states and the fastest symbolic exploration algo- rithm to date. The distributed version of Saturation uses the overall memory available on a network of workstations (NOW) to efficiently spread the memory load during the highly irregular exploration. A crucial factor in limiting the memory consumption during the symbolic state-space generation is the ability to perform garbage collection to free up the memory occupied by dead nodes. However, garbage collection over a NOW requires a nontrivial communication overhead. In addition, operation cache policies become critical while analyzing large-scale systems using the symbolic approach. In this technical report, we develop a garbage collection scheme and several operation cache policies to help on solving extremely complex systems. Experiments show that our schemes improve the performance of the original distributed implementation, SmArTNow, in terms of time and memory efficiency.
A distributed version of the NASA Engine Performance Program
NASA Technical Reports Server (NTRS)
Cours, Jeffrey T.; Curlett, Brian P.
1993-01-01
Distributed NEPP, a version of the NASA Engine Performance Program, uses the original NEPP code but executes it in a distributed computer environment. Multiple workstations connected by a network increase the program's speed and, more importantly, the complexity of the cases it can handle in a reasonable time. Distributed NEPP uses the public domain software package, called Parallel Virtual Machine, allowing it to execute on clusters of machines containing many different architectures. It includes the capability to link with other computers, allowing them to process NEPP jobs in parallel. This paper discusses the design issues and granularity considerations that entered into programming Distributed NEPP and presents the results of timing runs.
Sparse distributed memory: understanding the speed and robustness of expert memory
Brogliato, Marcelo S.; Chada, Daniel M.; Linhares, Alexandre
2014-01-01
How can experts, sometimes in exacting detail, almost immediately and very precisely recall memory items from a vast repertoire? The problem in which we will be interested concerns models of theoretical neuroscience that could explain the speed and robustness of an expert's recollection. The approach is based on Sparse Distributed Memory, which has been shown to be plausible, both in a neuroscientific and in a psychological manner, in a number of ways. A crucial characteristic concerns the limits of human recollection, the “tip-of-tongue” memory event—which is found at a non-linearity in the model. We expand the theoretical framework, deriving an optimization formula to solve this non-linearity. Numerical results demonstrate how the higher frequency of rehearsal, through work or study, immediately increases the robustness and speed associated with expert memory. PMID:24808842
Virtual Parts Engineering Research Center
2010-05-20
engineering 10 materials. High strength alloys , composites (polymer composites and metallic composites), and the like cannot merely be replaced by...ceramics, smart materials, shape memory alloys , super plastic materials and nano- structured materials may be more appropriate substitutes in a reverse...molding process using thermosetting Bakelite. For remanufacturing the part in small quantities, machining has been identified as the most economical
Thermodynamic Model of Spatial Memory
NASA Astrophysics Data System (ADS)
Kaufman, Miron; Allen, P.
1998-03-01
We develop and test a thermodynamic model of spatial memory. Our model is an application of statistical thermodynamics to cognitive science. It is related to applications of the statistical mechanics framework in parallel distributed processes research. Our macroscopic model allows us to evaluate an entropy associated with spatial memory tasks. We find that older adults exhibit higher levels of entropy than younger adults. Thurstone's Law of Categorical Judgment, according to which the discriminal processes along the psychological continuum produced by presentations of a single stimulus are normally distributed, is explained by using a Hooke spring model of spatial memory. We have also analyzed a nonlinear modification of the ideal spring model of spatial memory. This work is supported by NIH/NIA grant AG09282-06.
ERIC Educational Resources Information Center
Richardson, E.; Clayman, Linda
As a result of studies on fitting and machining apprentices attitudes toward employers, a study was conducted to obtain the attitudes of a sample of employers toward apprenticeship. Three hundred questionnaires were distributed to employers of fitting and machine students studying at a number of Sydney (Australia) Technical Colleges. An…
NASA Technical Reports Server (NTRS)
Warren, W. H., Jr.
1982-01-01
The contents and format of the machine-readable version of the cataloque distributed by the Astronomical Data Center are described. Coding for the various scales and abbreviations used in the catalogue are tabulated and certain revisions to the machine version made to improve storage efficiency and notation are discussed.
31 CFR 12.3 - Sale of tobacco products in vending machines prohibited.
Code of Federal Regulations, 2010 CFR
2010-07-01
... RESTRICTION OF SALE AND DISTRIBUTION OF TOBACCO PRODUCTS § 12.3 Sale of tobacco products in vending machines prohibited. The sale of tobacco products in vending machines located in or around any Federal building under... 31 Money and Finance: Treasury 1 2010-07-01 2010-07-01 false Sale of tobacco products in vending...
Forensic Analysis of Window’s(Registered) Virtual Memory Incorporating the System’s Page-File
2008-12-01
Management and Budget, Paperwork Reduction Project (0704-0188) Washington DC 20503. 1. AGENCY USE ONLY (Leave blank) 2. REPORT DATE December...data in a meaningful way. One reason for this is how memory is managed by the operating system. Data belonging to one process can be distributed...way. One reason for this is how memory is managed by the operating system. Data belonging to one process can be distributed arbitrarily across
Distributed practice can boost evaluative conditioning by increasing memory for the stimulus pairs.
Richter, Jasmin; Gast, Anne
2017-09-01
When presenting a neutral stimulus (CS) in close temporal and spatial proximity to a positive or negative stimulus (US) the former is often observed to adopt the valence of the latter, a phenomenon named evaluative conditioning (EC). It is already well established that under most conditions, contingency awareness is important for an EC effect to occur. In addition to that, some findings suggest that awareness of the stimulus pairs is not only relevant during the learning phase, but that it is also relevant whether memory for the pairings is still available during the measurement phase. As previous research has shown that memory is better after temporally distributed than after contiguous (massed) repetitions, it seems plausible that also EC effects are moderated by distributed practice manipulations. This was tested in the current studies. In two experiments with successful distributed practice manipulations on memory, we show that also the magnitude of the EC effect was larger for pairs learned under spaced compared to massed conditions. Both effects, on memory and on EC, are found after a within-participant and after a between-participant manipulation. However, we did not find significant differences in the EC effect for different conditions of spaced practice. These findings are in line with the assumption that EC is based on similar processes as memory for the pairings. Copyright © 2017 Elsevier B.V. All rights reserved.
Motion Estimation System Utilizing Point Cloud Registration
NASA Technical Reports Server (NTRS)
Chen, Qi (Inventor)
2016-01-01
A system and method of estimation motion of a machine is disclosed. The method may include determining a first point cloud and a second point cloud corresponding to an environment in a vicinity of the machine. The method may further include generating a first extended gaussian image (EGI) for the first point cloud and a second EGI for the second point cloud. The method may further include determining a first EGI segment based on the first EGI and a second EGI segment based on the second EGI. The method may further include determining a first two dimensional distribution for points in the first EGI segment and a second two dimensional distribution for points in the second EGI segment. The method may further include estimating motion of the machine based on the first and second two dimensional distributions.
Electron beam machining using rotating and shaped beam power distribution
Elmer, J.W.; O`Brien, D.W.
1996-07-09
An apparatus and method are disclosed for electron beam (EB) machining (drilling, cutting and welding) that uses conventional EB guns, power supplies, and welding machine technology without the need for fast bias pulsing technology. The invention involves a magnetic lensing (EB optics) system and electronic controls to: (1) concurrently bend, focus, shape, scan, and rotate the beam to protect the EB gun and to create a desired effective power-density distribution, and (2) rotate or scan this shaped beam in a controlled way. The shaped beam power-density distribution can be measured using a tomographic imaging system. For example, the EB apparatus of this invention has the ability to drill holes in metal having a diameter up to 1,000 {micro}m (1 mm or larger), compared to the 250 {micro}m diameter of laser drilling. 5 figs.
Distribution of man-machine controls in space teleoperation
NASA Technical Reports Server (NTRS)
Bejczy, A. K.
1982-01-01
The distribution of control between man and machine is dependent on the tasks, available technology, human performance characteristics and control goals. This dependency has very specific projections on systems designed for teleoperation in space. This paper gives a brief outline of the space-related issues and presents the results of advanced teleoperator research and development at the Jet Propulsion Laboratory (JPL). The research and development work includes smart sensors, flexible computer controls and intelligent man-machine interface devices in the area of visual displays and kinesthetic man-machine coupling in remote control of manipulators. Some of the development results have been tested at the Johnson Space Center (JSC) using the simulated full-scale Shuttle Remote Manipulator System (RMS). The research and development work for advanced space teleoperation is far from complete and poses many interdisciplinary challenges.
A Collaborative Framework for Distributed Privacy-Preserving Support Vector Machine Learning
Que, Jialan; Jiang, Xiaoqian; Ohno-Machado, Lucila
2012-01-01
A Support Vector Machine (SVM) is a popular tool for decision support. The traditional way to build an SVM model is to estimate parameters based on a centralized repository of data. However, in the field of biomedicine, patient data are sometimes stored in local repositories or institutions where they were collected, and may not be easily shared due to privacy concerns. This creates a substantial barrier for researchers to effectively learn from the distributed data using machine learning tools like SVMs. To overcome this difficulty and promote efficient information exchange without sharing sensitive raw data, we developed a Distributed Privacy Preserving Support Vector Machine (DPP-SVM). The DPP-SVM enables privacy-preserving collaborative learning, in which a trusted server integrates “privacy-insensitive” intermediary results. The globally learned model is guaranteed to be exactly the same as learned from combined data. We also provide a free web-service (http://privacy.ucsd.edu:8080/ppsvm/) for multiple participants to collaborate and complete the SVM-learning task in an efficient and privacy-preserving manner. PMID:23304414
NASA Technical Reports Server (NTRS)
Bose, Bimal K.; Kim, Min-Huei
1995-01-01
The report essentially summarizes the work performed in order to satisfy the above project objective. In the beginning, different energy storage devices, such as battery, flywheel and ultra capacitor are reviewed and compared, establishing the superiority of the battery. Then, the possible power sources, such as IC engine, diesel engine, gas turbine and fuel cell are reviewed and compared, and the superiority of IC engine has been established. Different types of machines for drive motor/engine generator, such as induction machine, PM synchronous machine and switched reluctance machine are compared, and the induction machine is established as the superior candidate. Similar discussion was made for power converters and devices. The Insulated Gate Bipolar Transistor (IGBT) appears to be the most superior device although Mercury Cadmium Telluride (MCT) shows future promise. Different types of candidate distribution systems with the possible combinations of power and energy sources have been discussed and the most viable system consisting of battery, IC engine and induction machine has been identified. Then, HFAC system has been compared with the DC system establishing the superiority of the former. The detailed component sizing calculations of HFAC and DC systems reinforce the superiority of the former. A preliminary control strategy has been developed for the candidate HFAC system. Finally, modeling and simulation study have been made to validate the system performance. The study in the report demonstrates the superiority of HFAC distribution system for next generation electric/hybrid vehicle.
Autobiographical Memory in Semantic Dementia: A Longitudinal fMRI Study
ERIC Educational Resources Information Center
Maguire, Eleanor A.; Kumaran, Dharshan; Hassabis, Demis; Kopelman, Michael D.
2010-01-01
Whilst patients with semantic dementia (SD) are known to suffer from semantic memory and language impairments, there is less agreement about whether memory for personal everyday experiences, autobiographical memory, is compromised. In healthy individuals, functional MRI (fMRI) has helped to delineate a consistent and distributed brain network…
Gesture-controlled interfaces for self-service machines and other applications
NASA Technical Reports Server (NTRS)
Cohen, Charles J. (Inventor); Jacobus, Charles J. (Inventor); Paul, George (Inventor); Beach, Glenn (Inventor); Foulk, Gene (Inventor); Obermark, Jay (Inventor); Cavell, Brook (Inventor)
2004-01-01
A gesture recognition interface for use in controlling self-service machines and other devices is disclosed. A gesture is defined as motions and kinematic poses generated by humans, animals, or machines. Specific body features are tracked, and static and motion gestures are interpreted. Motion gestures are defined as a family of parametrically delimited oscillatory motions, modeled as a linear-in-parameters dynamic system with added geometric constraints to allow for real-time recognition using a small amount of memory and processing time. A linear least squares method is preferably used to determine the parameters which represent each gesture. Feature position measure is used in conjunction with a bank of predictor bins seeded with the gesture parameters, and the system determines which bin best fits the observed motion. Recognizing static pose gestures is preferably performed by localizing the body/object from the rest of the image, describing that object, and identifying that description. The disclosure details methods for gesture recognition, as well as the overall architecture for using gesture recognition to control of devices, including self-service machines.
Distribution of quantum Fisher information in asymmetric cloning machines
Xiao, Xing; Yao, Yao; Zhou, Lei-Ming; Wang, Xiaoguang
2014-01-01
An unknown quantum state cannot be copied and broadcast freely due to the no-cloning theorem. Approximate cloning schemes have been proposed to achieve the optimal cloning characterized by the maximal fidelity between the original and its copies. Here, from the perspective of quantum Fisher information (QFI), we investigate the distribution of QFI in asymmetric cloning machines which produce two nonidentical copies. As one might expect, improving the QFI of one copy results in decreasing the QFI of the other copy. It is perhaps also unsurprising that asymmetric phase-covariant cloning outperforms universal cloning in distributing QFI since a priori information of the input state has been utilized. However, interesting results appear when we compare the distributabilities of fidelity (which quantifies the full information of quantum states), and QFI (which only captures the information of relevant parameters) in asymmetric cloning machines. Unlike the results of fidelity, where the distributability of symmetric cloning is always optimal for any d-dimensional cloning, we find that any asymmetric cloning outperforms symmetric cloning on the distribution of QFI for d ≤ 18, whereas some but not all asymmetric cloning strategies could be worse than symmetric ones when d > 18. PMID:25484234
Efficient High Performance Collective Communication for Distributed Memory Environments
ERIC Educational Resources Information Center
Ali, Qasim
2009-01-01
Collective communication allows efficient communication and synchronization among a collection of processes, unlike point-to-point communication that only involves a pair of communicating processes. Achieving high performance for both kernels and full-scale applications running on a distributed memory system requires an efficient implementation of…
NASA Astrophysics Data System (ADS)
Hong, Seok Bin; Ahn, Yong San; Jang, Joon Hyeok; Kim, Jin-Gyun; Goo, Nam Seo; Yu, Woong-Ryeol
2016-04-01
Shape memory polymer (SMP) is one of smart polymers which exhibit shape memory effect upon external stimuli. Reinforcements as carbon fiber had been used for making shape memory polymer composite (CF-SMPC). This study investigated a possibility of designing self-deployable structures in harsh space condition using CF-SMPCs and analyzed their shape memory behaviors with constitutive equation model.CF-SMPCs were prepared using woven carbon fabrics and a thermoset epoxy based SMP to obtain their basic mechanical properties including actuation in harsh environment. The mechanical and shape memory properties of SMP and CF-SMPCs were characterized using dynamic mechanical analysis (DMA) and universal tensile machine (UTM) with an environmental chamber. The mechanical properties such as flexural strength and tensile strength of SMP and CF-SMPC were measured with simple tensile/bending test and time dependent shape memory behavior was characterized with designed shape memory bending test. For mechanical analysis of CF-SMPCs, a 3D constitutive equation of SMP, which had been developed using multiplicative decomposition of the deformation gradient and shape memory strains, was used with material parameters determined from CF-SMPCs. Carbon fibers in composites reinforced tensile and flexural strength of SMP and acted as strong elastic springs in rheology based equation models. The actuation behavior of SMP matrix and CF-SMPCs was then simulated as 3D shape memory bending cases. Fiber bundle property was imbued with shell model for more precise analysis and it would be used for prediction of deploying behavior in self-deployable hinge structure.
Efficient packing of patterns in sparse distributed memory by selective weighting of input bits
NASA Technical Reports Server (NTRS)
Kanerva, Pentti
1991-01-01
When a set of patterns is stored in a distributed memory, any given storage location participates in the storage of many patterns. From the perspective of any one stored pattern, the other patterns act as noise, and such noise limits the memory's storage capacity. The more similar the retrieval cues for two patterns are, the more the patterns interfere with each other in memory, and the harder it is to separate them on retrieval. A method is described of weighting the retrieval cues to reduce such interference and thus to improve the separability of patterns that have similar cues.
Angular distribution of fusion products and x rays emitted by a small dense plasma focus machine
NASA Astrophysics Data System (ADS)
Castillo, F.; Herrera, J. J. E.; Gamboa, Isabel; Rangel, J.; Golzarri, J. I.; Espinosa, G.
2007-01-01
Time integrated measurements of the angular distributions of fusion products and x rays in a small dense plasma focus machine are made inside the discharge chamber, using passive detectors. The machine is operated at 37kV with a stored energy of 4.8kJ and a deuterium filling pressure of 2.75torr. Distributions of protons and neutrons are measured with CR-39 Lantrack® nuclear track detectors, on 1.8×0.9cm2 chips, 500μm thick. A set of detectors was placed on a semicircular Teflon® holder, 13cm away from the plasma column, and covered with 15μm Al filters, thus eliminating tritium and helium-3 ions, but not protons and neutrons. A second set was placed on the opposite side of the holder, eliminating protons. The angular distribution of x rays is also studied within the chamber with TLD-200 dosimeters. While the neutron angular distributions can be fitted by Gaussian curves mounted on constant pedestals and the proton distributions are strongly peaked, falling rapidly after ±40°, the x-ray distributions show two maxima around the axis, presumably as a result of the collision of a collimated electron beam against the inner electrode, along the axis.
An alternative design for a sparse distributed memory
NASA Technical Reports Server (NTRS)
Jaeckel, Louis A.
1989-01-01
A new design for a Sparse Distributed Memory, called the selected-coordinate design, is described. As in the original design, there are a large number of memory locations, each of which may be activated by many different addresses (binary vectors) in a very large address space. Each memory location is defined by specifying ten selected coordinates (bit positions in the address vectors) and a set of corresponding assigned values, consisting of one bit for each selected coordinate. A memory location is activated by an address if, for all ten of the locations's selected coordinates, the corresponding bits in the address vector match the respective assigned value bits, regardless of the other bits in the address vector. Some comparative memory capacity and signal-to-noise ratio estimates for the both the new and original designs are given. A few possible hardware embodiments of the new design are described.
Rathbone, Clare J; Steel, Craig
2015-01-01
The relationship between developmental experiences, and an individual's emerging beliefs about themselves and the world, is central to many forms of psychotherapy. People suffering from a variety of mental health problems have been shown to use negative memories when defining the self; however, little is known about how these negative memories might be organised and relate to negative self-images. In two online studies with middle-aged (N = 18; study 1) and young (N = 56; study 2) adults, we found that participants' negative self-images (e.g., I am a failure) were associated with sets of autobiographical memories that formed clustered distributions around times of self-formation, in much the same pattern as for positive self-images (e.g., I am talented). This novel result shows that highly organised sets of salient memories may be responsible for perpetuating negative beliefs about the self. Implications for therapy are discussed.
Parallel computing for probabilistic fatigue analysis
NASA Technical Reports Server (NTRS)
Sues, Robert H.; Lua, Yuan J.; Smith, Mark D.
1993-01-01
This paper presents the results of Phase I research to investigate the most effective parallel processing software strategies and hardware configurations for probabilistic structural analysis. We investigate the efficiency of both shared and distributed-memory architectures via a probabilistic fatigue life analysis problem. We also present a parallel programming approach, the virtual shared-memory paradigm, that is applicable across both types of hardware. Using this approach, problems can be solved on a variety of parallel configurations, including networks of single or multiprocessor workstations. We conclude that it is possible to effectively parallelize probabilistic fatigue analysis codes; however, special strategies will be needed to achieve large-scale parallelism to keep large number of processors busy and to treat problems with the large memory requirements encountered in practice. We also conclude that distributed-memory architecture is preferable to shared-memory for achieving large scale parallelism; however, in the future, the currently emerging hybrid-memory architectures will likely be optimal.
Sparse distributed memory and related models
NASA Technical Reports Server (NTRS)
Kanerva, Pentti
1992-01-01
Described here is sparse distributed memory (SDM) as a neural-net associative memory. It is characterized by two weight matrices and by a large internal dimension - the number of hidden units is much larger than the number of input or output units. The first matrix, A, is fixed and possibly random, and the second matrix, C, is modifiable. The SDM is compared and contrasted to (1) computer memory, (2) correlation-matrix memory, (3) feet-forward artificial neural network, (4) cortex of the cerebellum, (5) Marr and Albus models of the cerebellum, and (6) Albus' cerebellar model arithmetic computer (CMAC). Several variations of the basic SDM design are discussed: the selected-coordinate and hyperplane designs of Jaeckel, the pseudorandom associative neural memory of Hassoun, and SDM with real-valued input variables by Prager and Fallside. SDM research conducted mainly at the Research Institute for Advanced Computer Science (RIACS) in 1986-1991 is highlighted.
Persistently active neurons in human medial frontal and medial temporal lobe support working memory
Kamiński, J; Sullivan, S; Chung, JM; Ross, IB; Mamelak, AN; Rutishauser, U
2017-01-01
Persistent neural activity is a putative mechanism for the maintenance of working memories. Persistent activity relies on the activity of a distributed network of areas, but the differential contribution of each area remains unclear. We recorded single neurons in the human medial frontal cortex and the medial temporal lobe while subjects held up to three items in memory. We found persistently active neurons in both areas. Persistent activity of hippocampal and amygdala neurons was stimulus-specific, formed stable attractors, and was predictive of memory content. Medial frontal cortex persistent activity, on the other hand, was modulated by memory load and task set but was not stimulus-specific. Trial-by-trial variability in persistent activity in both areas was related to memory strength, because it predicted the speed and accuracy by which stimuli were remembered. This work reveals, in humans, direct evidence for a distributed network of persistently active neurons supporting working memory maintenance. PMID:28218914
Feature-Based Visual Short-Term Memory Is Widely Distributed and Hierarchically Organized.
Dotson, Nicholas M; Hoffman, Steven J; Goodell, Baldwin; Gray, Charles M
2018-06-15
Feature-based visual short-term memory is known to engage both sensory and association cortices. However, the extent of the participating circuit and the neural mechanisms underlying memory maintenance is still a matter of vigorous debate. To address these questions, we recorded neuronal activity from 42 cortical areas in monkeys performing a feature-based visual short-term memory task and an interleaved fixation task. We find that task-dependent differences in firing rates are widely distributed throughout the cortex, while stimulus-specific changes in firing rates are more restricted and hierarchically organized. We also show that microsaccades during the memory delay encode the stimuli held in memory and that units modulated by microsaccades are more likely to exhibit stimulus specificity, suggesting that eye movements contribute to visual short-term memory processes. These results support a framework in which most cortical areas, within a modality, contribute to mnemonic representations at timescales that increase along the cortical hierarchy. Copyright © 2018 Elsevier Inc. All rights reserved.
Deist, Timo M; Jochems, A; van Soest, Johan; Nalbantov, Georgi; Oberije, Cary; Walsh, Seán; Eble, Michael; Bulens, Paul; Coucke, Philippe; Dries, Wim; Dekker, Andre; Lambin, Philippe
2017-06-01
Machine learning applications for personalized medicine are highly dependent on access to sufficient data. For personalized radiation oncology, datasets representing the variation in the entire cancer patient population need to be acquired and used to learn prediction models. Ethical and legal boundaries to ensure data privacy hamper collaboration between research institutes. We hypothesize that data sharing is possible without identifiable patient data leaving the radiation clinics and that building machine learning applications on distributed datasets is feasible. We developed and implemented an IT infrastructure in five radiation clinics across three countries (Belgium, Germany, and The Netherlands). We present here a proof-of-principle for future 'big data' infrastructures and distributed learning studies. Lung cancer patient data was collected in all five locations and stored in local databases. Exemplary support vector machine (SVM) models were learned using the Alternating Direction Method of Multipliers (ADMM) from the distributed databases to predict post-radiotherapy dyspnea grade [Formula: see text]. The discriminative performance was assessed by the area under the curve (AUC) in a five-fold cross-validation (learning on four sites and validating on the fifth). The performance of the distributed learning algorithm was compared to centralized learning where datasets of all institutes are jointly analyzed. The euroCAT infrastructure has been successfully implemented in five radiation clinics across three countries. SVM models can be learned on data distributed over all five clinics. Furthermore, the infrastructure provides a general framework to execute learning algorithms on distributed data. The ongoing expansion of the euroCAT network will facilitate machine learning in radiation oncology. The resulting access to larger datasets with sufficient variation will pave the way for generalizable prediction models and personalized medicine.
Parallel algorithms for boundary value problems
NASA Technical Reports Server (NTRS)
Lin, Avi
1990-01-01
A general approach to solve boundary value problems numerically in a parallel environment is discussed. The basic algorithm consists of two steps: the local step where all the P available processors work in parallel, and the global step where one processor solves a tridiagonal linear system of the order P. The main advantages of this approach are two fold. First, this suggested approach is very flexible, especially in the local step and thus the algorithm can be used with any number of processors and with any of the SIMD or MIMD machines. Secondly, the communication complexity is very small and thus can be used as easily with shared memory machines. Several examples for using this strategy are discussed.
Learning in and from brain-based devices.
Edelman, Gerald M
2007-11-16
Biologically based mobile devices have been constructed that differ from robots based on artificial intelligence. These brain-based devices (BBDs) contain simulated brains that autonomously categorize signals from the environment without a priori instruction. Two such BBDs, Darwin VII and Darwin X, are described here. Darwin VII recognizes objects and links categories to behavior through instrumental conditioning. Darwin X puts together the "what,"when," and "where" from cues in the environment into an episodic memory that allows it to find a desired target. Although these BBDs are designed to provide insights into how the brain works, their principles may find uses in building hybrid machines. These machines would combine the learning ability of BBDs with explicitly programmed control systems.
LEOPARD on a personal computer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lancaster, D.B.
1988-01-01
The LEOPARD code is very widely used to produce four- or two-group cross sections for water reactors. Although it is heavily used it had not been downloaded to the PC. This paper has been written to announce the completion of downloading LEOPARD. LEOPARD can now be run on anything from the early PC to the most advanced 80386 machines. The only requirements are 512 Kbytes of memory (LEOPARD actually only needs 235, but with buffers, 256 Kbytes may not be enough) and two disk rives (preferably, one is a hard drive). The run times for various machines and configurations aremore » summarized. The accuracy of the PC-LEOPARD results are documented.« less
Bamatraf, Saeed; Hussain, Muhammad; Aboalsamh, Hatim; Qazi, Emad-Ul-Haq; Malik, Amir Saeed; Amin, Hafeez Ullah; Mathkour, Hassan; Muhammad, Ghulam; Imran, Hafiz Muhammad
2016-01-01
We studied the impact of 2D and 3D educational contents on learning and memory recall using electroencephalography (EEG) brain signals. For this purpose, we adopted a classification approach that predicts true and false memories in case of both short term memory (STM) and long term memory (LTM) and helps to decide whether there is a difference between the impact of 2D and 3D educational contents. In this approach, EEG brain signals are converted into topomaps and then discriminative features are extracted from them and finally support vector machine (SVM) which is employed to predict brain states. For data collection, half of sixty-eight healthy individuals watched the learning material in 2D format whereas the rest watched the same material in 3D format. After learning task, memory recall tasks were performed after 30 minutes (STM) and two months (LTM), and EEG signals were recorded. In case of STM, 97.5% prediction accuracy was achieved for 3D and 96.6% for 2D and, in case of LTM, it was 100% for both 2D and 3D. The statistical analysis of the results suggested that for learning and memory recall both 2D and 3D materials do not have much difference in case of STM and LTM.