Memristor-Based Synapse Design and Training Scheme for Neuromorphic Computing Architecture
2012-06-01
system level built upon the conventional Von Neumann computer architecture [2][3]. Developing the neuromorphic architecture at chip level by...SCHEME FOR NEUROMORPHIC COMPUTING ARCHITECTURE 5a. CONTRACT NUMBER FA8750-11-2-0046 5b. GRANT NUMBER N/A 5c. PROGRAM ELEMENT NUMBER 62788F 6...creation of memristor-based neuromorphic computing architecture. Rather than the existing crossbar-based neuron network designs, we focus on memristor
Web-based training: a new paradigm in computer-assisted instruction in medicine.
Haag, M; Maylein, L; Leven, F J; Tönshoff, B; Haux, R
1999-01-01
Computer-assisted instruction (CAI) programs based on internet technologies, especially on the world wide web (WWW), provide new opportunities in medical education. The aim of this paper is to examine different aspects of such programs, which we call 'web-based training (WBT) programs', and to differentiate them from conventional CAI programs. First, we will distinguish five different interaction types: presentation; browsing; tutorial dialogue; drill and practice; and simulation. In contrast to conventional CAI, there are four architectural types of WBT programs: client-based; remote data and knowledge; distributed teaching; and server-based. We will discuss the implications of the different architectures for developing WBT software. WBT programs have to meet other requirements than conventional CAI programs. The most important tools and programming languages for developing WBT programs will be listed and assigned to the architecture types. For the future, we expect a trend from conventional CAI towards WBT programs.
Neuromorphic Computing – From Materials Research to Systems Architecture Roundtable
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schuller, Ivan K.; Stevens, Rick; Pino, Robinson
2015-10-29
Computation in its many forms is the engine that fuels our modern civilization. Modern computation—based on the von Neumann architecture—has allowed, until now, the development of continuous improvements, as predicted by Moore’s law. However, computation using current architectures and materials will inevitably—within the next 10 years—reach a limit because of fundamental scientific reasons. DOE convened a roundtable of experts in neuromorphic computing systems, materials science, and computer science in Washington on October 29-30, 2015 to address the following basic questions: Can brain-like (“neuromorphic”) computing devices based on new material concepts and systems be developed to dramatically outperform conventional CMOS basedmore » technology? If so, what are the basic research challenges for materials sicence and computing? The overarching answer that emerged was: The development of novel functional materials and devices incorporated into unique architectures will allow a revolutionary technological leap toward the implementation of a fully “neuromorphic” computer. To address this challenge, the following issues were considered: The main differences between neuromorphic and conventional computing as related to: signaling models, timing/clock, non-volatile memory, architecture, fault tolerance, integrated memory and compute, noise tolerance, analog vs. digital, and in situ learning New neuromorphic architectures needed to: produce lower energy consumption, potential novel nanostructured materials, and enhanced computation Device and materials properties needed to implement functions such as: hysteresis, stability, and fault tolerance Comparisons of different implementations: spin torque, memristors, resistive switching, phase change, and optical schemes for enhanced breakthroughs in performance, cost, fault tolerance, and/or manufacturability.« less
Challenges & Roadmap for Beyond CMOS Computing Simulation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rodrigues, Arun F.; Frank, Michael P.
Simulating HPC systems is a difficult task and the emergence of “Beyond CMOS” architectures and execution models will increase that difficulty. This document presents a “tutorial” on some of the simulation challenges faced by conventional and non-conventional architectures (Section 1) and goals and requirements for simulating Beyond CMOS systems (Section 2). These provide background for proposed short- and long-term roadmaps for simulation efforts at Sandia (Sections 3 and 4). Additionally, a brief explanation of a proof-of-concept integration of a Beyond CMOS architectural simulator is presented (Section 2.3).
OS friendly microprocessor architecture: Hardware level computer security
NASA Astrophysics Data System (ADS)
Jungwirth, Patrick; La Fratta, Patrick
2016-05-01
We present an introduction to the patented OS Friendly Microprocessor Architecture (OSFA) and hardware level computer security. Conventional microprocessors have not tried to balance hardware performance and OS performance at the same time. Conventional microprocessors have depended on the Operating System for computer security and information assurance. The goal of the OS Friendly Architecture is to provide a high performance and secure microprocessor and OS system. We are interested in cyber security, information technology (IT), and SCADA control professionals reviewing the hardware level security features. The OS Friendly Architecture is a switched set of cache memory banks in a pipeline configuration. For light-weight threads, the memory pipeline configuration provides near instantaneous context switching times. The pipelining and parallelism provided by the cache memory pipeline provides for background cache read and write operations while the microprocessor's execution pipeline is running instructions. The cache bank selection controllers provide arbitration to prevent the memory pipeline and microprocessor's execution pipeline from accessing the same cache bank at the same time. This separation allows the cache memory pages to transfer to and from level 1 (L1) caching while the microprocessor pipeline is executing instructions. Computer security operations are implemented in hardware. By extending Unix file permissions bits to each cache memory bank and memory address, the OSFA provides hardware level computer security.
Integration of nanoscale memristor synapses in neuromorphic computing architectures
NASA Astrophysics Data System (ADS)
Indiveri, Giacomo; Linares-Barranco, Bernabé; Legenstein, Robert; Deligeorgis, George; Prodromakis, Themistoklis
2013-09-01
Conventional neuro-computing architectures and artificial neural networks have often been developed with no or loose connections to neuroscience. As a consequence, they have largely ignored key features of biological neural processing systems, such as their extremely low-power consumption features or their ability to carry out robust and efficient computation using massively parallel arrays of limited precision, highly variable, and unreliable components. Recent developments in nano-technologies are making available extremely compact and low power, but also variable and unreliable solid-state devices that can potentially extend the offerings of availing CMOS technologies. In particular, memristors are regarded as a promising solution for modeling key features of biological synapses due to their nanoscale dimensions, their capacity to store multiple bits of information per element and the low energy required to write distinct states. In this paper, we first review the neuro- and neuromorphic computing approaches that can best exploit the properties of memristor and scale devices, and then propose a novel hybrid memristor-CMOS neuromorphic circuit which represents a radical departure from conventional neuro-computing approaches, as it uses memristors to directly emulate the biophysics and temporal dynamics of real synapses. We point out the differences between the use of memristors in conventional neuro-computing architectures and the hybrid memristor-CMOS circuit proposed, and argue how this circuit represents an ideal building block for implementing brain-inspired probabilistic computing paradigms that are robust to variability and fault tolerant by design.
DOE Office of Scientific and Technical Information (OSTI.GOV)
McCaskey, Alexander J.
Hybrid programming models for beyond-CMOS technologies will prove critical for integrating new computing technologies alongside our existing infrastructure. Unfortunately the software infrastructure required to enable this is lacking or not available. XACC is a programming framework for extreme-scale, post-exascale accelerator architectures that integrates alongside existing conventional applications. It is a pluggable framework for programming languages developed for next-gen computing hardware architectures like quantum and neuromorphic computing. It lets computational scientists efficiently off-load classically intractable work to attached accelerators through user-friendly Kernel definitions. XACC makes post-exascale hybrid programming approachable for domain computational scientists.
Exploration of operator method digital optical computers for application to NASA
NASA Technical Reports Server (NTRS)
1990-01-01
Digital optical computer design has been focused primarily towards parallel (single point-to-point interconnection) implementation. This architecture is compared to currently developing VHSIC systems. Using demonstrated multichannel acousto-optic devices, a figure of merit can be formulated. The focus is on a figure of merit termed Gate Interconnect Bandwidth Product (GIBP). Conventional parallel optical digital computer architecture demonstrates only marginal competitiveness at best when compared to projected semiconductor implements. Global, analog global, quasi-digital, and full digital interconnects are briefly examined as alternative to parallel digital computer architecture. Digital optical computing is becoming a very tough competitor to semiconductor technology since it can support a very high degree of three dimensional interconnect density and high degrees of Fan-In without capacitive loading effects at very low power consumption levels.
The new landscape of parallel computer architecture
NASA Astrophysics Data System (ADS)
Shalf, John
2007-07-01
The past few years has seen a sea change in computer architecture that will impact every facet of our society as every electronic device from cell phone to supercomputer will need to confront parallelism of unprecedented scale. Whereas the conventional multicore approach (2, 4, and even 8 cores) adopted by the computing industry will eventually hit a performance plateau, the highest performance per watt and per chip area is achieved using manycore technology (hundreds or even thousands of cores). However, fully unleashing the potential of the manycore approach to ensure future advances in sustained computational performance will require fundamental advances in computer architecture and programming models that are nothing short of reinventing computing. In this paper we examine the reasons behind the movement to exponentially increasing parallelism, and its ramifications for system design, applications and programming models.
Programming model for distributed intelligent systems
NASA Technical Reports Server (NTRS)
Sztipanovits, J.; Biegl, C.; Karsai, G.; Bogunovic, N.; Purves, B.; Williams, R.; Christiansen, T.
1988-01-01
A programming model and architecture which was developed for the design and implementation of complex, heterogeneous measurement and control systems is described. The Multigraph Architecture integrates artificial intelligence techniques with conventional software technologies, offers a unified framework for distributed and shared memory based parallel computational models and supports multiple programming paradigms. The system can be implemented on different hardware architectures and can be adapted to strongly different applications.
A multitasking finite state architecture for computer control of an electric powertrain
DOE Office of Scientific and Technical Information (OSTI.GOV)
Burba, J.C.
1984-01-01
Finite state techniques provide a common design language between the control engineer and the computer engineer for event driven computer control systems. They simplify communication and provide a highly maintainable control system understandable by both. This paper describes the development of a control system for an electric vehicle powertrain utilizing finite state concepts. The basics of finite state automata are provided as a framework to discuss a unique multitasking software architecture developed for this application. The architecture employs conventional time-sliced techniques with task scheduling controlled by a finite state machine representation of the control strategy of the powertrain. The complexitiesmore » of excitation variable sampling in this environment are also considered.« less
NASA Technical Reports Server (NTRS)
Rickard, D. A.; Bodenheimer, R. E.
1976-01-01
Digital computer components which perform two dimensional array logic operations (Tse logic) on binary data arrays are described. The properties of Golay transforms which make them useful in image processing are reviewed, and several architectures for Golay transform processors are presented with emphasis on the skeletonizing algorithm. Conventional logic control units developed for the Golay transform processors are described. One is a unique microprogrammable control unit that uses a microprocessor to control the Tse computer. The remaining control units are based on programmable logic arrays. Performance criteria are established and utilized to compare the various Golay transform machines developed. A critique of Tse logic is presented, and recommendations for additional research are included.
HTMT-class Latency Tolerant Parallel Architecture for Petaflops Scale Computation
NASA Technical Reports Server (NTRS)
Sterling, Thomas; Bergman, Larry
2000-01-01
Computational Aero Sciences and other numeric intensive computation disciplines demand computing throughputs substantially greater than the Teraflops scale systems only now becoming available. The related fields of fluids, structures, thermal, combustion, and dynamic controls are among the interdisciplinary areas that in combination with sufficient resolution and advanced adaptive techniques may force performance requirements towards Petaflops. This will be especially true for compute intensive models such as Navier-Stokes are or when such system models are only part of a larger design optimization computation involving many design points. Yet recent experience with conventional MPP configurations comprising commodity processing and memory components has shown that larger scale frequently results in higher programming difficulty and lower system efficiency. While important advances in system software and algorithms techniques have had some impact on efficiency and programmability for certain classes of problems, in general it is unlikely that software alone will resolve the challenges to higher scalability. As in the past, future generations of high-end computers may require a combination of hardware architecture and system software advances to enable efficient operation at a Petaflops level. The NASA led HTMT project has engaged the talents of a broad interdisciplinary team to develop a new strategy in high-end system architecture to deliver petaflops scale computing in the 2004/5 timeframe. The Hybrid-Technology, MultiThreaded parallel computer architecture incorporates several advanced technologies in combination with an innovative dynamic adaptive scheduling mechanism to provide unprecedented performance and efficiency within practical constraints of cost, complexity, and power consumption. The emerging superconductor Rapid Single Flux Quantum electronics can operate at 100 GHz (the record is 770 GHz) and one percent of the power required by convention semiconductor logic. Wave Division Multiplexing optical communications can approach a peak per fiber bandwidth of 1 Tbps and the new Data Vortex network topology employing this technology can connect tens of thousands of ports providing a bi-section bandwidth on the order of a Petabyte per second with latencies well below 100 nanoseconds, even under heavy loads. Processor-in-Memory (PIM) technology combines logic and memory on the same chip exposing the internal bandwidth of the memory row buffers at low latency. And holographic storage photorefractive storage technologies provide high-density memory with access a thousand times faster than conventional disk technologies. Together these technologies enable a new class of shared memory system architecture with a peak performance in the range of a Petaflops but size and power requirements comparable to today's largest Teraflops scale systems. To achieve high-sustained performance, HTMT combines an advanced multithreading processor architecture with a memory-driven coarse-grained latency management strategy called "percolation", yielding high efficiency while reducing the much of the parallel programming burden. This paper will present the basic system architecture characteristics made possible through this series of advanced technologies and then give a detailed description of the new percolation approach to runtime latency management.
The future of computing--new architectures and new technologies.
Warren, P
2004-02-01
All modern computers are designed using the 'von Neumann' architecture and built using silicon transistor technology. Both architecture and technology have been remarkably successful. Yet there are a range of problems for which this conventional architecture is not particularly well adapted, and new architectures are being proposed to solve these problems, in particular based on insight from nature. Transistor technology has enjoyed 50 years of continuing progress. However, the laws of physics dictate that within a relatively short time period this progress will come to an end. New technologies, based on molecular and biological sciences as well as quantum physics, are vying to replace silicon, or at least coexist with it and extend its capability. The paper describes these novel architectures and technologies, places them in the context of the kinds of problems they might help to solve, and predicts their possible manner and time of adoption. Finally it describes some key questions and research problems associated with their use.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Muller, U.A.; Baumle, B.; Kohler, P.
1992-10-01
Music, a DSP-based system with a parallel distributed-memory architecture, provides enormous computing power yet retains the flexibility of a general-purpose computer. Reaching a peak performance of 2.7 Gflops at a significantly lower cost, power consumption, and space requirement than conventional supercomputers, Music is well suited to computationally intensive applications such as neural network simulation. 12 refs., 9 figs., 2 tabs.
Gauss Elimination: Workhorse of Linear Algebra.
1995-08-05
linear algebra computation for solving systems, computing determinants and determining the rank of matrix. All of these are discussed in varying contexts. These include different arithmetic or algebraic setting such as integer arithmetic or polynomial rings as well as conventional real (floating-point) arithmetic. These have effects on both accuracy and complexity analyses of the algorithm. These, too, are covered here. The impact of modern parallel computer architecture on GE is also
A Debugger for Computational Grid Applications
NASA Technical Reports Server (NTRS)
Hood, Robert; Jost, Gabriele
2000-01-01
The p2d2 project at NAS has built a debugger for applications running on heterogeneous computational grids. It employs a client-server architecture to simplify the implementation. Its user interface has been designed to provide process control and state examination functions on a computation containing a large number of processes. It can find processes participating in distributed computations even when those processes were not created under debugger control. These process identification techniques work both on conventional distributed executions as well as those on a computational grid.
Alternative electrical distribution system architectures for automobiles
DOE Office of Scientific and Technical Information (OSTI.GOV)
Afridi, K.K.; Tabors, R.D.; Kassakian, J.G.
At present most automobiles use a 12 V electrical system with point-to-point wiring. The capability of this architecture in meeting the needs of future electrical loads is questionable. Furthermore, with the development of electric vehicles (EVs) there is a greater need for a better architecture. In this paper the authors outline the limitations of the conventional architecture and identify alternatives. They also present a multi-attribute trade-off methodology which compares these alternatives, and identifies a set of Pareto optimal architectures. The system attributes traded off are cost, weight, losses and probability of failure. These are calculated by a computer program thatmore » has built-in component attribute models. System attributes of a few dozen architectures are also reported and the results analyzed. 17 refs.« less
NASA Technical Reports Server (NTRS)
Fischer, James R.; Grosch, Chester; Mcanulty, Michael; Odonnell, John; Storey, Owen
1987-01-01
NASA's Office of Space Science and Applications (OSSA) gave a select group of scientists the opportunity to test and implement their computational algorithms on the Massively Parallel Processor (MPP) located at Goddard Space Flight Center, beginning in late 1985. One year later, the Working Group presented its report, which addressed the following: algorithms, programming languages, architecture, programming environments, the way theory relates, and performance measured. The findings point to a number of demonstrated computational techniques for which the MPP architecture is ideally suited. For example, besides executing much faster on the MPP than on conventional computers, systolic VLSI simulation (where distances are short), lattice simulation, neural network simulation, and image problems were found to be easier to program on the MPP's architecture than on a CYBER 205 or even a VAX. The report also makes technical recommendations covering all aspects of MPP use, and recommendations concerning the future of the MPP and machines based on similar architectures, expansion of the Working Group, and study of the role of future parallel processors for space station, EOS, and the Great Observatories era.
A performance analysis of advanced I/O architectures for PC-based network file servers
NASA Astrophysics Data System (ADS)
Huynh, K. D.; Khoshgoftaar, T. M.
1994-12-01
In the personal computing and workstation environments, more and more I/O adapters are becoming complete functional subsystems that are intelligent enough to handle I/O operations on their own without much intervention from the host processor. The IBM Subsystem Control Block (SCB) architecture has been defined to enhance the potential of these intelligent adapters by defining services and conventions that deliver command information and data to and from the adapters. In recent years, a new storage architecture, the Redundant Array of Independent Disks (RAID), has been quickly gaining acceptance in the world of computing. In this paper, we would like to discuss critical system design issues that are important to the performance of a network file server. We then present a performance analysis of the SCB architecture and disk array technology in typical network file server environments based on personal computers (PCs). One of the key issues investigated in this paper is whether a disk array can outperform a group of disks (of same type, same data capacity, and same cost) operating independently, not in parallel as in a disk array.
Topical perspective on massive threading and parallelism.
Farber, Robert M
2011-09-01
Unquestionably computer architectures have undergone a recent and noteworthy paradigm shift that now delivers multi- and many-core systems with tens to many thousands of concurrent hardware processing elements per workstation or supercomputer node. GPGPU (General Purpose Graphics Processor Unit) technology in particular has attracted significant attention as new software development capabilities, namely CUDA (Compute Unified Device Architecture) and OpenCL™, have made it possible for students as well as small and large research organizations to achieve excellent speedup for many applications over more conventional computing architectures. The current scientific literature reflects this shift with numerous examples of GPGPU applications that have achieved one, two, and in some special cases, three-orders of magnitude increased computational performance through the use of massive threading to exploit parallelism. Multi-core architectures are also evolving quickly to exploit both massive-threading and massive-parallelism such as the 1.3 million threads Blue Waters supercomputer. The challenge confronting scientists in planning future experimental and theoretical research efforts--be they individual efforts with one computer or collaborative efforts proposing to use the largest supercomputers in the world is how to capitalize on these new massively threaded computational architectures--especially as not all computational problems will scale to massive parallelism. In particular, the costs associated with restructuring software (and potentially redesigning algorithms) to exploit the parallelism of these multi- and many-threaded machines must be considered along with application scalability and lifespan. This perspective is an overview of the current state of threading and parallelize with some insight into the future. Published by Elsevier Inc.
NASA Astrophysics Data System (ADS)
Cheng, Jie-Zhi; Ni, Dong; Chou, Yi-Hong; Qin, Jing; Tiu, Chui-Mei; Chang, Yeun-Chung; Huang, Chiun-Sheng; Shen, Dinggang; Chen, Chung-Ming
2016-04-01
This paper performs a comprehensive study on the deep-learning-based computer-aided diagnosis (CADx) for the differential diagnosis of benign and malignant nodules/lesions by avoiding the potential errors caused by inaccurate image processing results (e.g., boundary segmentation), as well as the classification bias resulting from a less robust feature set, as involved in most conventional CADx algorithms. Specifically, the stacked denoising auto-encoder (SDAE) is exploited on the two CADx applications for the differentiation of breast ultrasound lesions and lung CT nodules. The SDAE architecture is well equipped with the automatic feature exploration mechanism and noise tolerance advantage, and hence may be suitable to deal with the intrinsically noisy property of medical image data from various imaging modalities. To show the outperformance of SDAE-based CADx over the conventional scheme, two latest conventional CADx algorithms are implemented for comparison. 10 times of 10-fold cross-validations are conducted to illustrate the efficacy of the SDAE-based CADx algorithm. The experimental results show the significant performance boost by the SDAE-based CADx algorithm over the two conventional methods, suggesting that deep learning techniques can potentially change the design paradigm of the CADx systems without the need of explicit design and selection of problem-oriented features.
Cheng, Jie-Zhi; Ni, Dong; Chou, Yi-Hong; Qin, Jing; Tiu, Chui-Mei; Chang, Yeun-Chung; Huang, Chiun-Sheng; Shen, Dinggang; Chen, Chung-Ming
2016-04-15
This paper performs a comprehensive study on the deep-learning-based computer-aided diagnosis (CADx) for the differential diagnosis of benign and malignant nodules/lesions by avoiding the potential errors caused by inaccurate image processing results (e.g., boundary segmentation), as well as the classification bias resulting from a less robust feature set, as involved in most conventional CADx algorithms. Specifically, the stacked denoising auto-encoder (SDAE) is exploited on the two CADx applications for the differentiation of breast ultrasound lesions and lung CT nodules. The SDAE architecture is well equipped with the automatic feature exploration mechanism and noise tolerance advantage, and hence may be suitable to deal with the intrinsically noisy property of medical image data from various imaging modalities. To show the outperformance of SDAE-based CADx over the conventional scheme, two latest conventional CADx algorithms are implemented for comparison. 10 times of 10-fold cross-validations are conducted to illustrate the efficacy of the SDAE-based CADx algorithm. The experimental results show the significant performance boost by the SDAE-based CADx algorithm over the two conventional methods, suggesting that deep learning techniques can potentially change the design paradigm of the CADx systems without the need of explicit design and selection of problem-oriented features.
Cheng, Jie-Zhi; Ni, Dong; Chou, Yi-Hong; Qin, Jing; Tiu, Chui-Mei; Chang, Yeun-Chung; Huang, Chiun-Sheng; Shen, Dinggang; Chen, Chung-Ming
2016-01-01
This paper performs a comprehensive study on the deep-learning-based computer-aided diagnosis (CADx) for the differential diagnosis of benign and malignant nodules/lesions by avoiding the potential errors caused by inaccurate image processing results (e.g., boundary segmentation), as well as the classification bias resulting from a less robust feature set, as involved in most conventional CADx algorithms. Specifically, the stacked denoising auto-encoder (SDAE) is exploited on the two CADx applications for the differentiation of breast ultrasound lesions and lung CT nodules. The SDAE architecture is well equipped with the automatic feature exploration mechanism and noise tolerance advantage, and hence may be suitable to deal with the intrinsically noisy property of medical image data from various imaging modalities. To show the outperformance of SDAE-based CADx over the conventional scheme, two latest conventional CADx algorithms are implemented for comparison. 10 times of 10-fold cross-validations are conducted to illustrate the efficacy of the SDAE-based CADx algorithm. The experimental results show the significant performance boost by the SDAE-based CADx algorithm over the two conventional methods, suggesting that deep learning techniques can potentially change the design paradigm of the CADx systems without the need of explicit design and selection of problem-oriented features. PMID:27079888
Rio: a dynamic self-healing services architecture using Jini networking technology
NASA Astrophysics Data System (ADS)
Clarke, James B.
2002-06-01
Current mainstream distributed Java architectures offer great capabilities embracing conventional enterprise architecture patterns and designs. These traditional systems provide robust transaction oriented environments that are in large part focused on data and host processors. Typically, these implementations require that an entire application be deployed on every machine that will be used as a compute resource. In order for this to happen, the application is usually taken down, installed and started with all systems in-sync and knowing about each other. Static environments such as these present an extremely difficult environment to setup, deploy and administer.
On the impact of approximate computation in an analog DeSTIN architecture.
Young, Steven; Lu, Junjie; Holleman, Jeremy; Arel, Itamar
2014-05-01
Deep machine learning (DML) holds the potential to revolutionize machine learning by automating rich feature extraction, which has become the primary bottleneck of human engineering in pattern recognition systems. However, the heavy computational burden renders DML systems implemented on conventional digital processors impractical for large-scale problems. The highly parallel computations required to implement large-scale deep learning systems are well suited to custom hardware. Analog computation has demonstrated power efficiency advantages of multiple orders of magnitude relative to digital systems while performing nonideal computations. In this paper, we investigate typical error sources introduced by analog computational elements and their impact on system-level performance in DeSTIN--a compositional deep learning architecture. These inaccuracies are evaluated on a pattern classification benchmark, clearly demonstrating the robustness of the underlying algorithm to the errors introduced by analog computational elements. A clear understanding of the impacts of nonideal computations is necessary to fully exploit the efficiency of analog circuits.
Deep learning with coherent nanophotonic circuits
NASA Astrophysics Data System (ADS)
Shen, Yichen; Harris, Nicholas C.; Skirlo, Scott; Prabhu, Mihika; Baehr-Jones, Tom; Hochberg, Michael; Sun, Xin; Zhao, Shijie; Larochelle, Hugo; Englund, Dirk; Soljačić, Marin
2017-07-01
Artificial neural networks are computational network models inspired by signal processing in the brain. These models have dramatically improved performance for many machine-learning tasks, including speech and image recognition. However, today's computing hardware is inefficient at implementing neural networks, in large part because much of it was designed for von Neumann computing schemes. Significant effort has been made towards developing electronic architectures tuned to implement artificial neural networks that exhibit improved computational speed and accuracy. Here, we propose a new architecture for a fully optical neural network that, in principle, could offer an enhancement in computational speed and power efficiency over state-of-the-art electronics for conventional inference tasks. We experimentally demonstrate the essential part of the concept using a programmable nanophotonic processor featuring a cascaded array of 56 programmable Mach-Zehnder interferometers in a silicon photonic integrated circuit and show its utility for vowel recognition.
Evolution of a designless nanoparticle network into reconfigurable Boolean logic
NASA Astrophysics Data System (ADS)
Bose, S. K.; Lawrence, C. P.; Liu, Z.; Makarenko, K. S.; van Damme, R. M. J.; Broersma, H. J.; van der Wiel, W. G.
2015-12-01
Natural computers exploit the emergent properties and massive parallelism of interconnected networks of locally active components. Evolution has resulted in systems that compute quickly and that use energy efficiently, utilizing whatever physical properties are exploitable. Man-made computers, on the other hand, are based on circuits of functional units that follow given design rules. Hence, potentially exploitable physical processes, such as capacitive crosstalk, to solve a problem are left out. Until now, designless nanoscale networks of inanimate matter that exhibit robust computational functionality had not been realized. Here we artificially evolve the electrical properties of a disordered nanomaterials system (by optimizing the values of control voltages using a genetic algorithm) to perform computational tasks reconfigurably. We exploit the rich behaviour that emerges from interconnected metal nanoparticles, which act as strongly nonlinear single-electron transistors, and find that this nanoscale architecture can be configured in situ into any Boolean logic gate. This universal, reconfigurable gate would require about ten transistors in a conventional circuit. Our system meets the criteria for the physical realization of (cellular) neural networks: universality (arbitrary Boolean functions), compactness, robustness and evolvability, which implies scalability to perform more advanced tasks. Our evolutionary approach works around device-to-device variations and the accompanying uncertainties in performance. Moreover, it bears a great potential for more energy-efficient computation, and for solving problems that are very hard to tackle in conventional architectures.
Atomic switch networks-nanoarchitectonic design of a complex system for natural computing.
Demis, E C; Aguilera, R; Sillin, H O; Scharnhorst, K; Sandouk, E J; Aono, M; Stieg, A Z; Gimzewski, J K
2015-05-22
Self-organized complex systems are ubiquitous in nature, and the structural complexity of these natural systems can be used as a model to design new classes of functional nanotechnology based on highly interconnected networks of interacting units. Conventional fabrication methods for electronic computing devices are subject to known scaling limits, confining the diversity of possible architectures. This work explores methods of fabricating a self-organized complex device known as an atomic switch network and discusses its potential utility in computing. Through a merger of top-down and bottom-up techniques guided by mathematical and nanoarchitectonic design principles, we have produced functional devices comprising nanoscale elements whose intrinsic nonlinear dynamics and memorization capabilities produce robust patterns of distributed activity and a capacity for nonlinear transformation of input signals when configured in the appropriate network architecture. Their operational characteristics represent a unique potential for hardware implementation of natural computation, specifically in the area of reservoir computing-a burgeoning field that investigates the computational aptitude of complex biologically inspired systems.
Fuzzy control of magnetic bearings
NASA Technical Reports Server (NTRS)
Feeley, J. J.; Niederauer, G. M.; Ahlstrom, D. J.
1991-01-01
The use of an adaptive fuzzy control algorithm implemented on a VLSI chip for the control of a magnetic bearing was considered. The architecture of the adaptive fuzzy controller is similar to that of a neural network. The performance of the fuzzy controller is compared to that of a conventional controller by computer simulation.
An architectural approach to create self organizing control systems for practical autonomous robots
NASA Technical Reports Server (NTRS)
Greiner, Helen
1991-01-01
For practical industrial applications, the development of trainable robots is an important and immediate objective. Therefore, the developing of flexible intelligence directly applicable to training is emphasized. It is generally agreed upon by the AI community that the fusion of expert systems, neural networks, and conventionally programmed modules (e.g., a trajectory generator) is promising in the quest for autonomous robotic intelligence. Autonomous robot development is hindered by integration and architectural problems. Some obstacles towards the construction of more general robot control systems are as follows: (1) Growth problem; (2) Software generation; (3) Interaction with environment; (4) Reliability; and (5) Resource limitation. Neural networks can be successfully applied to some of these problems. However, current implementations of neural networks are hampered by the resource limitation problem and must be trained extensively to produce computationally accurate output. A generalization of conventional neural nets is proposed, and an architecture is offered in an attempt to address the above problems.
Atomic switch networks—nanoarchitectonic design of a complex system for natural computing
NASA Astrophysics Data System (ADS)
Demis, E. C.; Aguilera, R.; Sillin, H. O.; Scharnhorst, K.; Sandouk, E. J.; Aono, M.; Stieg, A. Z.; Gimzewski, J. K.
2015-05-01
Self-organized complex systems are ubiquitous in nature, and the structural complexity of these natural systems can be used as a model to design new classes of functional nanotechnology based on highly interconnected networks of interacting units. Conventional fabrication methods for electronic computing devices are subject to known scaling limits, confining the diversity of possible architectures. This work explores methods of fabricating a self-organized complex device known as an atomic switch network and discusses its potential utility in computing. Through a merger of top-down and bottom-up techniques guided by mathematical and nanoarchitectonic design principles, we have produced functional devices comprising nanoscale elements whose intrinsic nonlinear dynamics and memorization capabilities produce robust patterns of distributed activity and a capacity for nonlinear transformation of input signals when configured in the appropriate network architecture. Their operational characteristics represent a unique potential for hardware implementation of natural computation, specifically in the area of reservoir computing—a burgeoning field that investigates the computational aptitude of complex biologically inspired systems.
Production experience with the ATLAS Event Service
NASA Astrophysics Data System (ADS)
Benjamin, D.; Calafiura, P.; Childers, T.; De, K.; Guan, W.; Maeno, T.; Nilsson, P.; Tsulaia, V.; Van Gemmeren, P.; Wenaus, T.; ATLAS Collaboration
2017-10-01
The ATLAS Event Service (AES) has been designed and implemented for efficient running of ATLAS production workflows on a variety of computing platforms, ranging from conventional Grid sites to opportunistic, often short-lived resources, such as spot market commercial clouds, supercomputers and volunteer computing. The Event Service architecture allows real time delivery of fine grained workloads to running payload applications which process dispatched events or event ranges and immediately stream the outputs to highly scalable Object Stores. Thanks to its agile and flexible architecture the AES is currently being used by grid sites for assigning low priority workloads to otherwise idle computing resources; similarly harvesting HPC resources in an efficient back-fill mode; and massively scaling out to the 50-100k concurrent core level on the Amazon spot market to efficiently utilize those transient resources for peak production needs. Platform ports in development include ATLAS@Home (BOINC) and the Google Compute Engine, and a growing number of HPC platforms. After briefly reviewing the concept and the architecture of the Event Service, we will report the status and experience gained in AES commissioning and production operations on supercomputers, and our plans for extending ES application beyond Geant4 simulation to other workflows, such as reconstruction and data analysis.
Architectural Methodology Report
NASA Technical Reports Server (NTRS)
Dhas, Chris
2000-01-01
The establishment of conventions between two communicating entities in the end systems is essential for communications. Examples of the kind of decisions that need to be made in establishing a protocol convention include the nature of the data representation, the for-mat and the speed of the date representation over the communications path, and the sequence of control messages (if any) which are sent. One of the main functions of a protocol is to establish a standard path between the communicating entities. This is necessary to create a virtual communications medium with certain desirable characteristics. In essence, it is the function of the protocol to transform the characteristics of the physical communications environment into a more useful virtual communications model. The final function of a protocol is to establish standard data elements for communications over the path; that is, the protocol serves to create a virtual data element for exchange. Other systems may be constructed in which the transferred element is a program or a job. Finally, there are special purpose applications in which the element to be transferred may be a complex structure such as all or part of a graphic display. NASA's Glenn Research Center (GRC) defines and develops advanced technology for high priority national needs in communications technologies for application to aeronautics and space. GRC tasked Computer Networks and Software Inc. (CNS) to describe the methodologies used in developing a protocol architecture for an in-space Internet node. The node would support NASA:s four mission areas: Earth Science; Space Science; Human Exploration and Development of Space (HEDS); Aerospace Technology. This report presents the methodology for developing the protocol architecture. The methodology addresses the architecture for a computer communications environment. It does not address an analog voice architecture.
High Performance Semantic Factoring of Giga-Scale Semantic Graph Databases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Joslyn, Cliff A.; Adolf, Robert D.; Al-Saffar, Sinan
2010-10-04
As semantic graph database technology grows to address components ranging from extant large triple stores to SPARQL endpoints over SQL-structured relational databases, it will become increasingly important to be able to bring high performance computational resources to bear on their analysis, interpretation, and visualization, especially with respect to their innate semantic structure. Our research group built a novel high performance hybrid system comprising computational capability for semantic graph database processing utilizing the large multithreaded architecture of the Cray XMT platform, conventional clusters, and large data stores. In this paper we describe that architecture, and present the results of our deployingmore » that for the analysis of the Billion Triple dataset with respect to its semantic factors.« less
NASA Technical Reports Server (NTRS)
Kemeny, Sabrina E.
1994-01-01
Electronic and optoelectronic hardware implementations of highly parallel computing architectures address several ill-defined and/or computation-intensive problems not easily solved by conventional computing techniques. The concurrent processing architectures developed are derived from a variety of advanced computing paradigms including neural network models, fuzzy logic, and cellular automata. Hardware implementation technologies range from state-of-the-art digital/analog custom-VLSI to advanced optoelectronic devices such as computer-generated holograms and e-beam fabricated Dammann gratings. JPL's concurrent processing devices group has developed a broad technology base in hardware implementable parallel algorithms, low-power and high-speed VLSI designs and building block VLSI chips, leading to application-specific high-performance embeddable processors. Application areas include high throughput map-data classification using feedforward neural networks, terrain based tactical movement planner using cellular automata, resource optimization (weapon-target assignment) using a multidimensional feedback network with lateral inhibition, and classification of rocks using an inner-product scheme on thematic mapper data. In addition to addressing specific functional needs of DOD and NASA, the JPL-developed concurrent processing device technology is also being customized for a variety of commercial applications (in collaboration with industrial partners), and is being transferred to U.S. industries. This viewgraph p resentation focuses on two application-specific processors which solve the computation intensive tasks of resource allocation (weapon-target assignment) and terrain based tactical movement planning using two extremely different topologies. Resource allocation is implemented as an asynchronous analog competitive assignment architecture inspired by the Hopfield network. Hardware realization leads to a two to four order of magnitude speed-up over conventional techniques and enables multiple assignments, (many to many), not achievable with standard statistical approaches. Tactical movement planning (finding the best path from A to B) is accomplished with a digital two-dimensional concurrent processor array. By exploiting the natural parallel decomposition of the problem in silicon, a four order of magnitude speed-up over optimized software approaches has been demonstrated.
A Nanotechnology-Ready Computing Scheme based on a Weakly Coupled Oscillator Network
NASA Astrophysics Data System (ADS)
Vodenicarevic, Damir; Locatelli, Nicolas; Abreu Araujo, Flavio; Grollier, Julie; Querlioz, Damien
2017-03-01
With conventional transistor technologies reaching their limits, alternative computing schemes based on novel technologies are currently gaining considerable interest. Notably, promising computing approaches have proposed to leverage the complex dynamics emerging in networks of coupled oscillators based on nanotechnologies. The physical implementation of such architectures remains a true challenge, however, as most proposed ideas are not robust to nanotechnology devices’ non-idealities. In this work, we propose and investigate the implementation of an oscillator-based architecture, which can be used to carry out pattern recognition tasks, and which is tailored to the specificities of nanotechnologies. This scheme relies on a weak coupling between oscillators, and does not require a fine tuning of the coupling values. After evaluating its reliability under the severe constraints associated to nanotechnologies, we explore the scalability of such an architecture, suggesting its potential to realize pattern recognition tasks using limited resources. We show that it is robust to issues like noise, variability and oscillator non-linearity. Defining network optimization design rules, we show that nano-oscillator networks could be used for efficient cognitive processing.
A Nanotechnology-Ready Computing Scheme based on a Weakly Coupled Oscillator Network.
Vodenicarevic, Damir; Locatelli, Nicolas; Abreu Araujo, Flavio; Grollier, Julie; Querlioz, Damien
2017-03-21
With conventional transistor technologies reaching their limits, alternative computing schemes based on novel technologies are currently gaining considerable interest. Notably, promising computing approaches have proposed to leverage the complex dynamics emerging in networks of coupled oscillators based on nanotechnologies. The physical implementation of such architectures remains a true challenge, however, as most proposed ideas are not robust to nanotechnology devices' non-idealities. In this work, we propose and investigate the implementation of an oscillator-based architecture, which can be used to carry out pattern recognition tasks, and which is tailored to the specificities of nanotechnologies. This scheme relies on a weak coupling between oscillators, and does not require a fine tuning of the coupling values. After evaluating its reliability under the severe constraints associated to nanotechnologies, we explore the scalability of such an architecture, suggesting its potential to realize pattern recognition tasks using limited resources. We show that it is robust to issues like noise, variability and oscillator non-linearity. Defining network optimization design rules, we show that nano-oscillator networks could be used for efficient cognitive processing.
A Nanotechnology-Ready Computing Scheme based on a Weakly Coupled Oscillator Network
Vodenicarevic, Damir; Locatelli, Nicolas; Abreu Araujo, Flavio; Grollier, Julie; Querlioz, Damien
2017-01-01
With conventional transistor technologies reaching their limits, alternative computing schemes based on novel technologies are currently gaining considerable interest. Notably, promising computing approaches have proposed to leverage the complex dynamics emerging in networks of coupled oscillators based on nanotechnologies. The physical implementation of such architectures remains a true challenge, however, as most proposed ideas are not robust to nanotechnology devices’ non-idealities. In this work, we propose and investigate the implementation of an oscillator-based architecture, which can be used to carry out pattern recognition tasks, and which is tailored to the specificities of nanotechnologies. This scheme relies on a weak coupling between oscillators, and does not require a fine tuning of the coupling values. After evaluating its reliability under the severe constraints associated to nanotechnologies, we explore the scalability of such an architecture, suggesting its potential to realize pattern recognition tasks using limited resources. We show that it is robust to issues like noise, variability and oscillator non-linearity. Defining network optimization design rules, we show that nano-oscillator networks could be used for efficient cognitive processing. PMID:28322262
Emulating short-term synaptic dynamics with memristive devices
NASA Astrophysics Data System (ADS)
Berdan, Radu; Vasilaki, Eleni; Khiat, Ali; Indiveri, Giacomo; Serb, Alexandru; Prodromakis, Themistoklis
2016-01-01
Neuromorphic architectures offer great promise for achieving computation capacities beyond conventional Von Neumann machines. The essential elements for achieving this vision are highly scalable synaptic mimics that do not undermine biological fidelity. Here we demonstrate that single solid-state TiO2 memristors can exhibit non-associative plasticity phenomena observed in biological synapses, supported by their metastable memory state transition properties. We show that, contrary to conventional uses of solid-state memory, the existence of rate-limiting volatility is a key feature for capturing short-term synaptic dynamics. We also show how the temporal dynamics of our prototypes can be exploited to implement spatio-temporal computation, demonstrating the memristors full potential for building biophysically realistic neural processing systems.
An event-based architecture for solving constraint satisfaction problems
Mostafa, Hesham; Müller, Lorenz K.; Indiveri, Giacomo
2015-01-01
Constraint satisfaction problems are ubiquitous in many domains. They are typically solved using conventional digital computing architectures that do not reflect the distributed nature of many of these problems, and are thus ill-suited for solving them. Here we present a parallel analogue/digital hardware architecture specifically designed to solve such problems. We cast constraint satisfaction problems as networks of stereotyped nodes that communicate using digital pulses, or events. Each node contains an oscillator implemented using analogue circuits. The non-repeating phase relations among the oscillators drive the exploration of the solution space. We show that this hardware architecture can yield state-of-the-art performance on random SAT problems under reasonable assumptions on the implementation. We present measurements from a prototype electronic chip to demonstrate that a physical implementation of the proposed architecture is robust to practical non-idealities and to validate the theory proposed. PMID:26642827
NASA Astrophysics Data System (ADS)
Shi, X.
2015-12-01
As NSF indicated - "Theory and experimentation have for centuries been regarded as two fundamental pillars of science. It is now widely recognized that computational and data-enabled science forms a critical third pillar." Geocomputation is the third pillar of GIScience and geosciences. With the exponential growth of geodata, the challenge of scalable and high performance computing for big data analytics become urgent because many research activities are constrained by the inability of software or tool that even could not complete the computation process. Heterogeneous geodata integration and analytics obviously magnify the complexity and operational time frame. Many large-scale geospatial problems may be not processable at all if the computer system does not have sufficient memory or computational power. Emerging computer architectures, such as Intel's Many Integrated Core (MIC) Architecture and Graphics Processing Unit (GPU), and advanced computing technologies provide promising solutions to employ massive parallelism and hardware resources to achieve scalability and high performance for data intensive computing over large spatiotemporal and social media data. Exploring novel algorithms and deploying the solutions in massively parallel computing environment to achieve the capability for scalable data processing and analytics over large-scale, complex, and heterogeneous geodata with consistent quality and high-performance has been the central theme of our research team in the Department of Geosciences at the University of Arkansas (UARK). New multi-core architectures combined with application accelerators hold the promise to achieve scalability and high performance by exploiting task and data levels of parallelism that are not supported by the conventional computing systems. Such a parallel or distributed computing environment is particularly suitable for large-scale geocomputation over big data as proved by our prior works, while the potential of such advanced infrastructure remains unexplored in this domain. Within this presentation, our prior and on-going initiatives will be summarized to exemplify how we exploit multicore CPUs, GPUs, and MICs, and clusters of CPUs, GPUs and MICs, to accelerate geocomputation in different applications.
High performance semantic factoring of giga-scale semantic graph databases.
DOE Office of Scientific and Technical Information (OSTI.GOV)
al-Saffar, Sinan; Adolf, Bob; Haglin, David
2010-10-01
As semantic graph database technology grows to address components ranging from extant large triple stores to SPARQL endpoints over SQL-structured relational databases, it will become increasingly important to be able to bring high performance computational resources to bear on their analysis, interpretation, and visualization, especially with respect to their innate semantic structure. Our research group built a novel high performance hybrid system comprising computational capability for semantic graph database processing utilizing the large multithreaded architecture of the Cray XMT platform, conventional clusters, and large data stores. In this paper we describe that architecture, and present the results of our deployingmore » that for the analysis of the Billion Triple dataset with respect to its semantic factors, including basic properties, connected components, namespace interaction, and typed paths.« less
Aono, Masashi; Naruse, Makoto; Kim, Song-Ju; Wakabayashi, Masamitsu; Hori, Hirokazu; Ohtsu, Motoichi; Hara, Masahiko
2013-06-18
Biologically inspired computing devices and architectures are expected to overcome the limitations of conventional technologies in terms of solving computationally demanding problems, adapting to complex environments, reducing energy consumption, and so on. We previously demonstrated that a primitive single-celled amoeba (a plasmodial slime mold), which exhibits complex spatiotemporal oscillatory dynamics and sophisticated computing capabilities, can be used to search for a solution to a very hard combinatorial optimization problem. We successfully extracted the essential spatiotemporal dynamics by which the amoeba solves the problem. This amoeba-inspired computing paradigm can be implemented by various physical systems that exhibit suitable spatiotemporal dynamics resembling the amoeba's problem-solving process. In this Article, we demonstrate that photoexcitation transfer phenomena in certain quantum nanostructures mediated by optical near-field interactions generate the amoebalike spatiotemporal dynamics and can be used to solve the satisfiability problem (SAT), which is the problem of judging whether a given logical proposition (a Boolean formula) is self-consistent. SAT is related to diverse application problems in artificial intelligence, information security, and bioinformatics and is a crucially important nondeterministic polynomial time (NP)-complete problem, which is believed to become intractable for conventional digital computers when the problem size increases. We show that our amoeba-inspired computing paradigm dramatically outperforms a conventional stochastic search method. These results indicate the potential for developing highly versatile nanoarchitectonic computers that realize powerful solution searching with low energy consumption.
The Salman Mosque: Achmad Noe’man’s Critique of Indonesian Conventional Mosque Architecture
NASA Astrophysics Data System (ADS)
Holik, A. A. R.; Aryanti, T.
2017-03-01
The Salman Mosque, designed by Achmad Noe’man, was a striking Islamic architectural design in the 1960s when it was built. Unlike the conventional mosques, particularly in Indonesia, it has no dome. Instead, the roof was made of prestressed concrete and resembles a canoe. Using data drawn from field observations, this paper explores the architectural characteristics of the Salman Mosque as a product of Modern architecture. It argues that the domeless mosque, the simple minaret, the wooden wall panels and floor, the women’s balcony, and the roof demonstrate architectural modernism, as opposed to the conventional mosque typology that flourished in Indonesia at the time. This paper further argues that the Salman Mosque is Noe’man’s critique of the Indonesian conventional mosque architecture. It concludes that the architectural features of the Salman Mosque reflects Noe’man’s modern vision of Islam and Islamic architecture.
A Comparative Study : Microprogrammed Vs Risc Architectures For Symbolic Processing
NASA Astrophysics Data System (ADS)
Heudin, J. C.; Metivier, C.; Demigny, D.; Maurin, T.; Zavidovique, B.; Devos, F.
1987-05-01
It is oftenclaimed that conventional computers are not well suited for human-like tasks : Vision (Image Processing), Intelligence (Symbolic Processing) ... In the particular case of Artificial Intelligence, dynamic type-checking is one example of basic task that must be improved. The solution implemented in most Lisp work-stations consists in a microprogrammed architecture with a tagged memory. Another way to gain efficiency is to design a well suited instruction set for symbolic processing, which reduces the semantic gap between the high level language and the machine code. In this framework, the RISC concept provides a convenient approach to study new architectures for symbolic processing. This paper compares both approaches and describes our projectof designing a compact symbolic processor for Artificial Intelligence applications.
NASA Astrophysics Data System (ADS)
Fotin, Sergei V.; Yin, Yin; Haldankar, Hrishikesh; Hoffmeister, Jeffrey W.; Periaswamy, Senthil
2016-03-01
Computer-aided detection (CAD) has been used in screening mammography for many years and is likely to be utilized for digital breast tomosynthesis (DBT). Higher detection performance is desirable as it may have an impact on radiologist's decisions and clinical outcomes. Recently the algorithms based on deep convolutional architectures have been shown to achieve state of the art performance in object classification and detection. Similarly, we trained a deep convolutional neural network directly on patches sampled from two-dimensional mammography and reconstructed DBT volumes and compared its performance to a conventional CAD algorithm that is based on computation and classification of hand-engineered features. The detection performance was evaluated on the independent test set of 344 DBT reconstructions (GE SenoClaire 3D, iterative reconstruction algorithm) containing 328 suspicious and 115 malignant soft tissue densities including masses and architectural distortions. Detection sensitivity was measured on a region of interest (ROI) basis at the rate of five detection marks per volume. Moving from conventional to deep learning approach resulted in increase of ROI sensitivity from 0:832 +/- 0:040 to 0:893 +/- 0:033 for suspicious ROIs; and from 0:852 +/- 0:065 to 0:930 +/- 0:046 for malignant ROIs. These results indicate the high utility of deep feature learning in the analysis of DBT data and high potential of the method for broader medical image analysis tasks.
Three-dimensional integration of nanotechnologies for computing and data storage on a single chip
NASA Astrophysics Data System (ADS)
Shulaker, Max M.; Hills, Gage; Park, Rebecca S.; Howe, Roger T.; Saraswat, Krishna; Wong, H.-S. Philip; Mitra, Subhasish
2017-07-01
The computing demands of future data-intensive applications will greatly exceed the capabilities of current electronics, and are unlikely to be met by isolated improvements in transistors, data storage technologies or integrated circuit architectures alone. Instead, transformative nanosystems, which use new nanotechnologies to simultaneously realize improved devices and new integrated circuit architectures, are required. Here we present a prototype of such a transformative nanosystem. It consists of more than one million resistive random-access memory cells and more than two million carbon-nanotube field-effect transistors—promising new nanotechnologies for use in energy-efficient digital logic circuits and for dense data storage—fabricated on vertically stacked layers in a single chip. Unlike conventional integrated circuit architectures, the layered fabrication realizes a three-dimensional integrated circuit architecture with fine-grained and dense vertical connectivity between layers of computing, data storage, and input and output (in this instance, sensing). As a result, our nanosystem can capture massive amounts of data every second, store it directly on-chip, perform in situ processing of the captured data, and produce ‘highly processed’ information. As a working prototype, our nanosystem senses and classifies ambient gases. Furthermore, because the layers are fabricated on top of silicon logic circuitry, our nanosystem is compatible with existing infrastructure for silicon-based technologies. Such complex nano-electronic systems will be essential for future high-performance and highly energy-efficient electronic systems.
Three-dimensional integration of nanotechnologies for computing and data storage on a single chip.
Shulaker, Max M; Hills, Gage; Park, Rebecca S; Howe, Roger T; Saraswat, Krishna; Wong, H-S Philip; Mitra, Subhasish
2017-07-05
The computing demands of future data-intensive applications will greatly exceed the capabilities of current electronics, and are unlikely to be met by isolated improvements in transistors, data storage technologies or integrated circuit architectures alone. Instead, transformative nanosystems, which use new nanotechnologies to simultaneously realize improved devices and new integrated circuit architectures, are required. Here we present a prototype of such a transformative nanosystem. It consists of more than one million resistive random-access memory cells and more than two million carbon-nanotube field-effect transistors-promising new nanotechnologies for use in energy-efficient digital logic circuits and for dense data storage-fabricated on vertically stacked layers in a single chip. Unlike conventional integrated circuit architectures, the layered fabrication realizes a three-dimensional integrated circuit architecture with fine-grained and dense vertical connectivity between layers of computing, data storage, and input and output (in this instance, sensing). As a result, our nanosystem can capture massive amounts of data every second, store it directly on-chip, perform in situ processing of the captured data, and produce 'highly processed' information. As a working prototype, our nanosystem senses and classifies ambient gases. Furthermore, because the layers are fabricated on top of silicon logic circuitry, our nanosystem is compatible with existing infrastructure for silicon-based technologies. Such complex nano-electronic systems will be essential for future high-performance and highly energy-efficient electronic systems.
Fluid/Structure Interaction Studies of Aircraft Using High Fidelity Equations on Parallel Computers
NASA Technical Reports Server (NTRS)
Guruswamy, Guru; VanDalsem, William (Technical Monitor)
1994-01-01
Abstract Aeroelasticity which involves strong coupling of fluids, structures and controls is an important element in designing an aircraft. Computational aeroelasticity using low fidelity methods such as the linear aerodynamic flow equations coupled with the modal structural equations are well advanced. Though these low fidelity approaches are computationally less intensive, they are not adequate for the analysis of modern aircraft such as High Speed Civil Transport (HSCT) and Advanced Subsonic Transport (AST) which can experience complex flow/structure interactions. HSCT can experience vortex induced aeroelastic oscillations whereas AST can experience transonic buffet associated structural oscillations. Both aircraft may experience a dip in the flutter speed at the transonic regime. For accurate aeroelastic computations at these complex fluid/structure interaction situations, high fidelity equations such as the Navier-Stokes for fluids and the finite-elements for structures are needed. Computations using these high fidelity equations require large computational resources both in memory and speed. Current conventional super computers have reached their limitations both in memory and speed. As a result, parallel computers have evolved to overcome the limitations of conventional computers. This paper will address the transition that is taking place in computational aeroelasticity from conventional computers to parallel computers. The paper will address special techniques needed to take advantage of the architecture of new parallel computers. Results will be illustrated from computations made on iPSC/860 and IBM SP2 computer by using ENSAERO code that directly couples the Euler/Navier-Stokes flow equations with high resolution finite-element structural equations.
GPU-accelerated FDTD modeling of radio-frequency field-tissue interactions in high-field MRI.
Chi, Jieru; Liu, Feng; Weber, Ewald; Li, Yu; Crozier, Stuart
2011-06-01
The analysis of high-field RF field-tissue interactions requires high-performance finite-difference time-domain (FDTD) computing. Conventional CPU-based FDTD calculations offer limited computing performance in a PC environment. This study presents a graphics processing unit (GPU)-based parallel-computing framework, producing substantially boosted computing efficiency (with a two-order speedup factor) at a PC-level cost. Specific details of implementing the FDTD method on a GPU architecture have been presented and the new computational strategy has been successfully applied to the design of a novel 8-element transceive RF coil system at 9.4 T. Facilitated by the powerful GPU-FDTD computing, the new RF coil array offers optimized fields (averaging 25% improvement in sensitivity, and 20% reduction in loop coupling compared with conventional array structures of the same size) for small animal imaging with a robust RF configuration. The GPU-enabled acceleration paves the way for FDTD to be applied for both detailed forward modeling and inverse design of MRI coils, which were previously impractical.
Reconfigurable Parallel Computer Architectures for Space Applications
2012-08-07
Overview......................... 2.2.6 Cellular Wiring Grid Convention.................................................. 2 2 3 3 4 4 5 5...The panel is a pegboard-like structure, which does not articulate specific sockets, but rather provides a continuous grid of contact pads and...platforms (such as spacecraft). We envision that this might be achieved by assembling a number of tile-like panels, each a “ smart substrate
NASA Astrophysics Data System (ADS)
Kamura, Yoshio; Imura, Kohei
2018-06-01
Optical recording on organic thin films with a high spatial resolution is promising for high-density optical memories, optical computing, and security systems. The spatial resolution of the optical recording is limited by the diffraction of light. Electrons can be focused to a nanometer-sized spot, providing the potential for achieving better resolution. In conventional electron-beam lithography, however, optical tuning of the fabricated structures is limited mostly to metals and semiconductors rather than organic materials. In this article, we report a fabrication method of luminescent organic architectures using a focused electron beam. We optimized the fabrication conditions of the electron beam to generate chemical species showing visible photoluminescence via two-photon near-infrared excitations. We utilized this fabrication method to draw nanoscale optical architectures on a polystyrene thin film.
High Performance Descriptive Semantic Analysis of Semantic Graph Databases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Joslyn, Cliff A.; Adolf, Robert D.; al-Saffar, Sinan
As semantic graph database technology grows to address components ranging from extant large triple stores to SPARQL endpoints over SQL-structured relational databases, it will become increasingly important to be able to understand their inherent semantic structure, whether codified in explicit ontologies or not. Our group is researching novel methods for what we call descriptive semantic analysis of RDF triplestores, to serve purposes of analysis, interpretation, visualization, and optimization. But data size and computational complexity makes it increasingly necessary to bring high performance computational resources to bear on this task. Our research group built a novel high performance hybrid system comprisingmore » computational capability for semantic graph database processing utilizing the large multi-threaded architecture of the Cray XMT platform, conventional servers, and large data stores. In this paper we describe that architecture and our methods, and present the results of our analyses of basic properties, connected components, namespace interaction, and typed paths such for the Billion Triple Challenge 2010 dataset.« less
1993-09-23
answer to the question: Is subject s allowed access type a on object o? An authorization was thus seen as a 3-tuple (s,o,a). This view of access...called trusted in a Bell-LaPadula architecture. Work at Carnegie Mellon University on type enforcement contemporaneous with Denning’s was not addressed in...34Implementation Considerations for the Typed Access Matrix Model in a Distributed Environment," Proceedings of the 15th National Computer Security
Category-theoretic models of algebraic computer systems
NASA Astrophysics Data System (ADS)
Kovalyov, S. P.
2016-01-01
A computer system is said to be algebraic if it contains nodes that implement unconventional computation paradigms based on universal algebra. A category-based approach to modeling such systems that provides a theoretical basis for mapping tasks to these systems' architecture is proposed. The construction of algebraic models of general-purpose computations involving conditional statements and overflow control is formally described by a reflector in an appropriate category of algebras. It is proved that this reflector takes the modulo ring whose operations are implemented in the conventional arithmetic processors to the Łukasiewicz logic matrix. Enrichments of the set of ring operations that form bases in the Łukasiewicz logic matrix are found.
Convolutional neural network architectures for predicting DNA–protein binding
Zeng, Haoyang; Edwards, Matthew D.; Liu, Ge; Gifford, David K.
2016-01-01
Motivation: Convolutional neural networks (CNN) have outperformed conventional methods in modeling the sequence specificity of DNA–protein binding. Yet inappropriate CNN architectures can yield poorer performance than simpler models. Thus an in-depth understanding of how to match CNN architecture to a given task is needed to fully harness the power of CNNs for computational biology applications. Results: We present a systematic exploration of CNN architectures for predicting DNA sequence binding using a large compendium of transcription factor datasets. We identify the best-performing architectures by varying CNN width, depth and pooling designs. We find that adding convolutional kernels to a network is important for motif-based tasks. We show the benefits of CNNs in learning rich higher-order sequence features, such as secondary motifs and local sequence context, by comparing network performance on multiple modeling tasks ranging in difficulty. We also demonstrate how careful construction of sequence benchmark datasets, using approaches that control potentially confounding effects like positional or motif strength bias, is critical in making fair comparisons between competing methods. We explore how to establish the sufficiency of training data for these learning tasks, and we have created a flexible cloud-based framework that permits the rapid exploration of alternative neural network architectures for problems in computational biology. Availability and Implementation: All the models analyzed are available at http://cnn.csail.mit.edu. Contact: gifford@mit.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27307608
Fast associative memory + slow neural circuitry = the computational model of the brain.
NASA Astrophysics Data System (ADS)
Berkovich, Simon; Berkovich, Efraim; Lapir, Gennady
1997-08-01
We propose a computational model of the brain based on a fast associative memory and relatively slow neural processors. In this model, processing time is expensive but memory access is not, and therefore most algorithmic tasks would be accomplished by using large look-up tables as opposed to calculating. The essential feature of an associative memory in this context (characteristic for a holographic type memory) is that it works without an explicit mechanism for resolution of multiple responses. As a result, the slow neuronal processing elements, overwhelmed by the flow of information, operate as a set of templates for ranking of the retrieved information. This structure addresses the primary controversy in the brain architecture: distributed organization of memory vs. localization of processing centers. This computational model offers an intriguing explanation of many of the paradoxical features in the brain architecture, such as integration of sensors (through DMA mechanism), subliminal perception, universality of software, interrupts, fault-tolerance, certain bizarre possibilities for rapid arithmetics etc. In conventional computer science the presented type of a computational model did not attract attention as it goes against the technological grain by using a working memory faster than processing elements.
Porting plasma physics simulation codes to modern computing architectures using the
NASA Astrophysics Data System (ADS)
Germaschewski, Kai; Abbott, Stephen
2015-11-01
Available computing power has continued to grow exponentially even after single-core performance satured in the last decade. The increase has since been driven by more parallelism, both using more cores and having more parallelism in each core, e.g. in GPUs and Intel Xeon Phi. Adapting existing plasma physics codes is challenging, in particular as there is no single programming model that covers current and future architectures. We will introduce the open-source
Zeitouni, Jihad; Clough, Bret; Zeitouni, Suzanne; Saleem, Mohammed; Al Aisami, Kenan; Gregory, Carl
2017-01-01
Background: The use of lasers has become increasingly common in the field of medicine and dentistry, and there is a growing need for a deeper understanding of the procedure and its effects on tissue. The aim of this study was to compare the erbium-doped yttrium aluminium garnet (Er:YAG) laser and conventional drilling techniques, by observing the effects on trabecular bone microarchitecture and the extent of thermal and mechanical damage. Methods: Ovine femoral heads were employed to mimic maxillofacial trabecular bone, and cylindrical osteotomies were generated to mimic implant bed preparation. Various laser parameters were tested, as well as a conventional dental drilling technique. The specimens were then subjected to micro-computed tomographic (μCT) histomorphometic analysis and histology. Results: Herein, we demonstrate that mCT measurements of trabecular porosity provide quantitative evidence that laser-mediated cutting preserves the trabecular architecture and reduces thermal and mechanical damage at the margins of the cut. We confirmed these observations with histological studies. In contrast with laser-mediated cutting, conventional drilling resulted in trabecular collapse, reduction of porosity at the margin of the cut and histological signs of thermal damage. Conclusions: This study has demonstrated, for the first time, that mCT and quantification of porosity at the margin of the cut provides a quantitative insight into damage caused by bone cutting techniques. We further show that with laser-mediated cutting, the marrow remains exposed to the margins of the cut, facilitating cellular infiltration and likely accelerating healing. However, with drilling, trabecular collapse and thermal damage is likely to delay healing by restricting the passage of cells to the site of injury and causing localized cell death. PMID:29416849
NASA Astrophysics Data System (ADS)
Carlton, David Bryan
The exponential improvements in speed, energy efficiency, and cost that the computer industry has relied on for growth during the last 50 years are in danger of ending within the decade. These improvements all have relied on scaling the size of the silicon-based transistor that is at the heart of every modern CPU down to smaller and smaller length scales. However, as the size of the transistor reaches scales that are measured in the number of atoms that make it up, it is clear that this scaling cannot continue forever. As a result of this, there has been a great deal of research effort directed at the search for the next device that will continue to power the growth of the computer industry. However, due to the billions of dollars of investment that conventional silicon transistors have received over the years, it is unlikely that a technology will emerge that will be able to beat it outright in every performance category. More likely, different devices will possess advantages over conventional transistors for certain applications and uses. One of these emerging computing platforms is nanomagnetic logic (NML). NML-based circuits process information by manipulating the magnetization states of single-domain nanomagnets coupled to their nearest neighbors through magnetic dipole interactions. The state variable is magnetization direction and computations can take place without passing an electric current. This makes them extremely attractive as a replacement for conventional transistor-based computing architectures for certain ultra-low power applications. In most work to date, nanomagnetic logic circuits have used an external magnetic clocking field to reset the system between computations. The clocking field is then subsequently removed very slowly relative to the magnetization dynamics, guiding the nanomagnetic logic circuit adiabatically into its magnetic ground state. In this dissertation, I will discuss the dynamics behind this process and show that it is greatly influenced by thermal fluctuations. The magnetic ground state containing the answer to the computation is reached by a stochastic process very similar to the thermal annealing of crystalline materials. We will discuss how these dynamics affect the expected reliability, speed, and energy dissipation of NML systems operating under these conditions. Next I will show how a slight change in the properties of the nanomagnets that make up a NML circuit can completely alter the dynamics by which computations take place. The addition of biaxial anisotropy to the magnetic energy landscape creates a metastable state along the hard axis of the nanomagnet. This metastability can be used to remove the stochastic nature of the computation and has large implications for reliability, speed, and energy dissipation which will all be discussed. The changes to NML operation by the addition of biaxial anisotropy introduce new challenges to realizing a commercially viable logic architecture. In the final chapter, I will discuss these challenges and talk about the architectural changes that are necessary to make a working NML circuit based on nanomagnets with biaxial anisotropy.
Acceleration of FDTD mode solver by high-performance computing techniques.
Han, Lin; Xi, Yanping; Huang, Wei-Ping
2010-06-21
A two-dimensional (2D) compact finite-difference time-domain (FDTD) mode solver is developed based on wave equation formalism in combination with the matrix pencil method (MPM). The method is validated for calculation of both real guided and complex leaky modes of typical optical waveguides against the bench-mark finite-difference (FD) eigen mode solver. By taking advantage of the inherent parallel nature of the FDTD algorithm, the mode solver is implemented on graphics processing units (GPUs) using the compute unified device architecture (CUDA). It is demonstrated that the high-performance computing technique leads to significant acceleration of the FDTD mode solver with more than 30 times improvement in computational efficiency in comparison with the conventional FDTD mode solver running on CPU of a standard desktop computer. The computational efficiency of the accelerated FDTD method is in the same order of magnitude of the standard finite-difference eigen mode solver and yet require much less memory (e.g., less than 10%). Therefore, the new method may serve as an efficient, accurate and robust tool for mode calculation of optical waveguides even when the conventional eigen value mode solvers are no longer applicable due to memory limitation.
PIMS: Memristor-Based Processing-in-Memory-and-Storage.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cook, Jeanine
Continued progress in computing has augmented the quest for higher performance with a new quest for higher energy efficiency. This has led to the re-emergence of Processing-In-Memory (PIM) ar- chitectures that offer higher density and performance with some boost in energy efficiency. Past PIM work either integrated a standard CPU with a conventional DRAM to improve the CPU- memory link, or used a bit-level processor with Single Instruction Multiple Data (SIMD) control, but neither matched the energy consumption of the memory to the computation. We originally proposed to develop a new architecture derived from PIM that more effectively addressed energymore » efficiency for high performance scientific, data analytics, and neuromorphic applications. We also originally planned to implement a von Neumann architecture with arithmetic/logic units (ALUs) that matched the power consumption of an advanced storage array to maximize energy efficiency. Implementing this architecture in storage was our original idea, since by augmenting storage (in- stead of memory), the system could address both in-memory computation and applications that accessed larger data sets directly from storage, hence Processing-in-Memory-and-Storage (PIMS). However, as our research matured, we discovered several things that changed our original direc- tion, the most important being that a PIM that implements a standard von Neumann-type archi- tecture results in significant energy efficiency improvement, but only about a O(10) performance improvement. In addition to this, the emergence of new memory technologies moved us to propos- ing a non-von Neumann architecture, called Superstrider, implemented not in storage, but in a new DRAM technology called High Bandwidth Memory (HBM). HBM is a stacked DRAM tech- nology that includes a logic layer where an architecture such as Superstrider could potentially be implemented.« less
The AI Bus architecture for distributed knowledge-based systems
NASA Technical Reports Server (NTRS)
Schultz, Roger D.; Stobie, Iain
1991-01-01
The AI Bus architecture is layered, distributed object oriented framework developed to support the requirements of advanced technology programs for an order of magnitude improvement in software costs. The consequent need for highly autonomous computer systems, adaptable to new technology advances over a long lifespan, led to the design of an open architecture and toolbox for building large scale, robust, production quality systems. The AI Bus accommodates a mix of knowledge based and conventional components, running on heterogeneous, distributed real world and testbed environment. The concepts and design is described of the AI Bus architecture and its current implementation status as a Unix C++ library or reusable objects. Each high level semiautonomous agent process consists of a number of knowledge sources together with interagent communication mechanisms based on shared blackboards and message passing acquaintances. Standard interfaces and protocols are followed for combining and validating subsystems. Dynamic probes or demons provide an event driven means for providing active objects with shared access to resources, and each other, while not violating their security.
Energy-Efficient Neuromorphic Classifiers.
Martí, Daniel; Rigotti, Mattia; Seok, Mingoo; Fusi, Stefano
2016-10-01
Neuromorphic engineering combines the architectural and computational principles of systems neuroscience with semiconductor electronics, with the aim of building efficient and compact devices that mimic the synaptic and neural machinery of the brain. The energy consumptions promised by neuromorphic engineering are extremely low, comparable to those of the nervous system. Until now, however, the neuromorphic approach has been restricted to relatively simple circuits and specialized functions, thereby obfuscating a direct comparison of their energy consumption to that used by conventional von Neumann digital machines solving real-world tasks. Here we show that a recent technology developed by IBM can be leveraged to realize neuromorphic circuits that operate as classifiers of complex real-world stimuli. Specifically, we provide a set of general prescriptions to enable the practical implementation of neural architectures that compete with state-of-the-art classifiers. We also show that the energy consumption of these architectures, realized on the IBM chip, is typically two or more orders of magnitude lower than that of conventional digital machines implementing classifiers with comparable performance. Moreover, the spike-based dynamics display a trade-off between integration time and accuracy, which naturally translates into algorithms that can be flexibly deployed for either fast and approximate classifications, or more accurate classifications at the mere expense of longer running times and higher energy costs. This work finally proves that the neuromorphic approach can be efficiently used in real-world applications and has significant advantages over conventional digital devices when energy consumption is considered.
Program Helps Simulate Neural Networks
NASA Technical Reports Server (NTRS)
Villarreal, James; Mcintire, Gary
1993-01-01
Neural Network Environment on Transputer System (NNETS) computer program provides users high degree of flexibility in creating and manipulating wide variety of neural-network topologies at processing speeds not found in conventional computing environments. Supports back-propagation and back-propagation-related algorithms. Back-propagation algorithm used is implementation of Rumelhart's generalized delta rule. NNETS developed on INMOS Transputer(R). Predefines back-propagation network, Jordan network, and reinforcement network to assist users in learning and defining own networks. Also enables users to configure other neural-network paradigms from NNETS basic architecture. Small portion of software written in OCCAM(R) language.
Scaling to Nanotechnology Limits with the PIMS Computer Architecture and a new Scaling Rule
DOE Office of Scientific and Technical Information (OSTI.GOV)
Debenedictis, Erik P.
2015-02-01
We describe a new approach to computing that moves towards the limits of nanotechnology using a newly formulated sc aling rule. This is in contrast to the current computer industry scali ng away from von Neumann's original computer at the rate of Moore's Law. We extend Moore's Law to 3D, which l eads generally to architectures that integrate logic and memory. To keep pow er dissipation cons tant through a 2D surface of the 3D structure requires using adiabatic principles. We call our newly proposed architecture Processor In Memory and Storage (PIMS). We propose a new computational model that integratesmore » processing and memory into "tiles" that comprise logic, memory/storage, and communications functions. Since the programming model will be relatively stable as a system scales, programs repr esented by tiles could be executed in a PIMS system built with today's technology or could become the "schematic diagram" for implementation in an ultimate 3D nanotechnology of the future. We build a systems software approach that offers advantages over and above the technological and arch itectural advantages. Firs t, the algorithms may be more efficient in the conventional sens e of having fewer steps. Second, the algorithms may run with higher power efficiency per operation by being a better match for the adiabatic scaling ru le. The performance analysis based on demonstrated ideas in physical science suggests 80,000 x improvement in cost per operation for the (arguably) gene ral purpose function of emulating neurons in Deep Learning.« less
Storage strategies of eddy-current FE-BI model for GPU implementation
NASA Astrophysics Data System (ADS)
Bardel, Charles; Lei, Naiguang; Udpa, Lalita
2013-01-01
In the past few years graphical processing units (GPUs) have shown tremendous improvements in computational throughput over standard CPU architecture. However, this comes at the cost of restructuring the algorithms to meet the strengths and drawbacks of this GPU architecture. A major drawback is the state of limited memory, and hence storage of FE stiffness matrices on the GPU is important. In contrast to storage on CPU the GPU storage format has significant influence on the overall performance. This paper presents an investigation of a storage strategy in the implementation of a two-dimensional finite element-boundary integral (FE-BI) model for Eddy current NDE applications, on GPU architecture. Specifically, the high dimensional matrices are manipulated by examining the matrix structure and optimally splitting into structurally independent component matrices for efficient storage and retrieval of each component. Results obtained using the proposed approach are compared to those of conventional CPU implementation for validating the method.
Gilgamesh: A Multithreaded Processor-In-Memory Architecture for Petaflops Computing
NASA Technical Reports Server (NTRS)
Sterling, T. L.; Zima, H. P.
2002-01-01
Processor-in-Memory (PIM) architectures avoid the von Neumann bottleneck in conventional machines by integrating high-density DRAM and CMOS logic on the same chip. Parallel systems based on this new technology are expected to provide higher scalability, adaptability, robustness, fault tolerance and lower power consumption than current MPPs or commodity clusters. In this paper we describe the design of Gilgamesh, a PIM-based massively parallel architecture, and elements of its execution model. Gilgamesh extends existing PIM capabilities by incorporating advanced mechanisms for virtualizing tasks and data and providing adaptive resource management for load balancing and latency tolerance. The Gilgamesh execution model is based on macroservers, a middleware layer which supports object-based runtime management of data and threads allowing explicit and dynamic control of locality and load balancing. The paper concludes with a discussion of related research activities and an outlook to future work.
NASA Astrophysics Data System (ADS)
Pei, Zongrui; Eisenbach, Markus
2017-06-01
Dislocations are among the most important defects in determining the mechanical properties of both conventional alloys and high-entropy alloys. The Peierls-Nabarro model supplies an efficient pathway to their geometries and mobility. The difficulty in solving the integro-differential Peierls-Nabarro equation is how to effectively avoid the local minima in the energy landscape of a dislocation core. Among the other methods to optimize the dislocation core structures, we choose the algorithm of Particle Swarm Optimization, an algorithm that simulates the social behaviors of organisms. By employing more particles (bigger swarm) and more iterative steps (allowing them to explore for longer time), the local minima can be effectively avoided. But this would require more computational cost. The advantage of this algorithm is that it is readily parallelized in modern high computing architecture. We demonstrate the performance of our parallelized algorithm scales linearly with the number of employed cores.
An Update on Design Tools for Optimization of CMC 3D Fiber Architectures
NASA Technical Reports Server (NTRS)
Lang, J.; DiCarlo, J.
2012-01-01
Objective: Describe and up-date progress for NASA's efforts to develop 3D architectural design tools for CMC in general and for SIC/SiC composites in particular. Describe past and current sequential work efforts aimed at: Understanding key fiber and tow physical characteristics in conventional 2D and 3D woven architectures as revealed by microstructures in the literature. Developing an Excel program for down-selecting and predicting key geometric properties and resulting key fiber-controlled properties for various conventional 3D architectures. Developing a software tool for accurately visualizing all the key geometric details of conventional 3D architectures. Validating tools by visualizing and predicting the Internal geometry and key mechanical properties of a NASA SIC/SIC panel with a 3D orthogonal architecture. Applying the predictive and visualization tools toward advanced 3D orthogonal SiC/SIC composites, and combining them into a user-friendly software program.
NASA Technical Reports Server (NTRS)
Fijany, Amir; Toomarian, Benny N.
2000-01-01
There has been significant improvement in the performance of VLSI devices, in terms of size, power consumption, and speed, in recent years and this trend may also continue for some near future. However, it is a well known fact that there are major obstacles, i.e., physical limitation of feature size reduction and ever increasing cost of foundry, that would prevent the long term continuation of this trend. This has motivated the exploration of some fundamentally new technologies that are not dependent on the conventional feature size approach. Such technologies are expected to enable scaling to continue to the ultimate level, i.e., molecular and atomistic size. Quantum computing, quantum dot-based computing, DNA based computing, biologically inspired computing, etc., are examples of such new technologies. In particular, quantum-dots based computing by using Quantum-dot Cellular Automata (QCA) has recently been intensely investigated as a promising new technology capable of offering significant improvement over conventional VLSI in terms of reduction of feature size (and hence increase in integration level), reduction of power consumption, and increase of switching speed. Quantum dot-based computing and memory in general and QCA specifically, are intriguing to NASA due to their high packing density (10(exp 11) - 10(exp 12) per square cm ) and low power consumption (no transfer of current) and potentially higher radiation tolerant. Under Revolutionary Computing Technology (RTC) Program at the NASA/JPL Center for Integrated Space Microelectronics (CISM), we have been investigating the potential applications of QCA for the space program. To this end, exploiting the intrinsic features of QCA, we have designed novel QCA-based circuits for co-planner (i.e., single layer) and compact implementation of a class of data permutation matrices, a class of interconnection networks, and a bit-serial processor. Building upon these circuits, we have developed novel algorithms and QCA-based architectures for highly parallel and systolic computation of signal/image processing applications, such as FFT and Wavelet and Wlash-Hadamard Transforms.
Techniques for the rapid display and manipulation of 3-D biomedical data.
Goldwasser, S M; Reynolds, R A; Talton, D A; Walsh, E S
1988-01-01
The use of fully interactive 3-D workstations with true real-time performance will become increasingly common as technology matures and economical commercial systems become available. This paper provides a comprehensive introduction to high speed approaches to the display and manipulation of 3-D medical objects obtained from tomographic data acquisition systems such as CT, MR, and PET. A variety of techniques are outlined including the use of software on conventional minicomputers, hardware assist devices such as array processors and programmable frame buffers, and special purpose computer architecture for dedicated high performance systems. While both algorithms and architectures are addressed, the major theme centers around the utilization of hardware-based approaches including parallel processors for the implementation of true real-time systems.
Lunar Applications in Reconfigurable Computing
NASA Technical Reports Server (NTRS)
Somervill, Kevin
2008-01-01
NASA s Constellation Program is developing a lunar surface outpost in which reconfigurable computing will play a significant role. Reconfigurable systems provide a number of benefits over conventional software-based implementations including performance and power efficiency, while the use of standardized reconfigurable hardware provides opportunities to reduce logistical overhead. The current vision for the lunar surface architecture includes habitation, mobility, and communications systems, each of which greatly benefit from reconfigurable hardware in applications including video processing, natural feature recognition, data formatting, IP offload processing, and embedded control systems. In deploying reprogrammable hardware, considerations similar to those of software systems must be managed. There needs to be a mechanism for discovery enabling applications to locate and utilize the available resources. Also, application interfaces are needed to provide for both configuring the resources as well as transferring data between the application and the reconfigurable hardware. Each of these topics are explored in the context of deploying reconfigurable resources as an integral aspect of the lunar exploration architecture.
NASA Astrophysics Data System (ADS)
Farid, V. L.; Wonorahardjo, S.
2018-05-01
The implementation of Green Building criteria is relatively new in architectural practice, especially in Indonesia. Consequently, the integration of these criteria into design process has the potential to change the design process itself. The implementation of the green building criteria into the conventional design process will be discussed in this paper. The concept of this project is to design a residential unit with a natural air-conditioning system. To achieve this purpose, the Green Building criteria has been implemented since the beginning of the design process until the detailing process on the end of the project. Several studies was performed throughout the design process, such as: (1) Conceptual review, where several professionally proved theories related to Tropical Architecture and passive design are used for a reference, and (2) Computer simulations, such as Computational Fluid Dynamics (CFD) and wind tunnel simulation, used to represent the dynamic response of the surrounding environment towards the building. Hopefully this paper may become a reference for designing a green residential building.
Universal Quantum Computing with Arbitrary Continuous-Variable Encoding.
Lau, Hoi-Kwan; Plenio, Martin B
2016-09-02
Implementing a qubit quantum computer in continuous-variable systems conventionally requires the engineering of specific interactions according to the encoding basis states. In this work, we present a unified formalism to conduct universal quantum computation with a fixed set of operations but arbitrary encoding. By storing a qubit in the parity of two or four qumodes, all computing processes can be implemented by basis state preparations, continuous-variable exponential-swap operations, and swap tests. Our formalism inherits the advantages that the quantum information is decoupled from collective noise, and logical qubits with different encodings can be brought to interact without decoding. We also propose a possible implementation of the required operations by using interactions that are available in a variety of continuous-variable systems. Our work separates the "hardware" problem of engineering quantum-computing-universal interactions, from the "software" problem of designing encodings for specific purposes. The development of quantum computer architecture could hence be simplified.
Universal Quantum Computing with Arbitrary Continuous-Variable Encoding
NASA Astrophysics Data System (ADS)
Lau, Hoi-Kwan; Plenio, Martin B.
2016-09-01
Implementing a qubit quantum computer in continuous-variable systems conventionally requires the engineering of specific interactions according to the encoding basis states. In this work, we present a unified formalism to conduct universal quantum computation with a fixed set of operations but arbitrary encoding. By storing a qubit in the parity of two or four qumodes, all computing processes can be implemented by basis state preparations, continuous-variable exponential-swap operations, and swap tests. Our formalism inherits the advantages that the quantum information is decoupled from collective noise, and logical qubits with different encodings can be brought to interact without decoding. We also propose a possible implementation of the required operations by using interactions that are available in a variety of continuous-variable systems. Our work separates the "hardware" problem of engineering quantum-computing-universal interactions, from the "software" problem of designing encodings for specific purposes. The development of quantum computer architecture could hence be simplified.
NASA Astrophysics Data System (ADS)
Xu, Shigang; Liu, Yang
2018-03-01
The conventional pseudo-acoustic wave equations (PWEs) in arbitrary orthorhombic anisotropic (OA) media usually have coupled P- and SV-wave modes. These coupled equations may introduce strong SV-wave artifacts and numerical instabilities in P-wave simulation results and reverse-time migration (RTM) profiles. However, pure acoustic wave equations (PAWEs) completely decouple the P-wave component from the full elastic wavefield and naturally solve all the aforementioned problems. In this article, we present a novel PAWE in arbitrary OA media and compare it with the conventional coupled PWEs. Through decomposing the solution of the corresponding eigenvalue equation for the original PWE into an ellipsoidal differential operator (EDO) and an ellipsoidal scalar operator (ESO), the new PAWE in time-space domain is constructed by applying the combination of these two solvable operators and can effectively describe P-wave features in arbitrary OA media. Furthermore, we adopt the optimal finite-difference method (FDM) to solve the newly derived PAWE. In addition, the three-dimensional (3D) hybrid absorbing boundary condition (HABC) with some reasonable modifications is developed for reducing artificial edge reflections in anisotropic media. To improve computational efficiency in 3D case, we adopt graphic processing unit (GPU) with Compute Unified Device Architecture (CUDA) instead of traditional central processing unit (CPU) architecture. Several numerical experiments for arbitrary OA models confirm that the proposed schemes can produce pure, stable and accurate P-wave modeling results and RTM images with higher computational efficiency. Moreover, the 3D numerical simulations can provide us with a comprehensive and real description of wave propagation.
Computers in Academic Architecture Libraries.
ERIC Educational Resources Information Center
Willis, Alfred; And Others
1992-01-01
Computers are widely used in architectural research and teaching in U.S. schools of architecture. A survey of libraries serving these schools sought information on the emphasis placed on computers by the architectural curriculum, accessibility of computers to library staff, and accessibility of computers to library patrons. Survey results and…
A limit-cycle self-organizing map architecture for stable arm control.
Huang, Di-Wei; Gentili, Rodolphe J; Katz, Garrett E; Reggia, James A
2017-01-01
Inspired by the oscillatory nature of cerebral cortex activity, we recently proposed and studied self-organizing maps (SOMs) based on limit cycle neural activity in an attempt to improve the information efficiency and robustness of conventional single-node, single-pattern representations. Here we explore for the first time the use of limit cycle SOMs to build a neural architecture that controls a robotic arm by solving inverse kinematics in reach-and-hold tasks. This multi-map architecture integrates open-loop and closed-loop controls that learn to self-organize oscillatory neural representations and to harness non-fixed-point neural activity even for fixed-point arm reaching tasks. We show through computer simulations that our architecture generalizes well, achieves accurate, fast, and smooth arm movements, and is robust in the face of arm perturbations, map damage, and variations of internal timing parameters controlling the flow of activity. A robotic implementation is evaluated successfully without further training, demonstrating for the first time that limit cycle maps can control a physical robot arm. We conclude that architectures based on limit cycle maps can be organized to function effectively as neural controllers. Copyright © 2016 Elsevier Ltd. All rights reserved.
Memristor-based cellular nonlinear/neural network: design, analysis, and applications.
Duan, Shukai; Hu, Xiaofang; Dong, Zhekang; Wang, Lidan; Mazumder, Pinaki
2015-06-01
Cellular nonlinear/neural network (CNN) has been recognized as a powerful massively parallel architecture capable of solving complex engineering problems by performing trillions of analog operations per second. The memristor was theoretically predicted in the late seventies, but it garnered nascent research interest due to the recent much-acclaimed discovery of nanocrossbar memories by engineers at the Hewlett-Packard Laboratory. The memristor is expected to be co-integrated with nanoscale CMOS technology to revolutionize conventional von Neumann as well as neuromorphic computing. In this paper, a compact CNN model based on memristors is presented along with its performance analysis and applications. In the new CNN design, the memristor bridge circuit acts as the synaptic circuit element and substitutes the complex multiplication circuit used in traditional CNN architectures. In addition, the negative differential resistance and nonlinear current-voltage characteristics of the memristor have been leveraged to replace the linear resistor in conventional CNNs. The proposed CNN design has several merits, for example, high density, nonvolatility, and programmability of synaptic weights. The proposed memristor-based CNN design operations for implementing several image processing functions are illustrated through simulation and contrasted with conventional CNNs. Monte-Carlo simulation has been used to demonstrate the behavior of the proposed CNN due to the variations in memristor synaptic weights.
The circuit architecture of whole brains at the mesoscopic scale.
Mitra, Partha P
2014-09-17
Vertebrate brains of even moderate size are composed of astronomically large numbers of neurons and show a great degree of individual variability at the microscopic scale. This variation is presumably the result of phenotypic plasticity and individual experience. At a larger scale, however, relatively stable species-typical spatial patterns are observed in neuronal architecture, e.g., the spatial distributions of somata and axonal projection patterns, probably the result of a genetically encoded developmental program. The mesoscopic scale of analysis of brain architecture is the transitional point between a microscopic scale where individual variation is prominent and the macroscopic level where a stable, species-typical neural architecture is observed. The empirical existence of this scale, implicit in neuroanatomical atlases, combined with advances in computational resources, makes studying the circuit architecture of entire brains a practical task. A methodology has previously been proposed that employs a shotgun-like grid-based approach to systematically cover entire brain volumes with injections of neuronal tracers. This methodology is being employed to obtain mesoscale circuit maps in mouse and should be applicable to other vertebrate taxa. The resulting large data sets raise issues of data representation, analysis, and interpretation, which must be resolved. Even for data representation the challenges are nontrivial: the conventional approach using regional connectivity matrices fails to capture the collateral branching patterns of projection neurons. Future success of this promising research enterprise depends on the integration of previous neuroanatomical knowledge, partly through the development of suitable computational tools that encapsulate such expertise. Copyright © 2014 Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lyonnais, Marc; Smith, Matt; Mace, Kate P.
SCinet is the purpose-built network that operates during the International Conference for High Performance Computing,Networking, Storage and Analysis (Super Computing or SC). Created each year for the conference, SCinet brings to life a high-capacity network that supports applications and experiments that are a hallmark of the SC conference. The network links the convention center to research and commercial networks around the world. This resource serves as a platform for exhibitors to demonstrate the advanced computing resources of their home institutions and elsewhere by supporting a wide variety of applications. Volunteers from academia, government and industry work together to design andmore » deliver the SCinet infrastructure. Industry vendors and carriers donate millions of dollars in equipment and services needed to build and support the local and wide area networks. Planning begins more than a year in advance of each SC conference and culminates in a high intensity installation in the days leading up to the conference. The SCinet architecture for SC16 illustrates a dramatic increase in participation from the vendor community, particularly those that focus on network equipment. Software-Defined Networking (SDN) and Data Center Networking (DCN) are present in nearly all aspects of the design.« less
High-Speed GPU-Based Fully Three-Dimensional Diffuse Optical Tomographic System
Saikia, Manob Jyoti; Kanhirodan, Rajan; Mohan Vasu, Ram
2014-01-01
We have developed a graphics processor unit (GPU-) based high-speed fully 3D system for diffuse optical tomography (DOT). The reduction in execution time of 3D DOT algorithm, a severely ill-posed problem, is made possible through the use of (1) an algorithmic improvement that uses Broyden approach for updating the Jacobian matrix and thereby updating the parameter matrix and (2) the multinode multithreaded GPU and CUDA (Compute Unified Device Architecture) software architecture. Two different GPU implementations of DOT programs are developed in this study: (1) conventional C language program augmented by GPU CUDA and CULA routines (C GPU), (2) MATLAB program supported by MATLAB parallel computing toolkit for GPU (MATLAB GPU). The computation time of the algorithm on host CPU and the GPU system is presented for C and Matlab implementations. The forward computation uses finite element method (FEM) and the problem domain is discretized into 14610, 30823, and 66514 tetrahedral elements. The reconstruction time, so achieved for one iteration of the DOT reconstruction for 14610 elements, is 0.52 seconds for a C based GPU program for 2-plane measurements. The corresponding MATLAB based GPU program took 0.86 seconds. The maximum number of reconstructed frames so achieved is 2 frames per second. PMID:24891848
High-Speed GPU-Based Fully Three-Dimensional Diffuse Optical Tomographic System.
Saikia, Manob Jyoti; Kanhirodan, Rajan; Mohan Vasu, Ram
2014-01-01
We have developed a graphics processor unit (GPU-) based high-speed fully 3D system for diffuse optical tomography (DOT). The reduction in execution time of 3D DOT algorithm, a severely ill-posed problem, is made possible through the use of (1) an algorithmic improvement that uses Broyden approach for updating the Jacobian matrix and thereby updating the parameter matrix and (2) the multinode multithreaded GPU and CUDA (Compute Unified Device Architecture) software architecture. Two different GPU implementations of DOT programs are developed in this study: (1) conventional C language program augmented by GPU CUDA and CULA routines (C GPU), (2) MATLAB program supported by MATLAB parallel computing toolkit for GPU (MATLAB GPU). The computation time of the algorithm on host CPU and the GPU system is presented for C and Matlab implementations. The forward computation uses finite element method (FEM) and the problem domain is discretized into 14610, 30823, and 66514 tetrahedral elements. The reconstruction time, so achieved for one iteration of the DOT reconstruction for 14610 elements, is 0.52 seconds for a C based GPU program for 2-plane measurements. The corresponding MATLAB based GPU program took 0.86 seconds. The maximum number of reconstructed frames so achieved is 2 frames per second.
Demonstration of optical computing logics based on binary decision diagram.
Lin, Shiyun; Ishikawa, Yasuhiko; Wada, Kazumi
2012-01-16
Optical circuits are low power consumption and fast speed alternatives for the current information processing based on transistor circuits. However, because of no transistor function available in optics, the architecture for optical computing should be chosen that optics prefers. One of which is Binary Decision Diagram (BDD), where signal is processed by sending an optical signal from the root through a serial of switching nodes to the leaf (terminal). Speed of optical computing is limited by either transmission time of optical signals from the root to the leaf or switching time of a node. We have designed and experimentally demonstrated 1-bit and 2-bit adders based on the BDD architecture. The switching nodes are silicon ring resonators with a modulation depth of 10 dB and the states are changed by the plasma dispersion effect. The quality, Q of the rings designed is 1500, which allows fast transmission of signal, e.g., 1.3 ps calculated by a photon escaping time. A total processing time is thus analyzed to be ~9 ps for a 2-bit adder and would scales linearly with the number of bit. It is two orders of magnitude faster than the conventional CMOS circuitry, ~ns scale of delay. The presented results show the potential of fast speed optical computing circuits.
Correlation-based pattern recognition for implantable defibrillators.
Wilkins, J.
1996-01-01
An estimated 300,000 Americans die each year from cardiac arrhythmias. Historically, drug therapy or surgery were the only treatment options available for patients suffering from arrhythmias. Recently, implantable arrhythmia management devices have been developed. These devices allow abnormal cardiac rhythms to be sensed and corrected in vivo. Proper arrhythmia classification is critical to selecting the appropriate therapeutic intervention. The classification problem is made more challenging by the power/computation constraints imposed by the short battery life of implantable devices. Current devices utilize heart rate-based classification algorithms. Although easy to implement, rate-based approaches have unacceptably high error rates in distinguishing supraventricular tachycardia (SVT) from ventricular tachycardia (VT). Conventional morphology assessment techniques used in ECG analysis often require too much computation to be practical for implantable devices. In this paper, a computationally-efficient, arrhythmia classification architecture using correlation-based morphology assessment is presented. The architecture classifies individuals heart beats by assessing similarity between an incoming cardiac signal vector and a series of prestored class templates. A series of these beat classifications are used to make an overall rhythm assessment. The system makes use of several new results in the field of pattern recognition. The resulting system achieved excellent accuracy in discriminating SVT and VT. PMID:8947674
2010-06-01
DATES COVEREDAPR 2009 – JAN 2010 (From - To) APR 2009 – JAN 2010 4. TITLE AND SUBTITLE EMERGING NEUROMORPHIC COMPUTING ARCHITECTURES AND ENABLING...14. ABSTRACT The highly cross-disciplinary emerging field of neuromorphic computing architectures for cognitive information processing applications...belief systems, software, computer engineering, etc. In our effort to develop cognitive systems atop a neuromorphic computing architecture, we explored
Vectorized program architectures for supercomputer-aided circuit design
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rizzoli, V.; Ferlito, M.; Neri, A.
1986-01-01
Vector processors (supercomputers) can be effectively employed in MIC or MMIC applications to solve problems of large numerical size such as broad-band nonlinear design or statistical design (yield optimization). In order to fully exploit the capabilities of a vector hardware, any program architecture must be structured accordingly. This paper presents a possible approach to the ''semantic'' vectorization of microwave circuit design software. Speed-up factors of the order of 50 can be obtained on a typical vector processor (Cray X-MP), with respect to the most powerful scaler computers (CDC 7600), with cost reductions of more than one order of magnitude. Thismore » could broaden the horizon of microwave CAD techniques to include problems that are practically out of the reach of conventional systems.« less
C4ISR Architecture Working Group (AWG), Architecture Framework Version 2.0.
1997-12-18
Vision Name Name/identifier of document that contains doctrine, goals, or vision Type Doctrine, goals, or vision Description Text summary description...e.g., organization, directive, order) Description Text summary of tasking •Rules, Criteria, or Conventions Name Name/identifier of document that...contains rules, criteria, or conventions Type One of: rules, criteria, or conventions Description Text summary description of contents or
Physical Realization of a Supervised Learning System Built with Organic Memristive Synapses
NASA Astrophysics Data System (ADS)
Lin, Yu-Pu; Bennett, Christopher H.; Cabaret, Théo; Vodenicarevic, Damir; Chabi, Djaafar; Querlioz, Damien; Jousselme, Bruno; Derycke, Vincent; Klein, Jacques-Olivier
2016-09-01
Multiple modern applications of electronics call for inexpensive chips that can perform complex operations on natural data with limited energy. A vision for accomplishing this is implementing hardware neural networks, which fuse computation and memory, with low cost organic electronics. A challenge, however, is the implementation of synapses (analog memories) composed of such materials. In this work, we introduce robust, fastly programmable, nonvolatile organic memristive nanodevices based on electrografted redox complexes that implement synapses thanks to a wide range of accessible intermediate conductivity states. We demonstrate experimentally an elementary neural network, capable of learning functions, which combines four pairs of organic memristors as synapses and conventional electronics as neurons. Our architecture is highly resilient to issues caused by imperfect devices. It tolerates inter-device variability and an adaptable learning rule offers immunity against asymmetries in device switching. Highly compliant with conventional fabrication processes, the system can be extended to larger computing systems capable of complex cognitive tasks, as demonstrated in complementary simulations.
Pei, Zongrui; Max-Planck-Inst. fur Eisenforschung, Duseldorf; Eisenbach, Markus
2017-02-06
Dislocations are among the most important defects in determining the mechanical properties of both conventional alloys and high-entropy alloys. The Peierls-Nabarro model supplies an efficient pathway to their geometries and mobility. The difficulty in solving the integro-differential Peierls-Nabarro equation is how to effectively avoid the local minima in the energy landscape of a dislocation core. Among the other methods to optimize the dislocation core structures, we choose the algorithm of Particle Swarm Optimization, an algorithm that simulates the social behaviors of organisms. By employing more particles (bigger swarm) and more iterative steps (allowing them to explore for longer time), themore » local minima can be effectively avoided. But this would require more computational cost. The advantage of this algorithm is that it is readily parallelized in modern high computing architecture. We demonstrate the performance of our parallelized algorithm scales linearly with the number of employed cores.« less
Physical Realization of a Supervised Learning System Built with Organic Memristive Synapses.
Lin, Yu-Pu; Bennett, Christopher H; Cabaret, Théo; Vodenicarevic, Damir; Chabi, Djaafar; Querlioz, Damien; Jousselme, Bruno; Derycke, Vincent; Klein, Jacques-Olivier
2016-09-07
Multiple modern applications of electronics call for inexpensive chips that can perform complex operations on natural data with limited energy. A vision for accomplishing this is implementing hardware neural networks, which fuse computation and memory, with low cost organic electronics. A challenge, however, is the implementation of synapses (analog memories) composed of such materials. In this work, we introduce robust, fastly programmable, nonvolatile organic memristive nanodevices based on electrografted redox complexes that implement synapses thanks to a wide range of accessible intermediate conductivity states. We demonstrate experimentally an elementary neural network, capable of learning functions, which combines four pairs of organic memristors as synapses and conventional electronics as neurons. Our architecture is highly resilient to issues caused by imperfect devices. It tolerates inter-device variability and an adaptable learning rule offers immunity against asymmetries in device switching. Highly compliant with conventional fabrication processes, the system can be extended to larger computing systems capable of complex cognitive tasks, as demonstrated in complementary simulations.
Compact Interconnection Networks Based on Quantum Dots
NASA Technical Reports Server (NTRS)
Fijany, Amir; Toomarian, Nikzad; Modarress, Katayoon; Spotnitz, Matthew
2003-01-01
Architectures that would exploit the distinct characteristics of quantum-dot cellular automata (QCA) have been proposed for digital communication networks that connect advanced digital computing circuits. In comparison with networks of wires in conventional very-large-scale integrated (VLSI) circuitry, the networks according to the proposed architectures would be more compact. The proposed architectures would make it possible to implement complex interconnection schemes that are required for some advanced parallel-computing algorithms and that are difficult (and in many cases impractical) to implement in VLSI circuitry. The difficulty of implementation in VLSI and the major potential advantage afforded by QCA were described previously in Implementing Permutation Matrices by Use of Quantum Dots (NPO-20801), NASA Tech Briefs, Vol. 25, No. 10 (October 2001), page 42. To recapitulate: Wherever two wires in a conventional VLSI circuit cross each other and are required not to be in electrical contact with each other, there must be a layer of electrical insulation between them. This, in turn, makes it necessary to resort to a noncoplanar and possibly a multilayer design, which can be complex, expensive, and even impractical. As a result, much of the cost of designing VLSI circuits is associated with minimization of data routing and assignment of layers to minimize crossing of wires. Heretofore, these considerations have impeded the development of VLSI circuitry to implement complex, advanced interconnection schemes. On the other hand, with suitable design and under suitable operating conditions, QCA-based signal paths can be allowed to cross each other in the same plane without adverse effect. In principle, this characteristic could be exploited to design compact, coplanar, simple (relative to VLSI) QCA-based networks to implement complex, advanced interconnection schemes. The proposed architectures require two advances in QCA-based circuitry beyond basic QCA-based binary-signal wires described in the cited prior article. One of these advances would be the development of QCA-based wires capable of bidirectional transmission of signals. The other advance would be the development of QCA circuits capable of high-impedance state outputs. The high-impedance states would be utilized along with the 0- and 1-state outputs of QCA.
Developing a Distributed Computing Architecture at Arizona State University.
ERIC Educational Resources Information Center
Armann, Neil; And Others
1994-01-01
Development of Arizona State University's computing architecture, designed to ensure that all new distributed computing pieces will work together, is described. Aspects discussed include the business rationale, the general architectural approach, characteristics and objectives of the architecture, specific services, and impact on the university…
Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures
2017-10-04
Report: Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures The views, opinions and/or findings contained in this...Chapel Hill Title: Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures Report Term: 0-Other Email: dm...algorithms for scientific and geometric computing by exploiting the power and performance efficiency of heterogeneous shared memory architectures . These
CPU architecture for a fast and energy-saving calculation of convolution neural networks
NASA Astrophysics Data System (ADS)
Knoll, Florian J.; Grelcke, Michael; Czymmek, Vitali; Holtorf, Tim; Hussmann, Stephan
2017-06-01
One of the most difficult problem in the use of artificial neural networks is the computational capacity. Although large search engine companies own specially developed hardware to provide the necessary computing power, for the conventional user only remains the state of the art method, which is the use of a graphic processing unit (GPU) as a computational basis. Although these processors are well suited for large matrix computations, they need massive energy. Therefore a new processor on the basis of a field programmable gate array (FPGA) has been developed and is optimized for the application of deep learning. This processor is presented in this paper. The processor can be adapted for a particular application (in this paper to an organic farming application). The power consumption is only a fraction of a GPU application and should therefore be well suited for energy-saving applications.
Accelerating Climate Simulations Through Hybrid Computing
NASA Technical Reports Server (NTRS)
Zhou, Shujia; Sinno, Scott; Cruz, Carlos; Purcell, Mark
2009-01-01
Unconventional multi-core processors (e.g., IBM Cell B/E and NYIDIDA GPU) have emerged as accelerators in climate simulation. However, climate models typically run on parallel computers with conventional processors (e.g., Intel and AMD) using MPI. Connecting accelerators to this architecture efficiently and easily becomes a critical issue. When using MPI for connection, we identified two challenges: (1) identical MPI implementation is required in both systems, and; (2) existing MPI code must be modified to accommodate the accelerators. In response, we have extended and deployed IBM Dynamic Application Virtualization (DAV) in a hybrid computing prototype system (one blade with two Intel quad-core processors, two IBM QS22 Cell blades, connected with Infiniband), allowing for seamlessly offloading compute-intensive functions to remote, heterogeneous accelerators in a scalable, load-balanced manner. Currently, a climate solar radiation model running with multiple MPI processes has been offloaded to multiple Cell blades with approx.10% network overhead.
NASA Astrophysics Data System (ADS)
Christou, Michalis; Christoudias, Theodoros; Morillo, Julián; Alvarez, Damian; Merx, Hendrik
2016-09-01
We examine an alternative approach to heterogeneous cluster-computing in the many-core era for Earth system models, using the European Centre for Medium-Range Weather Forecasts Hamburg (ECHAM)/Modular Earth Submodel System (MESSy) Atmospheric Chemistry (EMAC) model as a pilot application on the Dynamical Exascale Entry Platform (DEEP). A set of autonomous coprocessors interconnected together, called Booster, complements a conventional HPC Cluster and increases its computing performance, offering extra flexibility to expose multiple levels of parallelism and achieve better scalability. The EMAC model atmospheric chemistry code (Module Efficiently Calculating the Chemistry of the Atmosphere (MECCA)) was taskified with an offload mechanism implemented using OmpSs directives. The model was ported to the MareNostrum 3 supercomputer to allow testing with Intel Xeon Phi accelerators on a production-size machine. The changes proposed in this paper are expected to contribute to the eventual adoption of Cluster-Booster division and Many Integrated Core (MIC) accelerated architectures in presently available implementations of Earth system models, towards exploiting the potential of a fully Exascale-capable platform.
Enabling Future Robotic Missions with Multicore Processors
NASA Technical Reports Server (NTRS)
Powell, Wesley A.; Johnson, Michael A.; Wilmot, Jonathan; Some, Raphael; Gostelow, Kim P.; Reeves, Glenn; Doyle, Richard J.
2011-01-01
Recent commercial developments in multicore processors (e.g. Tilera, Clearspeed, HyperX) have provided an option for high performance embedded computing that rivals the performance attainable with FPGA-based reconfigurable computing architectures. Furthermore, these processors offer more straightforward and streamlined application development by allowing the use of conventional programming languages and software tools in lieu of hardware design languages such as VHDL and Verilog. With these advantages, multicore processors can significantly enhance the capabilities of future robotic space missions. This paper will discuss these benefits, along with onboard processing applications where multicore processing can offer advantages over existing or competing approaches. This paper will also discuss the key artchitecural features of current commercial multicore processors. In comparison to the current art, the features and advancements necessary for spaceflight multicore processors will be identified. These include power reduction, radiation hardening, inherent fault tolerance, and support for common spacecraft bus interfaces. Lastly, this paper will explore how multicore processors might evolve with advances in electronics technology and how avionics architectures might evolve once multicore processors are inserted into NASA robotic spacecraft.
Frances: A Tool for Understanding Computer Architecture and Assembly Language
ERIC Educational Resources Information Center
Sondag, Tyler; Pokorny, Kian L.; Rajan, Hridesh
2012-01-01
Students in all areas of computing require knowledge of the computing device including software implementation at the machine level. Several courses in computer science curricula address these low-level details such as computer architecture and assembly languages. For such courses, there are advantages to studying real architectures instead of…
Tutorial: Computer architecture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gajski, D.D.; Milutinovic, V.M.; Siegel, H.J.
1986-01-01
This book presents the state-of-the-art in advanced computer architecture. It deals with the concepts underlying current architectures and covers approaches and techniques being used in the design of advanced computer systems.
Outline of a novel architecture for cortical computation.
Majumdar, Kaushik
2008-03-01
In this paper a novel architecture for cortical computation has been proposed. This architecture is composed of computing paths consisting of neurons and synapses. These paths have been decomposed into lateral, longitudinal and vertical components. Cortical computation has then been decomposed into lateral computation (LaC), longitudinal computation (LoC) and vertical computation (VeC). It has been shown that various loop structures in the cortical circuit play important roles in cortical computation as well as in memory storage and retrieval, keeping in conformity with the molecular basis of short and long term memory. A new learning scheme for the brain has also been proposed and how it is implemented within the proposed architecture has been explained. A few mathematical results about the architecture have been proposed, some of which are without proof.
Chip-scale integrated optical interconnects: a key enabler for future high-performance computing
NASA Astrophysics Data System (ADS)
Haney, Michael; Nair, Rohit; Gu, Tian
2012-01-01
High Performance Computing (HPC) systems are putting ever-increasing demands on the throughput efficiency of their interconnection fabrics. In this paper, the limits of conventional metal trace-based inter-chip interconnect fabrics are examined in the context of state-of-the-art HPC systems, which currently operate near the 1 GFLOPS/W level. The analysis suggests that conventional metal trace interconnects will limit performance to approximately 6 GFLOPS/W in larger HPC systems that require many computer chips to be interconnected in parallel processing architectures. As the HPC communications bottlenecks push closer to the processing chips, integrated Optical Interconnect (OI) technology may provide the ultra-high bandwidths needed at the inter- and intra-chip levels. With inter-chip photonic link energies projected to be less than 1 pJ/bit, integrated OI is projected to enable HPC architecture scaling to the 50 GFLOPS/W level and beyond - providing a path to Peta-FLOPS-level HPC within a single rack, and potentially even Exa-FLOPSlevel HPC for large systems. A new hybrid integrated chip-scale OI approach is described and evaluated. The concept integrates a high-density polymer waveguide fabric directly on top of a multiple quantum well (MQW) modulator array that is area-bonded to the Silicon computing chip. Grayscale lithography is used to fabricate 5 μm x 5 μm polymer waveguides and associated novel small-footprint total internal reflection-based vertical input/output couplers directly onto a layer containing an array of GaAs MQW devices configured to be either absorption modulators or photodetectors. An external continuous wave optical "power supply" is coupled into the waveguide links. Contrast ratios were measured using a test rider chip in place of a Silicon processing chip. The results suggest that sub-pJ/b chip-scale communication is achievable with this concept. When integrated into high-density integrated optical interconnect fabrics, it could provide a seamless interconnect fabric spanning the intra-
Flat holographic stereograms synthesized from computer-generated images by using LiNbO3 crystal
NASA Astrophysics Data System (ADS)
Qu, Zhi-Min; Liu, Jinsheng; Xu, Liangying
1991-02-01
In this paper we used a novel method for synthesizing computer gene rated images in which by means of a series of intermediate holograms recorded on Fe--doped LiNbO crystals a high quality flat stereograni with wide view angle and much deep 3D image ha been obtained. 2. INTRODUCTITJN As we all know the conventional holography is very limited. With the help of a contineous wave laser only stationary objects can be re corded due tO its insufficient power. Although some moving objects could be recorded by a pulsed laser the dimensions and kinds of object are restricted. If we would like to see a imaginary object or a three dimensional image designed by computer it is very difficult by means of above conventional holography. Of course if we have a two-dimensional image on a comouter screen we can rotate it to give a three-dimensional perspective but we can never really see it as a solid. However flat holographic stereograrns synthesized from computer generated images will make one directly see the comoute results in the form of 3D image. Obviously it will have wide applications in design architecture medicine education and arts. 406 / SPIE Vol. 1238 Three-Dimensional Holography: Science Culture Education (1989)
Architecture Adaptive Computing Environment
NASA Technical Reports Server (NTRS)
Dorband, John E.
2006-01-01
Architecture Adaptive Computing Environment (aCe) is a software system that includes a language, compiler, and run-time library for parallel computing. aCe was developed to enable programmers to write programs, more easily than was previously possible, for a variety of parallel computing architectures. Heretofore, it has been perceived to be difficult to write parallel programs for parallel computers and more difficult to port the programs to different parallel computing architectures. In contrast, aCe is supportable on all high-performance computing architectures. Currently, it is supported on LINUX clusters. aCe uses parallel programming constructs that facilitate writing of parallel programs. Such constructs were used in single-instruction/multiple-data (SIMD) programming languages of the 1980s, including Parallel Pascal, Parallel Forth, C*, *LISP, and MasPar MPL. In aCe, these constructs are extended and implemented for both SIMD and multiple- instruction/multiple-data (MIMD) architectures. Two new constructs incorporated in aCe are those of (1) scalar and virtual variables and (2) pre-computed paths. The scalar-and-virtual-variables construct increases flexibility in optimizing memory utilization in various architectures. The pre-computed-paths construct enables the compiler to pre-compute part of a communication operation once, rather than computing it every time the communication operation is performed.
Corwin, John; Silberschatz, Avi; Miller, Perry L; Marenco, Luis
2007-01-01
Data sparsity and schema evolution issues affecting clinical informatics and bioinformatics communities have led to the adoption of vertical or object-attribute-value-based database schemas to overcome limitations posed when using conventional relational database technology. This paper explores these issues and discusses why biomedical data are difficult to model using conventional relational techniques. The authors propose a solution to these obstacles based on a relational database engine using a sparse, column-store architecture. The authors provide benchmarks comparing the performance of queries and schema-modification operations using three different strategies: (1) the standard conventional relational design; (2) past approaches used by biomedical informatics researchers; and (3) their sparse, column-store architecture. The performance results show that their architecture is a promising technique for storing and processing many types of data that are not handled well by the other two semantic data models.
Multi-processor including data flow accelerator module
Davidson, George S.; Pierce, Paul E.
1990-01-01
An accelerator module for a data flow computer includes an intelligent memory. The module is added to a multiprocessor arrangement and uses a shared tagged memory architecture in the data flow computer. The intelligent memory module assigns locations for holding data values in correspondence with arcs leading to a node in a data dependency graph. Each primitive computation is associated with a corresponding memory cell, including a number of slots for operands needed to execute a primitive computation, a primitive identifying pointer, and linking slots for distributing the result of the cell computation to other cells requiring that result as an operand. Circuitry is provided for utilizing tag bits to determine automatically when all operands required by a processor are available and for scheduling the primitive for execution in a queue. Each memory cell of the module may be associated with any of the primitives, and the particular primitive to be executed by the processor associated with the cell is identified by providing an index, such as the cell number for the primitive, to the primitive lookup table of starting addresses. The module thus serves to perform functions previously performed by a number of sections of data flow architectures and coexists with conventional shared memory therein. A multiprocessing system including the module operates in a hybrid mode, wherein the same processing modules are used to perform some processing in a sequential mode, under immediate control of an operating system, while performing other processing in a data flow mode.
Electronic neural networks for global optimization
NASA Technical Reports Server (NTRS)
Thakoor, A. P.; Moopenn, A. W.; Eberhardt, S.
1990-01-01
An electronic neural network with feedback architecture, implemented in analog custom VLSI is described. Its application to problems of global optimization for dynamic assignment is discussed. The convergence properties of the neural network hardware are compared with computer simulation results. The neural network's ability to provide optimal or near optimal solutions within only a few neuron time constants, a speed enhancement of several orders of magnitude over conventional search methods, is demonstrated. The effect of noise on the circuit dynamics and the convergence behavior of the neural network hardware is also examined.
Baseline Architecture of ITER Control System
NASA Astrophysics Data System (ADS)
Wallander, A.; Di Maio, F.; Journeaux, J.-Y.; Klotz, W.-D.; Makijarvi, P.; Yonekawa, I.
2011-08-01
The control system of ITER consists of thousands of computers processing hundreds of thousands of signals. The control system, being the primary tool for operating the machine, shall integrate, control and coordinate all these computers and signals and allow a limited number of staff to operate the machine from a central location with minimum human intervention. The primary functions of the ITER control system are plant control, supervision and coordination, both during experimental pulses and 24/7 continuous operation. The former can be split in three phases; preparation of the experiment by defining all parameters; executing the experiment including distributed feed-back control and finally collecting, archiving, analyzing and presenting all data produced by the experiment. We define the control system as a set of hardware and software components with well defined characteristics. The architecture addresses the organization of these components and their relationship to each other. We distinguish between physical and functional architecture, where the former defines the physical connections and the latter the data flow between components. In this paper, we identify the ITER control system based on the plant breakdown structure. Then, the control system is partitioned into a workable set of bounded subsystems. This partition considers at the same time the completeness and the integration of the subsystems. The components making up subsystems are identified and defined, a naming convention is introduced and the physical networks defined. Special attention is given to timing and real-time communication for distributed control. Finally we discuss baseline technologies for implementing the proposed architecture based on analysis, market surveys, prototyping and benchmarking carried out during the last year.
New architecture for dynamic frame-skipping transcoder.
Fung, Kai-Tat; Chan, Yui-Lam; Siu, Wan-Chi
2002-01-01
Transcoding is a key technique for reducing the bit rate of a previously compressed video signal. A high transcoding ratio may result in an unacceptable picture quality when the full frame rate of the incoming video bitstream is used. Frame skipping is often used as an efficient scheme to allocate more bits to the representative frames, so that an acceptable quality for each frame can be maintained. However, the skipped frame must be decompressed completely, which might act as a reference frame to nonskipped frames for reconstruction. The newly quantized discrete cosine transform (DCT) coefficients of the prediction errors need to be re-computed for the nonskipped frame with reference to the previous nonskipped frame; this can create undesirable complexity as well as introduce re-encoding errors. In this paper, we propose new algorithms and a novel architecture for frame-rate reduction to improve picture quality and to reduce complexity. The proposed architecture is mainly performed on the DCT domain to achieve a transcoder with low complexity. With the direct addition of DCT coefficients and an error compensation feedback loop, re-encoding errors are reduced significantly. Furthermore, we propose a frame-rate control scheme which can dynamically adjust the number of skipped frames according to the incoming motion vectors and re-encoding errors due to transcoding such that the decoded sequence can have a smooth motion as well as better transcoded pictures. Experimental results show that, as compared to the conventional transcoder, the new architecture for frame-skipping transcoder is more robust, produces fewer requantization errors, and has reduced computational complexity.
An Open Architecture to Support Social and Health Services in a Smart TV Environment.
Costa, Carlos Rivas; Anido-Rifon, Luis E; Fernandez-Iglesias, Manuel J
2017-03-01
To design, implement, and test a solution to provide social and health services for the elderly at home based on smart TV technologies and access to all services. The architecture proposed is based on an open software platform and standard personal computing hardware. This provides great flexibility to develop new applications over the underlying infrastructure or to integrate new devices, for instance to monitor a broad range of vital signs in those cases where home monitoring is required. An actual system as a proof-of-concept was designed, implemented, and deployed. Applications range from social network clients to vital signs monitoring; from interactive TV contests to conventional online care applications such as medication reminders or telemedicine. In both cases, the results have been very positive, confirming the initial perception of the TV as a convenient, easy-to-use technology to provide social and health care. The TV set is a much more familiar computing interface for most senior users, and as a consequence, smart TVs become a most convenient solution for the design and implementation of applications and services targeted to this user group. This proposal has been tested in real setting with 62 senior people at their homes. Users included both individuals with experience using computers and others reluctant to them.
NASA Astrophysics Data System (ADS)
Weigel, Martin
2011-09-01
Over the last couple of years it has been realized that the vast computational power of graphics processing units (GPUs) could be harvested for purposes other than the video game industry. This power, which at least nominally exceeds that of current CPUs by large factors, results from the relative simplicity of the GPU architectures as compared to CPUs, combined with a large number of parallel processing units on a single chip. To benefit from this setup for general computing purposes, the problems at hand need to be prepared in a way to profit from the inherent parallelism and hierarchical structure of memory accesses. In this contribution I discuss the performance potential for simulating spin models, such as the Ising model, on GPU as compared to conventional simulations on CPU.
NASA Astrophysics Data System (ADS)
Zhang, Zhenhai; Li, Kejie; Wu, Xiaobing; Zhang, Shujiang
2008-03-01
The unwrapped and correcting algorithm based on Coordinate Rotation Digital Computer (CORDIC) and bilinear interpolation algorithm was presented in this paper, with the purpose of processing dynamic panoramic annular image. An original annular panoramic image captured by panoramic annular lens (PAL) can be unwrapped and corrected to conventional rectangular image without distortion, which is much more coincident with people's vision. The algorithm for panoramic image processing is modeled by VHDL and implemented in FPGA. The experimental results show that the proposed panoramic image algorithm for unwrapped and distortion correction has the lower computation complexity and the architecture for dynamic panoramic image processing has lower hardware cost and power consumption. And the proposed algorithm is valid.
Supporting Undergraduate Computer Architecture Students Using a Visual MIPS64 CPU Simulator
ERIC Educational Resources Information Center
Patti, D.; Spadaccini, A.; Palesi, M.; Fazzino, F.; Catania, V.
2012-01-01
The topics of computer architecture are always taught using an Assembly dialect as an example. The most commonly used textbooks in this field use the MIPS64 Instruction Set Architecture (ISA) to help students in learning the fundamentals of computer architecture because of its orthogonality and its suitability for real-world applications. This…
Memristor-Based Computing Architecture: Design Methodologies and Circuit Techniques
2013-03-01
MEMRISTOR-BASED COMPUTING ARCHITECTURE : DESIGN METHODOLOGIES AND CIRCUIT TECHNIQUES POLYTECHNIC INSTITUTE OF NEW YORK UNIVERSITY...TECHNICAL REPORT 3. DATES COVERED (From - To) OCT 2010 – OCT 2012 4. TITLE AND SUBTITLE MEMRISTOR-BASED COMPUTING ARCHITECTURE : DESIGN METHODOLOGIES...schemes for a memristor-based reconfigurable architecture design have not been fully explored yet. Therefore, in this project, we investigated
MAX - An advanced parallel computer for space applications
NASA Technical Reports Server (NTRS)
Lewis, Blair F.; Bunker, Robert L.
1991-01-01
MAX is a fault-tolerant multicomputer hardware and software architecture designed to meet the needs of NASA spacecraft systems. It consists of conventional computing modules (computers) connected via a dual network topology. One network is used to transfer data among the computers and between computers and I/O devices. This network's topology is arbitrary. The second network operates as a broadcast medium for operating system synchronization messages and supports the operating system's Byzantine resilience. A fully distributed operating system supports multitasking in an asynchronous event and data driven environment. A large grain dataflow paradigm is used to coordinate the multitasking and provide easy control of concurrency. It is the basis of the system's fault tolerance and allows both static and dynamical location of tasks. Redundant execution of tasks with software voting of results may be specified for critical tasks. The dataflow paradigm also supports simplified software design, test and maintenance. A unique feature is a method for reliably patching code in an executing dataflow application.
Brain architecture: a design for natural computation.
Kaiser, Marcus
2007-12-15
Fifty years ago, John von Neumann compared the architecture of the brain with that of the computers he invented and which are still in use today. In those days, the organization of computers was based on concepts of brain organization. Here, we give an update on current results on the global organization of neural systems. For neural systems, we outline how the spatial and topological architecture of neuronal and cortical networks facilitates robustness against failures, fast processing and balanced network activation. Finally, we discuss mechanisms of self-organization for such architectures. After all, the organization of the brain might again inspire computer architecture.
A Programmable Five Qubit Quantum Computer Using Trapped Atomic Ions
NASA Astrophysics Data System (ADS)
Debnath, Shantanu
Quantum computers can solve certain problems more efficiently compared to conventional classical methods. In the endeavor to build a quantum computer, several competing platforms have emerged that can implement certain quantum algorithms using a few qubits. However, the demonstrations so far have been done usually by tailoring the hardware to meet the requirements of a particular algorithm implemented for a limited number of instances. Although such proof of principal implementations are important to verify the working of algorithms on a physical system, they further need to have the potential to serve as a general purpose quantum computer allowing the flexibility required for running multiple algorithms and be scaled up to host more qubits. Here we demonstrate a small programmable quantum computer based on five trapped atomic ions each of which serves as a qubit. By optically resolving each ion we can individually address them in order to perform a complete set of single-qubit and fully connected two-qubit quantum gates and alsoperform efficient individual qubit measurements. We implement a computation architecture that accepts an algorithm from a user interface in the form of a standard logic gate sequence and decomposes it into fundamental quantum operations that are native to the hardware using a set of compilation instructions that are defined within the software. These operations are then effected through a pattern of laser pulses that perform coherent rotations on targeted qubits in the chain. The architecture implemented in the experiment therefore gives us unprecedented flexibility in the programming of any quantum algorithm while staying blind to the underlying hardware. As a demonstration we implement the Deutsch-Jozsa and Bernstein-Vazirani algorithms on the five-qubit processor and achieve average success rates of 95 and 90 percent, respectively. We also implement a five-qubit coherent quantum Fourier transform and examine its performance in the period finding and phase estimation protocol. We find fidelities of 84 and 62 percent, respectively. While maintaining the same computation architecture the system can be scaled to more ions using resources that scale favorably (O(N. 2)) with the numberof qubits N.
A new software-based architecture for quantum computer
NASA Astrophysics Data System (ADS)
Wu, Nan; Song, FangMin; Li, Xiangdong
2010-04-01
In this paper, we study a reliable architecture of a quantum computer and a new instruction set and machine language for the architecture, which can improve the performance and reduce the cost of the quantum computing. We also try to address some key issues in detail in the software-driven universal quantum computers.
A Review of Lightweight Thread Approaches for High Performance Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Castello, Adrian; Pena, Antonio J.; Seo, Sangmin
High-level, directive-based solutions are becoming the programming models (PMs) of the multi/many-core architectures. Several solutions relying on operating system (OS) threads perfectly work with a moderate number of cores. However, exascale systems will spawn hundreds of thousands of threads in order to exploit their massive parallel architectures and thus conventional OS threads are too heavy for that purpose. Several lightweight thread (LWT) libraries have recently appeared offering lighter mechanisms to tackle massive concurrency. In order to examine the suitability of LWTs in high-level runtimes, we develop a set of microbenchmarks consisting of commonlyfound patterns in current parallel codes. Moreover, wemore » study the semantics offered by some LWT libraries in order to expose the similarities between different LWT application programming interfaces. This study reveals that a reduced set of LWT functions can be sufficient to cover the common parallel code patterns and that those LWT libraries perform better than OS threads-based solutions in cases where task and nested parallelism are becoming more popular with new architectures.« less
Engineering incremental resistive switching in TaOx based memristors for brain-inspired computing
NASA Astrophysics Data System (ADS)
Wang, Zongwei; Yin, Minghui; Zhang, Teng; Cai, Yimao; Wang, Yangyuan; Yang, Yuchao; Huang, Ru
2016-07-01
Brain-inspired neuromorphic computing is expected to revolutionize the architecture of conventional digital computers and lead to a new generation of powerful computing paradigms, where memristors with analog resistive switching are considered to be potential solutions for synapses. Here we propose and demonstrate a novel approach to engineering the analog switching linearity in TaOx based memristors, that is, by homogenizing the filament growth/dissolution rate via the introduction of an ion diffusion limiting layer (DLL) at the TiN/TaOx interface. This has effectively mitigated the commonly observed two-regime conductance modulation behavior and led to more uniform filament growth (dissolution) dynamics with time, therefore significantly improving the conductance modulation linearity that is desirable in neuromorphic systems. In addition, the introduction of the DLL also served to reduce the power consumption of the memristor, and important synaptic learning rules in biological brains such as spike timing dependent plasticity were successfully implemented using these optimized devices. This study could provide general implications for continued optimizations of memristor performance for neuromorphic applications, by carefully tuning the dynamics involved in filament growth and dissolution.Brain-inspired neuromorphic computing is expected to revolutionize the architecture of conventional digital computers and lead to a new generation of powerful computing paradigms, where memristors with analog resistive switching are considered to be potential solutions for synapses. Here we propose and demonstrate a novel approach to engineering the analog switching linearity in TaOx based memristors, that is, by homogenizing the filament growth/dissolution rate via the introduction of an ion diffusion limiting layer (DLL) at the TiN/TaOx interface. This has effectively mitigated the commonly observed two-regime conductance modulation behavior and led to more uniform filament growth (dissolution) dynamics with time, therefore significantly improving the conductance modulation linearity that is desirable in neuromorphic systems. In addition, the introduction of the DLL also served to reduce the power consumption of the memristor, and important synaptic learning rules in biological brains such as spike timing dependent plasticity were successfully implemented using these optimized devices. This study could provide general implications for continued optimizations of memristor performance for neuromorphic applications, by carefully tuning the dynamics involved in filament growth and dissolution. Electronic supplementary information (ESI) available. See DOI: 10.1039/c6nr00476h
Architectures for single-chip image computing
NASA Astrophysics Data System (ADS)
Gove, Robert J.
1992-04-01
This paper will focus on the architectures of VLSI programmable processing components for image computing applications. TI, the maker of industry-leading RISC, DSP, and graphics components, has developed an architecture for a new-generation of image processors capable of implementing a plurality of image, graphics, video, and audio computing functions. We will show that the use of a single-chip heterogeneous MIMD parallel architecture best suits this class of processors--those which will dominate the desktop multimedia, document imaging, computer graphics, and visualization systems of this decade.
Transitioning ISR architecture into the cloud
NASA Astrophysics Data System (ADS)
Lash, Thomas D.
2012-06-01
Emerging cloud computing platforms offer an ideal opportunity for Intelligence, Surveillance, and Reconnaissance (ISR) intelligence analysis. Cloud computing platforms help overcome challenges and limitations of traditional ISR architectures. Modern ISR architectures can benefit from examining commercial cloud applications, especially as they relate to user experience, usage profiling, and transformational business models. This paper outlines legacy ISR architectures and their limitations, presents an overview of cloud technologies and their applications to the ISR intelligence mission, and presents an idealized ISR architecture implemented with cloud computing.
Architecture-Adaptive Computing Environment: A Tool for Teaching Parallel Programming
NASA Technical Reports Server (NTRS)
Dorband, John E.; Aburdene, Maurice F.
2002-01-01
Recently, networked and cluster computation have become very popular. This paper is an introduction to a new C based parallel language for architecture-adaptive programming, aCe C. The primary purpose of aCe (Architecture-adaptive Computing Environment) is to encourage programmers to implement applications on parallel architectures by providing them the assurance that future architectures will be able to run their applications with a minimum of modification. A secondary purpose is to encourage computer architects to develop new types of architectures by providing an easily implemented software development environment and a library of test applications. This new language should be an ideal tool to teach parallel programming. In this paper, we will focus on some fundamental features of aCe C.
Toward a Fault Tolerant Architecture for Vital Medical-Based Wearable Computing.
Abdali-Mohammadi, Fardin; Bajalan, Vahid; Fathi, Abdolhossein
2015-12-01
Advancements in computers and electronic technologies have led to the emergence of a new generation of efficient small intelligent systems. The products of such technologies might include Smartphones and wearable devices, which have attracted the attention of medical applications. These products are used less in critical medical applications because of their resource constraint and failure sensitivity. This is due to the fact that without safety considerations, small-integrated hardware will endanger patients' lives. Therefore, proposing some principals is required to construct wearable systems in healthcare so that the existing concerns are dealt with. Accordingly, this paper proposes an architecture for constructing wearable systems in critical medical applications. The proposed architecture is a three-tier one, supporting data flow from body sensors to cloud. The tiers of this architecture include wearable computers, mobile computing, and mobile cloud computing. One of the features of this architecture is its high possible fault tolerance due to the nature of its components. Moreover, the required protocols are presented to coordinate the components of this architecture. Finally, the reliability of this architecture is assessed by simulating the architecture and its components, and other aspects of the proposed architecture are discussed.
Advanced computer architecture specification for automated weld systems
NASA Technical Reports Server (NTRS)
Katsinis, Constantine
1994-01-01
This report describes the requirements for an advanced automated weld system and the associated computer architecture, and defines the overall system specification from a broad perspective. According to the requirements of welding procedures as they relate to an integrated multiaxis motion control and sensor architecture, the computer system requirements are developed based on a proven multiple-processor architecture with an expandable, distributed-memory, single global bus architecture, containing individual processors which are assigned to specific tasks that support sensor or control processes. The specified architecture is sufficiently flexible to integrate previously developed equipment, be upgradable and allow on-site modifications.
Quantum Computing Architectural Design
NASA Astrophysics Data System (ADS)
West, Jacob; Simms, Geoffrey; Gyure, Mark
2006-03-01
Large scale quantum computers will invariably require scalable architectures in addition to high fidelity gate operations. Quantum computing architectural design (QCAD) addresses the problems of actually implementing fault-tolerant algorithms given physical and architectural constraints beyond those of basic gate-level fidelity. Here we introduce a unified framework for QCAD that enables the scientist to study the impact of varying error correction schemes, architectural parameters including layout and scheduling, and physical operations native to a given architecture. Our software package, aptly named QCAD, provides compilation, manipulation/transformation, multi-paradigm simulation, and visualization tools. We demonstrate various features of the QCAD software package through several examples.
A hybrid joint based controller for an upper extremity exoskeleton
NASA Astrophysics Data System (ADS)
Mohd Khairuddin, Ismail; Taha, Zahari; Majeed, Anwar P. P. Abdul; Hakeem Deboucha, Abdel; Azraai Mohd Razman, Mohd; Aziz Jaafar, Abdul; Mohamed, Zulkifli
2016-02-01
This paper presents the modelling and control of a two degree of freedom upper extremity exoskeleton. The Euler-Lagrange formulation was used in deriving the dynamic modelling of both the human upper limb as well as the exoskeleton that consists of the upper arm and the forearm. The human model is based on anthropometrical measurements of the upper limb. The proportional-derivative (PD) computed torque control (CTC) architecture is employed in this study to investigate its efficacy performing joint-space control objectives specifically in rehabilitating the elbow and shoulder joints along the sagittal plane. An active force control (AFC) algorithm is also incorporated into the PD-CTC to investigate the effectiveness of this hybrid system in compensating disturbances. It was found that the AFC- PD-CTC performs well against the disturbances introduced into the system whilst achieving acceptable trajectory tracking as compared to the conventional PD-CTC control architecture.
Recursive computer architecture for VLSI
DOE Office of Scientific and Technical Information (OSTI.GOV)
Treleaven, P.C.; Hopkins, R.P.
1982-01-01
A general-purpose computer architecture based on the concept of recursion and suitable for VLSI computer systems built from replicated (lego-like) computing elements is presented. The recursive computer architecture is defined by presenting a program organisation, a machine organisation and an experimental machine implementation oriented to VLSI. The experimental implementation is being restricted to simple, identical microcomputers each containing a memory, a processor and a communications capability. This future generation of lego-like computer systems are termed fifth generation computers by the Japanese. 30 references.
Hypercluster Parallel Processor
NASA Technical Reports Server (NTRS)
Blech, Richard A.; Cole, Gary L.; Milner, Edward J.; Quealy, Angela
1992-01-01
Hypercluster computer system includes multiple digital processors, operation of which coordinated through specialized software. Configurable according to various parallel-computing architectures of shared-memory or distributed-memory class, including scalar computer, vector computer, reduced-instruction-set computer, and complex-instruction-set computer. Designed as flexible, relatively inexpensive system that provides single programming and operating environment within which one can investigate effects of various parallel-computing architectures and combinations on performance in solution of complicated problems like those of three-dimensional flows in turbomachines. Hypercluster software and architectural concepts are in public domain.
Rapid indirect trajectory optimization on highly parallel computing architectures
NASA Astrophysics Data System (ADS)
Antony, Thomas
Trajectory optimization is a field which can benefit greatly from the advantages offered by parallel computing. The current state-of-the-art in trajectory optimization focuses on the use of direct optimization methods, such as the pseudo-spectral method. These methods are favored due to their ease of implementation and large convergence regions while indirect methods have largely been ignored in the literature in the past decade except for specific applications in astrodynamics. It has been shown that the shortcomings conventionally associated with indirect methods can be overcome by the use of a continuation method in which complex trajectory solutions are obtained by solving a sequence of progressively difficult optimization problems. High performance computing hardware is trending towards more parallel architectures as opposed to powerful single-core processors. Graphics Processing Units (GPU), which were originally developed for 3D graphics rendering have gained popularity in the past decade as high-performance, programmable parallel processors. The Compute Unified Device Architecture (CUDA) framework, a parallel computing architecture and programming model developed by NVIDIA, is one of the most widely used platforms in GPU computing. GPUs have been applied to a wide range of fields that require the solution of complex, computationally demanding problems. A GPU-accelerated indirect trajectory optimization methodology which uses the multiple shooting method and continuation is developed using the CUDA platform. The various algorithmic optimizations used to exploit the parallelism inherent in the indirect shooting method are described. The resulting rapid optimal control framework enables the construction of high quality optimal trajectories that satisfy problem-specific constraints and fully satisfy the necessary conditions of optimality. The benefits of the framework are highlighted by construction of maximum terminal velocity trajectories for a hypothetical long range weapon system. The techniques used to construct an initial guess from an analytic near-ballistic trajectory and the methods used to formulate the necessary conditions of optimality in a manner that is transparent to the designer are discussed. Various hypothetical mission scenarios that enforce different combinations of initial, terminal, interior point and path constraints demonstrate the rapid construction of complex trajectories without requiring any a-priori insight into the structure of the solutions. Trajectory problems of this kind were previously considered impractical to solve using indirect methods. The performance of the GPU-accelerated solver is found to be 2x--4x faster than MATLAB's bvp4c, even while running on GPU hardware that is five years behind the state-of-the-art.
Raster Scan Computer Image Generation (CIG) System Based On Refresh Memory
NASA Astrophysics Data System (ADS)
Dichter, W.; Doris, K.; Conkling, C.
1982-06-01
A full color, Computer Image Generation (CIG) raster visual system has been developed which provides a high level of training sophistication by utilizing advanced semiconductor technology and innovative hardware and firmware techniques. Double buffered refresh memory and efficient algorithms eliminate the problem of conventional raster line ordering by allowing the generated image to be stored in a random fashion. Modular design techniques and simplified architecture provide significant advantages in reduced system cost, standardization of parts, and high reliability. The major system components are a general purpose computer to perform interfacing and data base functions; a geometric processor to define the instantaneous scene image; a display generator to convert the image to a video signal; an illumination control unit which provides final image processing; and a CRT monitor for display of the completed image. Additional optional enhancements include texture generators, increased edge and occultation capability, curved surface shading, and data base extensions.
Distributed Computing Architecture for Image-Based Wavefront Sensing and 2 D FFTs
NASA Technical Reports Server (NTRS)
Smith, Jeffrey S.; Dean, Bruce H.; Haghani, Shadan
2006-01-01
Image-based wavefront sensing (WFS) provides significant advantages over interferometric-based wavefi-ont sensors such as optical design simplicity and stability. However, the image-based approach is computational intensive, and therefore, specialized high-performance computing architectures are required in applications utilizing the image-based approach. The development and testing of these high-performance computing architectures are essential to such missions as James Webb Space Telescope (JWST), Terrestial Planet Finder-Coronagraph (TPF-C and CorSpec), and Spherical Primary Optical Telescope (SPOT). The development of these specialized computing architectures require numerous two-dimensional Fourier Transforms, which necessitate an all-to-all communication when applied on a distributed computational architecture. Several solutions for distributed computing are presented with an emphasis on a 64 Node cluster of DSPs, multiple DSP FPGAs, and an application of low-diameter graph theory. Timing results and performance analysis will be presented. The solutions offered could be applied to other all-to-all communication and scientifically computationally complex problems.
Advanced Launch System Multi-Path Redundant Avionics Architecture Analysis and Characterization
NASA Technical Reports Server (NTRS)
Baker, Robert L.
1993-01-01
The objective of the Multi-Path Redundant Avionics Suite (MPRAS) program is the development of a set of avionic architectural modules which will be applicable to the family of launch vehicles required to support the Advanced Launch System (ALS). To enable ALS cost/performance requirements to be met, the MPRAS must support autonomy, maintenance, and testability capabilities which exceed those present in conventional launch vehicles. The multi-path redundant or fault tolerance characteristics of the MPRAS are necessary to offset a reduction in avionics reliability due to the increased complexity needed to support these new cost reduction and performance capabilities and to meet avionics reliability requirements which will provide cost-effective reductions in overall ALS recurring costs. A complex, real-time distributed computing system is needed to meet the ALS avionics system requirements. General Dynamics, Boeing Aerospace, and C.S. Draper Laboratory have proposed system architectures as candidates for the ALS MPRAS. The purpose of this document is to report the results of independent performance and reliability characterization and assessment analyses of each proposed candidate architecture and qualitative assessments of testability, maintainability, and fault tolerance mechanisms. These independent analyses were conducted as part of the MPRAS Part 2 program and were carried under NASA Langley Research Contract NAS1-17964, Task Assignment 28.
Fully parallel write/read in resistive synaptic array for accelerating on-chip learning
NASA Astrophysics Data System (ADS)
Gao, Ligang; Wang, I.-Ting; Chen, Pai-Yu; Vrudhula, Sarma; Seo, Jae-sun; Cao, Yu; Hou, Tuo-Hung; Yu, Shimeng
2015-11-01
A neuro-inspired computing paradigm beyond the von Neumann architecture is emerging and it generally takes advantage of massive parallelism and is aimed at complex tasks that involve intelligence and learning. The cross-point array architecture with synaptic devices has been proposed for on-chip implementation of the weighted sum and weight update in the learning algorithms. In this work, forming-free, silicon-process-compatible Ta/TaO x /TiO2/Ti synaptic devices are fabricated, in which >200 levels of conductance states could be continuously tuned by identical programming pulses. In order to demonstrate the advantages of parallelism of the cross-point array architecture, a novel fully parallel write scheme is designed and experimentally demonstrated in a small-scale crossbar array to accelerate the weight update in the training process, at a speed that is independent of the array size. Compared to the conventional row-by-row write scheme, it achieves >30× speed-up and >30× improvement in energy efficiency as projected in a large-scale array. If realistic synaptic device characteristics such as device variations are taken into an array-level simulation, the proposed array architecture is able to achieve ∼95% recognition accuracy of MNIST handwritten digits, which is close to the accuracy achieved by software using the ideal sparse coding algorithm.
Analysis OpenMP performance of AMD and Intel architecture for breaking waves simulation using MPS
NASA Astrophysics Data System (ADS)
Alamsyah, M. N. A.; Utomo, A.; Gunawan, P. H.
2018-03-01
Simulation of breaking waves by using Navier-Stokes equation via moving particle semi-implicit method (MPS) over close domain is given. The results show the parallel computing on multicore architecture using OpenMP platform can reduce the computational time almost half of the serial time. Here, the comparison using two computer architectures (AMD and Intel) are performed. The results using Intel architecture is shown better than AMD architecture in CPU time. However, in efficiency, the computer with AMD architecture gives slightly higher than the Intel. For the simulation by 1512 number of particles, the CPU time using Intel and AMD are 12662.47 and 28282.30 respectively. Moreover, the efficiency using similar number of particles, AMD obtains 50.09 % and Intel up to 49.42 %.
LINCS: Livermore's network architecture. [Octopus computing network
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fletcher, J.G.
1982-01-01
Octopus, a local computing network that has been evolving at the Lawrence Livermore National Laboratory for over fifteen years, is currently undergoing a major revision. The primary purpose of the revision is to consolidate and redefine the variety of conventions and formats, which have grown up over the years, into a single standard family of protocols, the Livermore Interactive Network Communication Standard (LINCS). This standard treats the entire network as a single distributed operating system such that access to a computing resource is obtained in a single way, whether that resource is local (on the same computer as the accessingmore » process) or remote (on another computer). LINCS encompasses not only communication but also such issues as the relationship of customer to server processes and the structure, naming, and protection of resources. The discussion includes: an overview of the Livermore user community and computing hardware, the functions and structure of each of the seven layers of LINCS protocol, the reasons why we have designed our own protocols and why we are dissatisfied by the directions that current protocol standards are taking.« less
Programmable computing with a single magnetoresistive element
NASA Astrophysics Data System (ADS)
Ney, A.; Pampuch, C.; Koch, R.; Ploog, K. H.
2003-10-01
The development of transistor-based integrated circuits for modern computing is a story of great success. However, the proved concept for enhancing computational power by continuous miniaturization is approaching its fundamental limits. Alternative approaches consider logic elements that are reconfigurable at run-time to overcome the rigid architecture of the present hardware systems. Implementation of parallel algorithms on such `chameleon' processors has the potential to yield a dramatic increase of computational speed, competitive with that of supercomputers. Owing to their functional flexibility, `chameleon' processors can be readily optimized with respect to any computer application. In conventional microprocessors, information must be transferred to a memory to prevent it from getting lost, because electrically processed information is volatile. Therefore the computational performance can be improved if the logic gate is additionally capable of storing the output. Here we describe a simple hardware concept for a programmable logic element that is based on a single magnetic random access memory (MRAM) cell. It combines the inherent advantage of a non-volatile output with flexible functionality which can be selected at run-time to operate as an AND, OR, NAND or NOR gate.
A synchronized computational architecture for generalized bilateral control of robot arms
NASA Technical Reports Server (NTRS)
Bejczy, Antal K.; Szakaly, Zoltan
1987-01-01
This paper describes a computational architecture for an interconnected high speed distributed computing system for generalized bilateral control of robot arms. The key method of the architecture is the use of fully synchronized, interrupt driven software. Since an objective of the development is to utilize the processing resources efficiently, the synchronization is done in the hardware level to reduce system software overhead. The architecture also achieves a balaced load on the communication channel. The paper also describes some architectural relations to trading or sharing manual and automatic control.
Performance Analysis of Cloud Computing Architectures Using Discrete Event Simulation
NASA Technical Reports Server (NTRS)
Stocker, John C.; Golomb, Andrew M.
2011-01-01
Cloud computing offers the economic benefit of on-demand resource allocation to meet changing enterprise computing needs. However, the flexibility of cloud computing is disadvantaged when compared to traditional hosting in providing predictable application and service performance. Cloud computing relies on resource scheduling in a virtualized network-centric server environment, which makes static performance analysis infeasible. We developed a discrete event simulation model to evaluate the overall effectiveness of organizations in executing their workflow in traditional and cloud computing architectures. The two part model framework characterizes both the demand using a probability distribution for each type of service request as well as enterprise computing resource constraints. Our simulations provide quantitative analysis to design and provision computing architectures that maximize overall mission effectiveness. We share our analysis of key resource constraints in cloud computing architectures and findings on the appropriateness of cloud computing in various applications.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barhen, Jacob; Imam, Neena
2007-01-01
Revolutionary computing technologies are defined in terms of technological breakthroughs, which leapfrog over near-term projected advances in conventional hardware and software to produce paradigm shifts in computational science. For underwater threat source localization using information provided by a dynamical sensor network, one of the most promising computational advances builds upon the emergence of digital optical-core devices. In this article, we present initial results of sensor network calculations that focus on the concept of signal wavefront time-difference-of-arrival (TDOA). The corresponding algorithms are implemented on the EnLight processing platform recently introduced by Lenslet Laboratories. This tera-scale digital optical core processor is optimizedmore » for array operations, which it performs in a fixed-point-arithmetic architecture. Our results (i) illustrate the ability to reach the required accuracy in the TDOA computation, and (ii) demonstrate that a considerable speed-up can be achieved when using the EnLight 64a prototype processor as compared to a dual Intel XeonTM processor.« less
Hardware Acceleration of Adaptive Neural Algorithms.
DOE Office of Scientific and Technical Information (OSTI.GOV)
James, Conrad D.
As tradit ional numerical computing has faced challenges, researchers have turned towards alternative computing approaches to reduce power - per - computation metrics and improve algorithm performance. Here, we describe an approach towards non - conventional computing that strengthens the connection between machine learning and neuroscience concepts. The Hardware Acceleration of Adaptive Neural Algorithms (HAANA) project ha s develop ed neural machine learning algorithms and hardware for applications in image processing and cybersecurity. While machine learning methods are effective at extracting relevant features from many types of data, the effectiveness of these algorithms degrades when subjected to real - worldmore » conditions. Our team has generated novel neural - inspired approa ches to improve the resiliency and adaptability of machine learning algorithms. In addition, we have also designed and fabricated hardware architectures and microelectronic devices specifically tuned towards the training and inference operations of neural - inspired algorithms. Finally, our multi - scale simulation framework allows us to assess the impact of microelectronic device properties on algorithm performance.« less
Miyamoto, Kenji; Kuwano, Shigeru; Terada, Jun; Otaka, Akihiro
2016-01-25
We analyze the mobile fronthaul (MFH) bandwidth and the wireless transmission performance in the split-PHY processing (SPP) architecture, which redefines the functional split of centralized/cloud RAN (C-RAN) while preserving high wireless coordinated multi-point (CoMP) transmission/reception performance. The SPP architecture splits the base stations (BS) functions between wireless channel coding/decoding and wireless modulation/demodulation, and employs its own CoMP joint transmission and reception schemes. Simulation results show that the SPP architecture reduces the MFH bandwidth by up to 97% from conventional C-RAN while matching the wireless bit error rate (BER) performance of conventional C-RAN in uplink joint reception with only 2-dB signal to noise ratio (SNR) penalty.
Architectural Specialization for Inter-Iteration Loop Dependence Patterns
2015-10-01
Architectural Specialization for Inter-Iteration Loop Dependence Patterns Christopher Batten Computer Systems Laboratory School of Electrical and...Trends in Computer Architecture Transistors (Thousands) Frequency (MHz) Typical Power (W) MIPS R2K Intel P4 DEC Alpha 21264 Data collected by M...T as ks p er Jo ule ) Simple Processor Design Power Constraint High-Performance Architectures Embedded Architectures Design Performance
A 60 GOPS/W, -1.8 V to 0.9 V body bias ULP cluster in 28 nm UTBB FD-SOI technology
NASA Astrophysics Data System (ADS)
Rossi, Davide; Pullini, Antonio; Loi, Igor; Gautschi, Michael; Gürkaynak, Frank K.; Bartolini, Andrea; Flatresse, Philippe; Benini, Luca
2016-03-01
Ultra-low power operation and extreme energy efficiency are strong requirements for a number of high-growth application areas, such as E-health, Internet of Things, and wearable Human-Computer Interfaces. A promising approach to achieve up to one order of magnitude of improvement in energy efficiency over current generation of integrated circuits is near-threshold computing. However, frequency degradation due to aggressive voltage scaling may not be acceptable across all performance-constrained applications. Thread-level parallelism over multiple cores can be used to overcome the performance degradation at low voltage. Moreover, enabling the processors to operate on-demand and over a wide supply voltage and body bias ranges allows to achieve the best possible energy efficiency while satisfying a large spectrum of computational demands. In this work we present the first ever implementation of a 4-core cluster fabricated using conventional-well 28 nm UTBB FD-SOI technology. The multi-core architecture we present in this work is able to operate on a wide range of supply voltages starting from 0.44 V to 1.2 V. In addition, the architecture allows a wide range of body bias to be applied from -1.8 V to 0.9 V. The peak energy efficiency 60 GOPS/W is achieved at 0.5 V supply voltage and 0.5 V forward body bias. Thanks to the extended body bias range of conventional-well FD-SOI technology, high energy efficiency can be guaranteed for a wide range of process and environmental conditions. We demonstrate the ability to compensate for up to 99.7% of chips for process variation with only ±0.2 V of body biasing, and compensate temperature variation in the range -40 °C to 120 °C exploiting -1.1 V to 0.8 V body biasing. When compared to leading-edge near-threshold RISC processors optimized for extremely low power applications, the multi-core architecture we propose has 144× more performance at comparable energy efficiency levels. Even when compared to other low-power processors with comparable performance, including those implemented in 28 nm technology, our platform provides 1.4× to 3.7× better energy efficiency.
High Rate Digital Demodulator ASIC
NASA Technical Reports Server (NTRS)
Ghuman, Parminder; Sheikh, Salman; Koubek, Steve; Hoy, Scott; Gray, Andrew
1998-01-01
The architecture of High Rate (600 Mega-bits per second) Digital Demodulator (HRDD) ASIC capable of demodulating BPSK and QPSK modulated data is presented in this paper. The advantages of all-digital processing include increased flexibility and reliability with reduced reproduction costs. Conventional serial digital processing would require high processing rates necessitating a hardware implementation in other than CMOS technology such as Gallium Arsenide (GaAs) which has high cost and power requirements. It is more desirable to use CMOS technology with its lower power requirements and higher gate density. However, digital demodulation of high data rates in CMOS requires parallel algorithms to process the sampled data at a rate lower than the data rate. The parallel processing algorithms described here were developed jointly by NASA's Goddard Space Flight Center (GSFC) and the Jet Propulsion Laboratory (JPL). The resulting all-digital receiver has the capability to demodulate BPSK, QPSK, OQPSK, and DQPSK at data rates in excess of 300 Mega-bits per second (Mbps) per channel. This paper will provide an overview of the parallel architecture and features of the HRDR ASIC. In addition, this paper will provide an over-view of the implementation of the hardware architectures used to create flexibility over conventional high rate analog or hybrid receivers. This flexibility includes a wide range of data rates, modulation schemes, and operating environments. In conclusion it will be shown how this high rate digital demodulator can be used with an off-the-shelf A/D and a flexible analog front end, both of which are numerically computer controlled, to produce a very flexible, low cost high rate digital receiver.
NASA Astrophysics Data System (ADS)
Shen, Yanfeng; Cesnik, Carlos E. S.
2016-04-01
This paper presents a parallelized modeling technique for the efficient simulation of nonlinear ultrasonics introduced by the wave interaction with fatigue cracks. The elastodynamic wave equations with contact effects are formulated using an explicit Local Interaction Simulation Approach (LISA). The LISA formulation is extended to capture the contact-impact phenomena during the wave damage interaction based on the penalty method. A Coulomb friction model is integrated into the computation procedure to capture the stick-slip contact shear motion. The LISA procedure is coded using the Compute Unified Device Architecture (CUDA), which enables the highly parallelized supercomputing on powerful graphic cards. Both the explicit contact formulation and the parallel feature facilitates LISA's superb computational efficiency over the conventional finite element method (FEM). The theoretical formulations based on the penalty method is introduced and a guideline for the proper choice of the contact stiffness is given. The convergence behavior of the solution under various contact stiffness values is examined. A numerical benchmark problem is used to investigate the new LISA formulation and results are compared with a conventional contact finite element solution. Various nonlinear ultrasonic phenomena are successfully captured using this contact LISA formulation, including the generation of nonlinear higher harmonic responses. Nonlinear mode conversion of guided waves at fatigue cracks is also studied.
Narasimhan, S; Chiel, H J; Bhunia, S
2011-04-01
Implantable microsystems for monitoring or manipulating brain activity typically require on-chip real-time processing of multichannel neural data using ultra low-power, miniaturized electronics. In this paper, we propose an integrated-circuit/architecture-level hardware design framework for neural signal processing that exploits the nature of the signal-processing algorithm. First, we consider different power reduction techniques and compare the energy efficiency between the ultra-low frequency subthreshold and conventional superthreshold design. We show that the superthreshold design operating at a much higher frequency can achieve comparable energy dissipation by taking advantage of extensive power gating. It also provides significantly higher robustness of operation and yield under large process variations. Next, we propose an architecture level preferential design approach for further energy reduction by isolating the critical computation blocks (with respect to the quality of the output signal) and assigning them higher delay margins compared to the noncritical ones. Possible delay failures under parameter variations are confined to the noncritical components, allowing graceful degradation in quality under voltage scaling. Simulation results using prerecorded neural data from the sea-slug (Aplysia californica) show that the application of the proposed design approach can lead to significant improvement in total energy, without compromising the output signal quality under process variations, compared to conventional design approaches.
Manyscale Computing for Sensor Processing in Support of Space Situational Awareness
NASA Astrophysics Data System (ADS)
Schmalz, M.; Chapman, W.; Hayden, E.; Sahni, S.; Ranka, S.
2014-09-01
Increasing image and signal data burden associated with sensor data processing in support of space situational awareness implies continuing computational throughput growth beyond the petascale regime. In addition to growing applications data burden and diversity, the breadth, diversity and scalability of high performance computing architectures and their various organizations challenge the development of a single, unifying, practicable model of parallel computation. Therefore, models for scalable parallel processing have exploited architectural and structural idiosyncrasies, yielding potential misapplications when legacy programs are ported among such architectures. In response to this challenge, we have developed a concise, efficient computational paradigm and software called Manyscale Computing to facilitate efficient mapping of annotated application codes to heterogeneous parallel architectures. Our theory, algorithms, software, and experimental results support partitioning and scheduling of application codes for envisioned parallel architectures, in terms of work atoms that are mapped (for example) to threads or thread blocks on computational hardware. Because of the rigor, completeness, conciseness, and layered design of our manyscale approach, application-to-architecture mapping is feasible and scalable for architectures at petascales, exascales, and above. Further, our methodology is simple, relying primarily on a small set of primitive mapping operations and support routines that are readily implemented on modern parallel processors such as graphics processing units (GPUs) and hybrid multi-processors (HMPs). In this paper, we overview the opportunities and challenges of manyscale computing for image and signal processing in support of space situational awareness applications. We discuss applications in terms of a layered hardware architecture (laboratory > supercomputer > rack > processor > component hierarchy). Demonstration applications include performance analysis and results in terms of execution time as well as storage, power, and energy consumption for bus-connected and/or networked architectures. The feasibility of the manyscale paradigm is demonstrated by addressing four principal challenges: (1) architectural/structural diversity, parallelism, and locality, (2) masking of I/O and memory latencies, (3) scalability of design as well as implementation, and (4) efficient representation/expression of parallel applications. Examples will demonstrate how manyscale computing helps solve these challenges efficiently on real-world computing systems.
NASA Astrophysics Data System (ADS)
Jiang, Yuning; Kang, Jinfeng; Wang, Xinan
2017-03-01
Resistive switching memory (RRAM) is considered as one of the most promising devices for parallel computing solutions that may overcome the von Neumann bottleneck of today’s electronic systems. However, the existing RRAM-based parallel computing architectures suffer from practical problems such as device variations and extra computing circuits. In this work, we propose a novel parallel computing architecture for pattern recognition by implementing k-nearest neighbor classification on metal-oxide RRAM crossbar arrays. Metal-oxide RRAM with gradual RESET behaviors is chosen as both the storage and computing components. The proposed architecture is tested by the MNIST database. High speed (~100 ns per example) and high recognition accuracy (97.05%) are obtained. The influence of several non-ideal device properties is also discussed, and it turns out that the proposed architecture shows great tolerance to device variations. This work paves a new way to achieve RRAM-based parallel computing hardware systems with high performance.
Cognitive Architectures and Human-Computer Interaction. Introduction to Special Issue.
ERIC Educational Resources Information Center
Gray, Wayne D.; Young, Richard M.; Kirschenbaum, Susan S.
1997-01-01
In this introduction to a special issue on cognitive architectures and human-computer interaction (HCI), editors and contributors provide a brief overview of cognitive architectures. The following four architectures represented by articles in this issue are: Soar; LICAI (linked model of comprehension-based action planning and instruction taking);…
Biomimetic design processes in architecture: morphogenetic and evolutionary computational design.
Menges, Achim
2012-03-01
Design computation has profound impact on architectural design methods. This paper explains how computational design enables the development of biomimetic design processes specific to architecture, and how they need to be significantly different from established biomimetic processes in engineering disciplines. The paper first explains the fundamental difference between computer-aided and computational design in architecture, as the understanding of this distinction is of critical importance for the research presented. Thereafter, the conceptual relation and possible transfer of principles from natural morphogenesis to design computation are introduced and the related developments of generative, feature-based, constraint-based, process-based and feedback-based computational design methods are presented. This morphogenetic design research is then related to exploratory evolutionary computation, followed by the presentation of two case studies focusing on the exemplary development of spatial envelope morphologies and urban block morphologies.
The flight telerobotic servicer: From functional architecture to computer architecture
NASA Technical Reports Server (NTRS)
Lumia, Ronald; Fiala, John
1989-01-01
After a brief tutorial on the NASA/National Bureau of Standards Standard Reference Model for Telerobot Control System Architecture (NASREM) functional architecture, the approach to its implementation is shown. First, interfaces must be defined which are capable of supporting the known algorithms. This is illustrated by considering the interfaces required for the SERVO level of the NASREM functional architecture. After interface definition, the specific computer architecture for the implementation must be determined. This choice is obviously technology dependent. An example illustrating one possible mapping of the NASREM functional architecture to a particular set of computers which implements it is shown. The result of choosing the NASREM functional architecture is that it provides a technology independent paradigm which can be mapped into a technology dependent implementation capable of evolving with technology in the laboratory and in space.
Wavy Architecture Thin-Film Transistor for Ultrahigh Resolution Flexible Displays.
Hanna, Amir Nabil; Kutbee, Arwa Talal; Subedi, Ram Chandra; Ooi, Boon; Hussain, Muhammad Mustafa
2018-01-01
A novel wavy-shaped thin-film-transistor (TFT) architecture, capable of achieving 70% higher drive current per unit chip area when compared with planar conventional TFT architectures, is reported for flexible display application. The transistor, due to its atypical architecture, does not alter the turn-on voltage or the OFF current values, leading to higher performance without compromising static power consumption. The concept behind this architecture is expanding the transistor's width vertically through grooved trenches in a structural layer deposited on a flexible substrate. Operation of zinc oxide (ZnO)-based TFTs is shown down to a bending radius of 5 mm with no degradation in the electrical performance or cracks in the gate stack. Finally, flexible low-power LEDs driven by the respective currents of the novel wavy, and conventional coplanar architectures are demonstrated, where the novel architecture is able to drive the LED at 2 × the output power, 3 versus 1.5 mW, which demonstrates the potential use for ultrahigh resolution displays in an area efficient manner. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Innovative architectures for dense multi-microprocessor computers
NASA Technical Reports Server (NTRS)
Donaldson, Thomas; Doty, Karl; Engle, Steven W.; Larson, Robert E.; O'Reilly, John G.
1988-01-01
The results of a Phase I Small Business Innovative Research (SBIR) project performed for the NASA Langley Computational Structural Mechanics Group are described. The project resulted in the identification of a family of chordal-ring interconnection architectures with excellent potential to serve as the basis for new multimicroprocessor (MMP) computers. The paper presents examples of how computational algorithms from structural mechanics can be efficiently implemented on the chordal-ring architecture.
NASA Technical Reports Server (NTRS)
Simon, Donald L.
2010-01-01
Aircraft engine performance trend monitoring and gas path fault diagnostics are closely related technologies that assist operators in managing the health of their gas turbine engine assets. Trend monitoring is the process of monitoring the gradual performance change that an aircraft engine will naturally incur over time due to turbomachinery deterioration, while gas path diagnostics is the process of detecting and isolating the occurrence of any faults impacting engine flow-path performance. Today, performance trend monitoring and gas path fault diagnostic functions are performed by a combination of on-board and off-board strategies. On-board engine control computers contain logic that monitors for anomalous engine operation in real-time. Off-board ground stations are used to conduct fleet-wide engine trend monitoring and fault diagnostics based on data collected from each engine each flight. Continuing advances in avionics are enabling the migration of portions of the ground-based functionality on-board, giving rise to more sophisticated on-board engine health management capabilities. This paper reviews the conventional engine performance trend monitoring and gas path fault diagnostic architecture commonly applied today, and presents a proposed enhanced on-board architecture for future applications. The enhanced architecture gains real-time access to an expanded quantity of engine parameters, and provides advanced on-board model-based estimation capabilities. The benefits of the enhanced architecture include the real-time continuous monitoring of engine health, the early diagnosis of fault conditions, and the estimation of unmeasured engine performance parameters. A future vision to advance the enhanced architecture is also presented and discussed
A computer architecture for intelligent machines
NASA Technical Reports Server (NTRS)
Lefebvre, D. R.; Saridis, G. N.
1992-01-01
The theory of intelligent machines proposes a hierarchical organization for the functions of an autonomous robot based on the principle of increasing precision with decreasing intelligence. An analytic formulation of this theory using information-theoretic measures of uncertainty for each level of the intelligent machine has been developed. The authors present a computer architecture that implements the lower two levels of the intelligent machine. The architecture supports an event-driven programming paradigm that is independent of the underlying computer architecture and operating system. Execution-level controllers for motion and vision systems are briefly addressed, as well as the Petri net transducer software used to implement coordination-level functions. A case study illustrates how this computer architecture integrates real-time and higher-level control of manipulator and vision systems.
An Object Oriented Extensible Architecture for Affordable Aerospace Propulsion Systems
NASA Technical Reports Server (NTRS)
Follen, Gregory J.; Lytle, John K. (Technical Monitor)
2002-01-01
Driven by a need to explore and develop propulsion systems that exceeded current computing capabilities, NASA Glenn embarked on a novel strategy leading to the development of an architecture that enables propulsion simulations never thought possible before. Full engine 3 Dimensional Computational Fluid Dynamic propulsion system simulations were deemed impossible due to the impracticality of the hardware and software computing systems required. However, with a software paradigm shift and an embracing of parallel and distributed processing, an architecture was designed to meet the needs of future propulsion system modeling. The author suggests that the architecture designed at the NASA Glenn Research Center for propulsion system modeling has potential for impacting the direction of development of affordable weapons systems currently under consideration by the Applied Vehicle Technology Panel (AVT). This paper discusses the salient features of the NPSS Architecture including its interface layer, object layer, implementation for accessing legacy codes, numerical zooming infrastructure and its computing layer. The computing layer focuses on the use and deployment of these propulsion simulations on parallel and distributed computing platforms which has been the focus of NASA Ames. Additional features of the object oriented architecture that support MultiDisciplinary (MD) Coupling, computer aided design (CAD) access and MD coupling objects will be discussed. Included will be a discussion of the successes, challenges and benefits of implementing this architecture.
Distributed computing environments for future space control systems
NASA Technical Reports Server (NTRS)
Viallefont, Pierre
1993-01-01
The aim of this paper is to present the results of a CNES research project on distributed computing systems. The purpose of this research was to study the impact of the use of new computer technologies in the design and development of future space applications. The first part of this study was a state-of-the-art review of distributed computing systems. One of the interesting ideas arising from this review is the concept of a 'virtual computer' allowing the distributed hardware architecture to be hidden from a software application. The 'virtual computer' can improve system performance by adapting the best architecture (addition of computers) to the software application without having to modify its source code. This concept can also decrease the cost and obsolescence of the hardware architecture. In order to verify the feasibility of the 'virtual computer' concept, a prototype representative of a distributed space application is being developed independently of the hardware architecture.
Electro-Optic Computing Architectures. Volume I
1998-02-01
The objective of the Electro - Optic Computing Architecture (EOCA) program was to develop multi-function electro - optic interfaces and optical...interconnect units to enhance the performance of parallel processor systems and form the building blocks for future electro - optic computing architectures...Specifically, three multi-function interface modules were targeted for development - an Electro - Optic Interface (EOI), an Optical Interconnection Unit (OW
The Reed-Solomon encoders: Conventional versus Berlekamp's architecture
NASA Technical Reports Server (NTRS)
Perlman, M.; Lee, J. J.
1982-01-01
Concatenated coding was adopted for interplanetary space missions. Concatenated coding was employed with a convolutional inner code and a Reed-Solomon (RS) outer code for spacecraft telemetry. Conventional RS encoders are compared with those that incorporate two architectural features which approximately halve the number of multiplications of a set of fixed arguments by any RS codeword symbol. The fixed arguments and the RS symbols are taken from a nonbinary finite field. Each set of multiplications is bit-serially performed and completed during one (bit-serial) symbol shift. All firmware employed by conventional RS encoders is eliminated.
Image-Processing Software For A Hypercube Computer
NASA Technical Reports Server (NTRS)
Lee, Meemong; Mazer, Alan S.; Groom, Steven L.; Williams, Winifred I.
1992-01-01
Concurrent Image Processing Executive (CIPE) is software system intended to develop and use image-processing application programs on concurrent computing environment. Designed to shield programmer from complexities of concurrent-system architecture, it provides interactive image-processing environment for end user. CIPE utilizes architectural characteristics of particular concurrent system to maximize efficiency while preserving architectural independence from user and programmer. CIPE runs on Mark-IIIfp 8-node hypercube computer and associated SUN-4 host computer.
Experimental Comparison of Two Quantum Computing Architectures
2017-03-28
IN A U G U RA L A RT IC LE CO M PU TE R SC IE N CE S Experimental comparison of two quantum computing architectures Norbert M. Linkea,b,1, Dmitri...the vast computing power a universal quantumcomputer could offer, several candidate systems are being explored. They have allowed experimental ...existing systems and the role of architecture in quantum computer design . These will be crucial for the realization of more advanced future incarna
A GPU-based calculation using the three-dimensional FDTD method for electromagnetic field analysis.
Nagaoka, Tomoaki; Watanabe, Soichi
2010-01-01
Numerical simulations with the numerical human model using the finite-difference time domain (FDTD) method have recently been performed frequently in a number of fields in biomedical engineering. However, the FDTD calculation runs too slowly. We focus, therefore, on general purpose programming on the graphics processing unit (GPGPU). The three-dimensional FDTD method was implemented on the GPU using Compute Unified Device Architecture (CUDA). In this study, we used the NVIDIA Tesla C1060 as a GPGPU board. The performance of the GPU is evaluated in comparison with the performance of a conventional CPU and a vector supercomputer. The results indicate that three-dimensional FDTD calculations using a GPU can significantly reduce run time in comparison with that using a conventional CPU, even a native GPU implementation of the three-dimensional FDTD method, while the GPU/CPU speed ratio varies with the calculation domain and thread block size.
Efficient Online Learning Algorithms Based on LSTM Neural Networks.
Ergen, Tolga; Kozat, Suleyman Serdar
2017-09-13
We investigate online nonlinear regression and introduce novel regression structures based on the long short term memory (LSTM) networks. For the introduced structures, we also provide highly efficient and effective online training methods. To train these novel LSTM-based structures, we put the underlying architecture in a state space form and introduce highly efficient and effective particle filtering (PF)-based updates. We also provide stochastic gradient descent and extended Kalman filter-based updates. Our PF-based training method guarantees convergence to the optimal parameter estimation in the mean square error sense provided that we have a sufficient number of particles and satisfy certain technical conditions. More importantly, we achieve this performance with a computational complexity in the order of the first-order gradient-based methods by controlling the number of particles. Since our approach is generic, we also introduce a gated recurrent unit (GRU)-based approach by directly replacing the LSTM architecture with the GRU architecture, where we demonstrate the superiority of our LSTM-based approach in the sequential prediction task via different real life data sets. In addition, the experimental results illustrate significant performance improvements achieved by the introduced algorithms with respect to the conventional methods over several different benchmark real life data sets.
A learnable parallel processing architecture towards unity of memory and computing
NASA Astrophysics Data System (ADS)
Li, H.; Gao, B.; Chen, Z.; Zhao, Y.; Huang, P.; Ye, H.; Liu, L.; Liu, X.; Kang, J.
2015-08-01
Developing energy-efficient parallel information processing systems beyond von Neumann architecture is a long-standing goal of modern information technologies. The widely used von Neumann computer architecture separates memory and computing units, which leads to energy-hungry data movement when computers work. In order to meet the need of efficient information processing for the data-driven applications such as big data and Internet of Things, an energy-efficient processing architecture beyond von Neumann is critical for the information society. Here we show a non-von Neumann architecture built of resistive switching (RS) devices named “iMemComp”, where memory and logic are unified with single-type devices. Leveraging nonvolatile nature and structural parallelism of crossbar RS arrays, we have equipped “iMemComp” with capabilities of computing in parallel and learning user-defined logic functions for large-scale information processing tasks. Such architecture eliminates the energy-hungry data movement in von Neumann computers. Compared with contemporary silicon technology, adder circuits based on “iMemComp” can improve the speed by 76.8% and the power dissipation by 60.3%, together with a 700 times aggressive reduction in the circuit area.
A learnable parallel processing architecture towards unity of memory and computing.
Li, H; Gao, B; Chen, Z; Zhao, Y; Huang, P; Ye, H; Liu, L; Liu, X; Kang, J
2015-08-14
Developing energy-efficient parallel information processing systems beyond von Neumann architecture is a long-standing goal of modern information technologies. The widely used von Neumann computer architecture separates memory and computing units, which leads to energy-hungry data movement when computers work. In order to meet the need of efficient information processing for the data-driven applications such as big data and Internet of Things, an energy-efficient processing architecture beyond von Neumann is critical for the information society. Here we show a non-von Neumann architecture built of resistive switching (RS) devices named "iMemComp", where memory and logic are unified with single-type devices. Leveraging nonvolatile nature and structural parallelism of crossbar RS arrays, we have equipped "iMemComp" with capabilities of computing in parallel and learning user-defined logic functions for large-scale information processing tasks. Such architecture eliminates the energy-hungry data movement in von Neumann computers. Compared with contemporary silicon technology, adder circuits based on "iMemComp" can improve the speed by 76.8% and the power dissipation by 60.3%, together with a 700 times aggressive reduction in the circuit area.
Digital optical computers at the optoelectronic computing systems center
NASA Technical Reports Server (NTRS)
Jordan, Harry F.
1991-01-01
The Digital Optical Computing Program within the National Science Foundation Engineering Research Center for Opto-electronic Computing Systems has as its specific goal research on optical computing architectures suitable for use at the highest possible speeds. The program can be targeted toward exploiting the time domain because other programs in the Center are pursuing research on parallel optical systems, exploiting optical interconnection and optical devices and materials. Using a general purpose computing architecture as the focus, we are developing design techniques, tools and architecture for operation at the speed of light limit. Experimental work is being done with the somewhat low speed components currently available but with architectures which will scale up in speed as faster devices are developed. The design algorithms and tools developed for a general purpose, stored program computer are being applied to other systems such as optimally controlled optical communication networks.
Resource Efficient Hardware Architecture for Fast Computation of Running Max/Min Filters
Torres-Huitzil, Cesar
2013-01-01
Running max/min filters on rectangular kernels are widely used in many digital signal and image processing applications. Filtering with a k × k kernel requires of k 2 − 1 comparisons per sample for a direct implementation; thus, performance scales expensively with the kernel size k. Faster computations can be achieved by kernel decomposition and using constant time one-dimensional algorithms on custom hardware. This paper presents a hardware architecture for real-time computation of running max/min filters based on the van Herk/Gil-Werman (HGW) algorithm. The proposed architecture design uses less computation and memory resources than previously reported architectures when targeted to Field Programmable Gate Array (FPGA) devices. Implementation results show that the architecture is able to compute max/min filters, on 1024 × 1024 images with up to 255 × 255 kernels, in around 8.4 milliseconds, 120 frames per second, at a clock frequency of 250 MHz. The implementation is highly scalable for the kernel size with good performance/area tradeoff suitable for embedded applications. The applicability of the architecture is shown for local adaptive image thresholding. PMID:24288456
Novel wavelength diversity technique for high-speed atmospheric turbulence compensation
NASA Astrophysics Data System (ADS)
Arrasmith, William W.; Sullivan, Sean F.
2010-04-01
The defense, intelligence, and homeland security communities are driving a need for software dominant, real-time or near-real time atmospheric turbulence compensated imagery. The development of parallel processing capabilities are finding application in diverse areas including image processing, target tracking, pattern recognition, and image fusion to name a few. A novel approach to the computationally intensive case of software dominant optical and near infrared imaging through atmospheric turbulence is addressed in this paper. Previously, the somewhat conventional wavelength diversity method has been used to compensate for atmospheric turbulence with great success. We apply a new correlation based approach to the wavelength diversity methodology using a parallel processing architecture enabling high speed atmospheric turbulence compensation. Methods for optical imaging through distributed turbulence are discussed, simulation results are presented, and computational and performance assessments are provided.
Autonomous control systems - Architecture and fundamental issues
NASA Technical Reports Server (NTRS)
Antsaklis, P. J.; Passino, K. M.; Wang, S. J.
1988-01-01
A hierarchical functional autonomous controller architecture is introduced. In particular, the architecture for the control of future space vehicles is described in detail; it is designed to ensure the autonomous operation of the control system and it allows interaction with the pilot and crew/ground station, and the systems on board the autonomous vehicle. The fundamental issues in autonomous control system modeling and analysis are discussed. It is proposed to utilize a hybrid approach to modeling and analysis of autonomous systems. This will incorporate conventional control methods based on differential equations and techniques for the analysis of systems described with a symbolic formalism. In this way, the theory of conventional control can be fully utilized. It is stressed that autonomy is the design requirement and intelligent control methods appear at present, to offer some of the necessary tools to achieve autonomy. A conventional approach may evolve and replace some or all of the `intelligent' functions. It is shown that in addition to conventional controllers, the autonomous control system incorporates planning, learning, and FDI (fault detection and identification).
A unified approach to the design of clinical reporting systems.
Gouveia-Oliveira, A; Salgado, N C; Azevedo, A P; Lopes, L; Raposo, V D; Almeida, I; de Melo, F G
1994-12-01
Computer-based Clinical Reporting Systems (CRS) for diagnostic departments that use structured data entry have a number of functional and structural affinities suggesting that a common software architecture for CRS may be defined. Such an architecture should allow easy expandability and reusability of a CRS. We report the development methodology and the architecture of SISCOPE, a CRS originally designed for gastrointestinal endoscopy that is expandable and reusable. Its main components are a patient database, a knowledge base, a reports base, and screen and reporting engines. The knowledge base contains the description of the controlled vocabulary and all the information necessary to control the menu system, and is easily accessed and modified with a conventional text editor. The structure of the controlled vocabulary is formally presented as an entity-relationship diagram. The screen engine drives a dynamic user interface and the reporting engine automatically creates a medical report; both engines operate by following a set of rules and the information contained in the knowledge base. Clinical experience has shown this architecture to be highly flexible and to allow frequent modifications of both the vocabulary and the menu system. This structure provided increased collaboration among development teams, insulating the domain expert from the details of the database, and enabling him to modify the system as necessary and to test the changes immediately. The system has also been reused in several different domains.
New device architecture of a thermoelectric energy conversion for recovering low-quality heat
NASA Astrophysics Data System (ADS)
Kim, Hoon; Park, Sung-Geun; Jung, Buyoung; Hwang, Junphil; Kim, Woochul
2014-03-01
Low-quality heat is generally discarded for economic reasons; a low-cost energy conversion device considering price per watt, /W, is required to recover this waste heat. Thin-film based thermoelectric devices could be a superior alternative for this purpose, based on their low material consumption; however, power generated in conventional thermoelectric device architecture is negligible due to the small temperature drop across the thin film. To overcome this challenge, we propose new device architecture, and demonstrate approximately 60 Kelvin temperature differences using a thick polymer nanocomposite. The temperature differences were achieved by separating the thermal path from the electrical path; whereas in conventional device architecture, both electrical charges and thermal energy share same path. We also applied this device to harvest body heat and confirmed its usability as an energy conversion device for recovering low-quality heat.
Electro-Optic Computing Architectures: Volume II. Components and System Design and Analysis
1998-02-01
The objective of the Electro - Optic Computing Architecture (EOCA) program was to develop multi-function electro - optic interfaces and optical...interconnect units to enhance the performance of parallel processor systems and form the building blocks for future electro - optic computing architectures...Specifically, three multi-function interface modules were targeted for development - an Electro - Optic Interface (EOI), an Optical Interconnection Unit
Exploring the Feasibility of a DNA Computer: Design of an ALU Using Sticker-Based DNA Model.
Sarkar, Mayukh; Ghosal, Prasun; Mohanty, Saraju P
2017-09-01
Since its inception, DNA computing has advanced to offer an extremely powerful, energy-efficient emerging technology for solving hard computational problems with its inherent massive parallelism and extremely high data density. This would be much more powerful and general purpose when combined with other existing well-known algorithmic solutions that exist for conventional computing architectures using a suitable ALU. Thus, a specifically designed DNA Arithmetic and Logic Unit (ALU) that can address operations suitable for both domains can mitigate the gap between these two. An ALU must be able to perform all possible logic operations, including NOT, OR, AND, XOR, NOR, NAND, and XNOR; compare, shift etc., integer and floating point arithmetic operations (addition, subtraction, multiplication, and division). In this paper, design of an ALU has been proposed using sticker-based DNA model with experimental feasibility analysis. Novelties of this paper may be in manifold. First, the integer arithmetic operations performed here are 2s complement arithmetic, and the floating point operations follow the IEEE 754 floating point format, resembling closely to a conventional ALU. Also, the output of each operation can be reused for any next operation. So any algorithm or program logic that users can think of can be implemented directly on the DNA computer without any modification. Second, once the basic operations of sticker model can be automated, the implementations proposed in this paper become highly suitable to design a fully automated ALU. Third, proposed approaches are easy to implement. Finally, these approaches can work on sufficiently large binary numbers.
NASA Astrophysics Data System (ADS)
Liu, Chen; Han, Runze; Zhou, Zheng; Huang, Peng; Liu, Lifeng; Liu, Xiaoyan; Kang, Jinfeng
2018-04-01
In this work we present a novel convolution computing architecture based on metal oxide resistive random access memory (RRAM) to process the image data stored in the RRAM arrays. The proposed image storage architecture shows performances of better speed-device consumption efficiency compared with the previous kernel storage architecture. Further we improve the architecture for a high accuracy and low power computing by utilizing the binary storage and the series resistor. For a 28 × 28 image and 10 kernels with a size of 3 × 3, compared with the previous kernel storage approach, the newly proposed architecture shows excellent performances including: 1) almost 100% accuracy within 20% LRS variation and 90% HRS variation; 2) more than 67 times speed boost; 3) 71.4% energy saving.
NASA Technical Reports Server (NTRS)
Hsia, T. C.; Lu, G. Z.; Han, W. H.
1987-01-01
In advanced robot control problems, on-line computation of inverse Jacobian solution is frequently required. Parallel processing architecture is an effective way to reduce computation time. A parallel processing architecture is developed for the inverse Jacobian (inverse differential kinematic equation) of the PUMA arm. The proposed pipeline/parallel algorithm can be inplemented on an IC chip using systolic linear arrays. This implementation requires 27 processing cells and 25 time units. Computation time is thus significantly reduced.
Quantum Accelerators for High-performance Computing Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Humble, Travis S.; Britt, Keith A.; Mohiyaddin, Fahd A.
We define some of the programming and system-level challenges facing the application of quantum processing to high-performance computing. Alongside barriers to physical integration, prominent differences in the execution of quantum and conventional programs challenges the intersection of these computational models. Following a brief overview of the state of the art, we discuss recent advances in programming and execution models for hybrid quantum-classical computing. We discuss a novel quantum-accelerator framework that uses specialized kernels to offload select workloads while integrating with existing computing infrastructure. We elaborate on the role of the host operating system to manage these unique accelerator resources, themore » prospects for deploying quantum modules, and the requirements placed on the language hierarchy connecting these different system components. We draw on recent advances in the modeling and simulation of quantum computing systems with the development of architectures for hybrid high-performance computing systems and the realization of software stacks for controlling quantum devices. Finally, we present simulation results that describe the expected system-level behavior of high-performance computing systems composed from compute nodes with quantum processing units. We describe performance for these hybrid systems in terms of time-to-solution, accuracy, and energy consumption, and we use simple application examples to estimate the performance advantage of quantum acceleration.« less
Investigation of Dendrimer-Membrane Interactions
NASA Astrophysics Data System (ADS)
Mecke, Almut; Hessler, Jessica; Lee, Inhan; Banaszak Holl, Mark; Orr, Bradford; Patri, Anil K.; Baker, J. R.
2003-03-01
Modified Polyamidoamine (PAMAM) dendrimers show great promise as targeted drug transport agents. Current research efforts point to the possibility of dramatic improvements to conventional chemotherapy by selectively delivering a therapeutic to antigen bearing tumor cells. In order to better understand the uptake mechanism of such devices into cells we are investigating dendrimer-surface adsorption and dendrimer-membrane interactions using atomic force microscopy, light scattering and computer simulations. Model systems consisting of supported DMPC lipid bilayers have shown interesting results suggesting the shape and architecture of nano-devices play an important role for their biologic activity. We are also investigating the effect of targeted drug vehicles on cells in vitro.
Ultra Low Energy Binary Decision Diagram Circuits Using Few Electron Transistors
NASA Astrophysics Data System (ADS)
Saripalli, Vinay; Narayanan, Vijay; Datta, Suman
Novel medical applications involving embedded sensors, require ultra low energy dissipation with low-to-moderate performance (10kHz-100MHz) driving the conventional MOSFETs into sub-threshold operation regime. In this paper, we present an alternate ultra-low power computing architecture using Binary Decision Diagram based logic circuits implemented using Single Electron Transistors (SETs) operating in the Coulomb blockade regime with very low supply voltages. We evaluate the energy - performance tradeoff metrics of such BDD circuits using time domain Monte Carlo simulations and compare them with the energy-optimized CMOS logic circuits. Simulation results show that the proposed approach achieves better energy-delay characteristics than CMOS realizations.
Bio-Inspired Microsystem for Robust Genetic Assay Recognition
Lue, Jaw-Chyng; Fang, Wai-Chi
2008-01-01
A compact integrated system-on-chip (SoC) architecture solution for robust, real-time, and on-site genetic analysis has been proposed. This microsystem solution is noise-tolerable and suitable for analyzing the weak fluorescence patterns from a PCR prepared dual-labeled DNA microchip assay. In the architecture, a preceding VLSI differential logarithm microchip is designed for effectively computing the logarithm of the normalized input fluorescence signals. A posterior VLSI artificial neural network (ANN) processor chip is used for analyzing the processed signals from the differential logarithm stage. A single-channel logarithmic circuit was fabricated and characterized. A prototype ANN chip with unsupervised winner-take-all (WTA) function was designed, fabricated, and tested. An ANN learning algorithm using a novel sigmoid-logarithmic transfer function based on the supervised backpropagation (BP) algorithm is proposed for robustly recognizing low-intensity patterns. Our results show that the trained new ANN can recognize low-fluorescence patterns better than an ANN using the conventional sigmoid function. PMID:18566679
Parallel consensual neural networks.
Benediktsson, J A; Sveinsson, J R; Ersoy, O K; Swain, P H
1997-01-01
A new type of a neural-network architecture, the parallel consensual neural network (PCNN), is introduced and applied in classification/data fusion of multisource remote sensing and geographic data. The PCNN architecture is based on statistical consensus theory and involves using stage neural networks with transformed input data. The input data are transformed several times and the different transformed data are used as if they were independent inputs. The independent inputs are first classified using the stage neural networks. The output responses from the stage networks are then weighted and combined to make a consensual decision. In this paper, optimization methods are used in order to weight the outputs from the stage networks. Two approaches are proposed to compute the data transforms for the PCNN, one for binary data and another for analog data. The analog approach uses wavelet packets. The experimental results obtained with the proposed approach show that the PCNN outperforms both a conjugate-gradient backpropagation neural network and conventional statistical methods in terms of overall classification accuracy of test data.
Lee, S Seirin; Tashiro, S; Awazu, A; Kobayashi, R
2017-01-01
Specific features of nuclear architecture are important for the functional organization of the nucleus, and chromatin consists of two forms, heterochromatin and euchromatin. Conventional nuclear architecture is observed when heterochromatin is enriched at nuclear periphery, and it represents the primary structure in the majority of eukaryotic cells, including the rod cells of diurnal mammals. In contrast to this, inverted nuclear architecture is observed when the heterochromatin is distributed at the center of the nucleus, which occurs in the rod cells of nocturnal mammals. The inverted architecture found in the rod cells of the adult mouse is formed through the reorganization of conventional architecture during terminal differentiation. Although a previous experimental approach has demonstrated the relationship between these two nuclear architecture types at the molecular level, the mechanisms underlying long-range reorganization processes remain unknown. The details of nuclear structures and their spatial and temporal dynamics remain to be elucidated. Therefore, a comprehensive approach, using mathematical modeling, is required, in order to address these questions. Here, we propose a new mathematical approach to the understanding of nuclear architecture dynamics using the phase-field method. We successfully recreated the process of nuclear architecture reorganization, and showed that it is robustly induced by physical features, independent of a specific genotype. Our study demonstrates the potential of phase-field method application in the life science fields.
Pyramidal neurovision architecture for vision machines
NASA Astrophysics Data System (ADS)
Gupta, Madan M.; Knopf, George K.
1993-08-01
The vision system employed by an intelligent robot must be active; active in the sense that it must be capable of selectively acquiring the minimal amount of relevant information for a given task. An efficient active vision system architecture that is based loosely upon the parallel-hierarchical (pyramidal) structure of the biological visual pathway is presented in this paper. Although the computational architecture of the proposed pyramidal neuro-vision system is far less sophisticated than the architecture of the biological visual pathway, it does retain some essential features such as the converging multilayered structure of its biological counterpart. In terms of visual information processing, the neuro-vision system is constructed from a hierarchy of several interactive computational levels, whereupon each level contains one or more nonlinear parallel processors. Computationally efficient vision machines can be developed by utilizing both the parallel and serial information processing techniques within the pyramidal computing architecture. A computer simulation of a pyramidal vision system for active scene surveillance is presented.
Applying a cloud computing approach to storage architectures for spacecraft
NASA Astrophysics Data System (ADS)
Baldor, Sue A.; Quiroz, Carlos; Wood, Paul
As sensor technologies, processor speeds, and memory densities increase, spacecraft command, control, processing, and data storage systems have grown in complexity to take advantage of these improvements and expand the possible missions of spacecraft. Spacecraft systems engineers are increasingly looking for novel ways to address this growth in complexity and mitigate associated risks. Looking to conventional computing, many solutions have been executed to solve both the problem of complexity and heterogeneity in systems. In particular, the cloud-based paradigm provides a solution for distributing applications and storage capabilities across multiple platforms. In this paper, we propose utilizing a cloud-like architecture to provide a scalable mechanism for providing mass storage in spacecraft networks that can be reused on multiple spacecraft systems. By presenting a consistent interface to applications and devices that request data to be stored, complex systems designed by multiple organizations may be more readily integrated. Behind the abstraction, the cloud storage capability would manage wear-leveling, power consumption, and other attributes related to the physical memory devices, critical components in any mass storage solution for spacecraft. Our approach employs SpaceWire networks and SpaceWire-capable devices, although the concept could easily be extended to non-heterogeneous networks consisting of multiple spacecraft and potentially the ground segment.
Motion-sensor fusion-based gesture recognition and its VLSI architecture design for mobile devices
NASA Astrophysics Data System (ADS)
Zhu, Wenping; Liu, Leibo; Yin, Shouyi; Hu, Siqi; Tang, Eugene Y.; Wei, Shaojun
2014-05-01
With the rapid proliferation of smartphones and tablets, various embedded sensors are incorporated into these platforms to enable multimodal human-computer interfaces. Gesture recognition, as an intuitive interaction approach, has been extensively explored in the mobile computing community. However, most gesture recognition implementations by now are all user-dependent and only rely on accelerometer. In order to achieve competitive accuracy, users are required to hold the devices in predefined manner during the operation. In this paper, a high-accuracy human gesture recognition system is proposed based on multiple motion sensor fusion. Furthermore, to reduce the energy overhead resulted from frequent sensor sampling and data processing, a high energy-efficient VLSI architecture implemented on a Xilinx Virtex-5 FPGA board is also proposed. Compared with the pure software implementation, approximately 45 times speed-up is achieved while operating at 20 MHz. The experiments show that the average accuracy for 10 gestures achieves 93.98% for user-independent case and 96.14% for user-dependent case when subjects hold the device randomly during completing the specified gestures. Although a few percent lower than the conventional best result, it still provides competitive accuracy acceptable for practical usage. Most importantly, the proposed system allows users to hold the device randomly during operating the predefined gestures, which substantially enhances the user experience.
Collaborative Working Architecture for IoT-Based Applications.
Mora, Higinio; Signes-Pont, María Teresa; Gil, David; Johnsson, Magnus
2018-05-23
The new sensing applications need enhanced computing capabilities to handle the requirements of complex and huge data processing. The Internet of Things (IoT) concept brings processing and communication features to devices. In addition, the Cloud Computing paradigm provides resources and infrastructures for performing the computations and outsourcing the work from the IoT devices. This scenario opens new opportunities for designing advanced IoT-based applications, however, there is still much research to be done to properly gear all the systems for working together. This work proposes a collaborative model and an architecture to take advantage of the available computing resources. The resulting architecture involves a novel network design with different levels which combines sensing and processing capabilities based on the Mobile Cloud Computing (MCC) paradigm. An experiment is included to demonstrate that this approach can be used in diverse real applications. The results show the flexibility of the architecture to perform complex computational tasks of advanced applications.
Optimizing Engineering Tools Using Modern Ground Architectures
2017-12-01
Considerations,” International Journal of Computer Science & Engineering Survey , vol. 5, no. 4, 2014. [10] R. Bell. (n.d). A beginner’s guide to big O notation...scientific community. Traditional computing architectures were not capable of processing the data efficiently, or in some cases, could not process the...thesis investigates how these modern computing architectures could be leveraged by industry and academia to improve the performance and capabilities of
Architecture independent environment for developing engineering software on MIMD computers
NASA Technical Reports Server (NTRS)
Valimohamed, Karim A.; Lopez, L. A.
1990-01-01
Engineers are constantly faced with solving problems of increasing complexity and detail. Multiple Instruction stream Multiple Data stream (MIMD) computers have been developed to overcome the performance limitations of serial computers. The hardware architectures of MIMD computers vary considerably and are much more sophisticated than serial computers. Developing large scale software for a variety of MIMD computers is difficult and expensive. There is a need to provide tools that facilitate programming these machines. First, the issues that must be considered to develop those tools are examined. The two main areas of concern were architecture independence and data management. Architecture independent software facilitates software portability and improves the longevity and utility of the software product. It provides some form of insurance for the investment of time and effort that goes into developing the software. The management of data is a crucial aspect of solving large engineering problems. It must be considered in light of the new hardware organizations that are available. Second, the functional design and implementation of a software environment that facilitates developing architecture independent software for large engineering applications are described. The topics of discussion include: a description of the model that supports the development of architecture independent software; identifying and exploiting concurrency within the application program; data coherence; engineering data base and memory management.
Exploiting current-generation graphics hardware for synthetic-scene generation
NASA Astrophysics Data System (ADS)
Tanner, Michael A.; Keen, Wayne A.
2010-04-01
Increasing seeker frame rate and pixel count, as well as the demand for higher levels of scene fidelity, have driven scene generation software for hardware-in-the-loop (HWIL) and software-in-the-loop (SWIL) testing to higher levels of parallelization. Because modern PC graphics cards provide multiple computational cores (240 shader cores for a current NVIDIA Corporation GeForce and Quadro cards), implementation of phenomenology codes on graphics processing units (GPUs) offers significant potential for simultaneous enhancement of simulation frame rate and fidelity. To take advantage of this potential requires algorithm implementation that is structured to minimize data transfers between the central processing unit (CPU) and the GPU. In this paper, preliminary methodologies developed at the Kinetic Hardware In-The-Loop Simulator (KHILS) will be presented. Included in this paper will be various language tradeoffs between conventional shader programming, Compute Unified Device Architecture (CUDA) and Open Computing Language (OpenCL), including performance trades and possible pathways for future tool development.
Computer vision camera with embedded FPGA processing
NASA Astrophysics Data System (ADS)
Lecerf, Antoine; Ouellet, Denis; Arias-Estrada, Miguel
2000-03-01
Traditional computer vision is based on a camera-computer system in which the image understanding algorithms are embedded in the computer. To circumvent the computational load of vision algorithms, low-level processing and imaging hardware can be integrated in a single compact module where a dedicated architecture is implemented. This paper presents a Computer Vision Camera based on an open architecture implemented in an FPGA. The system is targeted to real-time computer vision tasks where low level processing and feature extraction tasks can be implemented in the FPGA device. The camera integrates a CMOS image sensor, an FPGA device, two memory banks, and an embedded PC for communication and control tasks. The FPGA device is a medium size one equivalent to 25,000 logic gates. The device is connected to two high speed memory banks, an IS interface, and an imager interface. The camera can be accessed for architecture programming, data transfer, and control through an Ethernet link from a remote computer. A hardware architecture can be defined in a Hardware Description Language (like VHDL), simulated and synthesized into digital structures that can be programmed into the FPGA and tested on the camera. The architecture of a classical multi-scale edge detection algorithm based on a Laplacian of Gaussian convolution has been developed to show the capabilities of the system.
State-of-the-art in Heterogeneous Computing
Brodtkorb, Andre R.; Dyken, Christopher; Hagen, Trond R.; ...
2010-01-01
Node level heterogeneous architectures have become attractive during the last decade for several reasons: compared to traditional symmetric CPUs, they offer high peak performance and are energy and/or cost efficient. With the increase of fine-grained parallelism in high-performance computing, as well as the introduction of parallelism in workstations, there is an acute need for a good overview and understanding of these architectures. We give an overview of the state-of-the-art in heterogeneous computing, focusing on three commonly found architectures: the Cell Broadband Engine Architecture, graphics processing units (GPUs), and field programmable gate arrays (FPGAs). We present a review of hardware, availablemore » software tools, and an overview of state-of-the-art techniques and algorithms. Furthermore, we present a qualitative and quantitative comparison of the architectures, and give our view on the future of heterogeneous computing.« less
A computer architecture for intelligent machines
NASA Technical Reports Server (NTRS)
Lefebvre, D. R.; Saridis, G. N.
1991-01-01
The Theory of Intelligent Machines proposes a hierarchical organization for the functions of an autonomous robot based on the Principle of Increasing Precision With Decreasing Intelligence. An analytic formulation of this theory using information-theoretic measures of uncertainty for each level of the intelligent machine has been developed in recent years. A computer architecture that implements the lower two levels of the intelligent machine is presented. The architecture supports an event-driven programming paradigm that is independent of the underlying computer architecture and operating system. Details of Execution Level controllers for motion and vision systems are addressed, as well as the Petri net transducer software used to implement Coordination Level functions. Extensions to UNIX and VxWorks operating systems which enable the development of a heterogeneous, distributed application are described. A case study illustrates how this computer architecture integrates real-time and higher-level control of manipulator and vision systems.
Advanced computer architecture for large-scale real-time applications.
DOT National Transportation Integrated Search
1973-04-01
Air traffic control automation is identified as a crucial problem which provides a complex, real-time computer application environment. A novel computer architecture in the form of a pipeline associative processor is conceived to achieve greater perf...
Integrating Computing Resources: A Shared Distributed Architecture for Academics and Administrators.
ERIC Educational Resources Information Center
Beltrametti, Monica; English, Will
1994-01-01
Development and implementation of a shared distributed computing architecture at the University of Alberta (Canada) are described. Aspects discussed include design of the architecture, users' views of the electronic environment, technical and managerial challenges, and the campuswide human infrastructures needed to manage such an integrated…
ERIC Educational Resources Information Center
Farid, Ayman A.; Zaghloul, Weaam M.; Dewidar, Khaled M.
2014-01-01
The great shift in sustainability and computer aided design in the field of architecture caused a remarkable change in the architecture philosophy, new aspects of beauty and aesthetic values are being introduced, and traditional definitions for beauty cannot fully cover this aspects, which causes a gap between; new architecture works criticism and…
Programmable hardware for reconfigurable computing systems
NASA Astrophysics Data System (ADS)
Smith, Stephen
1996-10-01
In 1945 the work of J. von Neumann and H. Goldstein created the principal architecture for electronic computation that has now lasted fifty years. Nevertheless alternative architectures have been created that have computational capability, for special tasks, far beyond that feasible with von Neumann machines. The emergence of high capacity programmable logic devices has made the realization of these architectures practical. The original ENIAC and EDVAC machines were conceived to solve special mathematical problems that were far from today's concept of 'killer applications.' In a similar vein programmable hardware computation is being used today to solve unique mathematical problems. Our programmable hardware activity is focused on the research and development of novel computational systems based upon the reconfigurability of our programmable logic devices. We explore our programmable logic architectures and their implications for programmable hardware. One programmable hardware board implementation is detailed.
Access-in-turn test architecture for low-power test application
NASA Astrophysics Data System (ADS)
Wang, Weizheng; Wang, JinCheng; Wang, Zengyun; Xiang, Lingyun
2017-03-01
This paper presents a novel access-in-turn test architecture (AIT-TA) for testing of very large scale integrated (VLSI) designs. In the proposed scheme, each scan cell in a chain receives test data from shift-in line in turn while pushing its test response to the shift-out line. It solves the power problem of conventional scan architecture to a great extent and suppresses significantly the switching activity during shift and capture operation with acceptable hardware overhead. Thus, it can help to implement the test at much higher operation frequencies resulting shorter test application time. The proposed test approach enhances the architecture of conventional scan flip-flops and backward compatible with existing test pattern generation and simulation techniques. Experimental results obtained for some larger ISCAS'89 and ITC'99 benchmark circuits illustrate effectiveness of the proposed low-power test application scheme.
Execution environment for intelligent real-time control systems
NASA Technical Reports Server (NTRS)
Sztipanovits, Janos
1987-01-01
Modern telerobot control technology requires the integration of symbolic and non-symbolic programming techniques, different models of parallel computations, and various programming paradigms. The Multigraph Architecture, which has been developed for the implementation of intelligent real-time control systems is described. The layered architecture includes specific computational models, integrated execution environment and various high-level tools. A special feature of the architecture is the tight coupling between the symbolic and non-symbolic computations. It supports not only a data interface, but also the integration of the control structures in a parallel computing environment.
Efficient Phase Unwrapping Architecture for Digital Holographic Microscopy
Hwang, Wen-Jyi; Cheng, Shih-Chang; Cheng, Chau-Jern
2011-01-01
This paper presents a novel phase unwrapping architecture for accelerating the computational speed of digital holographic microscopy (DHM). A fast Fourier transform (FFT) based phase unwrapping algorithm providing a minimum squared error solution is adopted for hardware implementation because of its simplicity and robustness to noise. The proposed architecture is realized in a pipeline fashion to maximize throughput of the computation. Moreover, the number of hardware multipliers and dividers are minimized to reduce the hardware costs. The proposed architecture is used as a custom user logic in a system on programmable chip (SOPC) for physical performance measurement. Experimental results reveal that the proposed architecture is effective for expediting the computational speed while consuming low hardware resources for designing an embedded DHM system. PMID:22163688
Linking Humans to Data: Designing an Enterprise Architecture for EarthCube
NASA Astrophysics Data System (ADS)
Xu, C.; Yang, C.; Meyer, C. B.
2013-12-01
National Science Foundation (NSF)'s EarthCube is a strategic initiative towards a grand enterprise that holistically incorporates different geoscience research domains. The EarthCube as envisioned by NSF is a community-guided cyberinfrastructure (NSF 2011). The design of EarthCube enterprise architecture (EA) offers a vision to harmonize processes between the operations of EarthCube and its information technology foundation, the geospatial cyberinfrastructure. (Yang et al. 2010). We envision these processes as linking humans to data. We report here on fundamental ideas that would ultimately materialize as a conceptual design of EarthCube EA. EarthCube can be viewed as a meta-science that seeks to advance knowledge of the Earth through cross-disciplinary connections made using conventional domain-based earth science research. In order to build capacity that enables crossing disciplinary chasms, a key step would be to identify the cornerstones of the envisioned enterprise architecture. Human and data inputs are the two key factors to the success of EarthCube (NSF 2011), based upon which three hypotheses have been made: 1) cross disciplinary collaboration has to be achieved through data sharing; 2) disciplinary differences need to be articulated and captured in both computer and human understandable formats; 3) human intervention is crucial for crossing the disciplinary chasms. We have selected the Federal Enterprise Architecture Framework (FEAF, CIO Council 2013) as the baseline for the envisioned EarthCube EA, noting that the FEAF's deficiencies can be improved upon with inputs from three other popular EA frameworks. This presentation reports the latest on the conceptual design of an enterprise architecture in support of EarthCube.
ERIC Educational Resources Information Center
Betts, Janelle Lyon
2001-01-01
Describes a high school art assignment in which students utilize Appleworks or Claris Works to design their own house, after learning about architectural styles and how to use the computer program. States that the project develops student computer skills and increases student knowledge about architecture. (CMK)
Evaluation of Visual Computer Simulator for Computer Architecture Education
ERIC Educational Resources Information Center
Imai, Yoshiro; Imai, Masatoshi; Moritoh, Yoshio
2013-01-01
This paper presents trial evaluation of a visual computer simulator in 2009-2011, which has been developed to play some roles of both instruction facility and learning tool simultaneously. And it illustrates an example of Computer Architecture education for University students and usage of e-Learning tool for Assembly Programming in order to…
Design of a massively parallel computer using bit serial processing elements
NASA Technical Reports Server (NTRS)
Aburdene, Maurice F.; Khouri, Kamal S.; Piatt, Jason E.; Zheng, Jianqing
1995-01-01
A 1-bit serial processor designed for a parallel computer architecture is described. This processor is used to develop a massively parallel computational engine, with a single instruction-multiple data (SIMD) architecture. The computer is simulated and tested to verify its operation and to measure its performance for further development.
A heterogeneous hierarchical architecture for real-time computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Skroch, D.A.; Fornaro, R.J.
The need for high-speed data acquisition and control algorithms has prompted continued research in the area of multiprocessor systems and related programming techniques. The result presented here is a unique hardware and software architecture for high-speed real-time computer systems. The implementation of a prototype of this architecture has required the integration of architecture, operating systems and programming languages into a cohesive unit. This report describes a Heterogeneous Hierarchial Architecture for Real-Time (H{sup 2} ART) and system software for program loading and interprocessor communication.
Accelerating 3D Elastic Wave Equations on Knights Landing based Intel Xeon Phi processors
NASA Astrophysics Data System (ADS)
Sourouri, Mohammed; Birger Raknes, Espen
2017-04-01
In advanced imaging methods like reverse-time migration (RTM) and full waveform inversion (FWI) the elastic wave equation (EWE) is numerically solved many times to create the seismic image or the elastic parameter model update. Thus, it is essential to optimize the solution time for solving the EWE as this will have a major impact on the total computational cost in running RTM or FWI. From a computational point of view applications implementing EWEs are associated with two major challenges. The first challenge is the amount of memory-bound computations involved, while the second challenge is the execution of such computations over very large datasets. So far, multi-core processors have not been able to tackle these two challenges, which eventually led to the adoption of accelerators such as Graphics Processing Units (GPUs). Compared to conventional CPUs, GPUs are densely populated with many floating-point units and fast memory, a type of architecture that has proven to map well to many scientific computations. Despite its architectural advantages, full-scale adoption of accelerators has yet to materialize. First, accelerators require a significant programming effort imposed by programming models such as CUDA or OpenCL. Second, accelerators come with a limited amount of memory, which also require explicit data transfers between the CPU and the accelerator over the slow PCI bus. The second generation of the Xeon Phi processor based on the Knights Landing (KNL) architecture, promises the computational capabilities of an accelerator but require the same programming effort as traditional multi-core processors. The high computational performance is realized through many integrated cores (number of cores and tiles and memory varies with the model) organized in tiles that are connected via a 2D mesh based interconnect. In contrary to accelerators, KNL is a self-hosted system, meaning explicit data transfers over the PCI bus are no longer required. However, like most accelerators, KNL sports a memory subsystem consisting of low-level caches and 16GB of high-bandwidth MCDRAM memory. For capacity computing, up to 400GB of conventional DDR4 memory is provided. Such a strict hierarchical memory layout means that data locality is imperative if the true potential of this product is to be harnessed. In this work, we study a series of optimizations specifically targeting KNL for our EWE based application to reduce the time-to-solution time for the following 3D model sizes in grid points: 1283, 2563 and 5123. We compare the results with an optimized version for multi-core CPUs running on a dual-socket Xeon E5 2680v3 system using OpenMP. Our initial naive implementation on the KNL is roughly 20% faster than the multi-core version, but by using only one thread per core and careful memory placement using the memkind library, we could achieve higher speedups. Additionally, by using the MCDRAM as cache for problem sizes that are smaller than 16 GB further performance improvements were unlocked. Depending on the problem size, our overall results indicate that the KNL based system is approximately 2.2x faster than the 24-core Xeon E5 2680v3 system, with only modest changes to the code.
Three Program Architecture for Design Optimization
NASA Technical Reports Server (NTRS)
Miura, Hirokazu; Olson, Lawrence E. (Technical Monitor)
1998-01-01
In this presentation, I would like to review historical perspective on the program architecture used to build design optimization capabilities based on mathematical programming and other numerical search techniques. It is rather straightforward to classify the program architecture in three categories as shown above. However, the relative importance of each of the three approaches has not been static, instead dynamically changing as the capabilities of available computational resource increases. For example, we considered that the direct coupling architecture would never be used for practical problems, but availability of such computer systems as multi-processor. In this presentation, I would like to review the roles of three architecture from historical as well as current and future perspective. There may also be some possibility for emergence of hybrid architecture. I hope to provide some seeds for active discussion where we are heading to in the very dynamic environment for high speed computing and communication.
ERIC Educational Resources Information Center
Arumi, Francisco N.
Computer programs capable of describing the thermal behavior of buildings are used to help architectural students understand environmental systems. The Numerical Simulation Laboratory at the Architectural School of the University of Texas at Austin was developed to provide the necessary software capable of simulating the energy transactions…
Shape-matching soft mechanical metamaterials.
Mirzaali, M J; Janbaz, S; Strano, M; Vergani, L; Zadpoor, A A
2018-01-17
Architectured materials with rationally designed geometries could be used to create mechanical metamaterials with unprecedented or rare properties and functionalities. Here, we introduce "shape-matching" metamaterials where the geometry of cellular structures comprising auxetic and conventional unit cells is designed so as to achieve a pre-defined shape upon deformation. We used computational models to forward-map the space of planar shapes to the space of geometrical designs. The validity of the underlying computational models was first demonstrated by comparing their predictions with experimental observations on specimens fabricated with indirect additive manufacturing. The forward-maps were then used to devise the geometry of cellular structures that approximate the arbitrary shapes described by random Fourier's series. Finally, we show that the presented metamaterials could match the contours of three real objects including a scapula model, a pumpkin, and a Delft Blue pottery piece. Shape-matching materials have potential applications in soft robotics and wearable (medical) devices.
Hu, Xiangen; Graesser, Arthur C
2004-05-01
The Human Use Regulatory Affairs Advisor (HURAA) is a Web-based facility that provides help and training on the ethical use of human subjects in research, based on documents and regulations in United States federal agencies. HURAA has a number of standard features of conventional Web facilities and computer-based training, such as hypertext, multimedia, help modules, glossaries, archives, links to other sites, and page-turning didactic instruction. HURAA also has these intelligent features: (1) an animated conversational agent that serves as a navigational guide for the Web facility, (2) lessons with case-based and explanation-based reasoning, (3) document retrieval through natural language queries, and (4) a context-sensitive Frequently Asked Questions segment, called Point & Query. This article describes the functional learning components of HURAA, specifies its computational architecture, and summarizes empirical tests of the facility on learners.
NASA Astrophysics Data System (ADS)
Russkova, Tatiana V.
2017-11-01
One tool to improve the performance of Monte Carlo methods for numerical simulation of light transport in the Earth's atmosphere is the parallel technology. A new algorithm oriented to parallel execution on the CUDA-enabled NVIDIA graphics processor is discussed. The efficiency of parallelization is analyzed on the basis of calculating the upward and downward fluxes of solar radiation in both a vertically homogeneous and inhomogeneous models of the atmosphere. The results of testing the new code under various atmospheric conditions including continuous singlelayered and multilayered clouds, and selective molecular absorption are presented. The results of testing the code using video cards with different compute capability are analyzed. It is shown that the changeover of computing from conventional PCs to the architecture of graphics processors gives more than a hundredfold increase in performance and fully reveals the capabilities of the technology used.
Sixth SIAM conference on applied linear algebra: Final program and abstracts. Final technical report
DOE Office of Scientific and Technical Information (OSTI.GOV)
NONE
1997-12-31
Linear algebra plays a central role in mathematics and applications. The analysis and solution of problems from an amazingly wide variety of disciplines depend on the theory and computational techniques of linear algebra. In turn, the diversity of disciplines depending on linear algebra also serves to focus and shape its development. Some problems have special properties (numerical, structural) that can be exploited. Some are simply so large that conventional approaches are impractical. New computer architectures motivate new algorithms, and fresh ways to look at old ones. The pervasive nature of linear algebra in analyzing and solving problems means that peoplemore » from a wide spectrum--universities, industrial and government laboratories, financial institutions, and many others--share an interest in current developments in linear algebra. This conference aims to bring them together for their mutual benefit. Abstracts of papers presented are included.« less
The fuzzy cube and causal efficacy: representation of concomitant mechanisms in stroke.
Jobe, Thomas H.; Helgason, Cathy M.
1998-04-01
Twentieth century medical science has embraced nineteenth century Boolean probability theory based upon two-valued Aristotelian logic. With the later addition of bit-based, von Neumann structured computational architectures, an epistemology based on randomness has led to a bivalent epidemiological methodology that dominates medical decision making. In contrast, fuzzy logic, based on twentieth century multi-valued logic, and computational structures that are content addressed and adaptively modified, has advanced a new scientific paradigm for the twenty-first century. Diseases such as stroke involve multiple concomitant causal factors that are difficult to represent using conventional statistical methods. We tested which paradigm best represented this complex multi-causal clinical phenomenon-stroke. We show that the fuzzy logic paradigm better represented clinical complexity in cerebrovascular disease than current probability theory based methodology. We believe this finding is generalizable to all of clinical science since multiple concomitant causal factors are involved in nearly all known pathological processes.
Nanoelectronics: Opportunities for future space applications
NASA Technical Reports Server (NTRS)
Frazier, Gary
1995-01-01
Further improvements in the performance of integrated electronics will eventually halt due to practical fundamental limits on our ability to downsize transistors and interconnect wiring. Avoiding these limits requires a revolutionary approach to switching device technology and computing architecture. Nanoelectronics, the technology of exploiting physics on the nanometer scale for computation and communication, attempts to avoid conventional limits by developing new approaches to switching, circuitry, and system integration. This presentation overviews the basic principles that operate on the nanometer scale that can be assembled into practical devices and circuits. Quantum resonant tunneling (RT) is used as the center-piece of the overview since RT devices already operate at high temperature (120 degrees C) and can be scaled, in principle, to a few nanometers in semiconductors. Near- and long-term applications of GaAs and silicon quantum devices are suggested for signal and information processing, memory, optoelectronics, and radio frequency (RF) communication.
NASA Astrophysics Data System (ADS)
Mills, R. T.; Rupp, K.; Smith, B. F.; Brown, J.; Knepley, M.; Zhang, H.; Adams, M.; Hammond, G. E.
2017-12-01
As the high-performance computing community pushes towards the exascale horizon, power and heat considerations have driven the increasing importance and prevalence of fine-grained parallelism in new computer architectures. High-performance computing centers have become increasingly reliant on GPGPU accelerators and "manycore" processors such as the Intel Xeon Phi line, and 512-bit SIMD registers have even been introduced in the latest generation of Intel's mainstream Xeon server processors. The high degree of fine-grained parallelism and more complicated memory hierarchy considerations of such "manycore" processors present several challenges to existing scientific software. Here, we consider how the massively parallel, open-source hydrologic flow and reactive transport code PFLOTRAN - and the underlying Portable, Extensible Toolkit for Scientific Computation (PETSc) library on which it is built - can best take advantage of such architectures. We will discuss some key features of these novel architectures and our code optimizations and algorithmic developments targeted at them, and present experiences drawn from working with a wide range of PFLOTRAN benchmark problems on these architectures.
NASA Technical Reports Server (NTRS)
Weeks, Cindy Lou
1986-01-01
Experiments were conducted at NASA Ames Research Center to define multi-tasking software requirements for multiple-instruction, multiple-data stream (MIMD) computer architectures. The focus was on specifying solutions for algorithms in the field of computational fluid dynamics (CFD). The program objectives were to allow researchers to produce usable parallel application software as soon as possible after acquiring MIMD computer equipment, to provide researchers with an easy-to-learn and easy-to-use parallel software language which could be implemented on several different MIMD machines, and to enable researchers to list preferred design specifications for future MIMD computer architectures. Analysis of CFD algorithms indicated that extensions of an existing programming language, adaptable to new computer architectures, provided the best solution to meeting program objectives. The CoFORTRAN Language was written in response to these objectives and to provide researchers a means to experiment with parallel software solutions to CFD algorithms on machines with parallel architectures.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Potok, Thomas; Schuman, Catherine; Patton, Robert
The White House and Department of Energy have been instrumental in driving the development of a neuromorphic computing program to help the United States continue its lead in basic research into (1) Beyond Exascale—high performance computing beyond Moore’s Law and von Neumann architectures, (2) Scientific Discovery—new paradigms for understanding increasingly large and complex scientific data, and (3) Emerging Architectures—assessing the potential of neuromorphic and quantum architectures. Neuromorphic computing spans a broad range of scientific disciplines from materials science to devices, to computer science, to neuroscience, all of which are required to solve the neuromorphic computing grand challenge. In our workshopmore » we focus on the computer science aspects, specifically from a neuromorphic device through an application. Neuromorphic devices present a very different paradigm to the computer science community from traditional von Neumann architectures, which raises six major questions about building a neuromorphic application from the device level. We used these fundamental questions to organize the workshop program and to direct the workshop panels and discussions. From the white papers, presentations, panels, and discussions, there emerged several recommendations on how to proceed.« less
A Study of Complex Deep Learning Networks on High Performance, Neuromorphic, and Quantum Computers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Potok, Thomas E; Schuman, Catherine D; Young, Steven R
Current Deep Learning models use highly optimized convolutional neural networks (CNN) trained on large graphical processing units (GPU)-based computers with a fairly simple layered network topology, i.e., highly connected layers, without intra-layer connections. Complex topologies have been proposed, but are intractable to train on current systems. Building the topologies of the deep learning network requires hand tuning, and implementing the network in hardware is expensive in both cost and power. In this paper, we evaluate deep learning models using three different computing architectures to address these problems: quantum computing to train complex topologies, high performance computing (HPC) to automatically determinemore » network topology, and neuromorphic computing for a low-power hardware implementation. Due to input size limitations of current quantum computers we use the MNIST dataset for our evaluation. The results show the possibility of using the three architectures in tandem to explore complex deep learning networks that are untrainable using a von Neumann architecture. We show that a quantum computer can find high quality values of intra-layer connections and weights, while yielding a tractable time result as the complexity of the network increases; a high performance computer can find optimal layer-based topologies; and a neuromorphic computer can represent the complex topology and weights derived from the other architectures in low power memristive hardware. This represents a new capability that is not feasible with current von Neumann architecture. It potentially enables the ability to solve very complicated problems unsolvable with current computing technologies.« less
ERIC Educational Resources Information Center
Amenyo, John-Thones
2012-01-01
Carefully engineered playable games can serve as vehicles for students and practitioners to learn and explore the programming of advanced computer architectures to execute applications, such as high performance computing (HPC) and complex, inter-networked, distributed systems. The article presents families of playable games that are grounded in…
A design philosophy for multi-layer neural networks with applications to robot control
NASA Technical Reports Server (NTRS)
Vadiee, Nader; Jamshidi, MO
1989-01-01
A system is proposed which receives input information from many sensors that may have diverse scaling, dimension, and data representations. The proposed system tolerates sensory information with faults. The proposed self-adaptive processing technique has great promise in integrating the techniques of artificial intelligence and neural networks in an attempt to build a more intelligent computing environment. The proposed architecture can provide a detailed decision tree based on the input information, information stored in a long-term memory, and the adapted rule-based knowledge. A mathematical model for analysis will be obtained to validate the cited hypotheses. An extensive software program will be developed to simulate a typical example of pattern recognition problem. It is shown that the proposed model displays attention, expectation, spatio-temporal, and predictory behavior which are specific to the human brain. The anticipated results of this research project are: (1) creation of a new dynamic neural network structure, and (2) applications to and comparison with conventional multi-layer neural network structures. The anticipated benefits from this research are vast. The model can be used in a neuro-computer architecture as a building block which can perform complicated, nonlinear, time-varying mapping from a multitude of input excitory classes to an output or decision environment. It can be used for coordinating different sensory inputs and past experience of a dynamic system and actuating signals. The commercial applications of this project can be the creation of a special-purpose neuro-computer hardware which can be used in spatio-temporal pattern recognitions in such areas as air defense systems, e.g., target tracking, and recognition. Potential robotics-related applications are trajectory planning, inverse dynamics computations, hierarchical control, task-oriented control, and collision avoidance.
Gschwind, Michael K
2013-04-16
Mechanisms for generating and executing programs for a floating point (FP) only single instruction multiple data (SIMD) instruction set architecture (ISA) are provided. A computer program product comprising a computer recordable medium having a computer readable program recorded thereon is provided. The computer readable program, when executed on a computing device, causes the computing device to receive one or more instructions and execute the one or more instructions using logic in an execution unit of the computing device. The logic implements a floating point (FP) only single instruction multiple data (SIMD) instruction set architecture (ISA), based on data stored in a vector register file of the computing device. The vector register file is configured to store both scalar and floating point values as vectors having a plurality of vector elements.
NASA Technical Reports Server (NTRS)
Denning, Peter J.; Tichy, Walter F.
1990-01-01
Highly parallel computing architectures are the only means to achieve the computation rates demanded by advanced scientific problems. A decade of research has demonstrated the feasibility of such machines and current research focuses on which architectures designated as multiple instruction multiple datastream (MIMD) and single instruction multiple datastream (SIMD) have produced the best results to date; neither shows a decisive advantage for most near-homogeneous scientific problems. For scientific problems with many dissimilar parts, more speculative architectures such as neural networks or data flow may be needed.
Switching from computer to microcomputer architecture education
NASA Astrophysics Data System (ADS)
Bolanakis, Dimosthenis E.; Kotsis, Konstantinos T.; Laopoulos, Theodore
2010-03-01
In the last decades, the technological and scientific evolution of the computing discipline has been widely affecting research in software engineering education, which nowadays advocates more enlightened and liberal ideas. This article reviews cross-disciplinary research on a computer architecture class in consideration of its switching to microcomputer architecture. The authors present their strategies towards a successful crossing of boundaries between engineering disciplines. This communication aims at providing a different aspect on professional courses that are, nowadays, addressed at the expense of traditional courses.
Three-Dimensional Nanobiocomputing Architectures With Neuronal Hypercells
2007-06-01
Neumann architectures, and CMOS fabrication. Novel solutions of massive parallel distributed computing and processing (pipelined due to systolic... and processing platforms utilizing molecular hardware within an enabling organization and architecture. The design technology is based on utilizing a...Microsystems and Nanotechnologies investigated a novel 3D3 (Hardware Software Nanotechnology) technology to design super-high performance computing
Switching from Computer to Microcomputer Architecture Education
ERIC Educational Resources Information Center
Bolanakis, Dimosthenis E.; Kotsis, Konstantinos T.; Laopoulos, Theodore
2010-01-01
In the last decades, the technological and scientific evolution of the computing discipline has been widely affecting research in software engineering education, which nowadays advocates more enlightened and liberal ideas. This article reviews cross-disciplinary research on a computer architecture class in consideration of its switching to…
An Architecture for Cross-Cloud System Management
NASA Astrophysics Data System (ADS)
Dodda, Ravi Teja; Smith, Chris; van Moorsel, Aad
The emergence of the cloud computing paradigm promises flexibility and adaptability through on-demand provisioning of compute resources. As the utilization of cloud resources extends beyond a single provider, for business as well as technical reasons, the issue of effectively managing such resources comes to the fore. Different providers expose different interfaces to their compute resources utilizing varied architectures and implementation technologies. This heterogeneity poses a significant system management problem, and can limit the extent to which the benefits of cross-cloud resource utilization can be realized. We address this problem through the definition of an architecture to facilitate the management of compute resources from different cloud providers in an homogenous manner. This preserves the flexibility and adaptability promised by the cloud computing paradigm, whilst enabling the benefits of cross-cloud resource utilization to be realized. The practical efficacy of the architecture is demonstrated through an implementation utilizing compute resources managed through different interfaces on the Amazon Elastic Compute Cloud (EC2) service. Additionally, we provide empirical results highlighting the performance differential of these different interfaces, and discuss the impact of this performance differential on efficiency and profitability.
Dunn, Katherine E; Trefzer, Martin A; Johnson, Steven; Tyrrell, Andy M
2016-08-01
Molecular computation with DNA has great potential for low power, highly parallel information processing in a biological or biochemical context. However, significant challenges remain for the field of DNA computation. New technology is needed to allow multiplexed label-free readout and to enable regulation of molecular state without addition of new DNA strands. These capabilities could be provided by hybrid bioelectronic systems in which biomolecular computing is integrated with conventional electronics through immobilization of DNA machines on the surface of electronic circuitry. Here we present a quantitative experimental analysis of a surface-immobilized OR gate made from DNA and driven by strand displacement. The purpose of our work is to examine the performance of a simple representative surface-immobilized DNA logic machine, to provide valuable information for future work on hybrid bioelectronic systems involving DNA devices. We used a quartz crystal microbalance to examine a DNA monolayer containing approximately 5×10(11)gatescm(-2), with an inter-gate separation of approximately 14nm, and we found that the ensemble of gates took approximately 6min to switch. The gates could be switched repeatedly, but the switching efficiency was significantly degraded on the second and subsequent cycles when the binding site for the input was near to the surface. Otherwise, the switching efficiency could be 80% or better, and the power dissipated by the ensemble of gates during switching was approximately 0.1nWcm(-2), which is orders of magnitude less than the power dissipated during switching of an equivalent array of transistors. We propose an architecture for hybrid DNA-electronic systems in which information can be stored and processed, either in series or in parallel, by a combination of molecular machines and conventional electronics. In this architecture, information can flow freely and in both directions between the solution-phase and the underlying electronics via surface-immobilized DNA machines that provide the interface between the molecular and electronic domains. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
A Low Cost VLSI Architecture for Spike Sorting Based on Feature Extraction with Peak Search.
Chang, Yuan-Jyun; Hwang, Wen-Jyi; Chen, Chih-Chang
2016-12-07
The goal of this paper is to present a novel VLSI architecture for spike sorting with high classification accuracy, low area costs and low power consumption. A novel feature extraction algorithm with low computational complexities is proposed for the design of the architecture. In the feature extraction algorithm, a spike is separated into two portions based on its peak value. The area of each portion is then used as a feature. The algorithm is simple to implement and less susceptible to noise interference. Based on the algorithm, a novel architecture capable of identifying peak values and computing spike areas concurrently is proposed. To further accelerate the computation, a spike can be divided into a number of segments for the local feature computation. The local features are subsequently merged with the global ones by a simple hardware circuit. The architecture can also be easily operated in conjunction with the circuits for commonly-used spike detection algorithms, such as the Non-linear Energy Operator (NEO). The architecture has been implemented by an Application-Specific Integrated Circuit (ASIC) with 90-nm technology. Comparisons to the existing works show that the proposed architecture is well suited for real-time multi-channel spike detection and feature extraction requiring low hardware area costs, low power consumption and high classification accuracy.
Ultra-Compact Transputer-Based Controller for High-Level, Multi-Axis Coordination
NASA Technical Reports Server (NTRS)
Zenowich, Brian; Crowell, Adam; Townsend, William T.
2013-01-01
The design of machines that rely on arrays of servomotors such as robotic arms, orbital platforms, and combinations of both, imposes a heavy computational burden to coordinate their actions to perform coherent tasks. For example, the robotic equivalent of a person tracing a straight line in space requires enormously complex kinematics calculations, and complexity increases with the number of servo nodes. A new high-level architecture for coordinated servo-machine control enables a practical, distributed transputer alternative to conventional central processor electronics. The solution is inherently scalable, dramatically reduces bulkiness and number of conductor runs throughout the machine, requires only a fraction of the power, and is designed for cooling in a vacuum.
NASA Astrophysics Data System (ADS)
Sibiceanu, A. R.; Ivan, F.; Nicolae, V.; Iorga, A.; Cioroianu, C.
2017-08-01
Given the importance of reducing carbon emissions from road transport, price and security of oil supply, hybrid electric vehicle can provide a viable alternative solution to conventional vehicles, equipped with thermal engines, which use fossil fuels. Based on the growing trends of new vehicles sales, which include hybrid and electric vehicles closely associated with their use in terms of harmful emissions, strict regulations are established. In this paper were created models of thermal and hybrid electric powertrains groups, using computer simulation program AVL Cruise, making a comparative study using petroleum fuels for continuously variable transmission. The results obtained highlights both fuel consumption as well as pollutant emissions.
Memory-Intensive Benchmarks: IRAM vs. Cache-Based Machines
NASA Technical Reports Server (NTRS)
Biswas, Rupak; Gaeke, Brian R.; Husbands, Parry; Li, Xiaoye S.; Oliker, Leonid; Yelick, Katherine A.; Biegel, Bryan (Technical Monitor)
2002-01-01
The increasing gap between processor and memory performance has lead to new architectural models for memory-intensive applications. In this paper, we explore the performance of a set of memory-intensive benchmarks and use them to compare the performance of conventional cache-based microprocessors to a mixed logic and DRAM processor called VIRAM. The benchmarks are based on problem statements, rather than specific implementations, and in each case we explore the fundamental hardware requirements of the problem, as well as alternative algorithms and data structures that can help expose fine-grained parallelism or simplify memory access patterns. The benchmarks are characterized by their memory access patterns, their basic control structures, and the ratio of computation to memory operation.
NMF-mGPU: non-negative matrix factorization on multi-GPU systems.
Mejía-Roa, Edgardo; Tabas-Madrid, Daniel; Setoain, Javier; García, Carlos; Tirado, Francisco; Pascual-Montano, Alberto
2015-02-13
In the last few years, the Non-negative Matrix Factorization ( NMF ) technique has gained a great interest among the Bioinformatics community, since it is able to extract interpretable parts from high-dimensional datasets. However, the computing time required to process large data matrices may become impractical, even for a parallel application running on a multiprocessors cluster. In this paper, we present NMF-mGPU, an efficient and easy-to-use implementation of the NMF algorithm that takes advantage of the high computing performance delivered by Graphics-Processing Units ( GPUs ). Driven by the ever-growing demands from the video-games industry, graphics cards usually provided in PCs and laptops have evolved from simple graphics-drawing platforms into high-performance programmable systems that can be used as coprocessors for linear-algebra operations. However, these devices may have a limited amount of on-board memory, which is not considered by other NMF implementations on GPU. NMF-mGPU is based on CUDA ( Compute Unified Device Architecture ), the NVIDIA's framework for GPU computing. On devices with low memory available, large input matrices are blockwise transferred from the system's main memory to the GPU's memory, and processed accordingly. In addition, NMF-mGPU has been explicitly optimized for the different CUDA architectures. Finally, platforms with multiple GPUs can be synchronized through MPI ( Message Passing Interface ). In a four-GPU system, this implementation is about 120 times faster than a single conventional processor, and more than four times faster than a single GPU device (i.e., a super-linear speedup). Applications of GPUs in Bioinformatics are getting more and more attention due to their outstanding performance when compared to traditional processors. In addition, their relatively low price represents a highly cost-effective alternative to conventional clusters. In life sciences, this results in an excellent opportunity to facilitate the daily work of bioinformaticians that are trying to extract biological meaning out of hundreds of gigabytes of experimental information. NMF-mGPU can be used "out of the box" by researchers with little or no expertise in GPU programming in a variety of platforms, such as PCs, laptops, or high-end GPU clusters. NMF-mGPU is freely available at https://github.com/bioinfo-cnb/bionmf-gpu .
Generic Divide and Conquer Internet-Based Computing
NASA Technical Reports Server (NTRS)
Radenski, Atanas; Follen, Gregory J. (Technical Monitor)
2001-01-01
The rapid growth of internet-based applications and the proliferation of networking technologies have been transforming traditional commercial application areas as well as computer and computational sciences and engineering. This growth stimulates the exploration of new, internet-oriented software technologies that can open new research and application opportunities not only for the commercial world, but also for the scientific and high -performance computing applications community. The general goal of this research project is to contribute to better understanding of the transition to internet-based high -performance computing and to develop solutions for some of the difficulties of this transition. More specifically, our goal is to design an architecture for generic divide and conquer internet-based computing, to develop a portable implementation of this architecture, to create an example library of high-performance divide-and-conquer computing agents that run on top of this architecture, and to evaluate the performance of these agents. We have been designing an architecture that incorporates a master task-pool server and utilizes satellite computational servers that operate on the Internet in a dynamically changing large configuration of lower-end nodes provided by volunteer contributors. Our designed architecture is intended to be complementary to and accessible from computational grids such as Globus, Legion, and Condor. Grids provide remote access to existing high-end computing resources; in contrast, our goal is to utilize idle processor time of lower-end internet nodes. Our project is focused on a generic divide-and-conquer paradigm and its applications that operate on a loose and ever changing pool of lower-end internet nodes.
THE COMPUTER AND THE ARCHITECTURAL PROFESSION.
ERIC Educational Resources Information Center
HAVILAND, DAVID S.
THE ROLE OF ADVANCING TECHNOLOGY IN THE FIELD OF ARCHITECTURE IS DISCUSSED IN THIS REPORT. PROBLEMS IN COMMUNICATION AND THE DESIGN PROCESS ARE IDENTIFIED. ADVANTAGES AND DISADVANTAGES OF COMPUTERS ARE MENTIONED IN RELATION TO MAN AND MACHINE INTERACTION. PRESENT AND FUTURE IMPLICATIONS OF COMPUTER USAGE ARE IDENTIFIED AND DISCUSSED WITH RESPECT…
The Contribution of Visualization to Learning Computer Architecture
ERIC Educational Resources Information Center
Yehezkel, Cecile; Ben-Ari, Mordechai; Dreyfus, Tommy
2007-01-01
This paper describes a visualization environment and associated learning activities designed to improve learning of computer architecture. The environment, EasyCPU, displays a model of the components of a computer and the dynamic processes involved in program execution. We present the results of a research program that analysed the contribution of…
Technology advances and market forces: Their impact on high performance architectures
NASA Technical Reports Server (NTRS)
Best, D. R.
1978-01-01
Reasonable projections into future supercomputer architectures and technology require an analysis of the computer industry market environment, the current capabilities and trends within the component industry, and the research activities on computer architecture in the industrial and academic communities. Management, programmer, architect, and user must cooperate to increase the efficiency of supercomputer development efforts. Care must be taken to match the funding, compiler, architecture and application with greater attention to testability, maintainability, reliability, and usability than supercomputer development programs of the past.
A Case Study on Neural Inspired Dynamic Memory Management Strategies for High Performance Computing.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vineyard, Craig Michael; Verzi, Stephen Joseph
As high performance computing architectures pursue more computational power there is a need for increased memory capacity and bandwidth as well. A multi-level memory (MLM) architecture addresses this need by combining multiple memory types with different characteristics as varying levels of the same architecture. How to efficiently utilize this memory infrastructure is an unknown challenge, and in this research we sought to investigate whether neural inspired approaches can meaningfully help with memory management. In particular we explored neurogenesis inspired re- source allocation, and were able to show a neural inspired mixed controller policy can beneficially impact how MLM architectures utilizemore » memory.« less
Computing architecture for autonomous microgrids
Goldsmith, Steven Y.
2015-09-29
A computing architecture that facilitates autonomously controlling operations of a microgrid is described herein. A microgrid network includes numerous computing devices that execute intelligent agents, each of which is assigned to a particular entity (load, source, storage device, or switch) in the microgrid. The intelligent agents can execute in accordance with predefined protocols to collectively perform computations that facilitate uninterrupted control of the .
Blueprint for a microwave trapped ion quantum computer.
Lekitsch, Bjoern; Weidt, Sebastian; Fowler, Austin G; Mølmer, Klaus; Devitt, Simon J; Wunderlich, Christof; Hensinger, Winfried K
2017-02-01
The availability of a universal quantum computer may have a fundamental impact on a vast number of research fields and on society as a whole. An increasingly large scientific and industrial community is working toward the realization of such a device. An arbitrarily large quantum computer may best be constructed using a modular approach. We present a blueprint for a trapped ion-based scalable quantum computer module, making it possible to create a scalable quantum computer architecture based on long-wavelength radiation quantum gates. The modules control all operations as stand-alone units, are constructed using silicon microfabrication techniques, and are within reach of current technology. To perform the required quantum computations, the modules make use of long-wavelength radiation-based quantum gate technology. To scale this microwave quantum computer architecture to a large size, we present a fully scalable design that makes use of ion transport between different modules, thereby allowing arbitrarily many modules to be connected to construct a large-scale device. A high error-threshold surface error correction code can be implemented in the proposed architecture to execute fault-tolerant operations. With appropriate adjustments, the proposed modules are also suitable for alternative trapped ion quantum computer architectures, such as schemes using photonic interconnects.
NASA Technical Reports Server (NTRS)
Carroll, Chester C.; Youngblood, John N.; Saha, Aindam
1987-01-01
Improvements and advances in the development of computer architecture now provide innovative technology for the recasting of traditional sequential solutions into high-performance, low-cost, parallel system to increase system performance. Research conducted in development of specialized computer architecture for the algorithmic execution of an avionics system, guidance and control problem in real time is described. A comprehensive treatment of both the hardware and software structures of a customized computer which performs real-time computation of guidance commands with updated estimates of target motion and time-to-go is presented. An optimal, real-time allocation algorithm was developed which maps the algorithmic tasks onto the processing elements. This allocation is based on the critical path analysis. The final stage is the design and development of the hardware structures suitable for the efficient execution of the allocated task graph. The processing element is designed for rapid execution of the allocated tasks. Fault tolerance is a key feature of the overall architecture. Parallel numerical integration techniques, tasks definitions, and allocation algorithms are discussed. The parallel implementation is analytically verified and the experimental results are presented. The design of the data-driven computer architecture, customized for the execution of the particular algorithm, is discussed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carroll, C.C.; Youngblood, J.N.; Saha, A.
1987-12-01
Improvements and advances in the development of computer architecture now provide innovative technology for the recasting of traditional sequential solutions into high-performance, low-cost, parallel system to increase system performance. Research conducted in development of specialized computer architecture for the algorithmic execution of an avionics system, guidance and control problem in real time is described. A comprehensive treatment of both the hardware and software structures of a customized computer which performs real-time computation of guidance commands with updated estimates of target motion and time-to-go is presented. An optimal, real-time allocation algorithm was developed which maps the algorithmic tasks onto the processingmore » elements. This allocation is based on the critical path analysis. The final stage is the design and development of the hardware structures suitable for the efficient execution of the allocated task graph. The processing element is designed for rapid execution of the allocated tasks. Fault tolerance is a key feature of the overall architecture. Parallel numerical integration techniques, tasks definitions, and allocation algorithms are discussed. The parallel implementation is analytically verified and the experimental results are presented. The design of the data-driven computer architecture, customized for the execution of the particular algorithm, is discussed.« less
SU (2) lattice gauge theory simulations on Fermi GPUs
NASA Astrophysics Data System (ADS)
Cardoso, Nuno; Bicudo, Pedro
2011-05-01
In this work we explore the performance of CUDA in quenched lattice SU (2) simulations. CUDA, NVIDIA Compute Unified Device Architecture, is a hardware and software architecture developed by NVIDIA for computing on the GPU. We present an analysis and performance comparison between the GPU and CPU in single and double precision. Analyses with multiple GPUs and two different architectures (G200 and Fermi architectures) are also presented. In order to obtain a high performance, the code must be optimized for the GPU architecture, i.e., an implementation that exploits the memory hierarchy of the CUDA programming model. We produce codes for the Monte Carlo generation of SU (2) lattice gauge configurations, for the mean plaquette, for the Polyakov Loop at finite T and for the Wilson loop. We also present results for the potential using many configurations (50,000) without smearing and almost 2000 configurations with APE smearing. With two Fermi GPUs we have achieved an excellent performance of 200× the speed over one CPU, in single precision, around 110 Gflops/s. We also find that, using the Fermi architecture, double precision computations for the static quark-antiquark potential are not much slower (less than 2× slower) than single precision computations.
NASA Astrophysics Data System (ADS)
Meng, Hui; Hui, Hui; Hu, Chaoen; Yang, Xin; Tian, Jie
2017-03-01
The ability of fast and single-neuron resolution imaging of neural activities enables light-sheet fluorescence microscopy (LSFM) as a powerful imaging technique in functional neural connection applications. The state-of-art LSFM imaging system can record the neuronal activities of entire brain for small animal, such as zebrafish or C. elegans at single-neuron resolution. However, the stimulated and spontaneous movements in animal brain result in inconsistent neuron positions during recording process. It is time consuming to register the acquired large-scale images with conventional method. In this work, we address the problem of fast registration of neural positions in stacks of LSFM images. This is necessary to register brain structures and activities. To achieve fast registration of neural activities, we present a rigid registration architecture by implementation of Graphics Processing Unit (GPU). In this approach, the image stacks were preprocessed on GPU by mean stretching to reduce the computation effort. The present image was registered to the previous image stack that considered as reference. A fast Fourier transform (FFT) algorithm was used for calculating the shift of the image stack. The calculations for image registration were performed in different threads while the preparation functionality was refactored and called only once by the master thread. We implemented our registration algorithm on NVIDIA Quadro K4200 GPU under Compute Unified Device Architecture (CUDA) programming environment. The experimental results showed that the registration computation can speed-up to 550ms for a full high-resolution brain image. Our approach also has potential to be used for other dynamic image registrations in biomedical applications.
Hybrid VLSI/QCA Architecture for Computing FFTs
NASA Technical Reports Server (NTRS)
Fijany, Amir; Toomarian, Nikzad; Modarres, Katayoon; Spotnitz, Matthew
2003-01-01
A data-processor architecture that would incorporate elements of both conventional very-large-scale integrated (VLSI) circuitry and quantum-dot cellular automata (QCA) has been proposed to enable the highly parallel and systolic computation of fast Fourier transforms (FFTs). The proposed circuit would complement the QCA-based circuits described in several prior NASA Tech Briefs articles, namely Implementing Permutation Matrices by Use of Quantum Dots (NPO-20801), Vol. 25, No. 10 (October 2001), page 42; Compact Interconnection Networks Based on Quantum Dots (NPO-20855) Vol. 27, No. 1 (January 2003), page 32; and Bit-Serial Adder Based on Quantum Dots (NPO-20869), Vol. 27, No. 1 (January 2003), page 35. The cited prior articles described the limitations of very-large-scale integrated (VLSI) circuitry and the major potential advantage afforded by QCA. To recapitulate: In a VLSI circuit, signal paths that are required not to interact with each other must not cross in the same plane. In contrast, for reasons too complex to describe in the limited space available for this article, suitably designed and operated QCAbased signal paths that are required not to interact with each other can nevertheless be allowed to cross each other in the same plane without adverse effect. In principle, this characteristic could be exploited to design compact, coplanar, simple (relative to VLSI) QCA-based networks to implement complex, advanced interconnection schemes.
Developing an Intelligent Computer-Aided Trainer
NASA Technical Reports Server (NTRS)
Hua, Grace
1990-01-01
The Payload-assist module Deploys/Intelligent Computer-Aided Training (PD/ICAT) system was developed as a prototype for intelligent tutoring systems with the intention of seeing PD/ICAT evolve and produce a general ICAT architecture and development environment that can be adapted by a wide variety of training tasks. The proposed architecture is composed of a user interface, a domain expert, a training session manager, a trainee model and a training scenario generator. The PD/ICAT prototype was developed in the LISP environment. Although it has been well received by its peers and users, it could not be delivered toe its end users for practical use because of specific hardware and software constraints. To facilitate delivery of PD/ICAT to its users and to prepare for a more widely accepted development and delivery environment for future ICAT applications, we have ported this training system to a UNIX workstation and adopted use of a conventional language, C, and a C-based rule-based language, CLIPS. A rapid conversion of the PD/ICAT expert system to CLIPS was possible because the knowledge was basically represented as a forward chaining rule base. The resulting CLIPS rule base has been tested successfully in other ICATs as well. Therefore, the porting effort has proven to be a positive step toward our ultimate goal of building a general purpose ICAT development environment.
Multiprocessor architecture: Synthesis and evaluation
NASA Technical Reports Server (NTRS)
Standley, Hilda M.
1990-01-01
Multiprocessor computed architecture evaluation for structural computations is the focus of the research effort described. Results obtained are expected to lead to more efficient use of existing architectures and to suggest designs for new, application specific, architectures. The brief descriptions given outline a number of related efforts directed toward this purpose. The difficulty is analyzing an existing architecture or in designing a new computer architecture lies in the fact that the performance of a particular architecture, within the context of a given application, is determined by a number of factors. These include, but are not limited to, the efficiency of the computation algorithm, the programming language and support environment, the quality of the program written in the programming language, the multiplicity of the processing elements, the characteristics of the individual processing elements, the interconnection network connecting processors and non-local memories, and the shared memory organization covering the spectrum from no shared memory (all local memory) to one global access memory. These performance determiners may be loosely classified as being software or hardware related. This distinction is not clear or even appropriate in many cases. The effect of the choice of algorithm is ignored by assuming that the algorithm is specified as given. Effort directed toward the removal of the effect of the programming language and program resulted in the design of a high-level parallel programming language. Two characteristics of the fundamental structure of the architecture (memory organization and interconnection network) are examined.
Kim, Jihun; Kim, Jonghong; Jang, Gil-Jin; Lee, Minho
2017-03-01
Deep learning has received significant attention recently as a promising solution to many problems in the area of artificial intelligence. Among several deep learning architectures, convolutional neural networks (CNNs) demonstrate superior performance when compared to other machine learning methods in the applications of object detection and recognition. We use a CNN for image enhancement and the detection of driving lanes on motorways. In general, the process of lane detection consists of edge extraction and line detection. A CNN can be used to enhance the input images before lane detection by excluding noise and obstacles that are irrelevant to the edge detection result. However, training conventional CNNs requires considerable computation and a big dataset. Therefore, we suggest a new learning algorithm for CNNs using an extreme learning machine (ELM). The ELM is a fast learning method used to calculate network weights between output and hidden layers in a single iteration and thus, can dramatically reduce learning time while producing accurate results with minimal training data. A conventional ELM can be applied to networks with a single hidden layer; as such, we propose a stacked ELM architecture in the CNN framework. Further, we modify the backpropagation algorithm to find the targets of hidden layers and effectively learn network weights while maintaining performance. Experimental results confirm that the proposed method is effective in reducing learning time and improving performance. Copyright © 2016 Elsevier Ltd. All rights reserved.
NASA Technical Reports Server (NTRS)
Boriakoff, Valentin
1994-01-01
The goal of this project was the feasibility study of a particular architecture of a digital signal processing machine operating in real time which could do in a pipeline fashion the computation of the fast Fourier transform (FFT) of a time-domain sampled complex digital data stream. The particular architecture makes use of simple identical processors (called inner product processors) in a linear organization called a systolic array. Through computer simulation the new architecture to compute the FFT with systolic arrays was proved to be viable, and computed the FFT correctly and with the predicted particulars of operation. Integrated circuits to compute the operations expected of the vital node of the systolic architecture were proven feasible, and even with a 2 micron VLSI technology can execute the required operations in the required time. Actual construction of the integrated circuits was successful in one variant (fixed point) and unsuccessful in the other (floating point).
An S N Algorithm for Modern Architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Baker, Randal Scott
2016-08-29
LANL discrete ordinates transport packages are required to perform large, computationally intensive time-dependent calculations on massively parallel architectures, where even a single such calculation may need many months to complete. While KBA methods scale out well to very large numbers of compute nodes, we are limited by practical constraints on the number of such nodes we can actually apply to any given calculation. Instead, we describe a modified KBA algorithm that allows realization of the reductions in solution time offered by both the current, and future, architectural changes within a compute node.
Hierarchial parallel computer architecture defined by computational multidisciplinary mechanics
NASA Technical Reports Server (NTRS)
Padovan, Joe; Gute, Doug; Johnson, Keith
1989-01-01
The goal is to develop an architecture for parallel processors enabling optimal handling of multi-disciplinary computation of fluid-solid simulations employing finite element and difference schemes. The goals, philosphical and modeling directions, static and dynamic poly trees, example problems, interpolative reduction, the impact on solvers are shown in viewgraph form.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Uhr, L.
1987-01-01
This book is written by research scientists involved in the development of massively parallel, but hierarchically structured, algorithms, architectures, and programs for image processing, pattern recognition, and computer vision. The book gives an integrated picture of the programs and algorithms that are being developed, and also of the multi-computer hardware architectures for which these systems are designed.
Computer Architecture's Changing Role in Rebooting Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
DeBenedictis, Erik P.
In this paper, Windows 95 started the Wintel era, in which Microsoft Windows running on Intel x86 microprocessors dominated the computer industry and changed the world. Retaining the x86 instruction set across many generations let users buy new and more capable microprocessors without having to buy software to work with new architectures.
Computer Architecture's Changing Role in Rebooting Computing
DeBenedictis, Erik P.
2017-04-26
In this paper, Windows 95 started the Wintel era, in which Microsoft Windows running on Intel x86 microprocessors dominated the computer industry and changed the world. Retaining the x86 instruction set across many generations let users buy new and more capable microprocessors without having to buy software to work with new architectures.
Design of a fault tolerant airborne digital computer. Volume 1: Architecture
NASA Technical Reports Server (NTRS)
Wensley, J. H.; Levitt, K. N.; Green, M. W.; Goldberg, J.; Neumann, P. G.
1973-01-01
This volume is concerned with the architecture of a fault tolerant digital computer for an advanced commercial aircraft. All of the computations of the aircraft, including those presently carried out by analogue techniques, are to be carried out in this digital computer. Among the important qualities of the computer are the following: (1) The capacity is to be matched to the aircraft environment. (2) The reliability is to be selectively matched to the criticality and deadline requirements of each of the computations. (3) The system is to be readily expandable. contractible, and (4) The design is to appropriate to post 1975 technology. Three candidate architectures are discussed and assessed in terms of the above qualities. Of the three candidates, a newly conceived architecture, Software Implemented Fault Tolerance (SIFT), provides the best match to the above qualities. In addition SIFT is particularly simple and believable. The other candidates, Bus Checker System (BUCS), also newly conceived in this project, and the Hopkins multiprocessor are potentially more efficient than SIFT in the use of redundancy, but otherwise are not as attractive.
Using a software-defined computer in teaching the basics of computer architecture and operation
NASA Astrophysics Data System (ADS)
Kosowska, Julia; Mazur, Grzegorz
2017-08-01
The paper describes the concept and implementation of SDC_One software-defined computer designed for experimental and didactic purposes. Equipped with extensive hardware monitoring mechanisms, the device enables the students to monitor the computer's operation on bus transfer cycle or instruction cycle basis, providing the practical illustration of basic aspects of computer's operation. In the paper, we describe the hardware monitoring capabilities of SDC_One and some scenarios of using it in teaching the basics of computer architecture and microprocessor operation.
A Serial Bus Architecture for Parallel Processing Systems
1986-09-01
pins are needed to effect the data transfer. As Integrated Circuits grow in computational power, more communication capacity is needed, pushing...chip. The wider the communication path the more pins are needed to effect the data transfer. As Integrated Circuits grow in computational power, more...13 2. A Suitable Architecture Sought 14 II. OPTIMUM ARCHITECTURE OF LARGE INTEGRATED A. PARTIONING SILICON FOR MAXIMUM 1? 1. Transistor
Partitioning in Avionics Architectures: Requirements, Mechanisms, and Assurance
NASA Technical Reports Server (NTRS)
Rushby, John
1999-01-01
Automated aircraft control has traditionally been divided into distinct "functions" that are implemented separately (e.g., autopilot, autothrottle, flight management); each function has its own fault-tolerant computer system, and dependencies among different functions are generally limited to the exchange of sensor and control data. A by-product of this "federated" architecture is that faults are strongly contained within the computer system of the function where they occur and cannot readily propagate to affect the operation of other functions. More modern avionics architectures contemplate supporting multiple functions on a single, shared, fault-tolerant computer system where natural fault containment boundaries are less sharply defined. Partitioning uses appropriate hardware and software mechanisms to restore strong fault containment to such integrated architectures. This report examines the requirements for partitioning, mechanisms for their realization, and issues in providing assurance for partitioning. Because partitioning shares some concerns with computer security, security models are reviewed and compared with the concerns of partitioning.
NASA Technical Reports Server (NTRS)
Farhat, Nabil H.
1987-01-01
Self-organization and learning is a distinctive feature of neural nets and processors that sets them apart from conventional approaches to signal processing. It leads to self-programmability which alleviates the problem of programming complexity in artificial neural nets. In this paper architectures for partitioning an optoelectronic analog of a neural net into distinct layers with prescribed interconnectivity pattern to enable stochastic learning by simulated annealing in the context of a Boltzmann machine are presented. Stochastic learning is of interest because of its relevance to the role of noise in biological neural nets. Practical considerations and methodologies for appreciably accelerating stochastic learning in such a multilayered net are described. These include the use of parallel optical computing of the global energy of the net, the use of fast nonvolatile programmable spatial light modulators to realize fast plasticity, optical generation of random number arrays, and an adaptive noisy thresholding scheme that also makes stochastic learning more biologically plausible. The findings reported predict optoelectronic chips that can be used in the realization of optical learning machines.
Li, Xuanxuan; Spence, John C. H.; Hogue, Brenda G.; ...
2017-09-22
X-ray free-electron lasers (XFELs) provide new opportunities for structure determination of biomolecules, viruses and nanomaterials. With unprecedented peak brilliance and ultra-short pulse duration, XFELs can tolerate higher X-ray doses by exploiting the femtosecond-scale exposure time, and can thus go beyond the resolution limits achieved with conventional X-ray diffraction imaging techniques. Using XFELs, it is possible to collect scattering information from single particles at high resolution, however particle heterogeneity and unknown orientations complicate data merging in three-dimensional space. Using the Linac Coherent Light Source (LCLS), synthetic inorganic nanocrystals with a core–shell architecture were used as a model system for proof-of-principle coherentmore » diffractive single-particle imaging experiments. To deal with the heterogeneity of the core–shell particles, new computational methods have been developed to extract the particle size and orientation from the scattering data to assist data merging. The size distribution agrees with that obtained by electron microscopy and the merged data support a model with a core–shell architecture.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Xuanxuan; Spence, John C. H.; Hogue, Brenda G.
X-ray free-electron lasers (XFELs) provide new opportunities for structure determination of biomolecules, viruses and nanomaterials. With unprecedented peak brilliance and ultra-short pulse duration, XFELs can tolerate higher X-ray doses by exploiting the femtosecond-scale exposure time, and can thus go beyond the resolution limits achieved with conventional X-ray diffraction imaging techniques. Using XFELs, it is possible to collect scattering information from single particles at high resolution, however particle heterogeneity and unknown orientations complicate data merging in three-dimensional space. Using the Linac Coherent Light Source (LCLS), synthetic inorganic nanocrystals with a core–shell architecture were used as a model system for proof-of-principle coherentmore » diffractive single-particle imaging experiments. To deal with the heterogeneity of the core–shell particles, new computational methods have been developed to extract the particle size and orientation from the scattering data to assist data merging. The size distribution agrees with that obtained by electron microscopy and the merged data support a model with a core–shell architecture.« less
Fault-Tolerant Software-Defined Radio on Manycore
NASA Technical Reports Server (NTRS)
Ricketts, Scott
2015-01-01
Software-defined radio (SDR) platforms generally rely on field-programmable gate arrays (FPGAs) and digital signal processors (DSPs), but such architectures require significant software development. In addition, application demands for radiation mitigation and fault tolerance exacerbate programming challenges. MaXentric Technologies, LLC, has developed a manycore-based SDR technology that provides 100 times the throughput of conventional radiationhardened general purpose processors. Manycore systems (30-100 cores and beyond) have the potential to provide high processing performance at error rates that are equivalent to current space-deployed uniprocessor systems. MaXentric's innovation is a highly flexible radio, providing over-the-air reconfiguration; adaptability; and uninterrupted, real-time, multimode operation. The technology is also compliant with NASA's Space Telecommunications Radio System (STRS) architecture. In addition to its many uses within NASA communications, the SDR can also serve as a highly programmable research-stage prototyping device for new waveforms and other communications technologies. It can also support noncommunication codes on its multicore processor, collocated with the communications workload-reducing the size, weight, and power of the overall system by aggregating processing jobs to a single board computer.
Ghafoorian, Mohsen; Karssemeijer, Nico; Heskes, Tom; van Uden, Inge W M; Sanchez, Clara I; Litjens, Geert; de Leeuw, Frank-Erik; van Ginneken, Bram; Marchiori, Elena; Platel, Bram
2017-07-11
The anatomical location of imaging features is of crucial importance for accurate diagnosis in many medical tasks. Convolutional neural networks (CNN) have had huge successes in computer vision, but they lack the natural ability to incorporate the anatomical location in their decision making process, hindering success in some medical image analysis tasks. In this paper, to integrate the anatomical location information into the network, we propose several deep CNN architectures that consider multi-scale patches or take explicit location features while training. We apply and compare the proposed architectures for segmentation of white matter hyperintensities in brain MR images on a large dataset. As a result, we observe that the CNNs that incorporate location information substantially outperform a conventional segmentation method with handcrafted features as well as CNNs that do not integrate location information. On a test set of 50 scans, the best configuration of our networks obtained a Dice score of 0.792, compared to 0.805 for an independent human observer. Performance levels of the machine and the independent human observer were not statistically significantly different (p-value = 0.06).
Modeling driver behavior in a cognitive architecture.
Salvucci, Dario D
2006-01-01
This paper explores the development of a rigorous computational model of driver behavior in a cognitive architecture--a computational framework with underlying psychological theories that incorporate basic properties and limitations of the human system. Computational modeling has emerged as a powerful tool for studying the complex task of driving, allowing researchers to simulate driver behavior and explore the parameters and constraints of this behavior. An integrated driver model developed in the ACT-R (Adaptive Control of Thought-Rational) cognitive architecture is described that focuses on the component processes of control, monitoring, and decision making in a multilane highway environment. This model accounts for the steering profiles, lateral position profiles, and gaze distributions of human drivers during lane keeping, curve negotiation, and lane changing. The model demonstrates how cognitive architectures facilitate understanding of driver behavior in the context of general human abilities and constraints and how the driving domain benefits cognitive architectures by pushing model development toward more complex, realistic tasks. The model can also serve as a core computational engine for practical applications that predict and recognize driver behavior and distraction.
Image-based metrology of porous tissue engineering scaffolds
NASA Astrophysics Data System (ADS)
Rajagopalan, Srinivasan; Robb, Richard A.
2006-03-01
Tissue engineering is an interdisciplinary effort aimed at the repair and regeneration of biological tissues through the application and control of cells, porous scaffolds and growth factors. The regeneration of specific tissues guided by tissue analogous substrates is dependent on diverse scaffold architectural indices that can be derived quantitatively from the microCT and microMR images of the scaffolds. However, the randomness of pore-solid distributions in conventional stochastic scaffolds presents unique computational challenges. As a result, image-based characterization of scaffolds has been predominantly qualitative. In this paper, we discuss quantitative image-based techniques that can be used to compute the metrological indices of porous tissue engineering scaffolds. While bulk averaged quantities such as porosity and surface are derived directly from the optimal pore-solid delineations, the spatially distributed geometric indices are derived from the medial axis representations of the pore network. The computational framework proposed (to the best of our knowledge for the first time in tissue engineering) in this paper might have profound implications towards unraveling the symbiotic structure-function relationship of porous tissue engineering scaffolds.
Advanced cloud fault tolerance system
NASA Astrophysics Data System (ADS)
Sumangali, K.; Benny, Niketa
2017-11-01
Cloud computing has become a prevalent on-demand service on the internet to store, manage and process data. A pitfall that accompanies cloud computing is the failures that can be encountered in the cloud. To overcome these failures, we require a fault tolerance mechanism to abstract faults from users. We have proposed a fault tolerant architecture, which is a combination of proactive and reactive fault tolerance. This architecture essentially increases the reliability and the availability of the cloud. In the future, we would like to compare evaluations of our proposed architecture with existing architectures and further improve it.
Heavy Lift Vehicle (HLV) Avionics Flight Computing Architecture Study
NASA Technical Reports Server (NTRS)
Hodson, Robert F.; Chen, Yuan; Morgan, Dwayne R.; Butler, A. Marc; Sdhuh, Joseph M.; Petelle, Jennifer K.; Gwaltney, David A.; Coe, Lisa D.; Koelbl, Terry G.; Nguyen, Hai D.
2011-01-01
A NASA multi-Center study team was assembled from LaRC, MSFC, KSC, JSC and WFF to examine potential flight computing architectures for a Heavy Lift Vehicle (HLV) to better understand avionics drivers. The study examined Design Reference Missions (DRMs) and vehicle requirements that could impact the vehicles avionics. The study considered multiple self-checking and voting architectural variants and examined reliability, fault-tolerance, mass, power, and redundancy management impacts. Furthermore, a goal of the study was to develop the skills and tools needed to rapidly assess additional architectures should requirements or assumptions change.
Extending Moore's Law via Computationally Error Tolerant Computing.
Deng, Bobin; Srikanth, Sriseshan; Hein, Eric R.; ...
2018-03-01
Dennard scaling has ended. Lowering the voltage supply (V dd) to sub-volt levels causes intermittent losses in signal integrity, rendering further scaling (down) no longer acceptable as a means to lower the power required by a processor core. However, it is possible to correct the occasional errors caused due to lower V dd in an efficient manner and effectively lower power. By deploying the right amount and kind of redundancy, we can strike a balance between overhead incurred in achieving reliability and energy savings realized by permitting lower V dd. One promising approach is the Redundant Residue Number System (RRNS)more » representation. Unlike other error correcting codes, RRNS has the important property of being closed under addition, subtraction and multiplication, thus enabling computational error correction at a fraction of an overhead compared to conventional approaches. We use the RRNS scheme to design a Computationally-Redundant, Energy-Efficient core, including the microarchitecture, Instruction Set Architecture (ISA) and RRNS centered algorithms. Finally, from the simulation results, this RRNS system can reduce the energy-delay-product by about 3× for multiplication intensive workloads and by about 2× in general, when compared to a non-error-correcting binary core.« less
Extending Moore's Law via Computationally Error Tolerant Computing.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deng, Bobin; Srikanth, Sriseshan; Hein, Eric R.
Dennard scaling has ended. Lowering the voltage supply (V dd) to sub-volt levels causes intermittent losses in signal integrity, rendering further scaling (down) no longer acceptable as a means to lower the power required by a processor core. However, it is possible to correct the occasional errors caused due to lower V dd in an efficient manner and effectively lower power. By deploying the right amount and kind of redundancy, we can strike a balance between overhead incurred in achieving reliability and energy savings realized by permitting lower V dd. One promising approach is the Redundant Residue Number System (RRNS)more » representation. Unlike other error correcting codes, RRNS has the important property of being closed under addition, subtraction and multiplication, thus enabling computational error correction at a fraction of an overhead compared to conventional approaches. We use the RRNS scheme to design a Computationally-Redundant, Energy-Efficient core, including the microarchitecture, Instruction Set Architecture (ISA) and RRNS centered algorithms. Finally, from the simulation results, this RRNS system can reduce the energy-delay-product by about 3× for multiplication intensive workloads and by about 2× in general, when compared to a non-error-correcting binary core.« less
NASA Astrophysics Data System (ADS)
Cunningham, Sally Jo
The current crop of digital libraries for the computing community are strongly grounded in the conventional library paradigm: they provide indexes to support searching of collections of research papers. As such, these digital libraries are relatively impoverished; the present computing digital libraries omit many of the documents and resources that are currently available to computing researchers, and offer few browsing structures. These computing digital libraries were built 'top down': the resources and collection contents are forced to fit an existing digital library architecture. A 'bottom up' approach to digital library development would begin with an investigation of a community's information needs and available documents, and then design a library to organize those documents in such a way as to fulfill the community's needs. The 'home grown', informal information resources developed by and for the machine learning community are examined as a case study, to determine the types of information and document organizations 'native' to this group of researchers. The insights gained in this type of case study can be used to inform construction of a digital library tailored to this community.
MapReduce Based Parallel Bayesian Network for Manufacturing Quality Control
NASA Astrophysics Data System (ADS)
Zheng, Mao-Kuan; Ming, Xin-Guo; Zhang, Xian-Yu; Li, Guo-Ming
2017-09-01
Increasing complexity of industrial products and manufacturing processes have challenged conventional statistics based quality management approaches in the circumstances of dynamic production. A Bayesian network and big data analytics integrated approach for manufacturing process quality analysis and control is proposed. Based on Hadoop distributed architecture and MapReduce parallel computing model, big volume and variety quality related data generated during the manufacturing process could be dealt with. Artificial intelligent algorithms, including Bayesian network learning, classification and reasoning, are embedded into the Reduce process. Relying on the ability of the Bayesian network in dealing with dynamic and uncertain problem and the parallel computing power of MapReduce, Bayesian network of impact factors on quality are built based on prior probability distribution and modified with posterior probability distribution. A case study on hull segment manufacturing precision management for ship and offshore platform building shows that computing speed accelerates almost directly proportionally to the increase of computing nodes. It is also proved that the proposed model is feasible for locating and reasoning of root causes, forecasting of manufacturing outcome, and intelligent decision for precision problem solving. The integration of bigdata analytics and BN method offers a whole new perspective in manufacturing quality control.
Linear and passive silicon diodes, isolators, and logic gates
NASA Astrophysics Data System (ADS)
Li, Zhi-Yuan
2013-12-01
Silicon photonic integrated devices and circuits have offered a promising means to revolutionalize information processing and computing technologies. One important reason is that these devices are compatible with conventional complementary metal oxide semiconductor (CMOS) processing technology that overwhelms current microelectronics industry. Yet, the dream to build optical computers has yet to come without the breakthrough of several key elements including optical diodes, isolators, and logic gates with low power, high signal contrast, and large bandwidth. Photonic crystal has a great power to mold the flow of light in micrometer/nanometer scale and is a promising platform for optical integration. In this paper we present our recent efforts of design, fabrication, and characterization of ultracompact, linear, passive on-chip optical diodes, isolators and logic gates based on silicon two-dimensional photonic crystal slabs. Both simulation and experiment results show high performance of these novel designed devices. These linear and passive silicon devices have the unique properties of small fingerprint, low power request, large bandwidth, fast response speed, easy for fabrication, and being compatible with COMS technology. Further improving their performance would open up a road towards photonic logics and optical computing and help to construct nanophotonic on-chip processor architectures for future optical computers.
NASA Astrophysics Data System (ADS)
Yang, J. L.; Sullivan, P.; Schumann, S.; Hancox, I.; Jones, T. S.
2012-01-01
We demonstrate organic discrete heterojunction photovoltaic cells based on fullerene (C60) and copper hexadecafluorophthalocyanine (F16CuPc), in which the C60 and F16CuPc act as the electron donor and the electron acceptor, respectively. The C60/F16CuPc cells fabricated with conventional and inverted architectures both exhibit comparable power conversion efficiencies. Furthermore, we show that the photocurrent in both cells is generated by a conventional exciton dissociation mechanism rather than the exciton recombination mechanism recently proposed for a similar C60/F16ZnPc system [Song et al., J. Am. Chem. Soc. 132, 4554 (2010)]. These results demonstrate that new unconventional material systems are a potential way to fabricate organic photovoltaic cells with inverted as well as conventional architectures.
REUSABLE PROPULSION ARCHITECTURE FOR SUSTAINABLE LOW-COST ACCESS TO SPACE
NASA Technical Reports Server (NTRS)
Bonometti, Joseph; Frame, Kyle L.; Dankanich, John W.
2005-01-01
Two transportation architecture changes are presented at either end of a conventional two-stage rocket flight: 1) Air launch using a large, conventional, pod hauler design (i.e., Crossbow)ans 2) Momentum exchange tether (i.e., an in-space asset like MXER). Air launch has ana analytically justified cost reduction of approx. 10%, but its intangible benefits suggest real-world operations cost reductions much higher: 1) Inherent launch safety; 2) Mission Risk Reduction; 3) Favorable payload/rocket limitations; and 4) Leveraging the aircraft for other uses (military transport, commercial cargo, public outreach activities, etc.)
NASA Astrophysics Data System (ADS)
Wang, Pin; Bista, Rajan K.; Khalbuss, Walid E.; Qiu, Wei; Staton, Kevin D.; Zhang, Lin; Brentnall, Teresa A.; Brand, Randall E.; Liu, Yang
2011-03-01
Alterations in nuclear architecture are the hallmark diagnostic characteristic of cancer cells. In this work, we show that the nuclear architectural characteristics quantified by spatial-domain low-coherence quantitative phase microscopy (SL-QPM), is more sensitive for the identification of cancer cells than conventional cytopathology. We demonstrated the importance of nuclear architectural characteristics in both an animal model of intestinal carcinogenesis - APC/Min mouse model and human cytology specimens with colorectal cancer by identifying cancer from cytologically noncancerous appearing cells. The determination of nanoscale nuclear architecture using this simple and practical optical instrument is a significant advance towards cancer diagnosis.
Architecutres, Models, Algorithms, and Software Tools for Configurable Computing
2000-03-06
and J.G. Nash. The gated interconnection network for dynamic programming. Plenum, 1988 . [18] Ju wook Jang, Heonchul Park, and Viktor K. Prasanna. A ...Sep. 1997. [2] C. Ebeling, D. C. Cronquist , P. Franklin and C. Fisher, "RaPiD - A configurable computing architecture for compute-intensive...ABSTRACT (Maximum 200 words) The Models, Algorithms, and Architectures for Reconfigurable Computing (MAARC) project developed a sound framework for
Blueprint for a microwave trapped ion quantum computer
Lekitsch, Bjoern; Weidt, Sebastian; Fowler, Austin G.; Mølmer, Klaus; Devitt, Simon J.; Wunderlich, Christof; Hensinger, Winfried K.
2017-01-01
The availability of a universal quantum computer may have a fundamental impact on a vast number of research fields and on society as a whole. An increasingly large scientific and industrial community is working toward the realization of such a device. An arbitrarily large quantum computer may best be constructed using a modular approach. We present a blueprint for a trapped ion–based scalable quantum computer module, making it possible to create a scalable quantum computer architecture based on long-wavelength radiation quantum gates. The modules control all operations as stand-alone units, are constructed using silicon microfabrication techniques, and are within reach of current technology. To perform the required quantum computations, the modules make use of long-wavelength radiation–based quantum gate technology. To scale this microwave quantum computer architecture to a large size, we present a fully scalable design that makes use of ion transport between different modules, thereby allowing arbitrarily many modules to be connected to construct a large-scale device. A high error–threshold surface error correction code can be implemented in the proposed architecture to execute fault-tolerant operations. With appropriate adjustments, the proposed modules are also suitable for alternative trapped ion quantum computer architectures, such as schemes using photonic interconnects. PMID:28164154
Analysis of Introducing Active Learning Methodologies in a Basic Computer Architecture Course
ERIC Educational Resources Information Center
Arbelaitz, Olatz; José I. Martín; Muguerza, Javier
2015-01-01
This paper presents an analysis of introducing active methodologies in the Computer Architecture course taught in the second year of the Computer Engineering Bachelor's degree program at the University of the Basque Country (UPV/EHU), Spain. The paper reports the experience from three academic years, 2011-2012, 2012-2013, and 2013-2014, in which…
ERIC Educational Resources Information Center
Nikolic, B.; Radivojevic, Z.; Djordjevic, J.; Milutinovic, V.
2009-01-01
Courses in Computer Architecture and Organization are regularly included in Computer Engineering curricula. These courses are usually organized in such a way that students obtain not only a purely theoretical experience, but also a practical understanding of the topics lectured. This practical work is usually done in a laboratory using simulators…
A Project-Based Learning Approach to Programmable Logic Design and Computer Architecture
ERIC Educational Resources Information Center
Kellett, C. M.
2012-01-01
This paper describes a course in programmable logic design and computer architecture as it is taught at the University of Newcastle, Australia. The course is designed around a major design project and has two supplemental assessment tasks that are also described. The context of the Computer Engineering degree program within which the course is…
ERIC Educational Resources Information Center
Stanley, Timothy D.; Wong, Lap Kei; Prigmore, Daniel; Benson, Justin; Fishler, Nathan; Fife, Leslie; Colton, Don
2007-01-01
Students learn better when they both hear and do. In computer architecture courses "doing" can be difficult in small schools without hardware laboratories hosted by computer engineering, electrical engineering, or similar departments. Software solutions exist. Our success with George Mills' Multimedia Logic (MML) is the focus of this paper. MML…
The Role of Sketch in Architecture Design
NASA Astrophysics Data System (ADS)
Li, Yanjin; Ning, Wen
2017-06-01
With the continuous development of computer technology, we rely more and more on the computer and pay more and more attention to the final design results, so that we ignore the importance of the sketch. However, the sketch is the most basic and effective way of architecture design. Based on the study of the sketch of Tjibao Cultural Center of sketch, the paper explores the role of sketch in architecture design .
SU (2) lattice gauge theory simulations on Fermi GPUs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cardoso, Nuno, E-mail: nunocardoso@cftp.ist.utl.p; Bicudo, Pedro, E-mail: bicudo@ist.utl.p
2011-05-10
In this work we explore the performance of CUDA in quenched lattice SU (2) simulations. CUDA, NVIDIA Compute Unified Device Architecture, is a hardware and software architecture developed by NVIDIA for computing on the GPU. We present an analysis and performance comparison between the GPU and CPU in single and double precision. Analyses with multiple GPUs and two different architectures (G200 and Fermi architectures) are also presented. In order to obtain a high performance, the code must be optimized for the GPU architecture, i.e., an implementation that exploits the memory hierarchy of the CUDA programming model. We produce codes formore » the Monte Carlo generation of SU (2) lattice gauge configurations, for the mean plaquette, for the Polyakov Loop at finite T and for the Wilson loop. We also present results for the potential using many configurations (50,000) without smearing and almost 2000 configurations with APE smearing. With two Fermi GPUs we have achieved an excellent performance of 200x the speed over one CPU, in single precision, around 110 Gflops/s. We also find that, using the Fermi architecture, double precision computations for the static quark-antiquark potential are not much slower (less than 2x slower) than single precision computations.« less
The semantics of Chemical Markup Language (CML): dictionaries and conventions.
Murray-Rust, Peter; Townsend, Joe A; Adams, Sam E; Phadungsukanan, Weerapong; Thomas, Jens
2011-10-14
The semantic architecture of CML consists of conventions, dictionaries and units. The conventions conform to a top-level specification and each convention can constrain compliant documents through machine-processing (validation). Dictionaries conform to a dictionary specification which also imposes machine validation on the dictionaries. Each dictionary can also be used to validate data in a CML document, and provide human-readable descriptions. An additional set of conventions and dictionaries are used to support scientific units. All conventions, dictionaries and dictionary elements are identifiable and addressable through unique URIs.
The semantics of Chemical Markup Language (CML): dictionaries and conventions
2011-01-01
The semantic architecture of CML consists of conventions, dictionaries and units. The conventions conform to a top-level specification and each convention can constrain compliant documents through machine-processing (validation). Dictionaries conform to a dictionary specification which also imposes machine validation on the dictionaries. Each dictionary can also be used to validate data in a CML document, and provide human-readable descriptions. An additional set of conventions and dictionaries are used to support scientific units. All conventions, dictionaries and dictionary elements are identifiable and addressable through unique URIs. PMID:21999509
Layered Architectures for Quantum Computers and Quantum Repeaters
NASA Astrophysics Data System (ADS)
Jones, Nathan C.
This chapter examines how to organize quantum computers and repeaters using a systematic framework known as layered architecture, where machine control is organized in layers associated with specialized tasks. The framework is flexible and could be used for analysis and comparison of quantum information systems. To demonstrate the design principles in practice, we develop architectures for quantum computers and quantum repeaters based on optically controlled quantum dots, showing how a myriad of technologies must operate synchronously to achieve fault-tolerance. Optical control makes information processing in this system very fast, scalable to large problem sizes, and extendable to quantum communication.
Neural simulations on multi-core architectures.
Eichner, Hubert; Klug, Tobias; Borst, Alexander
2009-01-01
Neuroscience is witnessing increasing knowledge about the anatomy and electrophysiological properties of neurons and their connectivity, leading to an ever increasing computational complexity of neural simulations. At the same time, a rather radical change in personal computer technology emerges with the establishment of multi-cores: high-density, explicitly parallel processor architectures for both high performance as well as standard desktop computers. This work introduces strategies for the parallelization of biophysically realistic neural simulations based on the compartmental modeling technique and results of such an implementation, with a strong focus on multi-core architectures and automation, i.e. user-transparent load balancing.
Neural Simulations on Multi-Core Architectures
Eichner, Hubert; Klug, Tobias; Borst, Alexander
2009-01-01
Neuroscience is witnessing increasing knowledge about the anatomy and electrophysiological properties of neurons and their connectivity, leading to an ever increasing computational complexity of neural simulations. At the same time, a rather radical change in personal computer technology emerges with the establishment of multi-cores: high-density, explicitly parallel processor architectures for both high performance as well as standard desktop computers. This work introduces strategies for the parallelization of biophysically realistic neural simulations based on the compartmental modeling technique and results of such an implementation, with a strong focus on multi-core architectures and automation, i.e. user-transparent load balancing. PMID:19636393
Advanced flight computer. Special study
NASA Technical Reports Server (NTRS)
Coo, Dennis
1995-01-01
This report documents a special study to define a 32-bit radiation hardened, SEU tolerant flight computer architecture, and to investigate current or near-term technologies and development efforts that contribute to the Advanced Flight Computer (AFC) design and development. An AFC processing node architecture is defined. Each node may consist of a multi-chip processor as needed. The modular, building block approach uses VLSI technology and packaging methods that demonstrate a feasible AFC module in 1998 that meets that AFC goals. The defined architecture and approach demonstrate a clear low-risk, low-cost path to the 1998 production goal, with intermediate prototypes in 1996.
Advanced information processing system for advanced launch system: Avionics architecture synthesis
NASA Technical Reports Server (NTRS)
Lala, Jaynarayan H.; Harper, Richard E.; Jaskowiak, Kenneth R.; Rosch, Gene; Alger, Linda S.; Schor, Andrei L.
1991-01-01
The Advanced Information Processing System (AIPS) is a fault-tolerant distributed computer system architecture that was developed to meet the real time computational needs of advanced aerospace vehicles. One such vehicle is the Advanced Launch System (ALS) being developed jointly by NASA and the Department of Defense to launch heavy payloads into low earth orbit at one tenth the cost (per pound of payload) of the current launch vehicles. An avionics architecture that utilizes the AIPS hardware and software building blocks was synthesized for ALS. The AIPS for ALS architecture synthesis process starting with the ALS mission requirements and ending with an analysis of the candidate ALS avionics architecture is described.
GASP-PL/I Simulation of Integrated Avionic System Processor Architectures. M.S. Thesis
NASA Technical Reports Server (NTRS)
Brent, G. A.
1978-01-01
A development study sponsored by NASA was completed in July 1977 which proposed a complete integration of all aircraft instrumentation into a single modular system. Instead of using the current single-function aircraft instruments, computers compiled and displayed inflight information for the pilot. A processor architecture called the Team Architecture was proposed. This is a hardware/software approach to high-reliability computer systems. A follow-up study of the proposed Team Architecture is reported. GASP-PL/1 simulation models are used to evaluate the operating characteristics of the Team Architecture. The problem, model development, simulation programs and results at length are presented. Also included are program input formats, outputs and listings.
Real-Time Cognitive Computing Architecture for Data Fusion in a Dynamic Environment
NASA Technical Reports Server (NTRS)
Duong, Tuan A.; Duong, Vu A.
2012-01-01
A novel cognitive computing architecture is conceptualized for processing multiple channels of multi-modal sensory data streams simultaneously, and fusing the information in real time to generate intelligent reaction sequences. This unique architecture is capable of assimilating parallel data streams that could be analog, digital, synchronous/asynchronous, and could be programmed to act as a knowledge synthesizer and/or an "intelligent perception" processor. In this architecture, the bio-inspired models of visual pathway and olfactory receptor processing are combined as processing components, to achieve the composite function of "searching for a source of food while avoiding the predator." The architecture is particularly suited for scene analysis from visual data and odorant.
Electromagnetic Physics Models for Parallel Computing Architectures
NASA Astrophysics Data System (ADS)
Amadio, G.; Ananya, A.; Apostolakis, J.; Aurora, A.; Bandieramonte, M.; Bhattacharyya, A.; Bianchini, C.; Brun, R.; Canal, P.; Carminati, F.; Duhem, L.; Elvira, D.; Gheata, A.; Gheata, M.; Goulas, I.; Iope, R.; Jun, S. Y.; Lima, G.; Mohanty, A.; Nikitina, T.; Novak, M.; Pokorski, W.; Ribon, A.; Seghal, R.; Shadura, O.; Vallecorsa, S.; Wenzel, S.; Zhang, Y.
2016-10-01
The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part of the GeantV project. Results of preliminary performance evaluation and physics validation are presented as well.
An MPI + $X$ implementation of contact global search using Kokkos
Hansen, Glen A.; Xavier, Patrick G.; Mish, Sam P.; ...
2015-10-05
This paper describes an approach that seeks to parallelize the spatial search associated with computational contact mechanics. In contact mechanics, the purpose of the spatial search is to find “nearest neighbors,” which is the prelude to an imprinting search that resolves the interactions between the external surfaces of contacting bodies. In particular, we are interested in the contact global search portion of the spatial search associated with this operation on domain-decomposition-based meshes. Specifically, we describe an implementation that combines standard domain-decomposition-based MPI-parallel spatial search with thread-level parallelism (MPI-X) available on advanced computer architectures (those with GPU coprocessors). Our goal ismore » to demonstrate the efficacy of the MPI-X paradigm in the overall contact search. Standard MPI-parallel implementations typically use a domain decomposition of the external surfaces of bodies within the domain in an attempt to efficiently distribute computational work. This decomposition may or may not be the same as the volume decomposition associated with the host physics. The parallel contact global search phase is then employed to find and distribute surface entities (nodes and faces) that are needed to compute contact constraints between entities owned by different MPI ranks without further inter-rank communication. Key steps of the contact global search include computing bounding boxes, building surface entity (node and face) search trees and finding and distributing entities required to complete on-rank (local) spatial searches. To enable source-code portability and performance across a variety of different computer architectures, we implemented the algorithm using the Kokkos hardware abstraction library. While we targeted development towards machines with a GPU accelerator per MPI rank, we also report performance results for OpenMP with a conventional multi-core compute node per rank. Results here demonstrate a 47 % decrease in the time spent within the global search algorithm, comparing the reference ACME algorithm with the GPU implementation, on an 18M face problem using four MPI ranks. As a result, while further work remains to maximize performance on the GPU, this result illustrates the potential of the proposed implementation.« less
Computational structures for robotic computations
NASA Technical Reports Server (NTRS)
Lee, C. S. G.; Chang, P. R.
1987-01-01
The computational problem of inverse kinematics and inverse dynamics of robot manipulators by taking advantage of parallelism and pipelining architectures is discussed. For the computation of inverse kinematic position solution, a maximum pipelined CORDIC architecture has been designed based on a functional decomposition of the closed-form joint equations. For the inverse dynamics computation, an efficient p-fold parallel algorithm to overcome the recurrence problem of the Newton-Euler equations of motion to achieve the time lower bound of O(log sub 2 n) has also been developed.
Yokohama, Noriya
2013-07-01
This report was aimed at structuring the design of architectures and studying performance measurement of a parallel computing environment using a Monte Carlo simulation for particle therapy using a high performance computing (HPC) instance within a public cloud-computing infrastructure. Performance measurements showed an approximately 28 times faster speed than seen with single-thread architecture, combined with improved stability. A study of methods of optimizing the system operations also indicated lower cost.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Agarwal, Sapan; Quach, Tu -Thach; Parekh, Ojas
In this study, the exponential increase in data over the last decade presents a significant challenge to analytics efforts that seek to process and interpret such data for various applications. Neural-inspired computing approaches are being developed in order to leverage the computational properties of the analog, low-power data processing observed in biological systems. Analog resistive memory crossbars can perform a parallel read or a vector-matrix multiplication as well as a parallel write or a rank-1 update with high computational efficiency. For an N × N crossbar, these two kernels can be O(N) more energy efficient than a conventional digital memory-basedmore » architecture. If the read operation is noise limited, the energy to read a column can be independent of the crossbar size (O(1)). These two kernels form the basis of many neuromorphic algorithms such as image, text, and speech recognition. For instance, these kernels can be applied to a neural sparse coding algorithm to give an O(N) reduction in energy for the entire algorithm when run with finite precision. Sparse coding is a rich problem with a host of applications including computer vision, object tracking, and more generally unsupervised learning.« less
Agarwal, Sapan; Quach, Tu -Thach; Parekh, Ojas; ...
2016-01-06
In this study, the exponential increase in data over the last decade presents a significant challenge to analytics efforts that seek to process and interpret such data for various applications. Neural-inspired computing approaches are being developed in order to leverage the computational properties of the analog, low-power data processing observed in biological systems. Analog resistive memory crossbars can perform a parallel read or a vector-matrix multiplication as well as a parallel write or a rank-1 update with high computational efficiency. For an N × N crossbar, these two kernels can be O(N) more energy efficient than a conventional digital memory-basedmore » architecture. If the read operation is noise limited, the energy to read a column can be independent of the crossbar size (O(1)). These two kernels form the basis of many neuromorphic algorithms such as image, text, and speech recognition. For instance, these kernels can be applied to a neural sparse coding algorithm to give an O(N) reduction in energy for the entire algorithm when run with finite precision. Sparse coding is a rich problem with a host of applications including computer vision, object tracking, and more generally unsupervised learning.« less
An Object Oriented Extensible Architecture for Affordable Aerospace Propulsion Systems
NASA Technical Reports Server (NTRS)
Follen, Gregory J.
2003-01-01
Driven by a need to explore and develop propulsion systems that exceeded current computing capabilities, NASA Glenn embarked on a novel strategy leading to the development of an architecture that enables propulsion simulations never thought possible before. Full engine 3 Dimensional Computational Fluid Dynamic propulsion system simulations were deemed impossible due to the impracticality of the hardware and software computing systems required. However, with a software paradigm shift and an embracing of parallel and distributed processing, an architecture was designed to meet the needs of future propulsion system modeling. The author suggests that the architecture designed at the NASA Glenn Research Center for propulsion system modeling has potential for impacting the direction of development of affordable weapons systems currently under consideration by the Applied Vehicle Technology Panel (AVT).
Solving the Cauchy-Riemann equations on parallel computers
NASA Technical Reports Server (NTRS)
Fatoohi, Raad A.; Grosch, Chester E.
1987-01-01
Discussed is the implementation of a single algorithm on three parallel-vector computers. The algorithm is a relaxation scheme for the solution of the Cauchy-Riemann equations; a set of coupled first order partial differential equations. The computers were chosen so as to encompass a variety of architectures. They are: the MPP, and SIMD machine with 16K bit serial processors; FLEX/32, an MIMD machine with 20 processors; and CRAY/2, an MIMD machine with four vector processors. The machine architectures are briefly described. The implementation of the algorithm is discussed in relation to these architectures and measures of the performance on each machine are given. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Conclusions are presented.
Processing-in-Memory Enabled Graphics Processors for 3D Rendering
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xie, Chenhao; Song, Shuaiwen; Wang, Jing
2017-02-06
The performance of 3D rendering of Graphics Processing Unit that convents 3D vector stream into 2D frame with 3D image effects significantly impact users’ gaming experience on modern computer systems. Due to the high texture throughput in 3D rendering, main memory bandwidth becomes a critical obstacle for improving the overall rendering performance. 3D stacked memory systems such as Hybrid Memory Cube (HMC) provide opportunities to significantly overcome the memory wall by directly connecting logic controllers to DRAM dies. Based on the observation that texel fetches significantly impact off-chip memory traffic, we propose two architectural designs to enable Processing-In-Memory based GPUmore » for efficient 3D rendering.« less
Application of Tessellation in Architectural Geometry Design
NASA Astrophysics Data System (ADS)
Chang, Wei
2018-06-01
Tessellation plays a significant role in architectural geometry design, which is widely used both through history of architecture and in modern architectural design with the help of computer technology. Tessellation has been found since the birth of civilization. In terms of dimensions, there are two- dimensional tessellations and three-dimensional tessellations; in terms of symmetry, there are periodic tessellations and aperiodic tessellations. Besides, some special types of tessellations such as Voronoi Tessellation and Delaunay Triangles are also included. Both Geometry and Crystallography, the latter of which is the basic theory of three-dimensional tessellations, need to be studied. In history, tessellation was applied into skins or decorations in architecture. The development of Computer technology enables tessellation to be more powerful, as seen in surface control, surface display and structure design, etc. Therefore, research on the application of tessellation in architectural geometry design is of great necessity in architecture studies.
NASA Technical Reports Server (NTRS)
Schmalzel, John L.; Morris, Jon; Turowski, Mark; Figueroa, Fernando; Oostdyk, Rebecca
2008-01-01
There are a number of architecture models for implementing Integrated Systems Health Management (ISHM) capabilities. For example, approaches based on the OSA-CBM and OSA-EAI models, or specific architectures developed in response to local needs. NASA s John C. Stennis Space Center (SSC) has developed one such version of an extensible architecture in support of rocket engine testing that integrates a palette of functions in order to achieve an ISHM capability. Among the functional capabilities that are supported by the framework are: prognostic models, anomaly detection, a data base of supporting health information, root cause analysis, intelligent elements, and integrated awareness. This paper focuses on the role that intelligent elements can play in ISHM architectures. We define an intelligent element as a smart element with sufficient computing capacity to support anomaly detection or other algorithms in support of ISHM functions. A smart element has the capabilities of supporting networked implementations of IEEE 1451.x smart sensor and actuator protocols. The ISHM group at SSC has been actively developing intelligent elements in conjunction with several partners at other Centers, universities, and companies as part of our ISHM approach for better supporting rocket engine testing. We have developed several implementations. Among the key features for these intelligent sensors is support for IEEE 1451.1 and incorporation of a suite of algorithms for determination of sensor health. Regardless of the potential advantages that can be achieved using intelligent sensors, existing large-scale systems are still based on conventional sensors and data acquisition systems. In order to bring the benefits of intelligent sensors to these environments, we have also developed virtual implementations of intelligent sensors.
Efficient Graph Based Assembly of Short-Read Sequences on Hybrid Core Architecture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sczyrba, Alex; Pratap, Abhishek; Canon, Shane
2011-03-22
Advanced architectures can deliver dramatically increased throughput for genomics and proteomics applications, reducing time-to-completion in some cases from days to minutes. One such architecture, hybrid-core computing, marries a traditional x86 environment with a reconfigurable coprocessor, based on field programmable gate array (FPGA) technology. In addition to higher throughput, increased performance can fundamentally improve research quality by allowing more accurate, previously impractical approaches. We will discuss the approach used by Convey?s de Bruijn graph constructor for short-read, de-novo assembly. Bioinformatics applications that have random access patterns to large memory spaces, such as graph-based algorithms, experience memory performance limitations on cache-based x86more » servers. Convey?s highly parallel memory subsystem allows application-specific logic to simultaneously access 8192 individual words in memory, significantly increasing effective memory bandwidth over cache-based memory systems. Many algorithms, such as Velvet and other de Bruijn graph based, short-read, de-novo assemblers, can greatly benefit from this type of memory architecture. Furthermore, small data type operations (four nucleotides can be represented in two bits) make more efficient use of logic gates than the data types dictated by conventional programming models.JGI is comparing the performance of Convey?s graph constructor and Velvet on both synthetic and real data. We will present preliminary results on memory usage and run time metrics for various data sets with different sizes, from small microbial and fungal genomes to very large cow rumen metagenome. For genomes with references we will also present assembly quality comparisons between the two assemblers.« less
Optimum Guidance Law and Information Management for a Large Number of Formation Flying Spacecrafts
NASA Astrophysics Data System (ADS)
Tsuda, Yuichi; Nakasuka, Shinichi
In recent years, formation flying technique is recognized as one of the most important technologies for deep space and orbital missions that involve multiple spacecraft operations. Formation flying mission improves simultaneous observability over a wide area, redundancy and reconfigurability of the system with relatively small and low cost spacecrafts compared with the conventional single spacecraft mission. From the viewpoint of guidance and control, realizing formation flying mission usually requires tight maintenance and control of the relative distances, speeds and orientations between the member satellites. This paper studies a practical architecture for formation flight missions focusing mainly on guidance and control, and describes a new guidance algorithm for changing and keeping the relative positions and speeds of the satellites in formation. The resulting algorithm is suitable for onboard processing and gives the optimum impulsive trajectory for satellites flying closely around a certain reference orbit, that can be elliptic, parabolic or hyperbolic. Based on this guidance algorithm, this study introduces an information management methodology between the member spacecrafts which is suitable for a large formation flight architecture. Routing and multicast communication based on the wireless local area network technology are introduced. Some mathematical analyses and computer simulations will be shown in the presentation to reveal the feasibility of the proposed formation flight architecture, especially when a very large number of satellites join the formation.
Heeger, David J.
2017-01-01
Most models of sensory processing in the brain have a feedforward architecture in which each stage comprises simple linear filtering operations and nonlinearities. Models of this form have been used to explain a wide range of neurophysiological and psychophysical data, and many recent successes in artificial intelligence (with deep convolutional neural nets) are based on this architecture. However, neocortex is not a feedforward architecture. This paper proposes a first step toward an alternative computational framework in which neural activity in each brain area depends on a combination of feedforward drive (bottom-up from the previous processing stage), feedback drive (top-down context from the next stage), and prior drive (expectation). The relative contributions of feedforward drive, feedback drive, and prior drive are controlled by a handful of state parameters, which I hypothesize correspond to neuromodulators and oscillatory activity. In some states, neural responses are dominated by the feedforward drive and the theory is identical to a conventional feedforward model, thereby preserving all of the desirable features of those models. In other states, the theory is a generative model that constructs a sensory representation from an abstract representation, like memory recall. In still other states, the theory combines prior expectation with sensory input, explores different possible perceptual interpretations of ambiguous sensory inputs, and predicts forward in time. The theory, therefore, offers an empirically testable framework for understanding how the cortex accomplishes inference, exploration, and prediction. PMID:28167793
A single VLSI chip for computing syndromes in the (225, 223) Reed-Solomon decoder
NASA Technical Reports Server (NTRS)
Hsu, I. S.; Truong, T. K.; Shao, H. M.; Deutsch, L. J.
1986-01-01
A description of a single VLSI chip for computing syndromes in the (255, 223) Reed-Solomon decoder is presented. The architecture that leads to this single VLSI chip design makes use of the dual basis multiplication algorithm. The same architecture can be applied to design VLSI chips to compute various kinds of number theoretic transforms.
A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL)
NASA Technical Reports Server (NTRS)
Carroll, Chester C.; Owen, Jeffrey E.
1988-01-01
A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL) is presented which overcomes the traditional disadvantages of simulations executed on a digital computer. The incorporation of parallel processing allows the mapping of simulations into a digital computer to be done in the same inherently parallel manner as they are currently mapped onto an analog computer. The direct-execution format maximizes the efficiency of the executed code since the need for a high level language compiler is eliminated. Resolution is greatly increased over that which is available with an analog computer without the sacrifice in execution speed normally expected with digitial computer simulations. Although this report covers all aspects of the new architecture, key emphasis is placed on the processing element configuration and the microprogramming of the ACLS constructs. The execution times for all ACLS constructs are computed using a model of a processing element based on the AMD 29000 CPU and the AMD 29027 FPU. The increase in execution speed provided by parallel processing is exemplified by comparing the derived execution times of two ACSL programs with the execution times for the same programs executed on a similar sequential architecture.
A neural network construction method for surrogate modeling of physics-based analysis
NASA Astrophysics Data System (ADS)
Sung, Woong Je
In this thesis existing methodologies related to the developmental methods of neural networks have been surveyed and their approaches to network sizing and structuring are carefully observed. This literature review covers the constructive methods, the pruning methods, and the evolutionary methods and questions about the basic assumption intrinsic to the conventional neural network learning paradigm, which is primarily devoted to optimization of connection weights (or synaptic strengths) for the pre-determined connection structure of the network. The main research hypothesis governing this thesis is that, without breaking a prevailing dichotomy between weights and connectivity of the network during learning phase, the efficient design of a task-specific neural network is hard to achieve because, as long as connectivity and weights are searched by separate means, a structural optimization of the neural network requires either repetitive re-training procedures or computationally expensive topological meta-search cycles. The main contribution of this thesis is designing and testing a novel learning mechanism which efficiently learns not only weight parameters but also connection structure from a given training data set, and positioning this learning mechanism within the surrogate modeling practice. In this work, a simple and straightforward extension to the conventional error Back-Propagation (BP) algorithm has been formulated to enable a simultaneous learning for both connectivity and weights of the Generalized Multilayer Perceptron (GMLP) in supervised learning tasks. A particular objective is to achieve a task-specific network having reasonable generalization performance with a minimal training time. The dichotomy between architectural design and weight optimization is reconciled by a mechanism establishing a new connection for a neuron pair which has potentially higher error-gradient than one of the existing connections. Interpreting an instance of the absence of connection as a zero-weight connection, the potential contribution to training error reduction of any present or absent connection can readily be evaluated using the BP algorithm. Instead of being broken, the connections that contribute less remain frozen with constant weight values optimized to that point but they are excluded from further weight optimization until reselected. In this way, a selective weight optimization is executed only for the dynamically maintained pool of high gradient connections. By searching the rapidly changing weights and concentrating optimization resources on them, the learning process is accelerated without either a significant increase in computational cost or a need for re-training. This results in a more task-adapted network connection structure. Combined with another important criterion for the division of a neuron which adds a new computational unit to a network, a highly fitted network can be grown out of the minimal random structure. This particular learning strategy can belong to a more broad class of the variable connectivity learning scheme and the devised algorithm has been named Optimal Brain Growth (OBG). The OBG algorithm has been tested on two canonical problems; a regression analysis using the Complicated Interaction Regression Function and a classification of the Two-Spiral Problem. A comparative study with conventional Multilayer Perceptrons (MLPs) consisting of single- and double-hidden layers shows that OBG is less sensitive to random initial conditions and generalizes better with only a minimal increase in computational time. This partially proves that a variable connectivity learning scheme has great potential to enhance computational efficiency and reduce efforts to select proper network architecture. To investigate the applicability of the OBG to more practical surrogate modeling tasks, the geometry-to-pressure mapping of a particular class of airfoils in the transonic flow regime has been sought using both the conventional MLP networks with pre-defined architecture and the OBG-developed networks started from the same initial MLP networks. Considering wide variety in airfoil geometry and diversity of flow conditions distributed over a range of flow Mach numbers and angles of attack, the new method shows a great potential to capture fundamentally nonlinear flow phenomena especially related to the occurrence of shock waves on airfoil surfaces in transonic flow regime. (Abstract shortened by UMI.).
NASA Technical Reports Server (NTRS)
Lindley, Craig A.
1995-01-01
This paper presents an architecture for satellites regarded as intercommunicating agents. The architecture is based upon a postmodern paradigm of artificial intelligence in which represented knowledge is regarded as text, inference procedures are regarded as social discourse and decision making conventions and the semantics of representations are grounded in the situated behaviour and activity of agents. A particular protocol is described for agent participation in distributed search and retrieval operations conducted as joint activities.
Evidence of common and separate eye and hand accumulators underlying flexible eye-hand coordination
Jana, Sumitash; Gopal, Atul
2016-01-01
Eye and hand movements are initiated by anatomically separate regions in the brain, and yet these movements can be flexibly coupled and decoupled, depending on the need. The computational architecture that enables this flexible coupling of independent effectors is not understood. Here, we studied the computational architecture that enables flexible eye-hand coordination using a drift diffusion framework, which predicts that the variability of the reaction time (RT) distribution scales with its mean. We show that a common stochastic accumulator to threshold, followed by a noisy effector-dependent delay, explains eye-hand RT distributions and their correlation in a visual search task that required decision-making, while an interactive eye and hand accumulator model did not. In contrast, in an eye-hand dual task, an interactive model better predicted the observed correlations and RT distributions than a common accumulator model. Notably, these two models could only be distinguished on the basis of the variability and not the means of the predicted RT distributions. Additionally, signatures of separate initiation signals were also observed in a small fraction of trials in the visual search task, implying that these distinct computational architectures were not a manifestation of the task design per se. Taken together, our results suggest two unique computational architectures for eye-hand coordination, with task context biasing the brain toward instantiating one of the two architectures. NEW & NOTEWORTHY Previous studies on eye-hand coordination have considered mainly the means of eye and hand reaction time (RT) distributions. Here, we leverage the approximately linear relationship between the mean and standard deviation of RT distributions, as predicted by the drift-diffusion model, to propose the existence of two distinct computational architectures underlying coordinated eye-hand movements. These architectures, for the first time, provide a computational basis for the flexible coupling between eye and hand movements. PMID:27784809
A System Architecture for Efficient Transmission of Massive DNA Sequencing Data.
Sağiroğlu, Mahmut Şamİl; Külekcİ, M Oğuzhan
2017-11-01
The DNA sequencing data analysis pipelines require significant computational resources. In that sense, cloud computing infrastructures appear as a natural choice for this processing. However, the first practical difficulty in reaching the cloud computing services is the transmission of the massive DNA sequencing data from where they are produced to where they will be processed. The daily practice here begins with compressing the data in FASTQ file format, and then sending these data via fast data transmission protocols. In this study, we address the weaknesses in that daily practice and present a new system architecture that incorporates the computational resources available on the client side while dynamically adapting itself to the available bandwidth. Our proposal considers the real-life scenarios, where the bandwidth of the connection between the parties may fluctuate, and also the computing power on the client side may be of any size ranging from moderate personal computers to powerful workstations. The proposed architecture aims at utilizing both the communication bandwidth and the computing resources for satisfying the ultimate goal of reaching the results as early as possible. We present a prototype implementation of the proposed architecture, and analyze several real-life cases, which provide useful insights for the sequencing centers, especially on deciding when to use a cloud service and in what conditions.
Computer graphics in architecture and engineering
NASA Technical Reports Server (NTRS)
Greenberg, D. P.
1975-01-01
The present status of the application of computer graphics to the building profession or architecture and its relationship to other scientific and technical areas were discussed. It was explained that, due to the fragmented nature of architecture and building activities (in contrast to the aerospace industry), a comprehensive, economic utilization of computer graphics in this area is not practical and its true potential cannot now be realized due to the present inability of architects and structural, mechanical, and site engineers to rely on a common data base. Future emphasis will therefore have to be placed on a vertical integration of the construction process and effective use of a three-dimensional data base, rather than on waiting for any technological breakthrough in interactive computing.
Innovative architectures for dense multi-microprocessor computers
NASA Technical Reports Server (NTRS)
Larson, Robert E.
1989-01-01
The purpose is to summarize a Phase 1 SBIR project performed for the NASA/Langley Computational Structural Mechanics Group. The project was performed from February to August 1987. The main objectives of the project were to: (1) expand upon previous research into the application of chordal ring architectures to the general problem of designing multi-microcomputer architectures, (2) attempt to identify a family of chordal rings such that each chordal ring can be simply expanded to produce the next member of the family, (3) perform a preliminary, high-level design of an expandable multi-microprocessor computer based upon chordal rings, (4) analyze the potential use of chordal ring based multi-microprocessors for sparse matrix problems and other applications arising in computational structural mechanics.
Fault tolerant architectures for integrated aircraft electronics systems
NASA Technical Reports Server (NTRS)
Levitt, K. N.; Melliar-Smith, P. M.; Schwartz, R. L.
1983-01-01
Work into possible architectures for future flight control computer systems is described. Ada for Fault-Tolerant Systems, the NETS Network Error-Tolerant System architecture, and voting in asynchronous systems are covered.
ERIC Educational Resources Information Center
Lintao, Rachelle B.; Erfe, Jonathan P.
2012-01-01
This study purports to foster the understanding of profession-based academic writing in two different cultural conventions by examining the rhetorical moves employed by American and Philippine thesis introductions in Architecture using Swales' 2004 Revised CARS move-analytic model as framework. Twenty (20) Master's thesis introductions in…
A Survey of Some Approaches to Distributed Data Base & Distributed File System Architecture.
1980-01-01
BUS POD A DD A 12 12 A = A Cell D = D Cell Figure 7-1: MUFFIN logical architecture - 45 - MUFI January 1980 ".-.Bus Interface V Conventional Processor...and Applied Mathematics (14), * December, 1966. [Kimbleton 791 Kimbleton, Stephen; Wang, Pearl; and Fong, Elizabeth. XNDM: An Experimental Network
Architecture of a mixed-mode electrophysiological signal acquisition interface.
Shen, Ding-Lan; Chen, Jyun-Min
2012-01-01
This paper proposes mixed-mode architecture for the acquisition interface of electrophysiological signals. The architecture advances the analog-to-digital converter (ADC) from the second chopper signal in the conventional approach and performs the second chopper operation in the digital domain. The demanded low-pass filter (LPF) is realized with a digital type. The analog LPF in feedback path is substituted with a digital one accompanying with a digital-to-analog converter (DAC). The analog variation is decreased due to the digitization of these operations. The entire architecture is simulated with the ECG input in a behavior model of Simulink.
HyperForest: A high performance multi-processor architecture for real-time intelligent systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Garcia, P. Jr.; Rebeil, J.P.; Pollard, H.
1997-04-01
Intelligent Systems are characterized by the intensive use of computer power. The computer revolution of the last few years is what has made possible the development of the first generation of Intelligent Systems. Software for second generation Intelligent Systems will be more complex and will require more powerful computing engines in order to meet real-time constraints imposed by new robots, sensors, and applications. A multiprocessor architecture was developed that merges the advantages of message-passing and shared-memory structures: expendability and real-time compliance. The HyperForest architecture will provide an expandable real-time computing platform for computationally intensive Intelligent Systems and open the doorsmore » for the application of these systems to more complex tasks in environmental restoration and cleanup projects, flexible manufacturing systems, and DOE`s own production and disassembly activities.« less
Motion camera based on a custom vision sensor and an FPGA architecture
NASA Astrophysics Data System (ADS)
Arias-Estrada, Miguel
1998-09-01
A digital camera for custom focal plane arrays was developed. The camera allows the test and development of analog or mixed-mode arrays for focal plane processing. The camera is used with a custom sensor for motion detection to implement a motion computation system. The custom focal plane sensor detects moving edges at the pixel level using analog VLSI techniques. The sensor communicates motion events using the event-address protocol associated to a temporal reference. In a second stage, a coprocessing architecture based on a field programmable gate array (FPGA) computes the time-of-travel between adjacent pixels. The FPGA allows rapid prototyping and flexible architecture development. Furthermore, the FPGA interfaces the sensor to a compact PC computer which is used for high level control and data communication to the local network. The camera could be used in applications such as self-guided vehicles, mobile robotics and smart surveillance systems. The programmability of the FPGA allows the exploration of further signal processing like spatial edge detection or image segmentation tasks. The article details the motion algorithm, the sensor architecture, the use of the event- address protocol for velocity vector computation and the FPGA architecture used in the motion camera system.
NASA Technical Reports Server (NTRS)
Welstead, Jason R.; Felder, James L.
2016-01-01
A single-aisle commercial transport concept with a turboelectric propulsion system architecture was developed assuming entry into service in 2035 and compared to a similar technology conventional configuration. The turboelectric architecture consisted of two underwing turbofans with generators extracting power from the fan shaft and sending it to a rear fuselage, axisymmetric, boundary layer ingesting fan. Results indicate that the turbo- electric concept has an economic mission fuel burn reduction of 7%, and a design mission fuel burn reduction of 12% compared to the conventional configuration. An exploration of the design space was performed to better understand how the turboelectric architecture changes the design space, and system sensitivities were run to determine the sensitivity of thrust specific fuel consumption at top of climb and propulsion system weight to the motor power, fan pressure ratio, and electrical transmission efficiency of the aft boundary layer ingesting fan.
NASA Astrophysics Data System (ADS)
Hartmann, Alfred; Redfield, Steve
1989-04-01
This paper discusses design of large-scale (1000x 1000) optical crossbar switching networks for use in parallel processing supercom-puters. Alternative design sketches for an optical crossbar switching network are presented using free-space optical transmission with either a beam spreading/masking model or a beam steering model for internodal communications. The performances of alternative multiple access channel communications protocol-unslotted and slotted ALOHA and carrier sense multiple access (CSMA)-are compared with the performance of the classic arbitrated bus crossbar of conventional electronic parallel computing. These comparisons indicate an almost inverse relationship between ease of implementation and speed of operation. Practical issues of optical system design are addressed, and an optically addressed, composite spatial light modulator design is presented for fabrication to arbitrarily large scale. The wide range of switch architecture, communications protocol, optical systems design, device fabrication, and system performance problems presented by these design sketches poses a serious challenge to practical exploitation of highly parallel optical interconnects in advanced computer designs.
Environmental models are products of the computer architecture and software tools available at the time of development. Scientifically sound algorithms may persist in their original state even as system architectures and software development approaches evolve and progress. Dating...
Modelling parallel programs and multiprocessor architectures with AXE
NASA Technical Reports Server (NTRS)
Yan, Jerry C.; Fineman, Charles E.
1991-01-01
AXE, An Experimental Environment for Parallel Systems, was designed to model and simulate for parallel systems at the process level. It provides an integrated environment for specifying computation models, multiprocessor architectures, data collection, and performance visualization. AXE is being used at NASA-Ames for developing resource management strategies, parallel problem formulation, multiprocessor architectures, and operating system issues related to the High Performance Computing and Communications Program. AXE's simple, structured user-interface enables the user to model parallel programs and machines precisely and efficiently. Its quick turn-around time keeps the user interested and productive. AXE models multicomputers. The user may easily modify various architectural parameters including the number of sites, connection topologies, and overhead for operating system activities. Parallel computations in AXE are represented as collections of autonomous computing objects known as players. Their use and behavior is described. Performance data of the multiprocessor model can be observed on a color screen. These include CPU and message routing bottlenecks, and the dynamic status of the software.
A high performance parallel computing architecture for robust image features
NASA Astrophysics Data System (ADS)
Zhou, Renyan; Liu, Leibo; Wei, Shaojun
2014-03-01
A design of parallel architecture for image feature detection and description is proposed in this article. The major component of this architecture is a 2D cellular network composed of simple reprogrammable processors, enabling the Hessian Blob Detector and Haar Response Calculation, which are the most computing-intensive stage of the Speeded Up Robust Features (SURF) algorithm. Combining this 2D cellular network and dedicated hardware for SURF descriptors, this architecture achieves real-time image feature detection with minimal software in the host processor. A prototype FPGA implementation of the proposed architecture achieves 1318.9 GOPS general pixel processing @ 100 MHz clock and achieves up to 118 fps in VGA (640 × 480) image feature detection. The proposed architecture is stand-alone and scalable so it is easy to be migrated into VLSI implementation.
A Low-Complexity Euclidean Orthogonal LDPC Architecture for Low Power Applications.
Revathy, M; Saravanan, R
2015-01-01
Low-density parity-check (LDPC) codes have been implemented in latest digital video broadcasting, broadband wireless access (WiMax), and fourth generation of wireless standards. In this paper, we have proposed a high efficient low-density parity-check code (LDPC) decoder architecture for low power applications. This study also considers the design and analysis of check node and variable node units and Euclidean orthogonal generator in LDPC decoder architecture. The Euclidean orthogonal generator is used to reduce the error rate of the proposed LDPC architecture, which can be incorporated between check and variable node architecture. This proposed decoder design is synthesized on Xilinx 9.2i platform and simulated using Modelsim, which is targeted to 45 nm devices. Synthesis report proves that the proposed architecture greatly reduces the power consumption and hardware utilizations on comparing with different conventional architectures.
The RISC (Reduced Instruction Set Computer) Architecture and Computer Performance Evaluation.
1986-03-01
time where the main emphasis of the evaluation process is put on the software . The model is intended to provide a tool for computer architects to use...program, or 3) Was to be implemented in random logic more effec- tively than the equivalent sequence of software instructions. Both data and address...definition is the IEEE standard 729-1983 stating Computer Architecture as: " The process of defining a collection of hardware and software components and
First 3 years of operation of RIACS (Research Institute for Advanced Computer Science) (1983-1985)
NASA Technical Reports Server (NTRS)
Denning, P. J.
1986-01-01
The focus of the Research Institute for Advanced Computer Science (RIACS) is to explore matches between advanced computing architectures and the processes of scientific research. An architecture evaluation of the MIT static dataflow machine, specification of a graphical language for expressing distributed computations, and specification of an expert system for aiding in grid generation for two-dimensional flow problems was initiated. Research projects for 1984 and 1985 are summarized.
Song, Tianqi; Garg, Sudhanshu; Mokhtar, Reem; Bui, Hieu; Reif, John
2018-01-19
A main goal in DNA computing is to build DNA circuits to compute designated functions using a minimal number of DNA strands. Here, we propose a novel architecture to build compact DNA strand displacement circuits to compute a broad scope of functions in an analog fashion. A circuit by this architecture is composed of three autocatalytic amplifiers, and the amplifiers interact to perform computation. We show DNA circuits to compute functions sqrt(x), ln(x) and exp(x) for x in tunable ranges with simulation results. A key innovation in our architecture, inspired by Napier's use of logarithm transforms to compute square roots on a slide rule, is to make use of autocatalytic amplifiers to do logarithmic and exponential transforms in concentration and time. In particular, we convert from the input that is encoded by the initial concentration of the input DNA strand, to time, and then back again to the output encoded by the concentration of the output DNA strand at equilibrium. This combined use of strand-concentration and time encoding of computational values may have impact on other forms of molecular computation.
Performances of multiprocessor multidisk architectures for continuous media storage
NASA Astrophysics Data System (ADS)
Gennart, Benoit A.; Messerli, Vincent; Hersch, Roger D.
1996-03-01
Multimedia interfaces increase the need for large image databases, capable of storing and reading streams of data with strict synchronicity and isochronicity requirements. In order to fulfill these requirements, we consider a parallel image server architecture which relies on arrays of intelligent disk nodes, each disk node being composed of one processor and one or more disks. This contribution analyzes through bottleneck performance evaluation and simulation the behavior of two multi-processor multi-disk architectures: a point-to-point architecture and a shared-bus architecture similar to current multiprocessor workstation architectures. We compare the two architectures on the basis of two multimedia algorithms: the compute-bound frame resizing by resampling and the data-bound disk-to-client stream transfer. The results suggest that the shared bus is a potential bottleneck despite its very high hardware throughput (400Mbytes/s) and that an architecture with addressable local memories located closely to their respective processors could partially remove this bottleneck. The point- to-point architecture is scalable and able to sustain high throughputs for simultaneous compute- bound and data-bound operations.
Electromagnetic physics models for parallel computing architectures
Amadio, G.; Ananya, A.; Apostolakis, J.; ...
2016-11-21
The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part ofmore » the GeantV project. Finally, the results of preliminary performance evaluation and physics validation are presented as well.« less
High-power, fixed, and tunable wavelength, grating-free cascaded Raman fiber lasers
NASA Astrophysics Data System (ADS)
Balaswamy, V.; Arun, S.; Aparanji, Santosh; Choudhury, Vishal; Supradeepa, V. R.
2018-04-01
Cascaded Raman lasers enable high powers at various wavelength bands inaccessible with conventional rare-earth doped lasers. The input and output wavelengths of conventional implementations are fixed by the constituent fiber gratings necessary for cascaded Raman conversion. We demonstrate here, a simple architecture for high power, fixed and wavelength tunable, grating-free, cascaded Raman conversion between different wavelength bands. The architecture is based on the recently proposed distributed feedback Raman lasers. Here, we implement a module which converts the Ytterbium band to the eye-safe 1.5micron region. We demonstrate pump-limited output powers of over 30W in fixed and continuously wavelength tunable configurations.
Architectural Implications of Cloud Computing
2011-10-24
Public Cloud Infrastructure-as-a- Service (IaaS) Software -as-a- Service ( SaaS ) Cloud Computing Types Platform-as-a- Service (PaaS) Based on Type of...Twitter #SEIVirtualForum © 2011 Carnegie Mellon University Software -as-a- Service ( SaaS ) Model of software deployment in which a third-party...and System Solutions (RTSS) Program. Her current interests and projects are in service -oriented architecture (SOA), cloud computing, and context
Generic Software for Emulating Multiprocessor Architectures.
1985-05-01
RD-A157 662 GENERIC SOFTWARE FOR EMULATING MULTIPROCESSOR 1/2 AlRCHITECTURES(J) MASSACHUSETTS INST OF TECH CAMBRIDGE U LRS LAB FOR COMPUTER SCIENCE R...AREA & WORK UNIT NUMBERS MIT Laboratory for Computer Science 545 Technology Square Cambridge, MA 02139 ____________ I I. CONTROLLING OFFICE NAME AND...aide If neceeasy end Identify by block number) Computer architecture, emulation, simulation, dataf low 20. ABSTRACT (Continue an reverse slde It
Sigint Application for Polymorphous Computing Architecture (PCA): Wideband DF
2006-08-01
Polymorphous Computing Architecture (PCA) program as stated by Robert Graybill is to Develop the computing foundation for agile systems by establishing...ubiquitous MUSIC algorithm rely upon an underlying narrowband signal model [8]. In this case, narrowband means that the signal bandwidth is less than...a wideband DF algorithm is needed to compensate for this model inadequacy. Among the various wideband DF techniques available, the coherent signal
Exploring Gigabyte Datasets in Real Time: Architectures, Interfaces and Time-Critical Design
NASA Technical Reports Server (NTRS)
Bryson, Steve; Gerald-Yamasaki, Michael (Technical Monitor)
1998-01-01
Architectures and Interfaces: The implications of real-time interaction on software architecture design: decoupling of interaction/graphics and computation into asynchronous processes. The performance requirements of graphics and computation for interaction. Time management in such an architecture. Examples of how visualization algorithms must be modified for high performance. Brief survey of interaction techniques and design, including direct manipulation and manipulation via widgets. talk discusses how human factors considerations drove the design and implementation of the virtual wind tunnel. Time-Critical Design: A survey of time-critical techniques for both computation and rendering. Emphasis on the assignment of a time budget to both the overall visualization environment and to each individual visualization technique in the environment. The estimation of the benefit and cost of an individual technique. Examples of the modification of visualization algorithms to allow time-critical control.
Hardware architecture design of image restoration based on time-frequency domain computation
NASA Astrophysics Data System (ADS)
Wen, Bo; Zhang, Jing; Jiao, Zipeng
2013-10-01
The image restoration algorithms based on time-frequency domain computation is high maturity and applied widely in engineering. To solve the high-speed implementation of these algorithms, the TFDC hardware architecture is proposed. Firstly, the main module is designed, by analyzing the common processing and numerical calculation. Then, to improve the commonality, the iteration control module is planed for iterative algorithms. In addition, to reduce the computational cost and memory requirements, the necessary optimizations are suggested for the time-consuming module, which include two-dimensional FFT/IFFT and the plural calculation. Eventually, the TFDC hardware architecture is adopted for hardware design of real-time image restoration system. The result proves that, the TFDC hardware architecture and its optimizations can be applied to image restoration algorithms based on TFDC, with good algorithm commonality, hardware realizability and high efficiency.
Takeda, Shuntaro; Furusawa, Akira
2017-09-22
We propose a scalable scheme for optical quantum computing using measurement-induced continuous-variable quantum gates in a loop-based architecture. Here, time-bin-encoded quantum information in a single spatial mode is deterministically processed in a nested loop by an electrically programmable gate sequence. This architecture can process any input state and an arbitrary number of modes with almost minimum resources, and offers a universal gate set for both qubits and continuous variables. Furthermore, quantum computing can be performed fault tolerantly by a known scheme for encoding a qubit in an infinite-dimensional Hilbert space of a single light mode.
NASA Astrophysics Data System (ADS)
Takeda, Shuntaro; Furusawa, Akira
2017-09-01
We propose a scalable scheme for optical quantum computing using measurement-induced continuous-variable quantum gates in a loop-based architecture. Here, time-bin-encoded quantum information in a single spatial mode is deterministically processed in a nested loop by an electrically programmable gate sequence. This architecture can process any input state and an arbitrary number of modes with almost minimum resources, and offers a universal gate set for both qubits and continuous variables. Furthermore, quantum computing can be performed fault tolerantly by a known scheme for encoding a qubit in an infinite-dimensional Hilbert space of a single light mode.
FUX-Sim: Implementation of a fast universal simulation/reconstruction framework for X-ray systems.
Abella, Monica; Serrano, Estefania; Garcia-Blas, Javier; García, Ines; de Molina, Claudia; Carretero, Jesus; Desco, Manuel
2017-01-01
The availability of digital X-ray detectors, together with advances in reconstruction algorithms, creates an opportunity for bringing 3D capabilities to conventional radiology systems. The downside is that reconstruction algorithms for non-standard acquisition protocols are generally based on iterative approaches that involve a high computational burden. The development of new flexible X-ray systems could benefit from computer simulations, which may enable performance to be checked before expensive real systems are implemented. The development of simulation/reconstruction algorithms in this context poses three main difficulties. First, the algorithms deal with large data volumes and are computationally expensive, thus leading to the need for hardware and software optimizations. Second, these optimizations are limited by the high flexibility required to explore new scanning geometries, including fully configurable positioning of source and detector elements. And third, the evolution of the various hardware setups increases the effort required for maintaining and adapting the implementations to current and future programming models. Previous works lack support for completely flexible geometries and/or compatibility with multiple programming models and platforms. In this paper, we present FUX-Sim, a novel X-ray simulation/reconstruction framework that was designed to be flexible and fast. Optimized implementation for different families of GPUs (CUDA and OpenCL) and multi-core CPUs was achieved thanks to a modularized approach based on a layered architecture and parallel implementation of the algorithms for both architectures. A detailed performance evaluation demonstrates that for different system configurations and hardware platforms, FUX-Sim maximizes performance with the CUDA programming model (5 times faster than other state-of-the-art implementations). Furthermore, the CPU and OpenCL programming models allow FUX-Sim to be executed over a wide range of hardware platforms.
Heterogeneous computing architecture for fast detection of SNP-SNP interactions.
Sluga, Davor; Curk, Tomaz; Zupan, Blaz; Lotric, Uros
2014-06-25
The extent of data in a typical genome-wide association study (GWAS) poses considerable computational challenges to software tools for gene-gene interaction discovery. Exhaustive evaluation of all interactions among hundreds of thousands to millions of single nucleotide polymorphisms (SNPs) may require weeks or even months of computation. Massively parallel hardware within a modern Graphic Processing Unit (GPU) and Many Integrated Core (MIC) coprocessors can shorten the run time considerably. While the utility of GPU-based implementations in bioinformatics has been well studied, MIC architecture has been introduced only recently and may provide a number of comparative advantages that have yet to be explored and tested. We have developed a heterogeneous, GPU and Intel MIC-accelerated software module for SNP-SNP interaction discovery to replace the previously single-threaded computational core in the interactive web-based data exploration program SNPsyn. We report on differences between these two modern massively parallel architectures and their software environments. Their utility resulted in an order of magnitude shorter execution times when compared to the single-threaded CPU implementation. GPU implementation on a single Nvidia Tesla K20 runs twice as fast as that for the MIC architecture-based Xeon Phi P5110 coprocessor, but also requires considerably more programming effort. General purpose GPUs are a mature platform with large amounts of computing power capable of tackling inherently parallel problems, but can prove demanding for the programmer. On the other hand the new MIC architecture, albeit lacking in performance reduces the programming effort and makes it up with a more general architecture suitable for a wider range of problems.
Heterogeneous computing architecture for fast detection of SNP-SNP interactions
2014-01-01
Background The extent of data in a typical genome-wide association study (GWAS) poses considerable computational challenges to software tools for gene-gene interaction discovery. Exhaustive evaluation of all interactions among hundreds of thousands to millions of single nucleotide polymorphisms (SNPs) may require weeks or even months of computation. Massively parallel hardware within a modern Graphic Processing Unit (GPU) and Many Integrated Core (MIC) coprocessors can shorten the run time considerably. While the utility of GPU-based implementations in bioinformatics has been well studied, MIC architecture has been introduced only recently and may provide a number of comparative advantages that have yet to be explored and tested. Results We have developed a heterogeneous, GPU and Intel MIC-accelerated software module for SNP-SNP interaction discovery to replace the previously single-threaded computational core in the interactive web-based data exploration program SNPsyn. We report on differences between these two modern massively parallel architectures and their software environments. Their utility resulted in an order of magnitude shorter execution times when compared to the single-threaded CPU implementation. GPU implementation on a single Nvidia Tesla K20 runs twice as fast as that for the MIC architecture-based Xeon Phi P5110 coprocessor, but also requires considerably more programming effort. Conclusions General purpose GPUs are a mature platform with large amounts of computing power capable of tackling inherently parallel problems, but can prove demanding for the programmer. On the other hand the new MIC architecture, albeit lacking in performance reduces the programming effort and makes it up with a more general architecture suitable for a wider range of problems. PMID:24964802
Application of computational physics within Northrop
NASA Technical Reports Server (NTRS)
George, M. W.; Ling, R. T.; Mangus, J. F.; Thompkins, W. T.
1987-01-01
An overview of Northrop programs in computational physics is presented. These programs depend on access to today's supercomputers, such as the Numerical Aerodynamical Simulator (NAS), and future growth on the continuing evolution of computational engines. Descriptions here are concentrated on the following areas: computational fluid dynamics (CFD), computational electromagnetics (CEM), computer architectures, and expert systems. Current efforts and future directions in these areas are presented. The impact of advances in the CFD area is described, and parallels are drawn to analagous developments in CEM. The relationship between advances in these areas and the development of advances (parallel) architectures and expert systems is also presented.
All-memristive neuromorphic computing with level-tuned neurons
NASA Astrophysics Data System (ADS)
Pantazi, Angeliki; Woźniak, Stanisław; Tuma, Tomas; Eleftheriou, Evangelos
2016-09-01
In the new era of cognitive computing, systems will be able to learn and interact with the environment in ways that will drastically enhance the capabilities of current processors, especially in extracting knowledge from vast amount of data obtained from many sources. Brain-inspired neuromorphic computing systems increasingly attract research interest as an alternative to the classical von Neumann processor architecture, mainly because of the coexistence of memory and processing units. In these systems, the basic components are neurons interconnected by synapses. The neurons, based on their nonlinear dynamics, generate spikes that provide the main communication mechanism. The computational tasks are distributed across the neural network, where synapses implement both the memory and the computational units, by means of learning mechanisms such as spike-timing-dependent plasticity. In this work, we present an all-memristive neuromorphic architecture comprising neurons and synapses realized by using the physical properties and state dynamics of phase-change memristors. The architecture employs a novel concept of interconnecting the neurons in the same layer, resulting in level-tuned neuronal characteristics that preferentially process input information. We demonstrate the proposed architecture in the tasks of unsupervised learning and detection of multiple temporal correlations in parallel input streams. The efficiency of the neuromorphic architecture along with the homogenous neuro-synaptic dynamics implemented with nanoscale phase-change memristors represent a significant step towards the development of ultrahigh-density neuromorphic co-processors.
All-memristive neuromorphic computing with level-tuned neurons.
Pantazi, Angeliki; Woźniak, Stanisław; Tuma, Tomas; Eleftheriou, Evangelos
2016-09-02
In the new era of cognitive computing, systems will be able to learn and interact with the environment in ways that will drastically enhance the capabilities of current processors, especially in extracting knowledge from vast amount of data obtained from many sources. Brain-inspired neuromorphic computing systems increasingly attract research interest as an alternative to the classical von Neumann processor architecture, mainly because of the coexistence of memory and processing units. In these systems, the basic components are neurons interconnected by synapses. The neurons, based on their nonlinear dynamics, generate spikes that provide the main communication mechanism. The computational tasks are distributed across the neural network, where synapses implement both the memory and the computational units, by means of learning mechanisms such as spike-timing-dependent plasticity. In this work, we present an all-memristive neuromorphic architecture comprising neurons and synapses realized by using the physical properties and state dynamics of phase-change memristors. The architecture employs a novel concept of interconnecting the neurons in the same layer, resulting in level-tuned neuronal characteristics that preferentially process input information. We demonstrate the proposed architecture in the tasks of unsupervised learning and detection of multiple temporal correlations in parallel input streams. The efficiency of the neuromorphic architecture along with the homogenous neuro-synaptic dynamics implemented with nanoscale phase-change memristors represent a significant step towards the development of ultrahigh-density neuromorphic co-processors.
Specialized computer architectures for computational aerodynamics
NASA Technical Reports Server (NTRS)
Stevenson, D. K.
1978-01-01
In recent years, computational fluid dynamics has made significant progress in modelling aerodynamic phenomena. Currently, one of the major barriers to future development lies in the compute-intensive nature of the numerical formulations and the relative high cost of performing these computations on commercially available general purpose computers, a cost high with respect to dollar expenditure and/or elapsed time. Today's computing technology will support a program designed to create specialized computing facilities to be dedicated to the important problems of computational aerodynamics. One of the still unresolved questions is the organization of the computing components in such a facility. The characteristics of fluid dynamic problems which will have significant impact on the choice of computer architecture for a specialized facility are reviewed.
NASA Astrophysics Data System (ADS)
Pereira, Carina; Dighe, Manjiri; Alessio, Adam M.
2018-02-01
Various Computer Aided Diagnosis (CAD) systems have been developed that characterize thyroid nodules using the features extracted from the B-mode ultrasound images and Shear Wave Elastography images (SWE). These features, however, are not perfect predictors of malignancy. In other domains, deep learning techniques such as Convolutional Neural Networks (CNNs) have outperformed conventional feature extraction based machine learning approaches. In general, fully trained CNNs require substantial volumes of data, motivating several efforts to use transfer learning with pre-trained CNNs. In this context, we sought to compare the performance of conventional feature extraction, fully trained CNNs, and transfer learning based, pre-trained CNNs for the detection of thyroid malignancy from ultrasound images. We compared these approaches applied to a data set of 964 B-mode and SWE images from 165 patients. The data were divided into 80% training/validation and 20% testing data. The highest accuracies achieved on the testing data for the conventional feature extraction, fully trained CNN, and pre-trained CNN were 0.80, 0.75, and 0.83 respectively. In this application, classification using a pre-trained network yielded the best performance, potentially due to the relatively limited sample size and sub-optimal architecture for the fully trained CNN.
Space Situational Awareness Data Processing Scalability Utilizing Google Cloud Services
NASA Astrophysics Data System (ADS)
Greenly, D.; Duncan, M.; Wysack, J.; Flores, F.
Space Situational Awareness (SSA) is a fundamental and critical component of current space operations. The term SSA encompasses the awareness, understanding and predictability of all objects in space. As the population of orbital space objects and debris increases, the number of collision avoidance maneuvers grows and prompts the need for accurate and timely process measures. The SSA mission continually evolves to near real-time assessment and analysis demanding the need for higher processing capabilities. By conventional methods, meeting these demands requires the integration of new hardware to keep pace with the growing complexity of maneuver planning algorithms. SpaceNav has implemented a highly scalable architecture that will track satellites and debris by utilizing powerful virtual machines on the Google Cloud Platform. SpaceNav algorithms for processing CDMs outpace conventional means. A robust processing environment for tracking data, collision avoidance maneuvers and various other aspects of SSA can be created and deleted on demand. Migrating SpaceNav tools and algorithms into the Google Cloud Platform will be discussed and the trials and tribulations involved. Information will be shared on how and why certain cloud products were used as well as integration techniques that were implemented. Key items to be presented are: 1.Scientific algorithms and SpaceNav tools integrated into a scalable architecture a) Maneuver Planning b) Parallel Processing c) Monte Carlo Simulations d) Optimization Algorithms e) SW Application Development/Integration into the Google Cloud Platform 2. Compute Engine Processing a) Application Engine Automated Processing b) Performance testing and Performance Scalability c) Cloud MySQL databases and Database Scalability d) Cloud Data Storage e) Redundancy and Availability
Efficient architecture for spike sorting in reconfigurable hardware.
Hwang, Wen-Jyi; Lee, Wei-Hao; Lin, Shiow-Jyu; Lai, Sheng-Ying
2013-11-01
This paper presents a novel hardware architecture for fast spike sorting. The architecture is able to perform both the feature extraction and clustering in hardware. The generalized Hebbian algorithm (GHA) and fuzzy C-means (FCM) algorithm are used for feature extraction and clustering, respectively. The employment of GHA allows efficient computation of principal components for subsequent clustering operations. The FCM is able to achieve near optimal clustering for spike sorting. Its performance is insensitive to the selection of initial cluster centers. The hardware implementations of GHA and FCM feature low area costs and high throughput. In the GHA architecture, the computation of different weight vectors share the same circuit for lowering the area costs. Moreover, in the FCM hardware implementation, the usual iterative operations for updating the membership matrix and cluster centroid are merged into one single updating process to evade the large storage requirement. To show the effectiveness of the circuit, the proposed architecture is physically implemented by field programmable gate array (FPGA). It is embedded in a System-on-Chip (SOC) platform for performance measurement. Experimental results show that the proposed architecture is an efficient spike sorting design for attaining high classification correct rate and high speed computation.
Hybrid parallel computing architecture for multiview phase shifting
NASA Astrophysics Data System (ADS)
Zhong, Kai; Li, Zhongwei; Zhou, Xiaohui; Shi, Yusheng; Wang, Congjun
2014-11-01
The multiview phase-shifting method shows its powerful capability in achieving high resolution three-dimensional (3-D) shape measurement. Unfortunately, this ability results in very high computation costs and 3-D computations have to be processed offline. To realize real-time 3-D shape measurement, a hybrid parallel computing architecture is proposed for multiview phase shifting. In this architecture, the central processing unit can co-operate with the graphic processing unit (GPU) to achieve hybrid parallel computing. The high computation cost procedures, including lens distortion rectification, phase computation, correspondence, and 3-D reconstruction, are implemented in GPU, and a three-layer kernel function model is designed to simultaneously realize coarse-grained and fine-grained paralleling computing. Experimental results verify that the developed system can perform 50 fps (frame per second) real-time 3-D measurement with 260 K 3-D points per frame. A speedup of up to 180 times is obtained for the performance of the proposed technique using a NVIDIA GT560Ti graphics card rather than a sequential C in a 3.4 GHZ Inter Core i7 3770.
The science of computing - Parallel computation
NASA Technical Reports Server (NTRS)
Denning, P. J.
1985-01-01
Although parallel computation architectures have been known for computers since the 1920s, it was only in the 1970s that microelectronic components technologies advanced to the point where it became feasible to incorporate multiple processors in one machine. Concommitantly, the development of algorithms for parallel processing also lagged due to hardware limitations. The speed of computing with solid-state chips is limited by gate switching delays. The physical limit implies that a 1 Gflop operational speed is the maximum for sequential processors. A computer recently introduced features a 'hypercube' architecture with 128 processors connected in networks at 5, 6 or 7 points per grid, depending on the design choice. Its computing speed rivals that of supercomputers, but at a fraction of the cost. The added speed with less hardware is due to parallel processing, which utilizes algorithms representing different parts of an equation that can be broken into simpler statements and processed simultaneously. Present, highly developed computer languages like FORTRAN, PASCAL, COBOL, etc., rely on sequential instructions. Thus, increased emphasis will now be directed at parallel processing algorithms to exploit the new architectures.
Analog Computation by DNA Strand Displacement Circuits.
Song, Tianqi; Garg, Sudhanshu; Mokhtar, Reem; Bui, Hieu; Reif, John
2016-08-19
DNA circuits have been widely used to develop biological computing devices because of their high programmability and versatility. Here, we propose an architecture for the systematic construction of DNA circuits for analog computation based on DNA strand displacement. The elementary gates in our architecture include addition, subtraction, and multiplication gates. The input and output of these gates are analog, which means that they are directly represented by the concentrations of the input and output DNA strands, respectively, without requiring a threshold for converting to Boolean signals. We provide detailed domain designs and kinetic simulations of the gates to demonstrate their expected performance. On the basis of these gates, we describe how DNA circuits to compute polynomial functions of inputs can be built. Using Taylor Series and Newton Iteration methods, functions beyond the scope of polynomials can also be computed by DNA circuits built upon our architecture.
Advanced Architectures for Astrophysical Supercomputing
NASA Astrophysics Data System (ADS)
Barsdell, B. R.; Barnes, D. G.; Fluke, C. J.
2010-12-01
Astronomers have come to rely on the increasing performance of computers to reduce, analyze, simulate and visualize their data. In this environment, faster computation can mean more science outcomes or the opening up of new parameter spaces for investigation. If we are to avoid major issues when implementing codes on advanced architectures, it is important that we have a solid understanding of our algorithms. A recent addition to the high-performance computing scene that highlights this point is the graphics processing unit (GPU). The hardware originally designed for speeding-up graphics rendering in video games is now achieving speed-ups of O(100×) in general-purpose computation - performance that cannot be ignored. We are using a generalized approach, based on the analysis of astronomy algorithms, to identify the optimal problem-types and techniques for taking advantage of both current GPU hardware and future developments in computing architectures.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lala, J.H.; Nagle, G.A.; Harper, R.E.
1993-05-01
The Maglev control computer system should be designed to verifiably possess high reliability and safety as well as high availability to make Maglev a dependable and attractive transportation alternative to the public. A Maglev control computer system has been designed using a design-for-validation methodology developed earlier under NASA and SDIO sponsorship for real-time aerospace applications. The present study starts by defining the maglev mission scenario and ends with the definition of a maglev control computer architecture. Key intermediate steps included definitions of functional and dependability requirements, synthesis of two candidate architectures, development of qualitative and quantitative evaluation criteria, and analyticalmore » modeling of the dependability characteristics of the two architectures. Finally, the applicability of the design-for-validation methodology was also illustrated by applying it to the German Transrapid TR07 maglev control system.« less
Computer Security Primer: Systems Architecture, Special Ontology and Cloud Virtual Machines
ERIC Educational Resources Information Center
Waguespack, Leslie J.
2014-01-01
With the increasing proliferation of multitasking and Internet-connected devices, security has reemerged as a fundamental design concern in information systems. The shift of IS curricula toward a largely organizational perspective of security leaves little room for focus on its foundation in systems architecture, the computational underpinnings of…
Narasimhan, Seetharam; Chiel, Hillel J; Bhunia, Swarup
2009-01-01
For implantable neural interface applications, it is important to compress data and analyze spike patterns across multiple channels in real time. Such a computational task for online neural data processing requires an innovative circuit-architecture level design approach for low-power, robust and area-efficient hardware implementation. Conventional microprocessor or Digital Signal Processing (DSP) chips would dissipate too much power and are too large in size for an implantable system. In this paper, we propose a novel hardware design approach, referred to as "Preferential Design" that exploits the nature of the neural signal processing algorithm to achieve a low-voltage, robust and area-efficient implementation using nanoscale process technology. The basic idea is to isolate the critical components with respect to system performance and design them more conservatively compared to the noncritical ones. This allows aggressive voltage scaling for low power operation while ensuring robustness and area efficiency. We have applied the proposed approach to a neural signal processing algorithm using the Discrete Wavelet Transform (DWT) and observed significant improvement in power and robustness over conventional design.
Drewes, A M; Nielsen, K D; Taagholt, S J; Svendsen, L; Bjerregård, K; Nielsson, L; Kristensen, L
1996-05-01
A new system for polysomnographic recording at home is presented. It consists of a 12 to 24-channel amplifier system with direct digitization of the polygraph signals using a portable computer. Sampling frequency, amplification and filter settings can be defined by the user, and the signals are evaluated at bedside. Technical testing proved a high signal/noise ratio, linear amplification and a good signal quality. Clinical testing of the first 100 recordings showed that they were acceptable for conventional sleep scoring in 98 cases. A comparison of two consecutive recordings was done in 9 healthy subjects and 11 patients with rheumatic disorders. Using conventional sleep staging, only a slight "first night effect" (FNE) was demonstrated in the sleep architecture. Power spectral analysis using autoregressive modeling demonstrated only a difference of power between the 2 nights in the beta (14.5-25 Hz) band. In conclusion, the usability and technical advantages make the system very suitable for ambulatory recordings and only a minimal FNE should be considered when results are evaluated.
Parallel Architectures and Parallel Algorithms for Integrated Vision Systems. Ph.D. Thesis
NASA Technical Reports Server (NTRS)
Choudhary, Alok Nidhi
1989-01-01
Computer vision is regarded as one of the most complex and computationally intensive problems. An integrated vision system (IVS) is a system that uses vision algorithms from all levels of processing to perform for a high level application (e.g., object recognition). An IVS normally involves algorithms from low level, intermediate level, and high level vision. Designing parallel architectures for vision systems is of tremendous interest to researchers. Several issues are addressed in parallel architectures and parallel algorithms for integrated vision systems.
ATCA for Machines-- Advanced Telecommunications Computing Architecture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Larsen, R.S.; /SLAC
2008-04-22
The Advanced Telecommunications Computing Architecture is a new industry open standard for electronics instrument modules and shelves being evaluated for the International Linear Collider (ILC). It is the first industrial standard designed for High Availability (HA). ILC availability simulations have shown clearly that the capabilities of ATCA are needed in order to achieve acceptable integrated luminosity. The ATCA architecture looks attractive for beam instruments and detector applications as well. This paper provides an overview of ongoing R&D including application of HA principles to power electronics systems.
NASA Astrophysics Data System (ADS)
Kelley, Troy D.; McGhee, S.
2013-05-01
This paper describes the ongoing development of a robotic control architecture that inspired by computational cognitive architectures from the discipline of cognitive psychology. The Symbolic and Sub-Symbolic Robotics Intelligence Control System (SS-RICS) combines symbolic and sub-symbolic representations of knowledge into a unified control architecture. The new architecture leverages previous work in cognitive architectures, specifically the development of the Adaptive Character of Thought-Rational (ACT-R) and Soar. This paper details current work on learning from episodes or events. The use of episodic memory as a learning mechanism has, until recently, been largely ignored by computational cognitive architectures. This paper details work on metric level episodic memory streams and methods for translating episodes into abstract schemas. The presentation will include research on learning through novelty and self generated feedback mechanisms for autonomous systems.
Sensing and perception: Connectionist approaches to subcognitive computing
NASA Technical Reports Server (NTRS)
Skrrypek, J.
1987-01-01
New approaches to machine sensing and perception are presented. The motivation for crossdisciplinary studies of perception in terms of AI and neurosciences is suggested. The question of computing architecture granularity as related to global/local computation underlying perceptual function is considered and examples of two environments are given. Finally, the examples of using one of the environments, UCLA PUNNS, to study neural architectures for visual function are presented.
Computer architecture evaluation for structural dynamics computations: Project summary
NASA Technical Reports Server (NTRS)
Standley, Hilda M.
1989-01-01
The intent of the proposed effort is the examination of the impact of the elements of parallel architectures on the performance realized in a parallel computation. To this end, three major projects are developed: a language for the expression of high level parallelism, a statistical technique for the synthesis of multicomputer interconnection networks based upon performance prediction, and a queueing model for the analysis of shared memory hierarchies.
2009-03-01
SENSOR NETWORKS THESIS Presented to the Faculty Department of Electrical and Computer Engineering Graduate School of Engineering and...hierarchical, and Secure Lock within a wireless sensor network (WSN) under the Hubenko architecture. Using a Matlab computer simulation, the impact of the...rekeying protocol should be applied given particular network parameters, such as WSN size. 10 1.3 Experimental Approach A computer simulation in
Kazakis, Georgios; Kanellopoulos, Ioannis; Sotiropoulos, Stefanos; Lagaros, Nikos D
2017-10-01
Construction industry has a major impact on the environment that we spend most of our life. Therefore, it is important that the outcome of architectural intuition performs well and complies with the design requirements. Architects usually describe as "optimal design" their choice among a rather limited set of design alternatives, dictated by their experience and intuition. However, modern design of structures requires accounting for a great number of criteria derived from multiple disciplines, often of conflicting nature. Such criteria derived from structural engineering, eco-design, bioclimatic and acoustic performance. The resulting vast number of alternatives enhances the need for computer-aided architecture in order to increase the possibility of arriving at a more preferable solution. Therefore, the incorporation of smart, automatic tools in the design process, able to further guide designer's intuition becomes even more indispensable. The principal aim of this study is to present possibilities to integrate automatic computational techniques related to topology optimization in the phase of intuition of civil structures as part of computer aided architectural design. In this direction, different aspects of a new computer aided architectural era related to the interpretation of the optimized designs, difficulties resulted from the increased computational effort and 3D printing capabilities are covered here in.
NASA Technical Reports Server (NTRS)
Smith, Paul H.
1988-01-01
The Computer Science Program provides advanced concepts, techniques, system architectures, algorithms, and software for both space and aeronautics information sciences and computer systems. The overall goal is to provide the technical foundation within NASA for the advancement of computing technology in aerospace applications. The research program is improving the state of knowledge of fundamental aerospace computing principles and advancing computing technology in space applications such as software engineering and information extraction from data collected by scientific instruments in space. The program includes the development of special algorithms and techniques to exploit the computing power provided by high performance parallel processors and special purpose architectures. Research is being conducted in the fundamentals of data base logic and improvement techniques for producing reliable computing systems.
A Low-Complexity Euclidean Orthogonal LDPC Architecture for Low Power Applications
Revathy, M.; Saravanan, R.
2015-01-01
Low-density parity-check (LDPC) codes have been implemented in latest digital video broadcasting, broadband wireless access (WiMax), and fourth generation of wireless standards. In this paper, we have proposed a high efficient low-density parity-check code (LDPC) decoder architecture for low power applications. This study also considers the design and analysis of check node and variable node units and Euclidean orthogonal generator in LDPC decoder architecture. The Euclidean orthogonal generator is used to reduce the error rate of the proposed LDPC architecture, which can be incorporated between check and variable node architecture. This proposed decoder design is synthesized on Xilinx 9.2i platform and simulated using Modelsim, which is targeted to 45 nm devices. Synthesis report proves that the proposed architecture greatly reduces the power consumption and hardware utilizations on comparing with different conventional architectures. PMID:26065017
Architecture for hospital information integration
NASA Astrophysics Data System (ADS)
Chimiak, William J.; Janariz, Daniel L.; Martinez, Ralph
1999-07-01
The ongoing integration of hospital information systems (HIS) continues. Data storage systems, data networks and computers improve, data bases grow and health-care applications increase. Some computer operating systems continue to evolve and some fade. Health care delivery now depends on this computer-assisted environment. The result is the critical harmonization of the various hospital information systems becomes increasingly difficult. The purpose of this paper is to present an architecture for HIS integration that is computer-language-neutral and computer- hardware-neutral for the informatics applications. The proposed architecture builds upon the work done at the University of Arizona on middleware, the work of the National Electrical Manufacturers Association, and the American College of Radiology. It is a fresh approach to allowing applications engineers to access medical data easily and thus concentrates on the application techniques in which they are expert without struggling with medical information syntaxes. The HIS can be modeled using a hierarchy of information sub-systems thus facilitating its understanding. The architecture includes the resulting information model along with a strict but intuitive application programming interface, managed by CORBA. The CORBA requirement facilitates interoperability. It should also reduce software and hardware development times.
FPGA Implementation of Generalized Hebbian Algorithm for Texture Classification
Lin, Shiow-Jyu; Hwang, Wen-Jyi; Lee, Wei-Hao
2012-01-01
This paper presents a novel hardware architecture for principal component analysis. The architecture is based on the Generalized Hebbian Algorithm (GHA) because of its simplicity and effectiveness. The architecture is separated into three portions: the weight vector updating unit, the principal computation unit and the memory unit. In the weight vector updating unit, the computation of different synaptic weight vectors shares the same circuit for reducing the area costs. To show the effectiveness of the circuit, a texture classification system based on the proposed architecture is physically implemented by Field Programmable Gate Array (FPGA). It is embedded in a System-On-Programmable-Chip (SOPC) platform for performance measurement. Experimental results show that the proposed architecture is an efficient design for attaining both high speed performance and low area costs. PMID:22778640
NASA Technical Reports Server (NTRS)
Hale, Mark A.; Craig, James I.; Mistree, Farrokh; Schrage, Daniel P.
1995-01-01
Computing architectures are being assembled that extend concurrent engineering practices by providing more efficient execution and collaboration on distributed, heterogeneous computing networks. Built on the successes of initial architectures, requirements for a next-generation design computing infrastructure can be developed. These requirements concentrate on those needed by a designer in decision-making processes from product conception to recycling and can be categorized in two areas: design process and design information management. A designer both designs and executes design processes throughout design time to achieve better product and process capabilities while expanding fewer resources. In order to accomplish this, information, or more appropriately design knowledge, needs to be adequately managed during product and process decomposition as well as recomposition. A foundation has been laid that captures these requirements in a design architecture called DREAMS (Developing Robust Engineering Analysis Models and Specifications). In addition, a computing infrastructure, called IMAGE (Intelligent Multidisciplinary Aircraft Generation Environment), is being developed that satisfies design requirements defined in DREAMS and incorporates enabling computational technologies.
Gigaflop architecture, a hardware perspective
NASA Technical Reports Server (NTRS)
Feierbach, G. F.
1978-01-01
Any super computer built in the early 1980s will use components that are available by fall 1978. The architecture of such a system cannot depart radically from current super computers if the software experience painfully acquired from these computers in the 70's is to apply. Given the above constraints, 10 billion floating point operations per second (BFLOPS) are attainable and a problem memory of 512 million (64 bit) words could be supported by the technology of the time. In contrast to this, industry is likely to respond with commercially available machines with a performance of less than 150 MFLOPS. This is due to self-imposed constraints on the manufacturers to provide upward compatible architectures (same instruction set) and systems which can be sold in significant volumes. Since this computing speed is inadequate to meet the demands of computational fluid dynamics, a special processor is required. Issues which are felt to be significant in the pursuit of maximum compute capability in this special processor are discussed.
Using Multimedia for Teaching Analysis in History of Modern Architecture.
ERIC Educational Resources Information Center
Perryman, Garry
This paper presents a case for the development and support of a computer-based interactive multimedia program for teaching analysis in community college architecture design programs. Analysis in architecture design is an extremely important strategy for the teaching of higher-order thinking skills, which senior schools of architecture look for in…
Progress in a novel architecture for high performance processing
NASA Astrophysics Data System (ADS)
Zhang, Zhiwei; Liu, Meng; Liu, Zijun; Du, Xueliang; Xie, Shaolin; Ma, Hong; Ding, Guangxin; Ren, Weili; Zhou, Fabiao; Sun, Wenqin; Wang, Huijuan; Wang, Donglin
2018-04-01
The high performance processing (HPP) is an innovative architecture which targets on high performance computing with excellent power efficiency and computing performance. It is suitable for data intensive applications like supercomputing, machine learning and wireless communication. An example chip with four application-specific integrated circuit (ASIC) cores which is the first generation of HPP cores has been taped out successfully under Taiwan Semiconductor Manufacturing Company (TSMC) 40 nm low power process. The innovative architecture shows great energy efficiency over the traditional central processing unit (CPU) and general-purpose computing on graphics processing units (GPGPU). Compared with MaPU, HPP has made great improvement in architecture. The chip with 32 HPP cores is being developed under TSMC 16 nm field effect transistor (FFC) technology process and is planed to use commercially. The peak performance of this chip can reach 4.3 teraFLOPS (TFLOPS) and its power efficiency is up to 89.5 gigaFLOPS per watt (GFLOPS/W).
Multiple Embedded Processors for Fault-Tolerant Computing
NASA Technical Reports Server (NTRS)
Bolotin, Gary; Watson, Robert; Katanyoutanant, Sunant; Burke, Gary; Wang, Mandy
2005-01-01
A fault-tolerant computer architecture has been conceived in an effort to reduce vulnerability to single-event upsets (spurious bit flips caused by impingement of energetic ionizing particles or photons). As in some prior fault-tolerant architectures, the redundancy needed for fault tolerance is obtained by use of multiple processors in one computer. Unlike prior architectures, the multiple processors are embedded in a single field-programmable gate array (FPGA). What makes this new approach practical is the recent commercial availability of FPGAs that are capable of having multiple embedded processors. A working prototype (see figure) consists of two embedded IBM PowerPC 405 processor cores and a comparator built on a Xilinx Virtex-II Pro FPGA. This relatively simple instantiation of the architecture implements an error-detection scheme. A planned future version, incorporating four processors and two comparators, would correct some errors in addition to detecting them.
A static data flow simulation study at Ames Research Center
NASA Technical Reports Server (NTRS)
Barszcz, Eric; Howard, Lauri S.
1987-01-01
Demands in computational power, particularly in the area of computational fluid dynamics (CFD), led NASA Ames Research Center to study advanced computer architectures. One architecture being studied is the static data flow architecture based on research done by Jack B. Dennis at MIT. To improve understanding of this architecture, a static data flow simulator, written in Pascal, has been implemented for use on a Cray X-MP/48. A matrix multiply and a two-dimensional fast Fourier transform (FFT), two algorithms used in CFD work at Ames, have been run on the simulator. Execution times can vary by a factor of more than 2 depending on the partitioning method used to assign instructions to processing elements. Service time for matching tokens has proved to be a major bottleneck. Loop control and array address calculation overhead can double the execution time. The best sustained MFLOPS rates were less than 50% of the maximum capability of the machine.
Strategies for concurrent processing of complex algorithms in data driven architectures
NASA Technical Reports Server (NTRS)
Stoughton, John W.; Mielke, Roland R.
1988-01-01
The purpose is to document research to develop strategies for concurrent processing of complex algorithms in data driven architectures. The problem domain consists of decision-free algorithms having large-grained, computationally complex primitive operations. Such are often found in signal processing and control applications. The anticipated multiprocessor environment is a data flow architecture containing between two and twenty computing elements. Each computing element is a processor having local program memory, and which communicates with a common global data memory. A new graph theoretic model called ATAMM which establishes rules for relating a decomposed algorithm to its execution in a data flow architecture is presented. The ATAMM model is used to determine strategies to achieve optimum time performance and to develop a system diagnostic software tool. In addition, preliminary work on a new multiprocessor operating system based on the ATAMM specifications is described.
Muller, George; Perkins, Casey J.; Lancaster, Mary J.; MacDonald, Douglas G.; Clements, Samuel L.; Hutton, William J.; Patrick, Scott W.; Key, Bradley Robert
2015-07-28
Computer-implemented security evaluation methods, security evaluation systems, and articles of manufacture are described. According to one aspect, a computer-implemented security evaluation method includes accessing information regarding a physical architecture and a cyber architecture of a facility, building a model of the facility comprising a plurality of physical areas of the physical architecture, a plurality of cyber areas of the cyber architecture, and a plurality of pathways between the physical areas and the cyber areas, identifying a target within the facility, executing the model a plurality of times to simulate a plurality of attacks against the target by an adversary traversing at least one of the areas in the physical domain and at least one of the areas in the cyber domain, and using results of the executing, providing information regarding a security risk of the facility with respect to the target.
Enterprise application architecture development based on DoDAF and TOGAF
NASA Astrophysics Data System (ADS)
Tao, Zhi-Gang; Luo, Yun-Feng; Chen, Chang-Xin; Wang, Ming-Zhe; Ni, Feng
2017-05-01
For the purpose of supporting the design and analysis of enterprise application architecture, here, we report a tailored enterprise application architecture description framework and its corresponding design method. The presented framework can effectively support service-oriented architecting and cloud computing by creating the metadata model based on architecture content framework (ACF), DoDAF metamodel (DM2) and Cloud Computing Modelling Notation (CCMN). The framework also makes an effort to extend and improve the mapping between The Open Group Architecture Framework (TOGAF) application architectural inputs/outputs, deliverables and Department of Defence Architecture Framework (DoDAF)-described models. The roadmap of 52 DoDAF-described models is constructed by creating the metamodels of these described models and analysing the constraint relationship among metamodels. By combining the tailored framework and the roadmap, this article proposes a service-oriented enterprise application architecture development process. Finally, a case study is presented to illustrate the results of implementing the tailored framework in the Southern Base Management Support and Information Platform construction project using the development process proposed by the paper.
Selecting a Benchmark Suite to Profile High-Performance Computing (HPC) Machines
2014-11-01
architectures. Machines now contain central processing units (CPUs), graphics processing units (GPUs), and many integrated core ( MIC ) architecture all...evaluate the feasibility and applicability of a new architecture just released to the market . Researchers are often unsure how available resources will...architectures. Having a suite of programs running on different architectures, such as GPUs, MICs , and CPUs, adds complexity and technical challenges
Intelligent holographic databases
NASA Astrophysics Data System (ADS)
Barbastathis, George
Memory is a key component of intelligence. In the human brain, physical structure and functionality jointly provide diverse memory modalities at multiple time scales. How could we engineer artificial memories with similar faculties? In this thesis, we attack both hardware and algorithmic aspects of this problem. A good part is devoted to holographic memory architectures, because they meet high capacity and parallelism requirements. We develop and fully characterize shift multiplexing, a novel storage method that simplifies disk head design for holographic disks. We develop and optimize the design of compact refreshable holographic random access memories, showing several ways that 1 Tbit can be stored holographically in volume less than 1 m3, with surface density more than 20 times higher than conventional silicon DRAM integrated circuits. To address the issue of photorefractive volatility, we further develop the two-lambda (dual wavelength) method for shift multiplexing, and combine electrical fixing with angle multiplexing to demonstrate 1,000 multiplexed fixed holograms. Finally, we propose a noise model and an information theoretic metric to optimize the imaging system of a holographic memory, in terms of storage density and error rate. Motivated by the problem of interfacing sensors and memories to a complex system with limited computational resources, we construct a computer game of Desert Survival, built as a high-dimensional non-stationary virtual environment in a competitive setting. The efficacy of episodic learning, implemented as a reinforced Nearest Neighbor scheme, and the probability of winning against a control opponent improve significantly by concentrating the algorithmic effort to the virtual desert neighborhood that emerges as most significant at any time. The generalized computational model combines the autonomous neural network and von Neumann paradigms through a compact, dynamic central representation, which contains the most salient features of the sensory inputs, fused with relevant recollections, reminiscent of the hypothesized cognitive function of awareness. The Declarative Memory is searched both by content and address, suggesting a holographic implementation. The proposed computer architecture may lead to a novel paradigm that solves 'hard' cognitive problems at low cost.
NASA Technical Reports Server (NTRS)
Fijany, Amir (Inventor); Bejczy, Antal K. (Inventor)
1993-01-01
This is a real-time robotic controller and simulator which is a MIMD-SIMD parallel architecture for interfacing with an external host computer and providing a high degree of parallelism in computations for robotic control and simulation. It includes a host processor for receiving instructions from the external host computer and for transmitting answers to the external host computer. There are a plurality of SIMD microprocessors, each SIMD processor being a SIMD parallel processor capable of exploiting fine grain parallelism and further being able to operate asynchronously to form a MIMD architecture. Each SIMD processor comprises a SIMD architecture capable of performing two matrix-vector operations in parallel while fully exploiting parallelism in each operation. There is a system bus connecting the host processor to the plurality of SIMD microprocessors and a common clock providing a continuous sequence of clock pulses. There is also a ring structure interconnecting the plurality of SIMD microprocessors and connected to the clock for providing the clock pulses to the SIMD microprocessors and for providing a path for the flow of data and instructions between the SIMD microprocessors. The host processor includes logic for controlling the RRCS by interpreting instructions sent by the external host computer, decomposing the instructions into a series of computations to be performed by the SIMD microprocessors, using the system bus to distribute associated data among the SIMD microprocessors, and initiating activity of the SIMD microprocessors to perform the computations on the data by procedure call.
NASA Astrophysics Data System (ADS)
Mehta, Neville; Kompalli, Suryaprakash; Chaudhary, Vipin
Teleradiology is the electronic transmission of radiological patient images, such as x-rays, CT, or MR across multiple locations. The goal could be interpretation, consultation, or medical records keeping. Information technology solutions have enabled electronic records and their associated benefits are evident in health care today. However, salient aspects of collaborative interfaces, and computer assisted diagnostic (CAD) tools are yet to be integrated into workflow designs. The Computer Assisted Diagnostics and Interventions (CADI) group at the University at Buffalo has developed an architecture that facilitates web-enabled use of CAD tools, along with the novel concept of synchronized collaboration. The architecture can support multiple teleradiology applications and case studies are presented here.
Design of a modular digital computer system, CDRL no. D001, final design plan
NASA Technical Reports Server (NTRS)
Easton, R. A.
1975-01-01
The engineering breadboard implementation for the CDRL no. D001 modular digital computer system developed during design of the logic system was documented. This effort followed the architecture study completed and documented previously, and was intended to verify the concepts of a fault tolerant, automatically reconfigurable, modular version of the computer system conceived during the architecture study. The system has a microprogrammed 32 bit word length, general register architecture and an instruction set consisting of a subset of the IBM System 360 instruction set plus additional fault tolerance firmware. The following areas were covered: breadboard packaging, central control element, central processing element, memory, input/output processor, and maintenance/status panel and electronics.
NASA Astrophysics Data System (ADS)
Dave, Gaurav P.; Sureshkumar, N.; Blessy Trencia Lincy, S. S.
2017-11-01
Current trend in processor manufacturing focuses on multi-core architectures rather than increasing the clock speed for performance improvement. Graphic processors have become as commodity hardware for providing fast co-processing in computer systems. Developments in IoT, social networking web applications, big data created huge demand for data processing activities and such kind of throughput intensive applications inherently contains data level parallelism which is more suited for SIMD architecture based GPU. This paper reviews the architectural aspects of multi/many core processors and graphics processors. Different case studies are taken to compare performance of throughput computing applications using shared memory programming in OpenMP and CUDA API based programming.
Examining the architecture of cellular computing through a comparative study with a computer
Wang, Degeng; Gribskov, Michael
2005-01-01
The computer and the cell both use information embedded in simple coding, the binary software code and the quadruple genomic code, respectively, to support system operations. A comparative examination of their system architecture as well as their information storage and utilization schemes is performed. On top of the code, both systems display a modular, multi-layered architecture, which, in the case of a computer, arises from human engineering efforts through a combination of hardware implementation and software abstraction. Using the computer as a reference system, a simplistic mapping of the architectural components between the two is easily detected. This comparison also reveals that a cell abolishes the software–hardware barrier through genomic encoding for the constituents of the biochemical network, a cell's ‘hardware’ equivalent to the computer central processing unit (CPU). The information loading (gene expression) process acts as a major determinant of the encoded constituent's abundance, which, in turn, often determines the ‘bandwidth’ of a biochemical pathway. Cellular processes are implemented in biochemical pathways in parallel manners. In a computer, on the other hand, the software provides only instructions and data for the CPU. A process represents just sequentially ordered actions by the CPU and only virtual parallelism can be implemented through CPU time-sharing. Whereas process management in a computer may simply mean job scheduling, coordinating pathway bandwidth through the gene expression machinery represents a major process management scheme in a cell. In summary, a cell can be viewed as a super-parallel computer, which computes through controlled hardware composition. While we have, at best, a very fragmented understanding of cellular operation, we have a thorough understanding of the computer throughout the engineering process. The potential utilization of this knowledge to the benefit of systems biology is discussed. PMID:16849179
Examining the architecture of cellular computing through a comparative study with a computer.
Wang, Degeng; Gribskov, Michael
2005-06-22
The computer and the cell both use information embedded in simple coding, the binary software code and the quadruple genomic code, respectively, to support system operations. A comparative examination of their system architecture as well as their information storage and utilization schemes is performed. On top of the code, both systems display a modular, multi-layered architecture, which, in the case of a computer, arises from human engineering efforts through a combination of hardware implementation and software abstraction. Using the computer as a reference system, a simplistic mapping of the architectural components between the two is easily detected. This comparison also reveals that a cell abolishes the software-hardware barrier through genomic encoding for the constituents of the biochemical network, a cell's "hardware" equivalent to the computer central processing unit (CPU). The information loading (gene expression) process acts as a major determinant of the encoded constituent's abundance, which, in turn, often determines the "bandwidth" of a biochemical pathway. Cellular processes are implemented in biochemical pathways in parallel manners. In a computer, on the other hand, the software provides only instructions and data for the CPU. A process represents just sequentially ordered actions by the CPU and only virtual parallelism can be implemented through CPU time-sharing. Whereas process management in a computer may simply mean job scheduling, coordinating pathway bandwidth through the gene expression machinery represents a major process management scheme in a cell. In summary, a cell can be viewed as a super-parallel computer, which computes through controlled hardware composition. While we have, at best, a very fragmented understanding of cellular operation, we have a thorough understanding of the computer throughout the engineering process. The potential utilization of this knowledge to the benefit of systems biology is discussed.
Communication and re-use of chemical information in bioscience
Murray-Rust, Peter; Mitchell, John BO; Rzepa, Henry S
2005-01-01
The current methods of publishing chemical information in bioscience articles are analysed. Using 3 papers as use-cases, it is shown that conventional methods using human procedures, including cut-and-paste are time-consuming and introduce errors. The meaning of chemical terms and the identity of compounds is often ambiguous. valuable experimental data such as spectra and computational results are almost always omitted. We describe an Open XML architecture at proof-of-concept which addresses these concerns. Compounds are identified through explicit connection tables or links to persistent Open resources such as PubChem. It is argued that if publishers adopt these tools and protocols, then the quality and quantity of chemical information available to bioscientists will increase and the authors, publishers and readers will find the process cost-effective. PMID:16026614
Robotic inspection of fiber reinforced composites using phased array UT
NASA Astrophysics Data System (ADS)
Stetson, Jeffrey T.; De Odorico, Walter
2014-02-01
Ultrasound is the current NDE method of choice to inspect large fiber reinforced airframe structures. Over the last 15 years Cartesian based scanning machines using conventional ultrasound techniques have been employed by all airframe OEMs and their top tier suppliers to perform these inspections. Technical advances in both computing power and commercially available, multi-axis robots now facilitate a new generation of scanning machines. These machines use multiple end effector tools taking full advantage of phased array ultrasound technologies yielding substantial improvements in inspection quality and productivity. This paper outlines the general architecture for these new robotic scanning systems as well as details the variety of ultrasonic techniques available for use with them including advances such as wide area phased array scanning and sound field adaptation for non-flat, non-parallel surfaces.
Enabling the High Level Synthesis of Data Analytics Accelerators
DOE Office of Scientific and Technical Information (OSTI.GOV)
Minutoli, Marco; Castellana, Vito G.; Tumeo, Antonino
Conventional High Level Synthesis (HLS) tools mainly tar- get compute intensive kernels typical of digital signal pro- cessing applications. We are developing techniques and ar- chitectural templates to enable HLS of data analytics appli- cations. These applications are memory intensive, present fine-grained, unpredictable data accesses, and irregular, dy- namic task parallelism. We discuss an architectural tem- plate based around a distributed controller to efficiently ex- ploit thread level parallelism. We present a memory in- terface that supports parallel memory subsystems and en- ables implementing atomic memory operations. We intro- duce a dynamic task scheduling approach to efficiently ex- ecute heavilymore » unbalanced workload. The templates are val- idated by synthesizing queries from the Lehigh University Benchmark (LUBM), a well know SPARQL benchmark.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Subekti, M.; Center for Development of Reactor Safety Technology, National Nuclear Energy Agency of Indonesia, Puspiptek Complex BO.80, Serpong-Tangerang, 15340; Ohno, T.
2006-07-01
The neuro-expert has been utilized in previous monitoring-system research of Pressure Water Reactor (PWR). The research improved the monitoring system by utilizing neuro-expert, conventional noise analysis and modified neural networks for capability extension. The parallel method applications required distributed architecture of computer-network for performing real-time tasks. The research aimed to improve the previous monitoring system, which could detect sensor degradation, and to perform the monitoring demonstration in High Temperature Engineering Tested Reactor (HTTR). The developing monitoring system based on some methods that have been tested using the data from online PWR simulator, as well as RSG-GAS (30 MW research reactormore » in Indonesia), will be applied in HTTR for more complex monitoring. (authors)« less
SNAVA-A real-time multi-FPGA multi-model spiking neural network simulation architecture.
Sripad, Athul; Sanchez, Giovanny; Zapata, Mireya; Pirrone, Vito; Dorta, Taho; Cambria, Salvatore; Marti, Albert; Krishnamourthy, Karthikeyan; Madrenas, Jordi
2018-01-01
Spiking Neural Networks (SNN) for Versatile Applications (SNAVA) simulation platform is a scalable and programmable parallel architecture that supports real-time, large-scale, multi-model SNN computation. This parallel architecture is implemented in modern Field-Programmable Gate Arrays (FPGAs) devices to provide high performance execution and flexibility to support large-scale SNN models. Flexibility is defined in terms of programmability, which allows easy synapse and neuron implementation. This has been achieved by using a special-purpose Processing Elements (PEs) for computing SNNs, and analyzing and customizing the instruction set according to the processing needs to achieve maximum performance with minimum resources. The parallel architecture is interfaced with customized Graphical User Interfaces (GUIs) to configure the SNN's connectivity, to compile the neuron-synapse model and to monitor SNN's activity. Our contribution intends to provide a tool that allows to prototype SNNs faster than on CPU/GPU architectures but significantly cheaper than fabricating a customized neuromorphic chip. This could be potentially valuable to the computational neuroscience and neuromorphic engineering communities. Copyright © 2017 Elsevier Ltd. All rights reserved.
Design and Analysis of a Neuromemristive Reservoir Computing Architecture for Biosignal Processing
Kudithipudi, Dhireesha; Saleh, Qutaiba; Merkel, Cory; Thesing, James; Wysocki, Bryant
2016-01-01
Reservoir computing (RC) is gaining traction in several signal processing domains, owing to its non-linear stateful computation, spatiotemporal encoding, and reduced training complexity over recurrent neural networks (RNNs). Previous studies have shown the effectiveness of software-based RCs for a wide spectrum of applications. A parallel body of work indicates that realizing RNN architectures using custom integrated circuits and reconfigurable hardware platforms yields significant improvements in power and latency. In this research, we propose a neuromemristive RC architecture, with doubly twisted toroidal structure, that is validated for biosignal processing applications. We exploit the device mismatch to implement the random weight distributions within the reservoir and propose mixed-signal subthreshold circuits for energy efficiency. A comprehensive analysis is performed to compare the efficiency of the neuromemristive RC architecture in both digital(reconfigurable) and subthreshold mixed-signal realizations. Both Electroencephalogram (EEG) and Electromyogram (EMG) biosignal benchmarks are used for validating the RC designs. The proposed RC architecture demonstrated an accuracy of 90 and 84% for epileptic seizure detection and EMG prosthetic finger control, respectively. PMID:26869876
Yoo, Dongjin
2012-07-01
Advanced additive manufacture (AM) techniques are now being developed to fabricate scaffolds with controlled internal pore architectures in the field of tissue engineering. In general, these techniques use a hybrid method which combines computer-aided design (CAD) with computer-aided manufacturing (CAM) tools to design and fabricate complicated three-dimensional (3D) scaffold models. The mathematical descriptions of micro-architectures along with the macro-structures of the 3D scaffold models are limited by current CAD technologies as well as by the difficulty of transferring the designed digital models to standard formats for fabrication. To overcome these difficulties, we have developed an efficient internal pore architecture design system based on triply periodic minimal surface (TPMS) unit cell libraries and associated computational methods to assemble TPMS unit cells into an entire scaffold model. In addition, we have developed a process planning technique based on TPMS internal architecture pattern of unit cells to generate tool paths for freeform fabrication of tissue engineering porous scaffolds. Copyright © 2012 IPEM. Published by Elsevier Ltd. All rights reserved.
Embedded Data Processor and Portable Computer Technology testbeds
NASA Technical Reports Server (NTRS)
Alena, Richard; Liu, Yuan-Kwei; Goforth, Andre; Fernquist, Alan R.
1993-01-01
Attention is given to current activities in the Embedded Data Processor and Portable Computer Technology testbed configurations that are part of the Advanced Data Systems Architectures Testbed at the Information Sciences Division at NASA Ames Research Center. The Embedded Data Processor Testbed evaluates advanced microprocessors for potential use in mission and payload applications within the Space Station Freedom Program. The Portable Computer Technology (PCT) Testbed integrates and demonstrates advanced portable computing devices and data system architectures. The PCT Testbed uses both commercial and custom-developed devices to demonstrate the feasibility of functional expansion and networking for portable computers in flight missions.
A supportive architecture for CFD-based design optimisation
NASA Astrophysics Data System (ADS)
Li, Ni; Su, Zeya; Bi, Zhuming; Tian, Chao; Ren, Zhiming; Gong, Guanghong
2014-03-01
Multi-disciplinary design optimisation (MDO) is one of critical methodologies to the implementation of enterprise systems (ES). MDO requiring the analysis of fluid dynamics raises a special challenge due to its extremely intensive computation. The rapid development of computational fluid dynamic (CFD) technique has caused a rise of its applications in various fields. Especially for the exterior designs of vehicles, CFD has become one of the three main design tools comparable to analytical approaches and wind tunnel experiments. CFD-based design optimisation is an effective way to achieve the desired performance under the given constraints. However, due to the complexity of CFD, integrating with CFD analysis in an intelligent optimisation algorithm is not straightforward. It is a challenge to solve a CFD-based design problem, which is usually with high dimensions, and multiple objectives and constraints. It is desirable to have an integrated architecture for CFD-based design optimisation. However, our review on existing works has found that very few researchers have studied on the assistive tools to facilitate CFD-based design optimisation. In the paper, a multi-layer architecture and a general procedure are proposed to integrate different CFD toolsets with intelligent optimisation algorithms, parallel computing technique and other techniques for efficient computation. In the proposed architecture, the integration is performed either at the code level or data level to fully utilise the capabilities of different assistive tools. Two intelligent algorithms are developed and embedded with parallel computing. These algorithms, together with the supportive architecture, lay a solid foundation for various applications of CFD-based design optimisation. To illustrate the effectiveness of the proposed architecture and algorithms, the case studies on aerodynamic shape design of a hypersonic cruising vehicle are provided, and the result has shown that the proposed architecture and developed algorithms have performed successfully and efficiently in dealing with the design optimisation with over 200 design variables.
Prospective Architectures for Onboard vs Cloud-Based Decision Making for Unmanned Aerial Systems
NASA Technical Reports Server (NTRS)
Sankararaman, Shankar; Teubert, Christopher
2017-01-01
This paper investigates propsective architectures for decision-making in unmanned aerial systems. When these unmanned vehicles operate in urban environments, there are several sources of uncertainty that affect their behavior, and decision-making algorithms need to be robust to account for these different sources of uncertainty. It is important to account for several risk-factors that affect the flight of these unmanned systems, and facilitate decision-making by taking into consideration these various risk-factors. In addition, there are several technical challenges related to autonomous flight of unmanned aerial systems; these challenges include sensing, obstacle detection, path planning and navigation, trajectory generation and selection, etc. Many of these activities require significant computational power and in many situations, all of these activities need to be performed in real-time. In order to efficiently integrate these activities, it is important to develop a systematic architecture that can facilitate real-time decision-making. Four prospective architectures are discussed in this paper; on one end of the spectrum, the first architecture considers all activities/computations being performed onboard the vehicle whereas on the other end of the spectrum, the fourth and final architecture considers all activities/computations being performed in the cloud, using a new service known as Prognostics as a Service that is being developed at NASA Ames Research Center. The four different architectures are compared, their advantages and disadvantages are explained and conclusions are presented.
NASA Astrophysics Data System (ADS)
Andrade, Xavier; Alberdi-Rodriguez, Joseba; Strubbe, David A.; Oliveira, Micael J. T.; Nogueira, Fernando; Castro, Alberto; Muguerza, Javier; Arruabarrena, Agustin; Louie, Steven G.; Aspuru-Guzik, Alán; Rubio, Angel; Marques, Miguel A. L.
2012-06-01
Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures.
Optimizing transformations of stencil operations for parallel cache-based architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bassetti, F.; Davis, K.
This paper describes a new technique for optimizing serial and parallel stencil- and stencil-like operations for cache-based architectures. This technique takes advantage of the semantic knowledge implicity in stencil-like computations. The technique is implemented as a source-to-source program transformation; because of its specificity it could not be expected of a conventional compiler. Empirical results demonstrate a uniform factor of two speedup. The experiments clearly show the benefits of this technique to be a consequence, as intended, of the reduction in cache misses. The test codes are based on a 5-point stencil obtained by the discretization of the Poisson equation andmore » applied to a two-dimensional uniform grid using the Jacobi method as an iterative solver. Results are presented for a 1-D tiling for a single processor, and in parallel using 1-D data partition. For the parallel case both blocking and non-blocking communication are tested. The same scheme of experiments has bee n performed for the 2-D tiling case. However, for the parallel case the 2-D partitioning is not discussed here, so the parallel case handled for 2-D is 2-D tiling with 1-D data partitioning.« less
Andrade, Xavier; Alberdi-Rodriguez, Joseba; Strubbe, David A; Oliveira, Micael J T; Nogueira, Fernando; Castro, Alberto; Muguerza, Javier; Arruabarrena, Agustin; Louie, Steven G; Aspuru-Guzik, Alán; Rubio, Angel; Marques, Miguel A L
2012-06-13
Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures.
NASA Astrophysics Data System (ADS)
Zaveri, Mazad Shaheriar
The semiconductor/computer industry has been following Moore's law for several decades and has reaped the benefits in speed and density of the resultant scaling. Transistor density has reached almost one billion per chip, and transistor delays are in picoseconds. However, scaling has slowed down, and the semiconductor industry is now facing several challenges. Hybrid CMOS/nano technologies, such as CMOL, are considered as an interim solution to some of the challenges. Another potential architectural solution includes specialized architectures for applications/models in the intelligent computing domain, one aspect of which includes abstract computational models inspired from the neuro/cognitive sciences. Consequently in this dissertation, we focus on the hardware implementations of Bayesian Memory (BM), which is a (Bayesian) Biologically Inspired Computational Model (BICM). This model is a simplified version of George and Hawkins' model of the visual cortex, which includes an inference framework based on Judea Pearl's belief propagation. We then present a "hardware design space exploration" methodology for implementing and analyzing the (digital and mixed-signal) hardware for the BM. This particular methodology involves: analyzing the computational/operational cost and the related micro-architecture, exploring candidate hardware components, proposing various custom hardware architectures using both traditional CMOS and hybrid nanotechnology - CMOL, and investigating the baseline performance/price of these architectures. The results suggest that CMOL is a promising candidate for implementing a BM. Such implementations can utilize the very high density storage/computation benefits of these new nano-scale technologies much more efficiently; for example, the throughput per 858 mm2 (TPM) obtained for CMOL based architectures is 32 to 40 times better than the TPM for a CMOS based multiprocessor/multi-FPGA system, and almost 2000 times better than the TPM for a PC implementation. We later use this methodology to investigate the hardware implementations of cortex-scale spiking neural system, which is an approximate neural equivalent of BICM based cortex-scale system. The results of this investigation also suggest that CMOL is a promising candidate to implement such large-scale neuromorphic systems. In general, the assessment of such hypothetical baseline hardware architectures provides the prospects for building large-scale (mammalian cortex-scale) implementations of neuromorphic/Bayesian/intelligent systems using state-of-the-art and beyond state-of-the-art silicon structures.
NASA Astrophysics Data System (ADS)
Lhamon, Michael Earl
A pattern recognition system which uses complex correlation filter banks requires proportionally more computational effort than single-real valued filters. This introduces increased computation burden but also introduces a higher level of parallelism, that common computing platforms fail to identify. As a result, we consider algorithm mapping to both optical and digital processors. For digital implementation, we develop computationally efficient pattern recognition algorithms, referred to as, vector inner product operators that require less computational effort than traditional fast Fourier methods. These algorithms do not need correlation and they map readily onto parallel digital architectures, which imply new architectures for optical processors. These filters exploit circulant-symmetric matrix structures of the training set data representing a variety of distortions. By using the same mathematical basis as with the vector inner product operations, we are able to extend the capabilities of more traditional correlation filtering to what we refer to as "Super Images". These "Super Images" are used to morphologically transform a complicated input scene into a predetermined dot pattern. The orientation of the dot pattern is related to the rotational distortion of the object of interest. The optical implementation of "Super Images" yields feature reduction necessary for using other techniques, such as artificial neural networks. We propose a parallel digital signal processor architecture based on specific pattern recognition algorithms but general enough to be applicable to other similar problems. Such an architecture is classified as a data flow architecture. Instead of mapping an algorithm to an architecture, we propose mapping the DSP architecture to a class of pattern recognition algorithms. Today's optical processing systems have difficulties implementing full complex filter structures. Typically, optical systems (like the 4f correlators) are limited to phase-only implementation with lower detection performance than full complex electronic systems. Our study includes pseudo-random pixel encoding techniques for approximating full complex filtering. Optical filter bank implementation is possible and they have the advantage of time averaging the entire filter bank at real time rates. Time-averaged optical filtering is computational comparable to billions of digital operations-per-second. For this reason, we believe future trends in high speed pattern recognition will involve hybrid architectures of both optical and DSP elements.
Super and parallel computers and their impact on civil engineering
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kamat, M.P.
1986-01-01
This book presents the papers given at a conference on the use of supercomputers in civil engineering. Topics considered at the conference included solving nonlinear equations on a hypercube, a custom architectured parallel processing system, distributed data processing, algorithms, computer architecture, parallel processing, vector processing, computerized simulation, and cost benefit analysis.
p88110: A Graphical Simulator for Computer Architecture and Organization Courses
ERIC Educational Resources Information Center
Garcia, M. I.; Rodriguez, S.; Perez, A.; Garcia, A.
2009-01-01
Studying fundamental Computer Architecture and Organization topics requires a significant amount of practical work if students are to acquire a good grasp of the theoretical concepts presented in classroom lectures or textbooks. The use of simulators is commonly adopted in order to reach this objective. However, as most of the available…
Component architecture in drug discovery informatics.
Smith, Peter M
2002-05-01
This paper reviews the characteristics of a new model of computing that has been spurred on by the Internet, known as Netcentric computing. Developments in this model led to distributed component architectures, which, although not new ideas, are now realizable with modern tools such as Enterprise Java. The application of this approach to scientific computing, particularly in pharmaceutical discovery research, is discussed and highlighted by a particular case involving the management of biological assay data.
FPGA-based architecture for motion recovering in real-time
NASA Astrophysics Data System (ADS)
Arias-Estrada, Miguel; Maya-Rueda, Selene E.; Torres-Huitzil, Cesar
2002-03-01
A key problem in the computer vision field is the measurement of object motion in a scene. The main goal is to compute an approximation of the 3D motion from the analysis of an image sequence. Once computed, this information can be used as a basis to reach higher level goals in different applications. Motion estimation algorithms pose a significant computational load for the sequential processors limiting its use in practical applications. In this work we propose a hardware architecture for motion estimation in real time based on FPGA technology. The technique used for motion estimation is Optical Flow due to its accuracy, and the density of velocity estimation, however other techniques are being explored. The architecture is composed of parallel modules working in a pipeline scheme to reach high throughput rates near gigaflops. The modules are organized in a regular structure to provide a high degree of flexibility to cover different applications. Some results will be presented and the real-time performance will be discussed and analyzed. The architecture is prototyped in an FPGA board with a Virtex device interfaced to a digital imager.
High-power, fixed, and tunable wavelength, grating-free cascaded Raman fiber lasers.
Balaswamy, V; Arun, S; Aparanji, Santosh; Choudhury, Vishal; Supradeepa, V R
2018-04-01
Cascaded Raman lasers enable high powers at various wavelength bands inaccessible with conventional rare-earth-doped lasers. The input and output wavelengths of conventional implementations are fixed by the constituent fiber gratings necessary for cascaded Raman conversion. We demonstrate here a simple architecture for high-power, fixed, and wavelength tunable, grating-free, cascaded Raman conversion between different wavelength bands. The architecture is based on the recently proposed distributed feedback Raman lasers. Here, we implement a module which converts the ytterbium band to the eye-safe 1.5 μm region. We demonstrate pump-limited output powers of over 30 W in fixed and continuously wavelength tunable configurations.
Parallel, stochastic measurement of molecular surface area.
Juba, Derek; Varshney, Amitabh
2008-08-01
Biochemists often wish to compute surface areas of proteins. A variety of algorithms have been developed for this task, but they are designed for traditional single-processor architectures. The current trend in computer hardware is towards increasingly parallel architectures for which these algorithms are not well suited. We describe a parallel, stochastic algorithm for molecular surface area computation that maps well to the emerging multi-core architectures. Our algorithm is also progressive, providing a rough estimate of surface area immediately and refining this estimate as time goes on. Furthermore, the algorithm generates points on the molecular surface which can be used for point-based rendering. We demonstrate a GPU implementation of our algorithm and show that it compares favorably with several existing molecular surface computation programs, giving fast estimates of the molecular surface area with good accuracy.
NASA Technical Reports Server (NTRS)
Torres-Pomales, Wilfredo
2014-01-01
This report presents an example of the application of multi-criteria decision analysis to the selection of an architecture for a safety-critical distributed computer system. The design problem includes constraints on minimum system availability and integrity, and the decision is based on the optimal balance of power, weight and cost. The analysis process includes the generation of alternative architectures, evaluation of individual decision criteria, and the selection of an alternative based on overall value. In this example presented here, iterative application of the quantitative evaluation process made it possible to deliberately generate an alternative architecture that is superior to all others regardless of the relative importance of cost.
Fault tolerant architectures for integrated aircraft electronics systems, task 2
NASA Technical Reports Server (NTRS)
Levitt, K. N.; Melliar-Smith, P. M.; Schwartz, R. L.
1984-01-01
The architectural basis for an advanced fault tolerant on-board computer to succeed the current generation of fault tolerant computers is examined. The network error tolerant system architecture is studied with particular attention to intercluster configurations and communication protocols, and to refined reliability estimates. The diagnosis of faults, so that appropriate choices for reconfiguration can be made is discussed. The analysis relates particularly to the recognition of transient faults in a system with tasks at many levels of priority. The demand driven data-flow architecture, which appears to have possible application in fault tolerant systems is described and work investigating the feasibility of automatic generation of aircraft flight control programs from abstract specifications is reported.
NASA Astrophysics Data System (ADS)
Liu, Lei; Hong, Xiaobin; Wu, Jian; Lin, Jintong
As Grid computing continues to gain popularity in the industry and research community, it also attracts more attention from the customer level. The large number of users and high frequency of job requests in the consumer market make it challenging. Clearly, all the current Client/Server(C/S)-based architecture will become unfeasible for supporting large-scale Grid applications due to its poor scalability and poor fault-tolerance. In this paper, based on our previous works [1, 2], a novel self-organized architecture to realize a highly scalable and flexible platform for Grids is proposed. Experimental results show that this architecture is suitable and efficient for consumer-oriented Grids.
A cognitive computational model inspired by the immune system response.
Abdo Abd Al-Hady, Mohamed; Badr, Amr Ahmed; Mostafa, Mostafa Abd Al-Azim
2014-01-01
The immune system has a cognitive ability to differentiate between healthy and unhealthy cells. The immune system response (ISR) is stimulated by a disorder in the temporary fuzzy state that is oscillating between the healthy and unhealthy states. However, modeling the immune system is an enormous challenge; the paper introduces an extensive summary of how the immune system response functions, as an overview of a complex topic, to present the immune system as a cognitive intelligent agent. The homogeneity and perfection of the natural immune system have been always standing out as the sought-after model we attempted to imitate while building our proposed model of cognitive architecture. The paper divides the ISR into four logical phases: setting a computational architectural diagram for each phase, proceeding from functional perspectives (input, process, and output), and their consequences. The proposed architecture components are defined by matching biological operations with computational functions and hence with the framework of the paper. On the other hand, the architecture focuses on the interoperability of main theoretical immunological perspectives (classic, cognitive, and danger theory), as related to computer science terminologies. The paper presents a descriptive model of immune system, to figure out the nature of response, deemed to be intrinsic for building a hybrid computational model based on a cognitive intelligent agent perspective and inspired by the natural biology. To that end, this paper highlights the ISR phases as applied to a case study on hepatitis C virus, meanwhile illustrating our proposed architecture perspective.
A Cognitive Computational Model Inspired by the Immune System Response
Abdo Abd Al-Hady, Mohamed; Badr, Amr Ahmed; Mostafa, Mostafa Abd Al-Azim
2014-01-01
The immune system has a cognitive ability to differentiate between healthy and unhealthy cells. The immune system response (ISR) is stimulated by a disorder in the temporary fuzzy state that is oscillating between the healthy and unhealthy states. However, modeling the immune system is an enormous challenge; the paper introduces an extensive summary of how the immune system response functions, as an overview of a complex topic, to present the immune system as a cognitive intelligent agent. The homogeneity and perfection of the natural immune system have been always standing out as the sought-after model we attempted to imitate while building our proposed model of cognitive architecture. The paper divides the ISR into four logical phases: setting a computational architectural diagram for each phase, proceeding from functional perspectives (input, process, and output), and their consequences. The proposed architecture components are defined by matching biological operations with computational functions and hence with the framework of the paper. On the other hand, the architecture focuses on the interoperability of main theoretical immunological perspectives (classic, cognitive, and danger theory), as related to computer science terminologies. The paper presents a descriptive model of immune system, to figure out the nature of response, deemed to be intrinsic for building a hybrid computational model based on a cognitive intelligent agent perspective and inspired by the natural biology. To that end, this paper highlights the ISR phases as applied to a case study on hepatitis C virus, meanwhile illustrating our proposed architecture perspective. PMID:25003131
Generic Divide and Conquer Internet-Based Computing
NASA Technical Reports Server (NTRS)
Follen, Gregory J. (Technical Monitor); Radenski, Atanas
2003-01-01
The growth of Internet-based applications and the proliferation of networking technologies have been transforming traditional commercial application areas as well as computer and computational sciences and engineering. This growth stimulates the exploration of Peer to Peer (P2P) software technologies that can open new research and application opportunities not only for the commercial world, but also for the scientific and high-performance computing applications community. The general goal of this project is to achieve better understanding of the transition to Internet-based high-performance computing and to develop solutions for some of the technical challenges of this transition. In particular, we are interested in creating long-term motivation for end users to provide their idle processor time to support computationally intensive tasks. We believe that a practical P2P architecture should provide useful service to both clients with high-performance computing needs and contributors of lower-end computing resources. To achieve this, we are designing dual -service architecture for P2P high-performance divide-and conquer computing; we are also experimenting with a prototype implementation. Our proposed architecture incorporates a master server, utilizes dual satellite servers, and operates on the Internet in a dynamically changing large configuration of lower-end nodes provided by volunteer contributors. A dual satellite server comprises a high-performance computing engine and a lower-end contributor service engine. The computing engine provides generic support for divide and conquer computations. The service engine is intended to provide free useful HTTP-based services to contributors of lower-end computing resources. Our proposed architecture is complementary to and accessible from computational grids, such as Globus, Legion, and Condor. Grids provide remote access to existing higher-end computing resources; in contrast, our goal is to utilize idle processor time of lower-end Internet nodes. Our project is focused on a generic divide and conquer paradigm and on mobile applications of this paradigm that can operate on a loose and ever changing pool of lower-end Internet nodes.
Data Compression for Maskless Lithography Systems: Architecture, Algorithms and Implementation
2008-05-19
Data Compression for Maskless Lithography Systems: Architecture, Algorithms and Implementation Vito Dai Electrical Engineering and Computer Sciences...servers or to redistribute to lists, requires prior specific permission. Data Compression for Maskless Lithography Systems: Architecture, Algorithms and...for Maskless Lithography Systems: Architecture, Algorithms and Implementation Copyright 2008 by Vito Dai 1 Abstract Data Compression for Maskless
Computer aided system engineering for space construction
NASA Technical Reports Server (NTRS)
Racheli, Ugo
1989-01-01
This viewgraph presentation covers the following topics. Construction activities envisioned for the assembly of large platforms in space (as well as interplanetary spacecraft and bases on extraterrestrial surfaces) require computational tools that exceed the capability of conventional construction management programs. The Center for Space Construction is investigating the requirements for new computational tools and, at the same time, suggesting the expansion of graduate and undergraduate curricula to include proficiency in Computer Aided Engineering (CAE) though design courses and individual or team projects in advanced space systems design. In the center's research, special emphasis is placed on problems of constructability and of the interruptability of planned activity sequences to be carried out by crews operating under hostile environmental conditions. The departure point for the planned work is the acquisition of the MCAE I-DEAS software, developed by the Structural Dynamics Research Corporation (SDRC), and its expansion to the level of capability denoted by the acronym IDEAS**2 currently used for configuration maintenance on Space Station Freedom. In addition to improving proficiency in the use of I-DEAS and IDEAS**2, it is contemplated that new software modules will be developed to expand the architecture of IDEAS**2. Such modules will deal with those analyses that require the integration of a space platform's configuration with a breakdown of planned construction activities and with a failure modes analysis to support computer aided system engineering (CASE) applied to space construction.
NASA Technical Reports Server (NTRS)
Koppenhoefer, Kyle C.; Gullerud, Arne S.; Ruggieri, Claudio; Dodds, Robert H., Jr.; Healy, Brian E.
1998-01-01
This report describes theoretical background material and commands necessary to use the WARP3D finite element code. WARP3D is under continuing development as a research code for the solution of very large-scale, 3-D solid models subjected to static and dynamic loads. Specific features in the code oriented toward the investigation of ductile fracture in metals include a robust finite strain formulation, a general J-integral computation facility (with inertia, face loading), an element extinction facility to model crack growth, nonlinear material models including viscoplastic effects, and the Gurson-Tver-gaard dilatant plasticity model for void growth. The nonlinear, dynamic equilibrium equations are solved using an incremental-iterative, implicit formulation with full Newton iterations to eliminate residual nodal forces. The history integration of the nonlinear equations of motion is accomplished with Newmarks Beta method. A central feature of WARP3D involves the use of a linear-preconditioned conjugate gradient (LPCG) solver implemented in an element-by-element format to replace a conventional direct linear equation solver. This software architecture dramatically reduces both the memory requirements and CPU time for very large, nonlinear solid models since formation of the assembled (dynamic) stiffness matrix is avoided. Analyses thus exhibit the numerical stability for large time (load) steps provided by the implicit formulation coupled with the low memory requirements characteristic of an explicit code. In addition to the much lower memory requirements of the LPCG solver, the CPU time required for solution of the linear equations during each Newton iteration is generally one-half or less of the CPU time required for a traditional direct solver. All other computational aspects of the code (element stiffnesses, element strains, stress updating, element internal forces) are implemented in the element-by- element, blocked architecture. This greatly improves vectorization of the code on uni-processor hardware and enables straightforward parallel-vector processing of element blocks on multi-processor hardware.
A performance model for GPUs with caches
Dao, Thanh Tuan; Kim, Jungwon; Seo, Sangmin; ...
2014-06-24
To exploit the abundant computational power of the world's fastest supercomputers, an even workload distribution to the typically heterogeneous compute devices is necessary. While relatively accurate performance models exist for conventional CPUs, accurate performance estimation models for modern GPUs do not exist. This paper presents two accurate models for modern GPUs: a sampling-based linear model, and a model based on machine-learning (ML) techniques which improves the accuracy of the linear model and is applicable to modern GPUs with and without caches. We first construct the sampling-based linear model to predict the runtime of an arbitrary OpenCL kernel. Based on anmore » analysis of NVIDIA GPUs' scheduling policies we determine the earliest sampling points that allow an accurate estimation. The linear model cannot capture well the significant effects that memory coalescing or caching as implemented in modern GPUs have on performance. We therefore propose a model based on ML techniques that takes several compiler-generated statistics about the kernel as well as the GPU's hardware performance counters as additional inputs to obtain a more accurate runtime performance estimation for modern GPUs. We demonstrate the effectiveness and broad applicability of the model by applying it to three different NVIDIA GPU architectures and one AMD GPU architecture. On an extensive set of OpenCL benchmarks, on average, the proposed model estimates the runtime performance with less than 7 percent error for a second-generation GTX 280 with no on-chip caches and less than 5 percent for the Fermi-based GTX 580 with hardware caches. On the Kepler-based GTX 680, the linear model has an error of less than 10 percent. On an AMD GPU architecture, Radeon HD 6970, the model estimates with 8 percent of error rates. As a result, the proposed technique outperforms existing models by a factor of 5 to 6 in terms of accuracy.« less
ERIC Educational Resources Information Center
Rodriguez, Carolina; Hudson, Roland; Niblock, Chantelle
2018-01-01
Combinations of Conventional Studio and Virtual Design Studio (VDS) have created valuable learning environments that take advantage of different instruments of communication and interaction. However, past experiences have reported limitations in regards to student engagement and motivation, especially when the studio projects encourage abstraction…
Autonomic Computing for Spacecraft Ground Systems
NASA Technical Reports Server (NTRS)
Li, Zhenping; Savkli, Cetin; Jones, Lori
2007-01-01
Autonomic computing for spacecraft ground systems increases the system reliability and reduces the cost of spacecraft operations and software maintenance. In this paper, we present an autonomic computing solution for spacecraft ground systems at NASA Goddard Space Flight Center (GSFC), which consists of an open standard for a message oriented architecture referred to as the GMSEC architecture (Goddard Mission Services Evolution Center), and an autonomic computing tool, the Criteria Action Table (CAT). This solution has been used in many upgraded ground systems for NASA 's missions, and provides a framework for developing solutions with higher autonomic maturity.
Traffic information computing platform for big data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Duan, Zongtao, E-mail: ztduan@chd.edu.cn; Li, Ying, E-mail: ztduan@chd.edu.cn; Zheng, Xibin, E-mail: ztduan@chd.edu.cn
Big data environment create data conditions for improving the quality of traffic information service. The target of this article is to construct a traffic information computing platform for big data environment. Through in-depth analysis the connotation and technology characteristics of big data and traffic information service, a distributed traffic atomic information computing platform architecture is proposed. Under the big data environment, this type of traffic atomic information computing architecture helps to guarantee the traffic safety and efficient operation, more intelligent and personalized traffic information service can be used for the traffic information users.
NASA Astrophysics Data System (ADS)
Tramm, John R.; Gunow, Geoffrey; He, Tim; Smith, Kord S.; Forget, Benoit; Siegel, Andrew R.
2016-05-01
In this study we present and analyze a formulation of the 3D Method of Characteristics (MOC) technique applied to the simulation of full core nuclear reactors. Key features of the algorithm include a task-based parallelism model that allows independent MOC tracks to be assigned to threads dynamically, ensuring load balancing, and a wide vectorizable inner loop that takes advantage of modern SIMD computer architectures. The algorithm is implemented in a set of highly optimized proxy applications in order to investigate its performance characteristics on CPU, GPU, and Intel Xeon Phi architectures. Speed, power, and hardware cost efficiencies are compared. Additionally, performance bottlenecks are identified for each architecture in order to determine the prospects for continued scalability of the algorithm on next generation HPC architectures.
Dynamic array processing for computationally intensive expert systems in CLIPS
NASA Technical Reports Server (NTRS)
Athavale, N. N.; Ragade, R. K.; Fenske, T. E.; Cassaro, M. A.
1990-01-01
This paper puts forth an architecture for implementing a loop for advanced data structure of arrays in CLIPS. An attempt is made to use multi-field variables in such an architecture to process a set of data during the decision making cycle. Also, current limitations on the expert system shells are discussed in brief in this paper. The resulting architecture is designed to circumvent the current limitations set by the expert system shell and also by the operating environment. Such advanced data structures are needed for tightly coupling symbolic and numeric computation modules.
A model for architectural comparison
NASA Astrophysics Data System (ADS)
Ho, Sam; Snyder, Larry
1988-04-01
Recently, architectures for sequential computers became a topic of much discussion and controversy. At the center of this storm is the Reduced Instruction Set Computer, or RISC, first described at Berkeley in 1980. While the merits of the RISC architecture cannot be ignored, its opponents have tried to do just that, while its proponents have expanded and frequently exaggerated them. This state of affairs has persisted to this day. No attempt is made to settle this controversy, since indeed there is likely no one answer. A qualitative framework is provided for a rational discussion of the issues.
Dual-scale topology optoelectronic processor.
Marsden, G C; Krishnamoorthy, A V; Esener, S C; Lee, S H
1991-12-15
The dual-scale topology optoelectronic processor (D-STOP) is a parallel optoelectronic architecture for matrix algebraic processing. The architecture can be used for matrix-vector multiplication and two types of vector outer product. The computations are performed electronically, which allows multiplication and summation concepts in linear algebra to be generalized to various nonlinear or symbolic operations. This generalization permits the application of D-STOP to many computational problems. The architecture uses a minimum number of optical transmitters, which thereby reduces fabrication requirements while maintaining area-efficient electronics. The necessary optical interconnections are space invariant, minimizing space-bandwidth requirements.
An Object-Oriented Network-Centric Software Architecture for Physical Computing
NASA Astrophysics Data System (ADS)
Palmer, Richard
1997-08-01
Recent developments in object-oriented computer languages and infrastructure such as the Internet, Web browsers, and the like provide an opportunity to define a more productive computational environment for scientific programming that is based more closely on the underlying mathematics describing physics than traditional programming languages such as FORTRAN or C++. In this talk I describe an object-oriented software architecture for representing physical problems that includes classes for such common mathematical objects as geometry, boundary conditions, partial differential and integral equations, discretization and numerical solution methods, etc. In practice, a scientific program written using this architecture looks remarkably like the mathematics used to understand the problem, is typically an order of magnitude smaller than traditional FORTRAN or C++ codes, and hence easier to understand, debug, describe, etc. All objects in this architecture are ``network-enabled,'' which means that components of a software solution to a physical problem can be transparently loaded from anywhere on the Internet or other global network. The architecture is expressed as an ``API,'' or application programmers interface specification, with reference embeddings in Java, Python, and C++. A C++ class library for an early version of this API has been implemented for machines ranging from PC's to the IBM SP2, meaning that phidentical codes run on all architectures.
Six-Degree-of-Freedom Trajectory Optimization Utilizing a Two-Timescale Collocation Architecture
NASA Technical Reports Server (NTRS)
Desai, Prasun N.; Conway, Bruce A.
2005-01-01
Six-degree-of-freedom (6DOF) trajectory optimization of a reentry vehicle is solved using a two-timescale collocation methodology. This class of 6DOF trajectory problems are characterized by two distinct timescales in their governing equations, where a subset of the states have high-frequency dynamics (the rotational equations of motion) while the remaining states (the translational equations of motion) vary comparatively slowly. With conventional collocation methods, the 6DOF problem size becomes extraordinarily large and difficult to solve. Utilizing the two-timescale collocation architecture, the problem size is reduced significantly. The converged solution shows a realistic landing profile and captures the appropriate high-frequency rotational dynamics. A large reduction in the overall problem size (by 55%) is attained with the two-timescale architecture as compared to the conventional single-timescale collocation method. Consequently, optimum 6DOF trajectory problems can now be solved efficiently using collocation, which was not previously possible for a system with two distinct timescales in the governing states.
ERIC Educational Resources Information Center
Hung, Y.-C.
2012-01-01
This paper investigates the impact of combining self explaining (SE) with computer architecture diagrams to help novice students learn assembly language programming. Pre- and post-test scores for the experimental and control groups were compared and subjected to covariance (ANCOVA) statistical analysis. Results indicate that the SE-plus-diagram…
MIT CSAIL and Lincoln Laboratory Task Force Report
2016-08-01
projects have been very diverse, spanning several areas of CSAIL concentration, including robotics, big data analytics , wireless communications...spanning several areas of CSAIL concentration, including robotics, big data analytics , wireless communications, computing architectures and...to machine learning systems and algorithms, such as recommender systems, and “Big Data ” analytics . Advanced computing architectures broadly refer to
A Model for Minimizing Numeric Function Generator Complexity and Delay
2007-12-01
allow computation of difficult mathematical functions in less time and with less hardware than commonly employed methods. They compute piecewise...Programmable Gate Arrays (FPGAs). The algorithms and estimation techniques apply to various NFG architectures and mathematical functions. This...thesis compares hardware utilization and propagation delay for various NFG architectures, mathematical functions, word widths, and segmentation methods
Usage of Thin-Client/Server Architecture in Computer Aided Education
ERIC Educational Resources Information Center
Cimen, Caghan; Kavurucu, Yusuf; Aydin, Halit
2014-01-01
With the advances of technology, thin-client/server architecture has become popular in multi-user/single network environments. Thin-client is a user terminal in which the user can login to a domain and run programs by connecting to a remote server. Recent developments in network and hardware technologies (cloud computing, virtualization, etc.)…
Optical Computing Based on Neuronal Models
1988-05-01
walking, and cognition are far too complex for existing sequential digital computers. Therefore new architectures, hardware, and algorithms modeled...collective behavior, and iterative processing into optical processing and artificial neurodynamical systems. Another intriguing promise of neural nets is...with architectures, implementations, and programming; and material research s -7- called for. Our future research in neurodynamics will continue to
Using SPEEDES to simulate the blue gene interconnect network
NASA Technical Reports Server (NTRS)
Springer, P.; Upchurch, E.
2003-01-01
JPL and the Center for Advanced Computer Architecture (CACR) is conducting application and simulation analyses of BG/L in order to establish a range of effectiveness for the Blue Gene/L MPP architecture in performing important classes of computations and to determine the design sensitivity of the global interconnect network in support of real world ASCI application execution.
The Use of Metaphors as a Parametric Design Teaching Model: A Case Study
ERIC Educational Resources Information Center
Agirbas, Asli
2018-01-01
Teaching methodologies for parametric design are being researched all over the world, since there is a growing demand for computer programming logic and its fabrication process in architectural education. The computer programming courses in architectural education are usually done in a very short period of time, and so students have no chance to…
NASA Technical Reports Server (NTRS)
Mavriplis, D. J.; Das, Raja; Saltz, Joel; Vermeland, R. E.
1992-01-01
An efficient three dimensional unstructured Euler solver is parallelized on a Cray Y-MP C90 shared memory computer and on an Intel Touchstone Delta distributed memory computer. This paper relates the experiences gained and describes the software tools and hardware used in this study. Performance comparisons between two differing architectures are made.
Service-Oriented Architecture for NVO and TeraGrid Computing
NASA Technical Reports Server (NTRS)
Jacob, Joseph; Miller, Craig; Williams, Roy; Steenberg, Conrad; Graham, Matthew
2008-01-01
The National Virtual Observatory (NVO) Extensible Secure Scalable Service Infrastructure (NESSSI) is a Web service architecture and software framework that enables Web-based astronomical data publishing and processing on grid computers such as the National Science Foundation's TeraGrid. Characteristics of this architecture include the following: (1) Services are created, managed, and upgraded by their developers, who are trusted users of computing platforms on which the services are deployed. (2) Service jobs can be initiated by means of Java or Python client programs run on a command line or with Web portals. (3) Access is granted within a graduated security scheme in which the size of a job that can be initiated depends on the level of authentication of the user.
Network architecture test-beds as platforms for ubiquitous computing.
Roscoe, Timothy
2008-10-28
Distributed systems research, and in particular ubiquitous computing, has traditionally assumed the Internet as a basic underlying communications substrate. Recently, however, the networking research community has come to question the fundamental design or 'architecture' of the Internet. This has been led by two observations: first, that the Internet as it stands is now almost impossible to evolve to support new functionality; and second, that modern applications of all kinds now use the Internet rather differently, and frequently implement their own 'overlay' networks above it to work around its perceived deficiencies. In this paper, I discuss recent academic projects to allow disruptive change to the Internet architecture, and also outline a radically different view of networking for ubiquitous computing that such proposals might facilitate.
FPGA-based real-time phase measuring profilometry algorithm design and implementation
NASA Astrophysics Data System (ADS)
Zhan, Guomin; Tang, Hongwei; Zhong, Kai; Li, Zhongwei; Shi, Yusheng
2016-11-01
Phase measuring profilometry (PMP) has been widely used in many fields, like Computer Aided Verification (CAV), Flexible Manufacturing System (FMS) et al. High frame-rate (HFR) real-time vision-based feedback control will be a common demands in near future. However, the instruction time delay in the computer caused by numerous repetitive operations greatly limit the efficiency of data processing. FPGA has the advantages of pipeline architecture and parallel execution, and it fit for handling PMP algorithm. In this paper, we design a fully pipelined hardware architecture for PMP. The functions of hardware architecture includes rectification, phase calculation, phase shifting, and stereo matching. The experiment verified the performance of this method, and the factors that may influence the computation accuracy was analyzed.
NASA Astrophysics Data System (ADS)
Ragan-Kelley, M.; Perez, F.; Granger, B.; Kluyver, T.; Ivanov, P.; Frederic, J.; Bussonnier, M.
2014-12-01
IPython has provided terminal-based tools for interactive computing in Python since 2001. The notebook document format and multi-process architecture introduced in 2011 have expanded the applicable scope of IPython into teaching, presenting, and sharing computational work, in addition to interactive exploration. The new architecture also allows users to work in any language, with implementations in Python, R, Julia, Haskell, and several other languages. The language agnostic parts of IPython have been renamed to Jupyter, to better capture the notion that a cross-language design can encapsulate commonalities present in computational research regardless of the programming language being used. This architecture offers components like the web-based Notebook interface, that supports rich documents that combine code and computational results with text narratives, mathematics, images, video and any media that a modern browser can display. This interface can be used not only in research, but also for publication and education, as notebooks can be converted to a variety of output formats, including HTML and PDF. Recent developments in the Jupyter project include a multi-user environment for hosting notebooks for a class or research group, a live collaboration notebook via Google Docs, and better support for languages other than Python.
Internet Architecture: Lessons Learned and Looking Forward
2006-12-01
Internet Architecture: Lessons Learned and Looking Forward Geoffrey G. Xie Department of Computer Science Naval Postgraduate School April 2006... Internet architecture. Report Documentation Page Form ApprovedOMB No. 0704-0188 Public reporting burden for the collection of information is...readers are referred there for more information about a specific protocol or concept. 2. Origin of Internet Architecture The Internet is easily
ERIC Educational Resources Information Center
Uwakonye, Obioha; Alagbe, Oluwole; Oluwatayo, Adedapo; Alagbe, Taiye; Alalade, Gbenga
2015-01-01
As a result of globalization of digital technology, intellectual discourse on what constitutes the basic body of architectural knowledge to be imparted to future professionals has been on the increase. This digital revolution has brought to the fore the need to review the already overloaded architectural education curriculum of Nigerian schools of…
Sabne, Amit J.; Sakdhnagool, Putt; Lee, Seyong; ...
2015-07-13
Accelerator-based heterogeneous computing is gaining momentum in the high-performance computing arena. However, the increased complexity of heterogeneous architectures demands more generic, high-level programming models. OpenACC is one such attempt to tackle this problem. Although the abstraction provided by OpenACC offers productivity, it raises questions concerning both functional and performance portability. In this article, the authors propose HeteroIR, a high-level, architecture-independent intermediate representation, to map high-level programming models, such as OpenACC, to heterogeneous architectures. They present a compiler approach that translates OpenACC programs into HeteroIR and accelerator kernels to obtain OpenACC functional portability. They then evaluate the performance portability obtained bymore » OpenACC with their approach on 12 OpenACC programs on Nvidia CUDA, AMD GCN, and Intel Xeon Phi architectures. They study the effects of various compiler optimizations and OpenACC program settings on these architectures to provide insights into the achieved performance portability.« less
Scalable service architecture for providing strong service guarantees
NASA Astrophysics Data System (ADS)
Christin, Nicolas; Liebeherr, Joerg
2002-07-01
For the past decade, a lot of Internet research has been devoted to providing different levels of service to applications. Initial proposals for service differentiation provided strong service guarantees, with strict bounds on delays, loss rates, and throughput, but required high overhead in terms of computational complexity and memory, both of which raise scalability concerns. Recently, the interest has shifted to service architectures with low overhead. However, these newer service architectures only provide weak service guarantees, which do not always address the needs of applications. In this paper, we describe a service architecture that supports strong service guarantees, can be implemented with low computational complexity, and only requires to maintain little state information. A key mechanism of the proposed service architecture is that it addresses scheduling and buffer management in a single algorithm. The presented architecture offers no solution for controlling the amount of traffic that enters the network. Instead, we plan on exploiting feedback mechanisms of TCP congestion control algorithms for the purpose of regulating the traffic entering the network.
GPU-computing in econophysics and statistical physics
NASA Astrophysics Data System (ADS)
Preis, T.
2011-03-01
A recent trend in computer science and related fields is general purpose computing on graphics processing units (GPUs), which can yield impressive performance. With multiple cores connected by high memory bandwidth, today's GPUs offer resources for non-graphics parallel processing. This article provides a brief introduction into the field of GPU computing and includes examples. In particular computationally expensive analyses employed in financial market context are coded on a graphics card architecture which leads to a significant reduction of computing time. In order to demonstrate the wide range of possible applications, a standard model in statistical physics - the Ising model - is ported to a graphics card architecture as well, resulting in large speedup values.
Performance evaluation of the time delay digital tanlock loop architectures
NASA Astrophysics Data System (ADS)
Al-Kharji Al-Ali, Omar; Anani, Nader; Al-Qutayri, Mahmoud; Al-Araji, Saleh; Ponnapalli, Prasad
2016-01-01
This article presents the architectures, theoretical analyses and testing results of modified time delay digital tanlock loop (TDTLs) system. The modifications to the original TDTL architecture were introduced to overcome some of the limitations of the original TDTL and to enhance the overall performance of the particular systems. The limitations addressed in this article include the non-linearity of the phase detector, the restricted width of the locking range and the overall system acquisition speed. Each of the modified architectures was tested by subjecting the system to sudden positive and negative frequency steps and comparing its response with that of the original TDTL. In addition, the performance of all the architectures was evaluated under noise-free as well as noisy environments. The extensive simulation results using MATLAB/SIMULINK demonstrate that the new architectures overcome the limitations they addressed and the overall results confirmed significant improvements in performance compared to the conventional TDTL system.
Digital tanlock loop architecture with no delay
NASA Astrophysics Data System (ADS)
Al-Kharji AL-Ali, Omar; Anani, Nader; Al-Araji, Saleh; Al-Qutayri, Mahmoud; Ponnapalli, Prasad
2012-02-01
This article proposes a new architecture for a digital tanlock loop which eliminates the time-delay block. The ? (rad) phase shift relationship between the two channels, which is generated by the delay block in the conventional time-delay digital tanlock loop (TDTL), is preserved using two quadrature sampling signals for the loop channels. The proposed system outperformed the original TDTL architecture, when both systems were tested with frequency shift keying input signal. The new system demonstrated better linearity and acquisition speed as well as improved noise performance compared with the original TDTL architecture. Furthermore, the removal of the time-delay block enables all processing to be digitally performed, which reduces the implementation complexity. Both the original TDTL and the new architecture without the delay block were modelled and simulated using MATLAB/Simulink. Implementation issues, including complexity and relation to simulation of both architectures, are also addressed.
New optical architecture for holographic data storage system compatible with Blu-ray Disc™ system
NASA Astrophysics Data System (ADS)
Shimada, Ken-ichi; Ide, Tatsuro; Shimano, Takeshi; Anderson, Ken; Curtis, Kevin
2014-02-01
A new optical architecture for holographic data storage system which is compatible with a Blu-ray Disc™ (BD) system is proposed. In the architecture, both signal and reference beams pass through a single objective lens with numerical aperture (NA) 0.85 for realizing angularly multiplexed recording. The geometry of the architecture brings a high affinity with an optical architecture in the BD system because the objective lens can be placed parallel to a holographic medium. Through the comparison of experimental results with theory, the validity of the optical architecture was verified and demonstrated that the conventional objective lens motion technique in the BD system is available for angularly multiplexed recording. The test-bed composed of a blue laser system and an objective lens of the NA 0.85 was designed. The feasibility of its compatibility with BD is examined through the designed test-bed.
NASA Astrophysics Data System (ADS)
Ford, Eric B.; Dindar, Saleh; Peters, Jorg
2015-08-01
The realism of astrophysical simulations and statistical analyses of astronomical data are set by the available computational resources. Thus, astronomers and astrophysicists are constantly pushing the limits of computational capabilities. For decades, astronomers benefited from massive improvements in computational power that were driven primarily by increasing clock speeds and required relatively little attention to details of the computational hardware. For nearly a decade, increases in computational capabilities have come primarily from increasing the degree of parallelism, rather than increasing clock speeds. Further increases in computational capabilities will likely be led by many-core architectures such as Graphical Processing Units (GPUs) and Intel Xeon Phi. Successfully harnessing these new architectures, requires significantly more understanding of the hardware architecture, cache hierarchy, compiler capabilities and network network characteristics.I will provide an astronomer's overview of the opportunities and challenges provided by modern many-core architectures and elastic cloud computing. The primary goal is to help an astronomical audience understand what types of problems are likely to yield more than order of magnitude speed-ups and which problems are unlikely to parallelize sufficiently efficiently to be worth the development time and/or costs.I will draw on my experience leading a team in developing the Swarm-NG library for parallel integration of large ensembles of small n-body systems on GPUs, as well as several smaller software projects. I will share lessons learned from collaborating with computer scientists, including both technical and soft skills. Finally, I will discuss the challenges of training the next generation of astronomers to be proficient in this new era of high-performance computing, drawing on experience teaching a graduate class on High-Performance Scientific Computing for Astrophysics and organizing a 2014 advanced summer school on Bayesian Computing for Astronomical Data Analysis with support of the Penn State Center for Astrostatistics and Institute for CyberScience.
Detail, Facture, and Colour in the Architecture of Polish Single-Family Houses after 1989
NASA Astrophysics Data System (ADS)
Sztafrowski, Marek
2017-10-01
The article presents single-family houses architecture transformations since 1989, with particularly close attention paid to the significance of detail, facture, and colour. The article presents the architecture as an art of designing and building facilities with both use and aesthetic value, an art of shaping space and building forms. Architectural work should correspond to the intended function, technique, economic and aesthetic requirements, thus shaping all elements of human immediate environment, both inside and outside of the building. Architecture of the building is perceived as form, structure, and function, as well as detail, facture, and colour. Facture and colour are created through materials used for external finishes. The solid of the building is noticed first while looking at the building, then the finishes detail such as colour, facture, and detail. Materials for external finishes are commonly selected for their aesthetic value equally with their technical characteristics. The detail was always a characteristic element of style. However, currently the fashion for details can be observed, the fashion for usage of materials for external finishes and inter-connected with that colour and facture. The architecture of Polish single-family houses underwent considerable metamorphosis after system change of 1989 - from destitute in form, devoid in detail and colour socmodernism, to architecture extremely varied in terms of form, utilised structures, materials, and detail. Hence, appearance of the phenomenon called fashion can be observed in the architecture, understood as constant changeability, seeking novelty, and creation based on opinion-forming centres. The architectural fashion consists of form, function, structure, building materials, detail, facture, and colour trends, e.g. after rejecting socmodernism, steep roofs characteristic for single-family houses trend started. After 1989, initially individual single-family house projects were created; however, rapidly developing building market precipitated the creation of catalogue solutions, repetitive and conventional. Currently, potential customers have access to catalogues of numerous design studios and companies, every last one including few dozens of comprehensive constructions design options of single-family house at the fewest. In the conventional catalogue designs, steep roofs began to gain popularity, becoming increasingly complicated with various choices of roof windows as time passes. The entrances are frequently adorned with porticos and columns. So-called “mansion architecture” of the single-family houses has developed. Recently, fashion alluding to modernism of 1920s has developed in the single-family houses architecture. New trends among architects are adapted with increasing frequency by investors looking for unconventional solutions. The neo-modernism trend is noticeable predominantly in individual projects; however, it appears in catalogue propositions with increasing frequency. Designs of single-family houses of simplistic shape and distinct expression emerge, with flat roofs, minimalistic detail, and vital, carefully chosen in terms of facture and colour, material solutions of wall finishes. Apart from the conventional solutions, presently the building market offers a vast variety of meticulously prepared, factory-made, and thoroughly checked in various realisations details. Architects discontinued using manufactured and individually designed detail in favour of utilising conventional solutions for designed objects. In a well-designed single-family house, facture, colour, and detail of materials utilised in external finishes should harmonise with the building shape and form.
Power Efficient Hardware Architecture of SHA-1 Algorithm for Trusted Mobile Computing
NASA Astrophysics Data System (ADS)
Kim, Mooseop; Ryou, Jaecheol
The Trusted Mobile Platform (TMP) is developed and promoted by the Trusted Computing Group (TCG), which is an industry standard body to enhance the security of the mobile computing environment. The built-in SHA-1 engine in TMP is one of the most important circuit blocks and contributes the performance of the whole platform because it is used as key primitives supporting platform integrity and command authentication. Mobile platforms have very stringent limitations with respect to available power, physical circuit area, and cost. Therefore special architecture and design methods for low power SHA-1 circuit are required. In this paper, we present a novel and efficient hardware architecture of low power SHA-1 design for TMP. Our low power SHA-1 hardware can compute 512-bit data block using less than 7,000 gates and has a power consumption about 1.1 mA on a 0.25μm CMOS process.
An Evaluation of Architectural Platforms for Parallel Navier-Stokes Computations
NASA Technical Reports Server (NTRS)
Jayasimha, D. N.; Hayder, M. E.; Pillay, S. K.
1996-01-01
We study the computational, communication, and scalability characteristics of a computational fluid dynamics application, which solves the time accurate flow field of a jet using the compressible Navier-Stokes equations, on a variety of parallel architecture platforms. The platforms chosen for this study are a cluster of workstations (the LACE experimental testbed at NASA Lewis), a shared memory multiprocessor (the Cray YMP), and distributed memory multiprocessors with different topologies - the IBM SP and the Cray T3D. We investigate the impact of various networks connecting the cluster of workstations on the performance of the application and the overheads induced by popular message passing libraries used for parallelization. The work also highlights the importance of matching the memory bandwidth to the processor speed for good single processor performance. By studying the performance of an application on a variety of architectures, we are able to point out the strengths and weaknesses of each of the example computing platforms.
Parallelizing Navier-Stokes Computations on a Variety of Architectural Platforms
NASA Technical Reports Server (NTRS)
Jayasimha, D. N.; Hayder, M. E.; Pillay, S. K.
1997-01-01
We study the computational, communication, and scalability characteristics of a Computational Fluid Dynamics application, which solves the time accurate flow field of a jet using the compressible Navier-Stokes equations, on a variety of parallel architectural platforms. The platforms chosen for this study are a cluster of workstations (the LACE experimental testbed at NASA Lewis), a shared memory multiprocessor (the Cray YMP), distributed memory multiprocessors with different topologies-the IBM SP and the Cray T3D. We investigate the impact of various networks, connecting the cluster of workstations, on the performance of the application and the overheads induced by popular message passing libraries used for parallelization. The work also highlights the importance of matching the memory bandwidth to the processor speed for good single processor performance. By studying the performance of an application on a variety of architectures, we are able to point out the strengths and weaknesses of each of the example computing platforms.
Application of high-performance computing to numerical simulation of human movement
NASA Technical Reports Server (NTRS)
Anderson, F. C.; Ziegler, J. M.; Pandy, M. G.; Whalen, R. T.
1995-01-01
We have examined the feasibility of using massively-parallel and vector-processing supercomputers to solve large-scale optimization problems for human movement. Specifically, we compared the computational expense of determining the optimal controls for the single support phase of gait using a conventional serial machine (SGI Iris 4D25), a MIMD parallel machine (Intel iPSC/860), and a parallel-vector-processing machine (Cray Y-MP 8/864). With the human body modeled as a 14 degree-of-freedom linkage actuated by 46 musculotendinous units, computation of the optimal controls for gait could take up to 3 months of CPU time on the Iris. Both the Cray and the Intel are able to reduce this time to practical levels. The optimal solution for gait can be found with about 77 hours of CPU on the Cray and with about 88 hours of CPU on the Intel. Although the overall speeds of the Cray and the Intel were found to be similar, the unique capabilities of each machine are better suited to different portions of the computational algorithm used. The Intel was best suited to computing the derivatives of the performance criterion and the constraints whereas the Cray was best suited to parameter optimization of the controls. These results suggest that the ideal computer architecture for solving very large-scale optimal control problems is a hybrid system in which a vector-processing machine is integrated into the communication network of a MIMD parallel machine.
Performance Analysis of Distributed Object-Oriented Applications
NASA Technical Reports Server (NTRS)
Schoeffler, James D.
1998-01-01
The purpose of this research was to evaluate the efficiency of a distributed simulation architecture which creates individual modules which are made self-scheduling through the use of a message-based communication system used for requesting input data from another module which is the source of that data. To make the architecture as general as possible, the message-based communication architecture was implemented using standard remote object architectures (Common Object Request Broker Architecture (CORBA) and/or Distributed Component Object Model (DCOM)). A series of experiments were run in which different systems are distributed in a variety of ways across multiple computers and the performance evaluated. The experiments were duplicated in each case so that the overhead due to message communication and data transmission can be separated from the time required to actually perform the computational update of a module each iteration. The software used to distribute the modules across multiple computers was developed in the first year of the current grant and was modified considerably to add a message-based communication scheme supported by the DCOM distributed object architecture. The resulting performance was analyzed using a model created during the first year of this grant which predicts the overhead due to CORBA and DCOM remote procedure calls and includes the effects of data passed to and from the remote objects. A report covering the distributed simulation software and the results of the performance experiments has been submitted separately. The above report also discusses possible future work to apply the methodology to dynamically distribute the simulation modules so as to minimize overall computation time.
Parallel algorithms for mapping pipelined and parallel computations
NASA Technical Reports Server (NTRS)
Nicol, David M.
1988-01-01
Many computational problems in image processing, signal processing, and scientific computing are naturally structured for either pipelined or parallel computation. When mapping such problems onto a parallel architecture it is often necessary to aggregate an obvious problem decomposition. Even in this context the general mapping problem is known to be computationally intractable, but recent advances have been made in identifying classes of problems and architectures for which optimal solutions can be found in polynomial time. Among these, the mapping of pipelined or parallel computations onto linear array, shared memory, and host-satellite systems figures prominently. This paper extends that work first by showing how to improve existing serial mapping algorithms. These improvements have significantly lower time and space complexities: in one case a published O(nm sup 3) time algorithm for mapping m modules onto n processors is reduced to an O(nm log m) time complexity, and its space requirements reduced from O(nm sup 2) to O(m). Run time complexity is further reduced with parallel mapping algorithms based on these improvements, which run on the architecture for which they create the mappings.
Evaluation of Cache-based Superscalar and Cacheless Vector Architectures for Scientific Computations
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Carter, Jonathan; Shalf, John; Skinner, David; Ethier, Stephane; Biswas, Rupak; Djomehri, Jahed; VanderWijngaart, Rob
2003-01-01
The growing gap between sustained and peak performance for scientific applications has become a well-known problem in high performance computing. The recent development of parallel vector systems offers the potential to bridge this gap for a significant number of computational science codes and deliver a substantial increase in computing capabilities. This paper examines the intranode performance of the NEC SX6 vector processor and the cache-based IBM Power3/4 superscalar architectures across a number of key scientific computing areas. First, we present the performance of a microbenchmark suite that examines a full spectrum of low-level machine characteristics. Next, we study the behavior of the NAS Parallel Benchmarks using some simple optimizations. Finally, we evaluate the perfor- mance of several numerical codes from key scientific computing domains. Overall results demonstrate that the SX6 achieves high performance on a large fraction of our application suite and in many cases significantly outperforms the RISC-based architectures. However, certain classes of applications are not easily amenable to vectorization and would likely require extensive reengineering of both algorithm and implementation to utilize the SX6 effectively.
Programming for 1.6 Millon cores: Early experiences with IBM's BG/Q SMP architecture
NASA Astrophysics Data System (ADS)
Glosli, James
2013-03-01
With the stall in clock cycle improvements a decade ago, the drive for computational performance has continues along a path of increasing core counts on a processor. The multi-core evolution has been expressed in both a symmetric multi processor (SMP) architecture and cpu/GPU architecture. Debates rage in the high performance computing (HPC) community which architecture best serves HPC. In this talk I will not attempt to resolve that debate but perhaps fuel it. I will discuss the experience of exploiting Sequoia, a 98304 node IBM Blue Gene/Q SMP at Lawrence Livermore National Laboratory. The advantages and challenges of leveraging the computational power BG/Q will be detailed through the discussion of two applications. The first application is a Molecular Dynamics code called ddcMD. This is a code developed over the last decade at LLNL and ported to BG/Q. The second application is a cardiac modeling code called Cardioid. This is a code that was recently designed and developed at LLNL to exploit the fine scale parallelism of BG/Q's SMP architecture. Through the lenses of these efforts I'll illustrate the need to rethink how we express and implement our computational approaches. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Bae, Dae Kyung; Song, Sang Jun; Kim, Kang Il; Hur, Dong; Jeong, Ho Yeon
2016-03-01
The purpose of the present study was to compare the clinical and radiographic results and survival rates between computer-assisted and conventional closing wedge high tibial osteotomies (HTOs). Data from a consecutive cohort comprised of 75 computer-assisted HTOs and 75 conventional HTOs were retrospectively reviewed. The Knee Society knee and function scores, Hospital for Special Surgery (HSS) score and femorotibial angle (FTA) were compared between the two groups. Survival rates were also compared with procedure failure. The knee and function scores at one year postoperatively were slightly better in the computer-assisted group than those in conventional group (90.1 vs. 86.1) (82.0 vs. 76.0). The HSS scores at one year postoperatively were slightly better for the computer-assisted HTOs than those of conventional HTOs (89.5 vs. 81.8). The inlier of the postoperative FTA was wider in the computer-assisted group than that in the conventional HTO group (88.0% vs. 58.7%), and mean postoperative FTA was greater in the computer-assisted group that in the conventional HTO group (valgus 9.0° vs. valgus 7.6°, p<0.001). The five- and 10-year survival rates were 97.1% and 89.6%, respectively. No difference was detected in nine-year survival rates (p=0.369) between the two groups, although the clinical and radiographic results were better in the computer-assisted group that those in the conventional HTO group. Mid-term survival rates did not differ between computer-assisted and conventional HTOs. A comparative analysis of longer-term survival rate is required to demonstrate the long-term benefit of computer-assisted HTO. III. Copyright © 2015 Elsevier B.V. All rights reserved.
Parallel compression/decompression-based datapath architecture for multibeam mask writers
NASA Astrophysics Data System (ADS)
Chaudhary, Narendra; Savari, Serap A.
2017-06-01
Multibeam electron beam systems will be used in the future for mask writing and for complimentary lithography. The major challenges of the multibeam systems are in meeting throughput requirements and in handling the large data volumes associated with writing grayscale data on the wafer. In terms of future communications and computational requirements Amdahl's Law suggests that a simple increase of computation power and parallelism may not be a sustainable solution. We propose a parallel data compression algorithm to exploit the sparsity of mask data and a grayscale video-like representation of data. To improve the communication and computational efficiency of these systems at the write time we propose an alternate datapath architecture partly motivated by multibeam direct write lithography and partly motivated by the circuit testing literature, where parallel decompression reduces clock cycles. We explain a deflection plate architecture inspired by NuFlare Technology's multibeam mask writing system and how our datapath architecture can be easily added to it to improve performance.
Parallel compression/decompression-based datapath architecture for multibeam mask writers
NASA Astrophysics Data System (ADS)
Chaudhary, Narendra; Savari, Serap A.
2017-10-01
Multibeam electron beam systems will be used in the future for mask writing and for complementary lithography. The major challenges of the multibeam systems are in meeting throughput requirements and in handling the large data volumes associated with writing grayscale data on the wafer. In terms of future communications and computational requirements, Amdahl's law suggests that a simple increase of computation power and parallelism may not be a sustainable solution. We propose a parallel data compression algorithm to exploit the sparsity of mask data and a grayscale video-like representation of data. To improve the communication and computational efficiency of these systems at the write time, we propose an alternate datapath architecture partly motivated by multibeam direct-write lithography and partly motivated by the circuit testing literature, where parallel decompression reduces clock cycles. We explain a deflection plate architecture inspired by NuFlare Technology's multibeam mask writing system and how our datapath architecture can be easily added to it to improve performance.
Chessa, Manuela; Bianchi, Valentina; Zampetti, Massimo; Sabatini, Silvio P; Solari, Fabio
2012-01-01
The intrinsic parallelism of visual neural architectures based on distributed hierarchical layers is well suited to be implemented on the multi-core architectures of modern graphics cards. The design strategies that allow us to optimally take advantage of such parallelism, in order to efficiently map on GPU the hierarchy of layers and the canonical neural computations, are proposed. Specifically, the advantages of a cortical map-like representation of the data are exploited. Moreover, a GPU implementation of a novel neural architecture for the computation of binocular disparity from stereo image pairs, based on populations of binocular energy neurons, is presented. The implemented neural model achieves good performances in terms of reliability of the disparity estimates and a near real-time execution speed, thus demonstrating the effectiveness of the devised design strategies. The proposed approach is valid in general, since the neural building blocks we implemented are a common basis for the modeling of visual neural functionalities.
Virtual Business Operating Environment in the Cloud: Conceptual Architecture and Challenges
NASA Astrophysics Data System (ADS)
Nezhad, Hamid R. Motahari; Stephenson, Bryan; Singhal, Sharad; Castellanos, Malu
Advances in service oriented architecture (SOA) have brought us close to the once imaginary vision of establishing and running a virtual business, a business in which most or all of its business functions are outsourced to online services. Cloud computing offers a realization of SOA in which IT resources are offered as services that are more affordable, flexible and attractive to businesses. In this paper, we briefly study advances in cloud computing, and discuss the benefits of using cloud services for businesses and trade-offs that they have to consider. We then present 1) a layered architecture for the virtual business, and 2) a conceptual architecture for a virtual business operating environment. We discuss the opportunities and research challenges that are ahead of us in realizing the technical components of this conceptual architecture. We conclude by giving the outlook and impact of cloud services on both large and small businesses.
Strategies for concurrent processing of complex algorithms in data driven architectures
NASA Technical Reports Server (NTRS)
Stoughton, John W.; Mielke, Roland R.
1987-01-01
The results of ongoing research directed at developing a graph theoretical model for describing data and control flow associated with the execution of large grained algorithms in a spatial distributed computer environment is presented. This model is identified by the acronym ATAMM (Algorithm/Architecture Mapping Model). The purpose of such a model is to provide a basis for establishing rules for relating an algorithm to its execution in a multiprocessor environment. Specifications derived from the model lead directly to the description of a data flow architecture which is a consequence of the inherent behavior of the data and control flow described by the model. The purpose of the ATAMM based architecture is to optimize computational concurrency in the multiprocessor environment and to provide an analytical basis for performance evaluation. The ATAMM model and architecture specifications are demonstrated on a prototype system for concept validation.
Marolf, Angela; Blaik, Margaret; Ackerman, Norman; Watson, Elizabeth; Gibson, Nicole; Thompson, Margret
2008-01-01
The role of digital imaging is increasing as these systems are becoming more affordable and accessible. Advantages of computed radiography compared with conventional film/screen combinations include improved contrast resolution and postprocessing capabilities. Computed radiography's spatial resolution is inferior to conventional radiography; however, this limitation is considered clinically insignificant. This study prospectively compared digital imaging and conventional radiography in detecting small volume pneumoperitoneum. Twenty cadaver dogs (15-30 kg) were injected with 0.25, 0.25, and 0.5 ml for 1 ml total of air intra-abdominally, and radiographed sequentially using computed and conventional radiographic technologies. Three radiologists independently evaluated the images, and receiver operating curve (ROC) analysis compared the two imaging modalities. There was no statistical difference between computed and conventional radiography in detecting free abdominal air, but overall computed radiography was relatively more sensitive based on ROC analysis. Computed radiographic images consistently and significantly demonstrated a minimal amount of 0.5 ml of free air based on ROC analysis. However, no minimal air amount was consistently or significantly detected with conventional film. Readers were more likely to detect free air on lateral computed images than the other projections, with no significant increased sensitivity between film/screen projections. Further studies are indicated to determine the differences or lack thereof between various digital imaging systems and conventional film/screen systems.
A "Language Lab" for Architectural Design.
ERIC Educational Resources Information Center
Mackenzie, Arch; And Others
This paper discusses a "language lab" strategy in which traditional studio learning may be supplemented by language lessons using computer graphics techniques to teach architectural grammar, a body of elements and principles that govern the design of buildings belonging to a particular architectural theory or style. Two methods of…
Accessing microfluidics through feature-based design software for 3D printing.
Shankles, Peter G; Millet, Larry J; Aufrecht, Jayde A; Retterer, Scott T
2018-01-01
Additive manufacturing has been a cornerstone of the product development pipeline for decades, playing an essential role in the creation of both functional and cosmetic prototypes. In recent years, the prospects for distributed and open source manufacturing have grown tremendously. This growth has been enabled by an expanding library of printable materials, low-cost printers, and communities dedicated to platform development. The microfluidics community has embraced this opportunity to integrate 3D printing into the suite of manufacturing strategies used to create novel fluidic architectures. The rapid turnaround time and low cost to implement these strategies in the lab makes 3D printing an attractive alternative to conventional micro- and nanofabrication techniques. In this work, the production of multiple microfluidic architectures using a hybrid 3D printing-soft lithography approach is demonstrated and shown to enable rapid device fabrication with channel dimensions that take advantage of laminar flow characteristics. The fabrication process outlined here is underpinned by the implementation of custom design software with an integrated slicer program that replaces less intuitive computer aided design and slicer software tools. Devices are designed in the program by assembling parameterized microfluidic building blocks. The fabrication process and flow control within 3D printed devices were demonstrated with a gradient generator and two droplet generator designs. Precise control over the printing process allowed 3D microfluidics to be printed in a single step by extruding bridge structures to 'jump-over' channels in the same plane. This strategy was shown to integrate with conventional nanofabrication strategies to simplify the operation of a platform that incorporates both nanoscale features and 3D printed microfluidics.
Accessing microfluidics through feature-based design software for 3D printing
Shankles, Peter G.; Millet, Larry J.; Aufrecht, Jayde A.
2018-01-01
Additive manufacturing has been a cornerstone of the product development pipeline for decades, playing an essential role in the creation of both functional and cosmetic prototypes. In recent years, the prospects for distributed and open source manufacturing have grown tremendously. This growth has been enabled by an expanding library of printable materials, low-cost printers, and communities dedicated to platform development. The microfluidics community has embraced this opportunity to integrate 3D printing into the suite of manufacturing strategies used to create novel fluidic architectures. The rapid turnaround time and low cost to implement these strategies in the lab makes 3D printing an attractive alternative to conventional micro- and nanofabrication techniques. In this work, the production of multiple microfluidic architectures using a hybrid 3D printing-soft lithography approach is demonstrated and shown to enable rapid device fabrication with channel dimensions that take advantage of laminar flow characteristics. The fabrication process outlined here is underpinned by the implementation of custom design software with an integrated slicer program that replaces less intuitive computer aided design and slicer software tools. Devices are designed in the program by assembling parameterized microfluidic building blocks. The fabrication process and flow control within 3D printed devices were demonstrated with a gradient generator and two droplet generator designs. Precise control over the printing process allowed 3D microfluidics to be printed in a single step by extruding bridge structures to ‘jump-over’ channels in the same plane. This strategy was shown to integrate with conventional nanofabrication strategies to simplify the operation of a platform that incorporates both nanoscale features and 3D printed microfluidics. PMID:29596418
PNNLs Data Intensive Computing research battles Homeland Security threats
David Thurman; Joe Kielman; Katherine Wolf; David Atkinson
2018-05-11
The Pacific Northwest National Laboratorys (PNNL's) approach to data intensive computing (DIC) is focused on three key research areas: hybrid hardware architecture, software architectures, and analytic algorithms. Advancements in these areas will help to address, and solve, DIC issues associated with capturing, managing, analyzing and understanding, in near real time, data at volumes and rates that push the frontiers of current technologies.
Design and Training of Limited-Interconnect Architectures
1991-07-16
and signal processing. Neuromorphic (brain like) models, allow an alternative for achieving real-time operation tor such tasks, while having a...compact and robust architecture. Neuromorphic models consist of interconnections of simple computational nodes. In this approach, each node computes a...operational performance. I1. Research Objectives The research objectives were: 1. Development of on- chip local training rules specifically designed for
Pipelined CPU Design with FPGA in Teaching Computer Architecture
ERIC Educational Resources Information Center
Lee, Jong Hyuk; Lee, Seung Eun; Yu, Heon Chang; Suh, Taeweon
2012-01-01
This paper presents a pipelined CPU design project with a field programmable gate array (FPGA) system in a computer architecture course. The class project is a five-stage pipelined 32-bit MIPS design with experiments on the Altera DE2 board. For proper scheduling, milestones were set every one or two weeks to help students complete the project on…
PNNL pushing scientific discovery through data intensive computing breakthroughs
Deborah Gracio; David Koppenaal; Ruby Leung
2018-05-18
The Pacific Northwest National Laboratory's approach to data intensive computing (DIC) is focused on three key research areas: hybrid hardware architectures, software architectures, and analytic algorithms. Advancements in these areas will help to address, and solve, DIC issues associated with capturing, managing, analyzing and understanding, in near real time, data at volumes and rates that push the frontiers of current technologies.
Fault tolerant computer control for a Maglev transportation system
NASA Technical Reports Server (NTRS)
Lala, Jaynarayan H.; Nagle, Gail A.; Anagnostopoulos, George
1994-01-01
Magnetically levitated (Maglev) vehicles operating on dedicated guideways at speeds of 500 km/hr are an emerging transportation alternative to short-haul air and high-speed rail. They have the potential to offer a service significantly more dependable than air and with less operating cost than both air and high-speed rail. Maglev transportation derives these benefits by using magnetic forces to suspend a vehicle 8 to 200 mm above the guideway. Magnetic forces are also used for propulsion and guidance. The combination of high speed, short headways, stringent ride quality requirements, and a distributed offboard propulsion system necessitates high levels of automation for the Maglev control and operation. Very high levels of safety and availability will be required for the Maglev control system. This paper describes the mission scenario, functional requirements, and dependability and performance requirements of the Maglev command, control, and communications system. A distributed hierarchical architecture consisting of vehicle on-board computers, wayside zone computers, a central computer facility, and communication links between these entities was synthesized to meet the functional and dependability requirements on the maglev. Two variations of the basic architecture are described: the Smart Vehicle Architecture (SVA) and the Zone Control Architecture (ZCA). Preliminary dependability modeling results are also presented.
Quantum error correction in crossbar architectures
NASA Astrophysics Data System (ADS)
Helsen, Jonas; Steudtner, Mark; Veldhorst, Menno; Wehner, Stephanie
2018-07-01
A central challenge for the scaling of quantum computing systems is the need to control all qubits in the system without a large overhead. A solution for this problem in classical computing comes in the form of so-called crossbar architectures. Recently we made a proposal for a large-scale quantum processor (Li et al arXiv:1711.03807 (2017)) to be implemented in silicon quantum dots. This system features a crossbar control architecture which limits parallel single-qubit control, but allows the scheme to overcome control scaling issues that form a major hurdle to large-scale quantum computing systems. In this work, we develop a language that makes it possible to easily map quantum circuits to crossbar systems, taking into account their architecture and control limitations. Using this language we show how to map well known quantum error correction codes such as the planar surface and color codes in this limited control setting with only a small overhead in time. We analyze the logical error behavior of this surface code mapping for estimated experimental parameters of the crossbar system and conclude that logical error suppression to a level useful for real quantum computation is feasible.
Robust Software Architecture for Robots
NASA Technical Reports Server (NTRS)
Aghazanian, Hrand; Baumgartner, Eric; Garrett, Michael
2009-01-01
Robust Real-Time Reconfigurable Robotics Software Architecture (R4SA) is the name of both a software architecture and software that embodies the architecture. The architecture was conceived in the spirit of current practice in designing modular, hard, realtime aerospace systems. The architecture facilitates the integration of new sensory, motor, and control software modules into the software of a given robotic system. R4SA was developed for initial application aboard exploratory mobile robots on Mars, but is adaptable to terrestrial robotic systems, real-time embedded computing systems in general, and robotic toys.
Sarikaya, Duygu; Corso, Jason J; Guru, Khurshid A
2017-07-01
Video understanding of robot-assisted surgery (RAS) videos is an active research area. Modeling the gestures and skill level of surgeons presents an interesting problem. The insights drawn may be applied in effective skill acquisition, objective skill assessment, real-time feedback, and human-robot collaborative surgeries. We propose a solution to the tool detection and localization open problem in RAS video understanding, using a strictly computer vision approach and the recent advances of deep learning. We propose an architecture using multimodal convolutional neural networks for fast detection and localization of tools in RAS videos. To the best of our knowledge, this approach will be the first to incorporate deep neural networks for tool detection and localization in RAS videos. Our architecture applies a region proposal network (RPN) and a multimodal two stream convolutional network for object detection to jointly predict objectness and localization on a fusion of image and temporal motion cues. Our results with an average precision of 91% and a mean computation time of 0.1 s per test frame detection indicate that our study is superior to conventionally used methods for medical imaging while also emphasizing the benefits of using RPN for precision and efficiency. We also introduce a new data set, ATLAS Dione, for RAS video understanding. Our data set provides video data of ten surgeons from Roswell Park Cancer Institute, Buffalo, NY, USA, performing six different surgical tasks on the daVinci Surgical System (dVSS) with annotations of robotic tools per frame.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of manymore » computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.« less
Parallel processing architecture for H.264 deblocking filter on multi-core platforms
NASA Astrophysics Data System (ADS)
Prasad, Durga P.; Sonachalam, Sekar; Kunchamwar, Mangesh K.; Gunupudi, Nageswara Rao
2012-03-01
Massively parallel computing (multi-core) chips offer outstanding new solutions that satisfy the increasing demand for high resolution and high quality video compression technologies such as H.264. Such solutions not only provide exceptional quality but also efficiency, low power, and low latency, previously unattainable in software based designs. While custom hardware and Application Specific Integrated Circuit (ASIC) technologies may achieve lowlatency, low power, and real-time performance in some consumer devices, many applications require a flexible and scalable software-defined solution. The deblocking filter in H.264 encoder/decoder poses difficult implementation challenges because of heavy data dependencies and the conditional nature of the computations. Deblocking filter implementations tend to be fixed and difficult to reconfigure for different needs. The ability to scale up for higher quality requirements such as 10-bit pixel depth or a 4:2:2 chroma format often reduces the throughput of a parallel architecture designed for lower feature set. A scalable architecture for deblocking filtering, created with a massively parallel processor based solution, means that the same encoder or decoder will be deployed in a variety of applications, at different video resolutions, for different power requirements, and at higher bit-depths and better color sub sampling patterns like YUV, 4:2:2, or 4:4:4 formats. Low power, software-defined encoders/decoders may be implemented using a massively parallel processor array, like that found in HyperX technology, with 100 or more cores and distributed memory. The large number of processor elements allows the silicon device to operate more efficiently than conventional DSP or CPU technology. This software programing model for massively parallel processors offers a flexible implementation and a power efficiency close to that of ASIC solutions. This work describes a scalable parallel architecture for an H.264 compliant deblocking filter for multi core platforms such as HyperX technology. Parallel techniques such as parallel processing of independent macroblocks, sub blocks, and pixel row level are examined in this work. The deblocking architecture consists of a basic cell called deblocking filter unit (DFU) and dependent data buffer manager (DFM). The DFU can be used in several instances, catering to different performance needs the DFM serves the data required for the different number of DFUs, and also manages all the neighboring data required for future data processing of DFUs. This approach achieves the scalability, flexibility, and performance excellence required in deblocking filters.
2006-07-14
MINDS: Architecture & Design Technical Report Department of Computer Science and Engineering University of Minnesota 4-192 EECS Building 200 Union...Street SE Minneapolis, MN 55455-0159 USA TR 06-022 MINDS: Architecture & Design Varun Chandola, Eric Eilertson, Levent Ertoz, Gyorgy Simon, and Vipin...REPORT DATE 14 JUL 2006 2. REPORT TYPE 3. DATES COVERED 00-07-2006 to 00-07-2006 4. TITLE AND SUBTITLE MINDS: Architecture & Design 5a
Seruya, Mitchel; Fisher, Mark; Rodriguez, Eduardo D
2013-11-01
There has been rising interest in computer-aided design/computer-aided manufacturing for preoperative planning and execution of osseous free flap reconstruction. The purpose of this study was to compare outcomes between computer-assisted and conventional fibula free flap techniques for craniofacial reconstruction. A two-center, retrospective review was carried out on patients who underwent fibula free flap surgery for craniofacial reconstruction from 2003 to 2012. Patients were categorized by the type of reconstructive technique: conventional (between 2003 and 2009) or computer-aided design/computer-aided manufacturing (from 2010 to 2012). Demographics, surgical factors, and perioperative and long-term outcomes were compared. A total of 68 patients underwent microsurgical craniofacial reconstruction: 58 conventional and 10 computer-aided design and manufacturing fibula free flaps. By demographics, patients undergoing the computer-aided design/computer-aided manufacturing method were significantly older and had a higher rate of radiotherapy exposure compared with conventional patients. Intraoperatively, the median number of osteotomies was significantly higher (2.0 versus 1.0, p=0.002) and the median ischemia time was significantly shorter (120 minutes versus 170 minutes, p=0.004) for the computer-aided design/computer-aided manufacturing technique compared with conventional techniques; operative times were shorter for patients undergoing the computer-aided design/computer-aided manufacturing technique, although this did not reach statistical significance. Perioperative and long-term outcomes were equivalent for the two groups, notably, hospital length of stay, recipient-site infection, partial and total flap loss, and rate of soft-tissue and bony tissue revisions. Microsurgical craniofacial reconstruction using a computer-assisted fibula flap technique yielded significantly shorter ischemia times amidst a higher number of osteotomies compared with conventional techniques. Therapeutic, III.
VTK-m: Accelerating the Visualization Toolkit for Massively Threaded Architectures
Moreland, Kenneth; Sewell, Christopher; Usher, William; ...
2016-05-09
Here, one of the most critical challenges for high-performance computing (HPC) scientific visualization is execution on massively threaded processors. Of the many fundamental changes we are seeing in HPC systems, one of the most profound is a reliance on new processor types optimized for execution bandwidth over latency hiding. Our current production scientific visualization software is not designed for these new types of architectures. To address this issue, the VTK-m framework serves as a container for algorithms, provides flexible data representation, and simplifies the design of visualization algorithms on new and future computer architecture.
VTK-m: Accelerating the Visualization Toolkit for Massively Threaded Architectures
Moreland, Kenneth; Sewell, Christopher; Usher, William; ...
2016-05-09
Execution on massively threaded processors is one of the most critical challenges for high-performance computing (HPC) scientific visualization. Of the many fundamental changes we are seeing in HPC systems, one of the most profound is a reliance on new processor types optimized for execution bandwidth over latency hiding. Moreover, our current production scientific visualization software is not designed for these new types of architectures. In order to address this issue, the VTK-m framework serves as a container for algorithms, provides flexible data representation, and simplifies the design of visualization algorithms on new and future computer architecture.
NASA Astrophysics Data System (ADS)
Kjærgaard, Thomas; Baudin, Pablo; Bykov, Dmytro; Eriksen, Janus Juul; Ettenhuber, Patrick; Kristensen, Kasper; Larkin, Jeff; Liakh, Dmitry; Pawłowski, Filip; Vose, Aaron; Wang, Yang Min; Jørgensen, Poul
2017-03-01
We present a scalable cross-platform hybrid MPI/OpenMP/OpenACC implementation of the Divide-Expand-Consolidate (DEC) formalism with portable performance on heterogeneous HPC architectures. The Divide-Expand-Consolidate formalism is designed to reduce the steep computational scaling of conventional many-body methods employed in electronic structure theory to linear scaling, while providing a simple mechanism for controlling the error introduced by this approximation. Our massively parallel implementation of this general scheme has three levels of parallelism, being a hybrid of the loosely coupled task-based parallelization approach and the conventional MPI +X programming model, where X is either OpenMP or OpenACC. We demonstrate strong and weak scalability of this implementation on heterogeneous HPC systems, namely on the GPU-based Cray XK7 Titan supercomputer at the Oak Ridge National Laboratory. Using the "resolution of the identity second-order Møller-Plesset perturbation theory" (RI-MP2) as the physical model for simulating correlated electron motion, the linear-scaling DEC implementation is applied to 1-aza-adamantane-trione (AAT) supramolecular wires containing up to 40 monomers (2440 atoms, 6800 correlated electrons, 24 440 basis functions and 91 280 auxiliary functions). This represents the largest molecular system treated at the MP2 level of theory, demonstrating an efficient removal of the scaling wall pertinent to conventional quantum many-body methods.
The Architecture of Information at Plateau Beaubourg
ERIC Educational Resources Information Center
Branda, Ewan Edward
2012-01-01
During the course of the 1960s, computers and information networks made their appearance in the public imagination. To architects on the cusp of architecture's postmodern turn, information technology offered new forms, metaphors, and techniques by which modern architecture's technological and utopian basis could be reasserted. Yet by the end of…
Federal Register 2010, 2011, 2012, 2013, 2014
2010-01-15
... design features associated with the architecture and connectivity capabilities of the airplane's computer... novel or unusual design features: digital systems architecture composed of several connected networks. The architecture and network configuration may be used for, or interfaced with, a diverse set of...
2014-11-01
Integrated Cognitive-neuroscience Architectures for Understanding Sensemaking (ICArUS): Phase 1 Test and Evaluation Development Guide Craig...Self-initiated sensemaking ........................................................................................... 19 Feature Vector Format: Tasks...The Integrated Cognitive-neuroscience Architectures for Understanding Sensemaking (ICArUS) Program aimed to build computational cognitive
Agent-Based Intelligent Interface for Wheelchair Movement Control
Barriuso, Alberto L.; De Paz, Juan F.
2018-01-01
People who suffer from any kind of motor difficulty face serious complications to autonomously move in their daily lives. However, a growing number research projects which propose different powered wheelchairs control systems are arising. Despite of the interest of the research community in the area, there is no platform that allows an easy integration of various control methods that make use of heterogeneous sensors and computationally demanding algorithms. In this work, an architecture based on virtual organizations of agents is proposed that makes use of a flexible and scalable communication protocol that allows the deployment of embedded agents in computationally limited devices. In order to validate the proper functioning of the proposed system, it has been integrated into a conventional wheelchair and a set of alternative control interfaces have been developed and deployed, including a portable electroencephalography system, a voice interface or as specifically designed smartphone application. A set of tests were conducted to test both the platform adequacy and the accuracy and ease of use of the proposed control systems yielding positive results that can be useful in further wheelchair interfaces design and implementation. PMID:29751603
Graphical User Interface in Art
NASA Astrophysics Data System (ADS)
Gwilt, Ian
This essay discusses the use of the Graphical User Interface (GUI) as a site of creative practice. By creatively repositioning the GUI as a work of art it is possible to challenge our understanding and expectations of the conventional computer interface wherein the icons and navigational architecture of the GUI no longer function as a technological tool. These artistic recontextualizations are often used to question our engagement with technology and to highlight the pivotal place that the domestic computer has taken in our everyday social, cultural and (increasingly), creative domains. Through these works the media specificity of the screen-based GUI can broken by dramatic changes in scale, form and configuration. This can be seen through the work of new media artists who have re-imagined the GUI in a number of creative forms both, within the digital, as image, animation, net and interactive art, and in the analogue, as print, painting, sculpture, installation and performative event. Furthermore as a creative work, the GUI can also be utilized as a visual way-finder to explore the relationship between the dynamic potentials of the digital and the concretized qualities of the material artifact.
Enabling NVM for Data-Intensive Scientific Services
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carns, Philip; Jenkins, John; Seo, Sangmin
Specialized, transient data services are playing an increasingly prominent role in data-intensive scientific computing. These services offer flexible, on-demand pairing of applications with storage hardware using semantics that are optimized for the problem domain. Concurrent with this trend, upcoming scientific computing and big data systems will be deployed with emerging NVM technology to achieve the highest possible price/productivity ratio. Clearly, therefore, we must develop techniques to facilitate the confluence of specialized data services and NVM technology. In this work we explore how to enable the composition of NVM resources within transient distributed services while still retaining their essential performance characteristics.more » Our approach involves eschewing the conventional distributed file system model and instead projecting NVM devices as remote microservices that leverage user-level threads, RPC services, RMA-enabled network transports, and persistent memory libraries in order to maximize performance. We describe a prototype system that incorporates these concepts, evaluate its performance for key workloads on an exemplar system, and discuss how the system can be leveraged as a component of future data-intensive architectures.« less
Avoiding and tolerating latency in large-scale next-generation shared-memory multiprocessors
NASA Technical Reports Server (NTRS)
Probst, David K.
1993-01-01
A scalable solution to the memory-latency problem is necessary to prevent the large latencies of synchronization and memory operations inherent in large-scale shared-memory multiprocessors from reducing high performance. We distinguish latency avoidance and latency tolerance. Latency is avoided when data is brought to nearby locales for future reference. Latency is tolerated when references are overlapped with other computation. Latency-avoiding locales include: processor registers, data caches used temporally, and nearby memory modules. Tolerating communication latency requires parallelism, allowing the overlap of communication and computation. Latency-tolerating techniques include: vector pipelining, data caches used spatially, prefetching in various forms, and multithreading in various forms. Relaxing the consistency model permits increased use of avoidance and tolerance techniques. Each model is a mapping from the program text to sets of partial orders on program operations; it is a convention about which temporal precedences among program operations are necessary. Information about temporal locality and parallelism constrains the use of avoidance and tolerance techniques. Suitable architectural primitives and compiler technology are required to exploit the increased freedom to reorder and overlap operations in relaxed models.
Multi-level Hierarchical Poly Tree computer architectures
NASA Technical Reports Server (NTRS)
Padovan, Joe; Gute, Doug
1990-01-01
Based on the concept of hierarchical substructuring, this paper develops an optimal multi-level Hierarchical Poly Tree (HPT) parallel computer architecture scheme which is applicable to the solution of finite element and difference simulations. Emphasis is given to minimizing computational effort, in-core/out-of-core memory requirements, and the data transfer between processors. In addition, a simplified communications network that reduces the number of I/O channels between processors is presented. HPT configurations that yield optimal superlinearities are also demonstrated. Moreover, to generalize the scope of applicability, special attention is given to developing: (1) multi-level reduction trees which provide an orderly/optimal procedure by which model densification/simplification can be achieved, as well as (2) methodologies enabling processor grading that yields architectures with varying types of multi-level granularity.
Benchmarking high performance computing architectures with CMS’ skeleton framework
NASA Astrophysics Data System (ADS)
Sexton-Kennedy, E.; Gartung, P.; Jones, C. D.
2017-10-01
In 2012 CMS evaluated which underlying concurrency technology would be the best to use for its multi-threaded framework. The available technologies were evaluated on the high throughput computing systems dominating the resources in use at that time. A skeleton framework benchmarking suite that emulates the tasks performed within a CMSSW application was used to select Intel’s Thread Building Block library, based on the measured overheads in both memory and CPU on the different technologies benchmarked. In 2016 CMS will get access to high performance computing resources that use new many core architectures; machines such as Cori Phase 1&2, Theta, Mira. Because of this we have revived the 2012 benchmark to test it’s performance and conclusions on these new architectures. This talk will discuss the results of this exercise.
FFT Computation with Systolic Arrays, A New Architecture
NASA Technical Reports Server (NTRS)
Boriakoff, Valentin
1994-01-01
The use of the Cooley-Tukey algorithm for computing the l-d FFT lends itself to a particular matrix factorization which suggests direct implementation by linearly-connected systolic arrays. Here we present a new systolic architecture that embodies this algorithm. This implementation requires a smaller number of processors and a smaller number of memory cells than other recent implementations, as well as having all the advantages of systolic arrays. For the implementation of the decimation-in-frequency case, word-serial data input allows continuous real-time operation without the need of a serial-to-parallel conversion device. No control or data stream switching is necessary. Computer simulation of this architecture was done in the context of a 1024 point DFT with a fixed point processor, and CMOS processor implementation has started.
NETRA: A parallel architecture for integrated vision systems. 1: Architecture and organization
NASA Technical Reports Server (NTRS)
Choudhary, Alok N.; Patel, Janak H.; Ahuja, Narendra
1989-01-01
Computer vision is regarded as one of the most complex and computationally intensive problems. An integrated vision system (IVS) is considered to be a system that uses vision algorithms from all levels of processing for a high level application (such as object recognition). A model of computation is presented for parallel processing for an IVS. Using the model, desired features and capabilities of a parallel architecture suitable for IVSs are derived. Then a multiprocessor architecture (called NETRA) is presented. This architecture is highly flexible without the use of complex interconnection schemes. The topology of NETRA is recursively defined and hence is easily scalable from small to large systems. Homogeneity of NETRA permits fault tolerance and graceful degradation under faults. It is a recursively defined tree-type hierarchical architecture where each of the leaf nodes consists of a cluster of processors connected with a programmable crossbar with selective broadcast capability to provide for desired flexibility. A qualitative evaluation of NETRA is presented. Then general schemes are described to map parallel algorithms onto NETRA. Algorithms are classified according to their communication requirements for parallel processing. An extensive analysis of inter-cluster communication strategies in NETRA is presented, and parameters affecting performance of parallel algorithms when mapped on NETRA are discussed. Finally, a methodology to evaluate performance of algorithms on NETRA is described.
The Nebula Standard Computer Architecture,
good target for high level languages, the designers also adopted a visibility approach in architecture design that provides more freedom for the hardware implementor while still maintaining software portability. (Author)
Silicon photonic integrated circuits with electrically programmable non-volatile memory functions.
Song, J-F; Lim, A E-J; Luo, X-S; Fang, Q; Li, C; Jia, L X; Tu, X-G; Huang, Y; Zhou, H-F; Liow, T-Y; Lo, G-Q
2016-09-19
Conventional silicon photonic integrated circuits do not normally possess memory functions, which require on-chip power in order to maintain circuit states in tuned or field-configured switching routes. In this context, we present an electrically programmable add/drop microring resonator with a wavelength shift of 426 pm between the ON/OFF states. Electrical pulses are used to control the choice of the state. Our experimental results show a wavelength shift of 2.8 pm/ms and a light intensity variation of ~0.12 dB/ms for a fixed wavelength in the OFF state. Theoretically, our device can accommodate up to 65 states of multi-level memory functions. Such memory functions can be integrated into wavelength division mutiplexing (WDM) filters and applied to optical routers and computing architectures fulfilling large data downloading demands.
SVGA and XGA active matrix microdisplays for head-mounted applications
NASA Astrophysics Data System (ADS)
Alvelda, Phillip; Bolotski, Michael; Brown, Imani L.
2000-03-01
The MicroDisplay Corporation's liquid crystal on silicon (LCOS) display devices are based on the union of several technologies with the extreme integration capability of conventionally fabricated CMOS substrates. The fast liquid crystal operation modes and new scalable high-performance pixel addressing architectures presented in this paper enable substantially improved color, contrast, and brightness while still satisfying the optical, packaging, and power requirements of portable applications. The entire suite of MicroDisplay's technologies was devised to create a line of mixed-signal application-specific integrated circuits (ASICs) in single-chip display systems. Mixed-signal circuits can integrate computing, memory, and communication circuitry on the same substrate as the display drivers and pixel array for a multifunctional complete system-on-a-chip. System-on-a-chip benefits also include reduced head supported weight requirements through the elimination of off-chip drive electronics.
High speed all-optical networks
NASA Technical Reports Server (NTRS)
Chlamtac, Imrich
1993-01-01
An inherent problem of conventional point-to-point WAN architectures is that they cannot translate optical transmission bandwidth into comparable user available throughput due to the limiting electronic processing speed of the switching nodes. This report presents the first solution to WDM based WAN networks that overcomes this limitation. The proposed Lightnet architecture takes into account the idiosyncrasies of WDM switching/transmission leading to an efficient and pragmatic solution. The Lightnet architecture trades the ample WDM bandwidth for a reduction in the number of processing stages and a simplification of each switching stage, leading to drastically increased effective network throughputs.
Advanced control architecture for autonomous vehicles
NASA Astrophysics Data System (ADS)
Maurer, Markus; Dickmanns, Ernst D.
1997-06-01
An advanced control architecture for autonomous vehicles is presented. The hierarchical architecture consists of four levels: a vehicle level, a control level, a rule-based level and a knowledge-based level. A special focus is on forms of internal representation, which have to be chosen adequately for each level. The control scheme is applied to VaMP, a Mercedes passenger car which autonomously performs missions on German freeways. VaMP perceives the environment with its sense of vision and conventional sensors. It controls its actuators for locomotion and attention focusing. Modules for perception, cognition and action are discussed.
Automation Hooks Architecture for Flexible Test Orchestration - Concept Development and Validation
NASA Technical Reports Server (NTRS)
Lansdowne, C. A.; Maclean, John R.; Winton, Chris; McCartney, Pat
2011-01-01
The Automation Hooks Architecture Trade Study for Flexible Test Orchestration sought a standardized data-driven alternative to conventional automated test programming interfaces. The study recommended composing the interface using multicast DNS (mDNS/SD) service discovery, Representational State Transfer (Restful) Web Services, and Automatic Test Markup Language (ATML). We describe additional efforts to rapidly mature the Automation Hooks Architecture candidate interface definition by validating it in a broad spectrum of applications. These activities have allowed us to further refine our concepts and provide observations directed toward objectives of economy, scalability, versatility, performance, severability, maintainability, scriptability and others.
NASA Astrophysics Data System (ADS)
Nishiura, Daisuke; Furuichi, Mikito; Sakaguchi, Hide
2015-09-01
The computational performance of a smoothed particle hydrodynamics (SPH) simulation is investigated for three types of current shared-memory parallel computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in efficient shared-memory allocation methods for each chipset, because the efficient data access patterns differ between compute unified device architecture (CUDA) programming for GPUs and OpenMP programming for MIC processors and multi-core CPUs. We first introduce several parallel implementation techniques for the SPH code, and then examine these on our target computer architectures to determine the most effective algorithms for each processor unit. In addition, we evaluate the effective computing performance and power efficiency of the SPH simulation on each architecture, as these are critical metrics for overall performance in a multi-device environment. In our benchmark test, the GPU is found to produce the best arithmetic performance as a standalone device unit, and gives the most efficient power consumption. The multi-core CPU obtains the most effective computing performance. The computational speed of the MIC processor on Xeon Phi approached that of two Xeon CPUs. This indicates that using MICs is an attractive choice for existing SPH codes on multi-core CPUs parallelized by OpenMP, as it gains computational acceleration without the need for significant changes to the source code.
GPU and APU computations of Finite Time Lyapunov Exponent fields
NASA Astrophysics Data System (ADS)
Conti, Christian; Rossinelli, Diego; Koumoutsakos, Petros
2012-03-01
We present GPU and APU accelerated computations of Finite-Time Lyapunov Exponent (FTLE) fields. The calculation of FTLEs is a computationally intensive process, as in order to obtain the sharp ridges associated with the Lagrangian Coherent Structures an extensive resampling of the flow field is required. The computational performance of this resampling is limited by the memory bandwidth of the underlying computer architecture. The present technique harnesses data-parallel execution of many-core architectures and relies on fast and accurate evaluations of moment conserving functions for the mesh to particle interpolations. We demonstrate how the computation of FTLEs can be efficiently performed on a GPU and on an APU through OpenCL and we report over one order of magnitude improvements over multi-threaded executions in FTLE computations of bluff body flows.
A mixed acid based vanadium-cerium redox flow battery with a zero-gap serpentine architecture
NASA Astrophysics Data System (ADS)
Leung, P. K.; Mohamed, M. R.; Shah, A. A.; Xu, Q.; Conde-Duran, M. B.
2015-01-01
This paper presents the performance of a vanadium-cerium redox flow battery using conventional and zero-gap serpentine architectures. Mixed-acid solutions based on methanesulfonate-sulfate anions (molar ratio 3:1) are used to enhance the solubilities of the vanadium (>2.0 mol dm-3) and cerium species (>0.8 mol dm-3), thus achieving an energy density (c.a. 28 Wh dm-3) comparable to that of conventional all-vanadium redox flow batteries (20-30 Wh dm-3). Electrochemical studies, including cyclic voltammetry and galvanostatic cycling, show that both vanadium and cerium active species are suitable for energy storage applications in these electrolytes. To take advantage of the high open-circuit voltage (1.78 V), improved mass transport and reduced internal resistance are facilitated by the use of zero-gap flow field architecture, which yields a power density output of the battery of up to 370 mW cm-2 at a state-of-charge of 50%. In a charge-discharge cycle at 200 mA cm-2, the vanadium-cerium redox flow battery with the zero-gap architecture is observed to discharge at a cell voltage of c.a. 1.35 V with a coulombic efficiency of up to 78%.