Memristor-Based Synapse Design and Training Scheme for Neuromorphic Computing Architecture
2012-06-01
system level built upon the conventional Von Neumann computer architecture [2][3]. Developing the neuromorphic architecture at chip level by...SCHEME FOR NEUROMORPHIC COMPUTING ARCHITECTURE 5a. CONTRACT NUMBER FA8750-11-2-0046 5b. GRANT NUMBER N/A 5c. PROGRAM ELEMENT NUMBER 62788F 6...creation of memristor-based neuromorphic computing architecture. Rather than the existing crossbar-based neuron network designs, we focus on memristor
Memristor-Based Computing Architecture: Design Methodologies and Circuit Techniques
2013-03-01
MEMRISTOR-BASED COMPUTING ARCHITECTURE : DESIGN METHODOLOGIES AND CIRCUIT TECHNIQUES POLYTECHNIC INSTITUTE OF NEW YORK UNIVERSITY...TECHNICAL REPORT 3. DATES COVERED (From - To) OCT 2010 – OCT 2012 4. TITLE AND SUBTITLE MEMRISTOR-BASED COMPUTING ARCHITECTURE : DESIGN METHODOLOGIES...schemes for a memristor-based reconfigurable architecture design have not been fully explored yet. Therefore, in this project, we investigated
Distributed Computing Architecture for Image-Based Wavefront Sensing and 2 D FFTs
NASA Technical Reports Server (NTRS)
Smith, Jeffrey S.; Dean, Bruce H.; Haghani, Shadan
2006-01-01
Image-based wavefront sensing (WFS) provides significant advantages over interferometric-based wavefi-ont sensors such as optical design simplicity and stability. However, the image-based approach is computational intensive, and therefore, specialized high-performance computing architectures are required in applications utilizing the image-based approach. The development and testing of these high-performance computing architectures are essential to such missions as James Webb Space Telescope (JWST), Terrestial Planet Finder-Coronagraph (TPF-C and CorSpec), and Spherical Primary Optical Telescope (SPOT). The development of these specialized computing architectures require numerous two-dimensional Fourier Transforms, which necessitate an all-to-all communication when applied on a distributed computational architecture. Several solutions for distributed computing are presented with an emphasis on a 64 Node cluster of DSPs, multiple DSP FPGAs, and an application of low-diameter graph theory. Timing results and performance analysis will be presented. The solutions offered could be applied to other all-to-all communication and scientifically computationally complex problems.
Biomimetic design processes in architecture: morphogenetic and evolutionary computational design.
Menges, Achim
2012-03-01
Design computation has profound impact on architectural design methods. This paper explains how computational design enables the development of biomimetic design processes specific to architecture, and how they need to be significantly different from established biomimetic processes in engineering disciplines. The paper first explains the fundamental difference between computer-aided and computational design in architecture, as the understanding of this distinction is of critical importance for the research presented. Thereafter, the conceptual relation and possible transfer of principles from natural morphogenesis to design computation are introduced and the related developments of generative, feature-based, constraint-based, process-based and feedback-based computational design methods are presented. This morphogenetic design research is then related to exploratory evolutionary computation, followed by the presentation of two case studies focusing on the exemplary development of spatial envelope morphologies and urban block morphologies.
NASA Astrophysics Data System (ADS)
Jiang, Yuning; Kang, Jinfeng; Wang, Xinan
2017-03-01
Resistive switching memory (RRAM) is considered as one of the most promising devices for parallel computing solutions that may overcome the von Neumann bottleneck of today’s electronic systems. However, the existing RRAM-based parallel computing architectures suffer from practical problems such as device variations and extra computing circuits. In this work, we propose a novel parallel computing architecture for pattern recognition by implementing k-nearest neighbor classification on metal-oxide RRAM crossbar arrays. Metal-oxide RRAM with gradual RESET behaviors is chosen as both the storage and computing components. The proposed architecture is tested by the MNIST database. High speed (~100 ns per example) and high recognition accuracy (97.05%) are obtained. The influence of several non-ideal device properties is also discussed, and it turns out that the proposed architecture shows great tolerance to device variations. This work paves a new way to achieve RRAM-based parallel computing hardware systems with high performance.
A new software-based architecture for quantum computer
NASA Astrophysics Data System (ADS)
Wu, Nan; Song, FangMin; Li, Xiangdong
2010-04-01
In this paper, we study a reliable architecture of a quantum computer and a new instruction set and machine language for the architecture, which can improve the performance and reduce the cost of the quantum computing. We also try to address some key issues in detail in the software-driven universal quantum computers.
Collaborative Working Architecture for IoT-Based Applications.
Mora, Higinio; Signes-Pont, María Teresa; Gil, David; Johnsson, Magnus
2018-05-23
The new sensing applications need enhanced computing capabilities to handle the requirements of complex and huge data processing. The Internet of Things (IoT) concept brings processing and communication features to devices. In addition, the Cloud Computing paradigm provides resources and infrastructures for performing the computations and outsourcing the work from the IoT devices. This scenario opens new opportunities for designing advanced IoT-based applications, however, there is still much research to be done to properly gear all the systems for working together. This work proposes a collaborative model and an architecture to take advantage of the available computing resources. The resulting architecture involves a novel network design with different levels which combines sensing and processing capabilities based on the Mobile Cloud Computing (MCC) paradigm. An experiment is included to demonstrate that this approach can be used in diverse real applications. The results show the flexibility of the architecture to perform complex computational tasks of advanced applications.
Cognitive Architectures and Human-Computer Interaction. Introduction to Special Issue.
ERIC Educational Resources Information Center
Gray, Wayne D.; Young, Richard M.; Kirschenbaum, Susan S.
1997-01-01
In this introduction to a special issue on cognitive architectures and human-computer interaction (HCI), editors and contributors provide a brief overview of cognitive architectures. The following four architectures represented by articles in this issue are: Soar; LICAI (linked model of comprehension-based action planning and instruction taking);…
Brain architecture: a design for natural computation.
Kaiser, Marcus
2007-12-15
Fifty years ago, John von Neumann compared the architecture of the brain with that of the computers he invented and which are still in use today. In those days, the organization of computers was based on concepts of brain organization. Here, we give an update on current results on the global organization of neural systems. For neural systems, we outline how the spatial and topological architecture of neuronal and cortical networks facilitates robustness against failures, fast processing and balanced network activation. Finally, we discuss mechanisms of self-organization for such architectures. After all, the organization of the brain might again inspire computer architecture.
Recursive computer architecture for VLSI
DOE Office of Scientific and Technical Information (OSTI.GOV)
Treleaven, P.C.; Hopkins, R.P.
1982-01-01
A general-purpose computer architecture based on the concept of recursion and suitable for VLSI computer systems built from replicated (lego-like) computing elements is presented. The recursive computer architecture is defined by presenting a program organisation, a machine organisation and an experimental machine implementation oriented to VLSI. The experimental implementation is being restricted to simple, identical microcomputers each containing a memory, a processor and a communications capability. This future generation of lego-like computer systems are termed fifth generation computers by the Japanese. 30 references.
NASA Astrophysics Data System (ADS)
Liu, Chen; Han, Runze; Zhou, Zheng; Huang, Peng; Liu, Lifeng; Liu, Xiaoyan; Kang, Jinfeng
2018-04-01
In this work we present a novel convolution computing architecture based on metal oxide resistive random access memory (RRAM) to process the image data stored in the RRAM arrays. The proposed image storage architecture shows performances of better speed-device consumption efficiency compared with the previous kernel storage architecture. Further we improve the architecture for a high accuracy and low power computing by utilizing the binary storage and the series resistor. For a 28 × 28 image and 10 kernels with a size of 3 × 3, compared with the previous kernel storage approach, the newly proposed architecture shows excellent performances including: 1) almost 100% accuracy within 20% LRS variation and 90% HRS variation; 2) more than 67 times speed boost; 3) 71.4% energy saving.
Advanced computer architecture specification for automated weld systems
NASA Technical Reports Server (NTRS)
Katsinis, Constantine
1994-01-01
This report describes the requirements for an advanced automated weld system and the associated computer architecture, and defines the overall system specification from a broad perspective. According to the requirements of welding procedures as they relate to an integrated multiaxis motion control and sensor architecture, the computer system requirements are developed based on a proven multiple-processor architecture with an expandable, distributed-memory, single global bus architecture, containing individual processors which are assigned to specific tasks that support sensor or control processes. The specified architecture is sufficiently flexible to integrate previously developed equipment, be upgradable and allow on-site modifications.
Computer vision camera with embedded FPGA processing
NASA Astrophysics Data System (ADS)
Lecerf, Antoine; Ouellet, Denis; Arias-Estrada, Miguel
2000-03-01
Traditional computer vision is based on a camera-computer system in which the image understanding algorithms are embedded in the computer. To circumvent the computational load of vision algorithms, low-level processing and imaging hardware can be integrated in a single compact module where a dedicated architecture is implemented. This paper presents a Computer Vision Camera based on an open architecture implemented in an FPGA. The system is targeted to real-time computer vision tasks where low level processing and feature extraction tasks can be implemented in the FPGA device. The camera integrates a CMOS image sensor, an FPGA device, two memory banks, and an embedded PC for communication and control tasks. The FPGA device is a medium size one equivalent to 25,000 logic gates. The device is connected to two high speed memory banks, an IS interface, and an imager interface. The camera can be accessed for architecture programming, data transfer, and control through an Ethernet link from a remote computer. A hardware architecture can be defined in a Hardware Description Language (like VHDL), simulated and synthesized into digital structures that can be programmed into the FPGA and tested on the camera. The architecture of a classical multi-scale edge detection algorithm based on a Laplacian of Gaussian convolution has been developed to show the capabilities of the system.
Architecture-Adaptive Computing Environment: A Tool for Teaching Parallel Programming
NASA Technical Reports Server (NTRS)
Dorband, John E.; Aburdene, Maurice F.
2002-01-01
Recently, networked and cluster computation have become very popular. This paper is an introduction to a new C based parallel language for architecture-adaptive programming, aCe C. The primary purpose of aCe (Architecture-adaptive Computing Environment) is to encourage programmers to implement applications on parallel architectures by providing them the assurance that future architectures will be able to run their applications with a minimum of modification. A secondary purpose is to encourage computer architects to develop new types of architectures by providing an easily implemented software development environment and a library of test applications. This new language should be an ideal tool to teach parallel programming. In this paper, we will focus on some fundamental features of aCe C.
Generic Divide and Conquer Internet-Based Computing
NASA Technical Reports Server (NTRS)
Radenski, Atanas; Follen, Gregory J. (Technical Monitor)
2001-01-01
The rapid growth of internet-based applications and the proliferation of networking technologies have been transforming traditional commercial application areas as well as computer and computational sciences and engineering. This growth stimulates the exploration of new, internet-oriented software technologies that can open new research and application opportunities not only for the commercial world, but also for the scientific and high -performance computing applications community. The general goal of this research project is to contribute to better understanding of the transition to internet-based high -performance computing and to develop solutions for some of the difficulties of this transition. More specifically, our goal is to design an architecture for generic divide and conquer internet-based computing, to develop a portable implementation of this architecture, to create an example library of high-performance divide-and-conquer computing agents that run on top of this architecture, and to evaluate the performance of these agents. We have been designing an architecture that incorporates a master task-pool server and utilizes satellite computational servers that operate on the Internet in a dynamically changing large configuration of lower-end nodes provided by volunteer contributors. Our designed architecture is intended to be complementary to and accessible from computational grids such as Globus, Legion, and Condor. Grids provide remote access to existing high-end computing resources; in contrast, our goal is to utilize idle processor time of lower-end internet nodes. Our project is focused on a generic divide-and-conquer paradigm and its applications that operate on a loose and ever changing pool of lower-end internet nodes.
A Low Cost VLSI Architecture for Spike Sorting Based on Feature Extraction with Peak Search.
Chang, Yuan-Jyun; Hwang, Wen-Jyi; Chen, Chih-Chang
2016-12-07
The goal of this paper is to present a novel VLSI architecture for spike sorting with high classification accuracy, low area costs and low power consumption. A novel feature extraction algorithm with low computational complexities is proposed for the design of the architecture. In the feature extraction algorithm, a spike is separated into two portions based on its peak value. The area of each portion is then used as a feature. The algorithm is simple to implement and less susceptible to noise interference. Based on the algorithm, a novel architecture capable of identifying peak values and computing spike areas concurrently is proposed. To further accelerate the computation, a spike can be divided into a number of segments for the local feature computation. The local features are subsequently merged with the global ones by a simple hardware circuit. The architecture can also be easily operated in conjunction with the circuits for commonly-used spike detection algorithms, such as the Non-linear Energy Operator (NEO). The architecture has been implemented by an Application-Specific Integrated Circuit (ASIC) with 90-nm technology. Comparisons to the existing works show that the proposed architecture is well suited for real-time multi-channel spike detection and feature extraction requiring low hardware area costs, low power consumption and high classification accuracy.
Blueprint for a microwave trapped ion quantum computer.
Lekitsch, Bjoern; Weidt, Sebastian; Fowler, Austin G; Mølmer, Klaus; Devitt, Simon J; Wunderlich, Christof; Hensinger, Winfried K
2017-02-01
The availability of a universal quantum computer may have a fundamental impact on a vast number of research fields and on society as a whole. An increasingly large scientific and industrial community is working toward the realization of such a device. An arbitrarily large quantum computer may best be constructed using a modular approach. We present a blueprint for a trapped ion-based scalable quantum computer module, making it possible to create a scalable quantum computer architecture based on long-wavelength radiation quantum gates. The modules control all operations as stand-alone units, are constructed using silicon microfabrication techniques, and are within reach of current technology. To perform the required quantum computations, the modules make use of long-wavelength radiation-based quantum gate technology. To scale this microwave quantum computer architecture to a large size, we present a fully scalable design that makes use of ion transport between different modules, thereby allowing arbitrarily many modules to be connected to construct a large-scale device. A high error-threshold surface error correction code can be implemented in the proposed architecture to execute fault-tolerant operations. With appropriate adjustments, the proposed modules are also suitable for alternative trapped ion quantum computer architectures, such as schemes using photonic interconnects.
ERIC Educational Resources Information Center
Arumi, Francisco N.
Computer programs capable of describing the thermal behavior of buildings are used to help architectural students understand environmental systems. The Numerical Simulation Laboratory at the Architectural School of the University of Texas at Austin was developed to provide the necessary software capable of simulating the energy transactions…
A computer architecture for intelligent machines
NASA Technical Reports Server (NTRS)
Lefebvre, D. R.; Saridis, G. N.
1992-01-01
The theory of intelligent machines proposes a hierarchical organization for the functions of an autonomous robot based on the principle of increasing precision with decreasing intelligence. An analytic formulation of this theory using information-theoretic measures of uncertainty for each level of the intelligent machine has been developed. The authors present a computer architecture that implements the lower two levels of the intelligent machine. The architecture supports an event-driven programming paradigm that is independent of the underlying computer architecture and operating system. Execution-level controllers for motion and vision systems are briefly addressed, as well as the Petri net transducer software used to implement coordination-level functions. A case study illustrates how this computer architecture integrates real-time and higher-level control of manipulator and vision systems.
Hardware architecture design of image restoration based on time-frequency domain computation
NASA Astrophysics Data System (ADS)
Wen, Bo; Zhang, Jing; Jiao, Zipeng
2013-10-01
The image restoration algorithms based on time-frequency domain computation is high maturity and applied widely in engineering. To solve the high-speed implementation of these algorithms, the TFDC hardware architecture is proposed. Firstly, the main module is designed, by analyzing the common processing and numerical calculation. Then, to improve the commonality, the iteration control module is planed for iterative algorithms. In addition, to reduce the computational cost and memory requirements, the necessary optimizations are suggested for the time-consuming module, which include two-dimensional FFT/IFFT and the plural calculation. Eventually, the TFDC hardware architecture is adopted for hardware design of real-time image restoration system. The result proves that, the TFDC hardware architecture and its optimizations can be applied to image restoration algorithms based on TFDC, with good algorithm commonality, hardware realizability and high efficiency.
Toward a Fault Tolerant Architecture for Vital Medical-Based Wearable Computing.
Abdali-Mohammadi, Fardin; Bajalan, Vahid; Fathi, Abdolhossein
2015-12-01
Advancements in computers and electronic technologies have led to the emergence of a new generation of efficient small intelligent systems. The products of such technologies might include Smartphones and wearable devices, which have attracted the attention of medical applications. These products are used less in critical medical applications because of their resource constraint and failure sensitivity. This is due to the fact that without safety considerations, small-integrated hardware will endanger patients' lives. Therefore, proposing some principals is required to construct wearable systems in healthcare so that the existing concerns are dealt with. Accordingly, this paper proposes an architecture for constructing wearable systems in critical medical applications. The proposed architecture is a three-tier one, supporting data flow from body sensors to cloud. The tiers of this architecture include wearable computers, mobile computing, and mobile cloud computing. One of the features of this architecture is its high possible fault tolerance due to the nature of its components. Moreover, the required protocols are presented to coordinate the components of this architecture. Finally, the reliability of this architecture is assessed by simulating the architecture and its components, and other aspects of the proposed architecture are discussed.
Blueprint for a microwave trapped ion quantum computer
Lekitsch, Bjoern; Weidt, Sebastian; Fowler, Austin G.; Mølmer, Klaus; Devitt, Simon J.; Wunderlich, Christof; Hensinger, Winfried K.
2017-01-01
The availability of a universal quantum computer may have a fundamental impact on a vast number of research fields and on society as a whole. An increasingly large scientific and industrial community is working toward the realization of such a device. An arbitrarily large quantum computer may best be constructed using a modular approach. We present a blueprint for a trapped ion–based scalable quantum computer module, making it possible to create a scalable quantum computer architecture based on long-wavelength radiation quantum gates. The modules control all operations as stand-alone units, are constructed using silicon microfabrication techniques, and are within reach of current technology. To perform the required quantum computations, the modules make use of long-wavelength radiation–based quantum gate technology. To scale this microwave quantum computer architecture to a large size, we present a fully scalable design that makes use of ion transport between different modules, thereby allowing arbitrarily many modules to be connected to construct a large-scale device. A high error–threshold surface error correction code can be implemented in the proposed architecture to execute fault-tolerant operations. With appropriate adjustments, the proposed modules are also suitable for alternative trapped ion quantum computer architectures, such as schemes using photonic interconnects. PMID:28164154
Three-Dimensional Nanobiocomputing Architectures With Neuronal Hypercells
2007-06-01
Neumann architectures, and CMOS fabrication. Novel solutions of massive parallel distributed computing and processing (pipelined due to systolic... and processing platforms utilizing molecular hardware within an enabling organization and architecture. The design technology is based on utilizing a...Microsystems and Nanotechnologies investigated a novel 3D3 (Hardware Software Nanotechnology) technology to design super-high performance computing
A supportive architecture for CFD-based design optimisation
NASA Astrophysics Data System (ADS)
Li, Ni; Su, Zeya; Bi, Zhuming; Tian, Chao; Ren, Zhiming; Gong, Guanghong
2014-03-01
Multi-disciplinary design optimisation (MDO) is one of critical methodologies to the implementation of enterprise systems (ES). MDO requiring the analysis of fluid dynamics raises a special challenge due to its extremely intensive computation. The rapid development of computational fluid dynamic (CFD) technique has caused a rise of its applications in various fields. Especially for the exterior designs of vehicles, CFD has become one of the three main design tools comparable to analytical approaches and wind tunnel experiments. CFD-based design optimisation is an effective way to achieve the desired performance under the given constraints. However, due to the complexity of CFD, integrating with CFD analysis in an intelligent optimisation algorithm is not straightforward. It is a challenge to solve a CFD-based design problem, which is usually with high dimensions, and multiple objectives and constraints. It is desirable to have an integrated architecture for CFD-based design optimisation. However, our review on existing works has found that very few researchers have studied on the assistive tools to facilitate CFD-based design optimisation. In the paper, a multi-layer architecture and a general procedure are proposed to integrate different CFD toolsets with intelligent optimisation algorithms, parallel computing technique and other techniques for efficient computation. In the proposed architecture, the integration is performed either at the code level or data level to fully utilise the capabilities of different assistive tools. Two intelligent algorithms are developed and embedded with parallel computing. These algorithms, together with the supportive architecture, lay a solid foundation for various applications of CFD-based design optimisation. To illustrate the effectiveness of the proposed architecture and algorithms, the case studies on aerodynamic shape design of a hypersonic cruising vehicle are provided, and the result has shown that the proposed architecture and developed algorithms have performed successfully and efficiently in dealing with the design optimisation with over 200 design variables.
Takeda, Shuntaro; Furusawa, Akira
2017-09-22
We propose a scalable scheme for optical quantum computing using measurement-induced continuous-variable quantum gates in a loop-based architecture. Here, time-bin-encoded quantum information in a single spatial mode is deterministically processed in a nested loop by an electrically programmable gate sequence. This architecture can process any input state and an arbitrary number of modes with almost minimum resources, and offers a universal gate set for both qubits and continuous variables. Furthermore, quantum computing can be performed fault tolerantly by a known scheme for encoding a qubit in an infinite-dimensional Hilbert space of a single light mode.
NASA Astrophysics Data System (ADS)
Takeda, Shuntaro; Furusawa, Akira
2017-09-01
We propose a scalable scheme for optical quantum computing using measurement-induced continuous-variable quantum gates in a loop-based architecture. Here, time-bin-encoded quantum information in a single spatial mode is deterministically processed in a nested loop by an electrically programmable gate sequence. This architecture can process any input state and an arbitrary number of modes with almost minimum resources, and offers a universal gate set for both qubits and continuous variables. Furthermore, quantum computing can be performed fault tolerantly by a known scheme for encoding a qubit in an infinite-dimensional Hilbert space of a single light mode.
Liu, Kui; Wei, Sixiao; Chen, Zhijiang; Jia, Bin; Chen, Genshe; Ling, Haibin; Sheaff, Carolyn; Blasch, Erik
2017-01-01
This paper presents the first attempt at combining Cloud with Graphic Processing Units (GPUs) in a complementary manner within the framework of a real-time high performance computation architecture for the application of detecting and tracking multiple moving targets based on Wide Area Motion Imagery (WAMI). More specifically, the GPU and Cloud Moving Target Tracking (GC-MTT) system applied a front-end web based server to perform the interaction with Hadoop and highly parallelized computation functions based on the Compute Unified Device Architecture (CUDA©). The introduced multiple moving target detection and tracking method can be extended to other applications such as pedestrian tracking, group tracking, and Patterns of Life (PoL) analysis. The cloud and GPUs based computing provides an efficient real-time target recognition and tracking approach as compared to methods when the work flow is applied using only central processing units (CPUs). The simultaneous tracking and recognition results demonstrate that a GC-MTT based approach provides drastically improved tracking with low frame rates over realistic conditions. PMID:28208684
Liu, Kui; Wei, Sixiao; Chen, Zhijiang; Jia, Bin; Chen, Genshe; Ling, Haibin; Sheaff, Carolyn; Blasch, Erik
2017-02-12
This paper presents the first attempt at combining Cloud with Graphic Processing Units (GPUs) in a complementary manner within the framework of a real-time high performance computation architecture for the application of detecting and tracking multiple moving targets based on Wide Area Motion Imagery (WAMI). More specifically, the GPU and Cloud Moving Target Tracking (GC-MTT) system applied a front-end web based server to perform the interaction with Hadoop and highly parallelized computation functions based on the Compute Unified Device Architecture (CUDA©). The introduced multiple moving target detection and tracking method can be extended to other applications such as pedestrian tracking, group tracking, and Patterns of Life (PoL) analysis. The cloud and GPUs based computing provides an efficient real-time target recognition and tracking approach as compared to methods when the work flow is applied using only central processing units (CPUs). The simultaneous tracking and recognition results demonstrate that a GC-MTT based approach provides drastically improved tracking with low frame rates over realistic conditions.
Computer graphics in architecture and engineering
NASA Technical Reports Server (NTRS)
Greenberg, D. P.
1975-01-01
The present status of the application of computer graphics to the building profession or architecture and its relationship to other scientific and technical areas were discussed. It was explained that, due to the fragmented nature of architecture and building activities (in contrast to the aerospace industry), a comprehensive, economic utilization of computer graphics in this area is not practical and its true potential cannot now be realized due to the present inability of architects and structural, mechanical, and site engineers to rely on a common data base. Future emphasis will therefore have to be placed on a vertical integration of the construction process and effective use of a three-dimensional data base, rather than on waiting for any technological breakthrough in interactive computing.
Innovative architectures for dense multi-microprocessor computers
NASA Technical Reports Server (NTRS)
Larson, Robert E.
1989-01-01
The purpose is to summarize a Phase 1 SBIR project performed for the NASA/Langley Computational Structural Mechanics Group. The project was performed from February to August 1987. The main objectives of the project were to: (1) expand upon previous research into the application of chordal ring architectures to the general problem of designing multi-microcomputer architectures, (2) attempt to identify a family of chordal rings such that each chordal ring can be simply expanded to produce the next member of the family, (3) perform a preliminary, high-level design of an expandable multi-microprocessor computer based upon chordal rings, (4) analyze the potential use of chordal ring based multi-microprocessors for sparse matrix problems and other applications arising in computational structural mechanics.
Enterprise application architecture development based on DoDAF and TOGAF
NASA Astrophysics Data System (ADS)
Tao, Zhi-Gang; Luo, Yun-Feng; Chen, Chang-Xin; Wang, Ming-Zhe; Ni, Feng
2017-05-01
For the purpose of supporting the design and analysis of enterprise application architecture, here, we report a tailored enterprise application architecture description framework and its corresponding design method. The presented framework can effectively support service-oriented architecting and cloud computing by creating the metadata model based on architecture content framework (ACF), DoDAF metamodel (DM2) and Cloud Computing Modelling Notation (CCMN). The framework also makes an effort to extend and improve the mapping between The Open Group Architecture Framework (TOGAF) application architectural inputs/outputs, deliverables and Department of Defence Architecture Framework (DoDAF)-described models. The roadmap of 52 DoDAF-described models is constructed by creating the metamodels of these described models and analysing the constraint relationship among metamodels. By combining the tailored framework and the roadmap, this article proposes a service-oriented enterprise application architecture development process. Finally, a case study is presented to illustrate the results of implementing the tailored framework in the Southern Base Management Support and Information Platform construction project using the development process proposed by the paper.
Neuromorphic Computing – From Materials Research to Systems Architecture Roundtable
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schuller, Ivan K.; Stevens, Rick; Pino, Robinson
2015-10-29
Computation in its many forms is the engine that fuels our modern civilization. Modern computation—based on the von Neumann architecture—has allowed, until now, the development of continuous improvements, as predicted by Moore’s law. However, computation using current architectures and materials will inevitably—within the next 10 years—reach a limit because of fundamental scientific reasons. DOE convened a roundtable of experts in neuromorphic computing systems, materials science, and computer science in Washington on October 29-30, 2015 to address the following basic questions: Can brain-like (“neuromorphic”) computing devices based on new material concepts and systems be developed to dramatically outperform conventional CMOS basedmore » technology? If so, what are the basic research challenges for materials sicence and computing? The overarching answer that emerged was: The development of novel functional materials and devices incorporated into unique architectures will allow a revolutionary technological leap toward the implementation of a fully “neuromorphic” computer. To address this challenge, the following issues were considered: The main differences between neuromorphic and conventional computing as related to: signaling models, timing/clock, non-volatile memory, architecture, fault tolerance, integrated memory and compute, noise tolerance, analog vs. digital, and in situ learning New neuromorphic architectures needed to: produce lower energy consumption, potential novel nanostructured materials, and enhanced computation Device and materials properties needed to implement functions such as: hysteresis, stability, and fault tolerance Comparisons of different implementations: spin torque, memristors, resistive switching, phase change, and optical schemes for enhanced breakthroughs in performance, cost, fault tolerance, and/or manufacturability.« less
A Study of Complex Deep Learning Networks on High Performance, Neuromorphic, and Quantum Computers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Potok, Thomas E; Schuman, Catherine D; Young, Steven R
Current Deep Learning models use highly optimized convolutional neural networks (CNN) trained on large graphical processing units (GPU)-based computers with a fairly simple layered network topology, i.e., highly connected layers, without intra-layer connections. Complex topologies have been proposed, but are intractable to train on current systems. Building the topologies of the deep learning network requires hand tuning, and implementing the network in hardware is expensive in both cost and power. In this paper, we evaluate deep learning models using three different computing architectures to address these problems: quantum computing to train complex topologies, high performance computing (HPC) to automatically determinemore » network topology, and neuromorphic computing for a low-power hardware implementation. Due to input size limitations of current quantum computers we use the MNIST dataset for our evaluation. The results show the possibility of using the three architectures in tandem to explore complex deep learning networks that are untrainable using a von Neumann architecture. We show that a quantum computer can find high quality values of intra-layer connections and weights, while yielding a tractable time result as the complexity of the network increases; a high performance computer can find optimal layer-based topologies; and a neuromorphic computer can represent the complex topology and weights derived from the other architectures in low power memristive hardware. This represents a new capability that is not feasible with current von Neumann architecture. It potentially enables the ability to solve very complicated problems unsolvable with current computing technologies.« less
Image Understanding Architecture
1991-09-01
architecture to support real-time, knowledge -based image understanding , and develop the software support environment that will be needed to utilize...NUMBER OF PAGES Image Understanding Architecture, Knowledge -Based Vision, AI Real-Time Computer Vision, Software Simulator, Parallel Processor IL PRICE... information . In addition to sensory and knowledge -based processing it is useful to introduce a level of symbolic processing. Thus, vision researchers
Efficient Phase Unwrapping Architecture for Digital Holographic Microscopy
Hwang, Wen-Jyi; Cheng, Shih-Chang; Cheng, Chau-Jern
2011-01-01
This paper presents a novel phase unwrapping architecture for accelerating the computational speed of digital holographic microscopy (DHM). A fast Fourier transform (FFT) based phase unwrapping algorithm providing a minimum squared error solution is adopted for hardware implementation because of its simplicity and robustness to noise. The proposed architecture is realized in a pipeline fashion to maximize throughput of the computation. Moreover, the number of hardware multipliers and dividers are minimized to reduce the hardware costs. The proposed architecture is used as a custom user logic in a system on programmable chip (SOPC) for physical performance measurement. Experimental results reveal that the proposed architecture is effective for expediting the computational speed while consuming low hardware resources for designing an embedded DHM system. PMID:22163688
A Project-Based Learning Approach to Programmable Logic Design and Computer Architecture
ERIC Educational Resources Information Center
Kellett, C. M.
2012-01-01
This paper describes a course in programmable logic design and computer architecture as it is taught at the University of Newcastle, Australia. The course is designed around a major design project and has two supplemental assessment tasks that are also described. The context of the Computer Engineering degree program within which the course is…
NASA Astrophysics Data System (ADS)
Dave, Gaurav P.; Sureshkumar, N.; Blessy Trencia Lincy, S. S.
2017-11-01
Current trend in processor manufacturing focuses on multi-core architectures rather than increasing the clock speed for performance improvement. Graphic processors have become as commodity hardware for providing fast co-processing in computer systems. Developments in IoT, social networking web applications, big data created huge demand for data processing activities and such kind of throughput intensive applications inherently contains data level parallelism which is more suited for SIMD architecture based GPU. This paper reviews the architectural aspects of multi/many core processors and graphics processors. Different case studies are taken to compare performance of throughput computing applications using shared memory programming in OpenMP and CUDA API based programming.
Resource Efficient Hardware Architecture for Fast Computation of Running Max/Min Filters
Torres-Huitzil, Cesar
2013-01-01
Running max/min filters on rectangular kernels are widely used in many digital signal and image processing applications. Filtering with a k × k kernel requires of k 2 − 1 comparisons per sample for a direct implementation; thus, performance scales expensively with the kernel size k. Faster computations can be achieved by kernel decomposition and using constant time one-dimensional algorithms on custom hardware. This paper presents a hardware architecture for real-time computation of running max/min filters based on the van Herk/Gil-Werman (HGW) algorithm. The proposed architecture design uses less computation and memory resources than previously reported architectures when targeted to Field Programmable Gate Array (FPGA) devices. Implementation results show that the architecture is able to compute max/min filters, on 1024 × 1024 images with up to 255 × 255 kernels, in around 8.4 milliseconds, 120 frames per second, at a clock frequency of 250 MHz. The implementation is highly scalable for the kernel size with good performance/area tradeoff suitable for embedded applications. The applicability of the architecture is shown for local adaptive image thresholding. PMID:24288456
A Cloud-Based Simulation Architecture for Pandemic Influenza Simulation
Eriksson, Henrik; Raciti, Massimiliano; Basile, Maurizio; Cunsolo, Alessandro; Fröberg, Anders; Leifler, Ola; Ekberg, Joakim; Timpka, Toomas
2011-01-01
High-fidelity simulations of pandemic outbreaks are resource consuming. Cluster-based solutions have been suggested for executing such complex computations. We present a cloud-based simulation architecture that utilizes computing resources both locally available and dynamically rented online. The approach uses the Condor framework for job distribution and management of the Amazon Elastic Computing Cloud (EC2) as well as local resources. The architecture has a web-based user interface that allows users to monitor and control simulation execution. In a benchmark test, the best cost-adjusted performance was recorded for the EC2 H-CPU Medium instance, while a field trial showed that the job configuration had significant influence on the execution time and that the network capacity of the master node could become a bottleneck. We conclude that it is possible to develop a scalable simulation environment that uses cloud-based solutions, while providing an easy-to-use graphical user interface. PMID:22195089
Programmable hardware for reconfigurable computing systems
NASA Astrophysics Data System (ADS)
Smith, Stephen
1996-10-01
In 1945 the work of J. von Neumann and H. Goldstein created the principal architecture for electronic computation that has now lasted fifty years. Nevertheless alternative architectures have been created that have computational capability, for special tasks, far beyond that feasible with von Neumann machines. The emergence of high capacity programmable logic devices has made the realization of these architectures practical. The original ENIAC and EDVAC machines were conceived to solve special mathematical problems that were far from today's concept of 'killer applications.' In a similar vein programmable hardware computation is being used today to solve unique mathematical problems. Our programmable hardware activity is focused on the research and development of novel computational systems based upon the reconfigurability of our programmable logic devices. We explore our programmable logic architectures and their implications for programmable hardware. One programmable hardware board implementation is detailed.
Heterogeneous computing architecture for fast detection of SNP-SNP interactions.
Sluga, Davor; Curk, Tomaz; Zupan, Blaz; Lotric, Uros
2014-06-25
The extent of data in a typical genome-wide association study (GWAS) poses considerable computational challenges to software tools for gene-gene interaction discovery. Exhaustive evaluation of all interactions among hundreds of thousands to millions of single nucleotide polymorphisms (SNPs) may require weeks or even months of computation. Massively parallel hardware within a modern Graphic Processing Unit (GPU) and Many Integrated Core (MIC) coprocessors can shorten the run time considerably. While the utility of GPU-based implementations in bioinformatics has been well studied, MIC architecture has been introduced only recently and may provide a number of comparative advantages that have yet to be explored and tested. We have developed a heterogeneous, GPU and Intel MIC-accelerated software module for SNP-SNP interaction discovery to replace the previously single-threaded computational core in the interactive web-based data exploration program SNPsyn. We report on differences between these two modern massively parallel architectures and their software environments. Their utility resulted in an order of magnitude shorter execution times when compared to the single-threaded CPU implementation. GPU implementation on a single Nvidia Tesla K20 runs twice as fast as that for the MIC architecture-based Xeon Phi P5110 coprocessor, but also requires considerably more programming effort. General purpose GPUs are a mature platform with large amounts of computing power capable of tackling inherently parallel problems, but can prove demanding for the programmer. On the other hand the new MIC architecture, albeit lacking in performance reduces the programming effort and makes it up with a more general architecture suitable for a wider range of problems.
Heterogeneous computing architecture for fast detection of SNP-SNP interactions
2014-01-01
Background The extent of data in a typical genome-wide association study (GWAS) poses considerable computational challenges to software tools for gene-gene interaction discovery. Exhaustive evaluation of all interactions among hundreds of thousands to millions of single nucleotide polymorphisms (SNPs) may require weeks or even months of computation. Massively parallel hardware within a modern Graphic Processing Unit (GPU) and Many Integrated Core (MIC) coprocessors can shorten the run time considerably. While the utility of GPU-based implementations in bioinformatics has been well studied, MIC architecture has been introduced only recently and may provide a number of comparative advantages that have yet to be explored and tested. Results We have developed a heterogeneous, GPU and Intel MIC-accelerated software module for SNP-SNP interaction discovery to replace the previously single-threaded computational core in the interactive web-based data exploration program SNPsyn. We report on differences between these two modern massively parallel architectures and their software environments. Their utility resulted in an order of magnitude shorter execution times when compared to the single-threaded CPU implementation. GPU implementation on a single Nvidia Tesla K20 runs twice as fast as that for the MIC architecture-based Xeon Phi P5110 coprocessor, but also requires considerably more programming effort. Conclusions General purpose GPUs are a mature platform with large amounts of computing power capable of tackling inherently parallel problems, but can prove demanding for the programmer. On the other hand the new MIC architecture, albeit lacking in performance reduces the programming effort and makes it up with a more general architecture suitable for a wider range of problems. PMID:24964802
A computer architecture for intelligent machines
NASA Technical Reports Server (NTRS)
Lefebvre, D. R.; Saridis, G. N.
1991-01-01
The Theory of Intelligent Machines proposes a hierarchical organization for the functions of an autonomous robot based on the Principle of Increasing Precision With Decreasing Intelligence. An analytic formulation of this theory using information-theoretic measures of uncertainty for each level of the intelligent machine has been developed in recent years. A computer architecture that implements the lower two levels of the intelligent machine is presented. The architecture supports an event-driven programming paradigm that is independent of the underlying computer architecture and operating system. Details of Execution Level controllers for motion and vision systems are addressed, as well as the Petri net transducer software used to implement Coordination Level functions. Extensions to UNIX and VxWorks operating systems which enable the development of a heterogeneous, distributed application are described. A case study illustrates how this computer architecture integrates real-time and higher-level control of manipulator and vision systems.
A learnable parallel processing architecture towards unity of memory and computing
NASA Astrophysics Data System (ADS)
Li, H.; Gao, B.; Chen, Z.; Zhao, Y.; Huang, P.; Ye, H.; Liu, L.; Liu, X.; Kang, J.
2015-08-01
Developing energy-efficient parallel information processing systems beyond von Neumann architecture is a long-standing goal of modern information technologies. The widely used von Neumann computer architecture separates memory and computing units, which leads to energy-hungry data movement when computers work. In order to meet the need of efficient information processing for the data-driven applications such as big data and Internet of Things, an energy-efficient processing architecture beyond von Neumann is critical for the information society. Here we show a non-von Neumann architecture built of resistive switching (RS) devices named “iMemComp”, where memory and logic are unified with single-type devices. Leveraging nonvolatile nature and structural parallelism of crossbar RS arrays, we have equipped “iMemComp” with capabilities of computing in parallel and learning user-defined logic functions for large-scale information processing tasks. Such architecture eliminates the energy-hungry data movement in von Neumann computers. Compared with contemporary silicon technology, adder circuits based on “iMemComp” can improve the speed by 76.8% and the power dissipation by 60.3%, together with a 700 times aggressive reduction in the circuit area.
A learnable parallel processing architecture towards unity of memory and computing.
Li, H; Gao, B; Chen, Z; Zhao, Y; Huang, P; Ye, H; Liu, L; Liu, X; Kang, J
2015-08-14
Developing energy-efficient parallel information processing systems beyond von Neumann architecture is a long-standing goal of modern information technologies. The widely used von Neumann computer architecture separates memory and computing units, which leads to energy-hungry data movement when computers work. In order to meet the need of efficient information processing for the data-driven applications such as big data and Internet of Things, an energy-efficient processing architecture beyond von Neumann is critical for the information society. Here we show a non-von Neumann architecture built of resistive switching (RS) devices named "iMemComp", where memory and logic are unified with single-type devices. Leveraging nonvolatile nature and structural parallelism of crossbar RS arrays, we have equipped "iMemComp" with capabilities of computing in parallel and learning user-defined logic functions for large-scale information processing tasks. Such architecture eliminates the energy-hungry data movement in von Neumann computers. Compared with contemporary silicon technology, adder circuits based on "iMemComp" can improve the speed by 76.8% and the power dissipation by 60.3%, together with a 700 times aggressive reduction in the circuit area.
FPGA Implementation of Generalized Hebbian Algorithm for Texture Classification
Lin, Shiow-Jyu; Hwang, Wen-Jyi; Lee, Wei-Hao
2012-01-01
This paper presents a novel hardware architecture for principal component analysis. The architecture is based on the Generalized Hebbian Algorithm (GHA) because of its simplicity and effectiveness. The architecture is separated into three portions: the weight vector updating unit, the principal computation unit and the memory unit. In the weight vector updating unit, the computation of different synaptic weight vectors shares the same circuit for reducing the area costs. To show the effectiveness of the circuit, a texture classification system based on the proposed architecture is physically implemented by Field Programmable Gate Array (FPGA). It is embedded in a System-On-Programmable-Chip (SOPC) platform for performance measurement. Experimental results show that the proposed architecture is an efficient design for attaining both high speed performance and low area costs. PMID:22778640
Pyramidal neurovision architecture for vision machines
NASA Astrophysics Data System (ADS)
Gupta, Madan M.; Knopf, George K.
1993-08-01
The vision system employed by an intelligent robot must be active; active in the sense that it must be capable of selectively acquiring the minimal amount of relevant information for a given task. An efficient active vision system architecture that is based loosely upon the parallel-hierarchical (pyramidal) structure of the biological visual pathway is presented in this paper. Although the computational architecture of the proposed pyramidal neuro-vision system is far less sophisticated than the architecture of the biological visual pathway, it does retain some essential features such as the converging multilayered structure of its biological counterpart. In terms of visual information processing, the neuro-vision system is constructed from a hierarchy of several interactive computational levels, whereupon each level contains one or more nonlinear parallel processors. Computationally efficient vision machines can be developed by utilizing both the parallel and serial information processing techniques within the pyramidal computing architecture. A computer simulation of a pyramidal vision system for active scene surveillance is presented.
NASA Astrophysics Data System (ADS)
Romanchuk, V. A.; Lukashenko, V. V.
2018-05-01
The technique of functioning of a control system by a computing cluster based on neurocomputers is proposed. Particular attention is paid to the method of choosing the structure of the computing cluster due to the fact that the existing methods are not effective because of a specialized hardware base - neurocomputers, which are highly parallel computer devices with an architecture different from the von Neumann architecture. A developed algorithm for choosing the computational structure of a cloud cluster is described, starting from the direction of data transfer in the flow control graph of the program and its adjacency matrix.
NASA Technical Reports Server (NTRS)
Torres-Pomales, Wilfredo
2014-01-01
This report presents an example of the application of multi-criteria decision analysis to the selection of an architecture for a safety-critical distributed computer system. The design problem includes constraints on minimum system availability and integrity, and the decision is based on the optimal balance of power, weight and cost. The analysis process includes the generation of alternative architectures, evaluation of individual decision criteria, and the selection of an alternative based on overall value. In this example presented here, iterative application of the quantitative evaluation process made it possible to deliberately generate an alternative architecture that is superior to all others regardless of the relative importance of cost.
NASA Astrophysics Data System (ADS)
Liu, Lei; Hong, Xiaobin; Wu, Jian; Lin, Jintong
As Grid computing continues to gain popularity in the industry and research community, it also attracts more attention from the customer level. The large number of users and high frequency of job requests in the consumer market make it challenging. Clearly, all the current Client/Server(C/S)-based architecture will become unfeasible for supporting large-scale Grid applications due to its poor scalability and poor fault-tolerance. In this paper, based on our previous works [1, 2], a novel self-organized architecture to realize a highly scalable and flexible platform for Grids is proposed. Experimental results show that this architecture is suitable and efficient for consumer-oriented Grids.
The Role of Sketch in Architecture Design
NASA Astrophysics Data System (ADS)
Li, Yanjin; Ning, Wen
2017-06-01
With the continuous development of computer technology, we rely more and more on the computer and pay more and more attention to the final design results, so that we ignore the importance of the sketch. However, the sketch is the most basic and effective way of architecture design. Based on the study of the sketch of Tjibao Cultural Center of sketch, the paper explores the role of sketch in architecture design .
NASA Astrophysics Data System (ADS)
Acernese, Fausto; Barone, Fabrizio; De Rosa, Rosario; Eleuteri, Antonio; Milano, Leopoldo; Pardi, Silvio; Ricciardi, Iolanda; Russo, Guido
2004-09-01
One of the main requirements of a digital system for the control of interferometric detectors of gravitational waves is the computing power, that is a direct consequence of the increasing complexity of the digital algorithms necessary for the control signals generation. For this specific task many specialized non standard real-time architectures have been developed, often very expensive and difficult to upgrade. On the other hand, such computing power is generally fully available for off-line applications on standard Pc based systems. Therefore, a possible and obvious solution may be provided by the integration of both the real-time and off-line architecture resulting in a hybrid control system architecture based on standards available components, trying to get both the advantages of the perfect data synchronization provided by the real-time systems and by the large computing power available on Pc based systems. Such integration may be provided by the implementation of the link between the two different architectures through the standard Ethernet network, whose data transfer speed is largely increasing in these years, using the TCP/IP, UDP and raw Ethernet protocols. In this paper we describe the architecture of an hybrid Ethernet based real-time control system prototype we implemented in Napoli, discussing its characteristics and performances. Finally we discuss a possible application to the real-time control of a suspended mass of the mode cleaner of the 3m prototype optical interferometer for gravitational wave detection (IDGW-3P) operational in Napoli.
Chessa, Manuela; Bianchi, Valentina; Zampetti, Massimo; Sabatini, Silvio P; Solari, Fabio
2012-01-01
The intrinsic parallelism of visual neural architectures based on distributed hierarchical layers is well suited to be implemented on the multi-core architectures of modern graphics cards. The design strategies that allow us to optimally take advantage of such parallelism, in order to efficiently map on GPU the hierarchy of layers and the canonical neural computations, are proposed. Specifically, the advantages of a cortical map-like representation of the data are exploited. Moreover, a GPU implementation of a novel neural architecture for the computation of binocular disparity from stereo image pairs, based on populations of binocular energy neurons, is presented. The implemented neural model achieves good performances in terms of reliability of the disparity estimates and a near real-time execution speed, thus demonstrating the effectiveness of the devised design strategies. The proposed approach is valid in general, since the neural building blocks we implemented are a common basis for the modeling of visual neural functionalities.
Benchmarking hardware architecture candidates for the NFIRAOS real-time controller
NASA Astrophysics Data System (ADS)
Smith, Malcolm; Kerley, Dan; Herriot, Glen; Véran, Jean-Pierre
2014-07-01
As a part of the trade study for the Narrow Field Infrared Adaptive Optics System, the adaptive optics system for the Thirty Meter Telescope, we investigated the feasibility of performing real-time control computation using a Linux operating system and Intel Xeon E5 CPUs. We also investigated a Xeon Phi based architecture which allows higher levels of parallelism. This paper summarizes both the CPU based real-time controller architecture and the Xeon Phi based RTC. The Intel Xeon E5 CPU solution meets the requirements and performs the computation for one AO cycle in an average of 767 microseconds. The Xeon Phi solution did not meet the 1200 microsecond time requirement and also suffered from unpredictable execution times. More detailed benchmark results are reported for both architectures.
Generic Divide and Conquer Internet-Based Computing
NASA Technical Reports Server (NTRS)
Follen, Gregory J. (Technical Monitor); Radenski, Atanas
2003-01-01
The growth of Internet-based applications and the proliferation of networking technologies have been transforming traditional commercial application areas as well as computer and computational sciences and engineering. This growth stimulates the exploration of Peer to Peer (P2P) software technologies that can open new research and application opportunities not only for the commercial world, but also for the scientific and high-performance computing applications community. The general goal of this project is to achieve better understanding of the transition to Internet-based high-performance computing and to develop solutions for some of the technical challenges of this transition. In particular, we are interested in creating long-term motivation for end users to provide their idle processor time to support computationally intensive tasks. We believe that a practical P2P architecture should provide useful service to both clients with high-performance computing needs and contributors of lower-end computing resources. To achieve this, we are designing dual -service architecture for P2P high-performance divide-and conquer computing; we are also experimenting with a prototype implementation. Our proposed architecture incorporates a master server, utilizes dual satellite servers, and operates on the Internet in a dynamically changing large configuration of lower-end nodes provided by volunteer contributors. A dual satellite server comprises a high-performance computing engine and a lower-end contributor service engine. The computing engine provides generic support for divide and conquer computations. The service engine is intended to provide free useful HTTP-based services to contributors of lower-end computing resources. Our proposed architecture is complementary to and accessible from computational grids, such as Globus, Legion, and Condor. Grids provide remote access to existing higher-end computing resources; in contrast, our goal is to utilize idle processor time of lower-end Internet nodes. Our project is focused on a generic divide and conquer paradigm and on mobile applications of this paradigm that can operate on a loose and ever changing pool of lower-end Internet nodes.
Evaluation of Cache-based Superscalar and Cacheless Vector Architectures for Scientific Computations
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Carter, Jonathan; Shalf, John; Skinner, David; Ethier, Stephane; Biswas, Rupak; Djomehri, Jahed; VanderWijngaart, Rob
2003-01-01
The growing gap between sustained and peak performance for scientific applications has become a well-known problem in high performance computing. The recent development of parallel vector systems offers the potential to bridge this gap for a significant number of computational science codes and deliver a substantial increase in computing capabilities. This paper examines the intranode performance of the NEC SX6 vector processor and the cache-based IBM Power3/4 superscalar architectures across a number of key scientific computing areas. First, we present the performance of a microbenchmark suite that examines a full spectrum of low-level machine characteristics. Next, we study the behavior of the NAS Parallel Benchmarks using some simple optimizations. Finally, we evaluate the perfor- mance of several numerical codes from key scientific computing domains. Overall results demonstrate that the SX6 achieves high performance on a large fraction of our application suite and in many cases significantly outperforms the RISC-based architectures. However, certain classes of applications are not easily amenable to vectorization and would likely require extensive reengineering of both algorithm and implementation to utilize the SX6 effectively.
Three Program Architecture for Design Optimization
NASA Technical Reports Server (NTRS)
Miura, Hirokazu; Olson, Lawrence E. (Technical Monitor)
1998-01-01
In this presentation, I would like to review historical perspective on the program architecture used to build design optimization capabilities based on mathematical programming and other numerical search techniques. It is rather straightforward to classify the program architecture in three categories as shown above. However, the relative importance of each of the three approaches has not been static, instead dynamically changing as the capabilities of available computational resource increases. For example, we considered that the direct coupling architecture would never be used for practical problems, but availability of such computer systems as multi-processor. In this presentation, I would like to review the roles of three architecture from historical as well as current and future perspective. There may also be some possibility for emergence of hybrid architecture. I hope to provide some seeds for active discussion where we are heading to in the very dynamic environment for high speed computing and communication.
Motion camera based on a custom vision sensor and an FPGA architecture
NASA Astrophysics Data System (ADS)
Arias-Estrada, Miguel
1998-09-01
A digital camera for custom focal plane arrays was developed. The camera allows the test and development of analog or mixed-mode arrays for focal plane processing. The camera is used with a custom sensor for motion detection to implement a motion computation system. The custom focal plane sensor detects moving edges at the pixel level using analog VLSI techniques. The sensor communicates motion events using the event-address protocol associated to a temporal reference. In a second stage, a coprocessing architecture based on a field programmable gate array (FPGA) computes the time-of-travel between adjacent pixels. The FPGA allows rapid prototyping and flexible architecture development. Furthermore, the FPGA interfaces the sensor to a compact PC computer which is used for high level control and data communication to the local network. The camera could be used in applications such as self-guided vehicles, mobile robotics and smart surveillance systems. The programmability of the FPGA allows the exploration of further signal processing like spatial edge detection or image segmentation tasks. The article details the motion algorithm, the sensor architecture, the use of the event- address protocol for velocity vector computation and the FPGA architecture used in the motion camera system.
NASA Astrophysics Data System (ADS)
Yang, Wei; Hall, Trevor J.
2013-12-01
The Internet is entering an era of cloud computing to provide more cost effective, eco-friendly and reliable services to consumer and business users. As a consequence, the nature of the Internet traffic has been fundamentally transformed from a pure packet-based pattern to today's predominantly flow-based pattern. Cloud computing has also brought about an unprecedented growth in the Internet traffic. In this paper, a hybrid optical switch architecture is presented to deal with the flow-based Internet traffic, aiming to offer flexible and intelligent bandwidth on demand to improve fiber capacity utilization. The hybrid optical switch is capable of integrating IP into optical networks for cloud-based traffic with predictable performance, for which the delay performance of the electronic module in the hybrid optical switch architecture is evaluated through simulation.
Gschwind, Michael K
2013-04-16
Mechanisms for generating and executing programs for a floating point (FP) only single instruction multiple data (SIMD) instruction set architecture (ISA) are provided. A computer program product comprising a computer recordable medium having a computer readable program recorded thereon is provided. The computer readable program, when executed on a computing device, causes the computing device to receive one or more instructions and execute the one or more instructions using logic in an execution unit of the computing device. The logic implements a floating point (FP) only single instruction multiple data (SIMD) instruction set architecture (ISA), based on data stored in a vector register file of the computing device. The vector register file is configured to store both scalar and floating point values as vectors having a plurality of vector elements.
FPGA-based real-time phase measuring profilometry algorithm design and implementation
NASA Astrophysics Data System (ADS)
Zhan, Guomin; Tang, Hongwei; Zhong, Kai; Li, Zhongwei; Shi, Yusheng
2016-11-01
Phase measuring profilometry (PMP) has been widely used in many fields, like Computer Aided Verification (CAV), Flexible Manufacturing System (FMS) et al. High frame-rate (HFR) real-time vision-based feedback control will be a common demands in near future. However, the instruction time delay in the computer caused by numerous repetitive operations greatly limit the efficiency of data processing. FPGA has the advantages of pipeline architecture and parallel execution, and it fit for handling PMP algorithm. In this paper, we design a fully pipelined hardware architecture for PMP. The functions of hardware architecture includes rectification, phase calculation, phase shifting, and stereo matching. The experiment verified the performance of this method, and the factors that may influence the computation accuracy was analyzed.
Computational structures for robotic computations
NASA Technical Reports Server (NTRS)
Lee, C. S. G.; Chang, P. R.
1987-01-01
The computational problem of inverse kinematics and inverse dynamics of robot manipulators by taking advantage of parallelism and pipelining architectures is discussed. For the computation of inverse kinematic position solution, a maximum pipelined CORDIC architecture has been designed based on a functional decomposition of the closed-form joint equations. For the inverse dynamics computation, an efficient p-fold parallel algorithm to overcome the recurrence problem of the Newton-Euler equations of motion to achieve the time lower bound of O(log sub 2 n) has also been developed.
NASA Astrophysics Data System (ADS)
Zaveri, Mazad Shaheriar
The semiconductor/computer industry has been following Moore's law for several decades and has reaped the benefits in speed and density of the resultant scaling. Transistor density has reached almost one billion per chip, and transistor delays are in picoseconds. However, scaling has slowed down, and the semiconductor industry is now facing several challenges. Hybrid CMOS/nano technologies, such as CMOL, are considered as an interim solution to some of the challenges. Another potential architectural solution includes specialized architectures for applications/models in the intelligent computing domain, one aspect of which includes abstract computational models inspired from the neuro/cognitive sciences. Consequently in this dissertation, we focus on the hardware implementations of Bayesian Memory (BM), which is a (Bayesian) Biologically Inspired Computational Model (BICM). This model is a simplified version of George and Hawkins' model of the visual cortex, which includes an inference framework based on Judea Pearl's belief propagation. We then present a "hardware design space exploration" methodology for implementing and analyzing the (digital and mixed-signal) hardware for the BM. This particular methodology involves: analyzing the computational/operational cost and the related micro-architecture, exploring candidate hardware components, proposing various custom hardware architectures using both traditional CMOS and hybrid nanotechnology - CMOL, and investigating the baseline performance/price of these architectures. The results suggest that CMOL is a promising candidate for implementing a BM. Such implementations can utilize the very high density storage/computation benefits of these new nano-scale technologies much more efficiently; for example, the throughput per 858 mm2 (TPM) obtained for CMOL based architectures is 32 to 40 times better than the TPM for a CMOS based multiprocessor/multi-FPGA system, and almost 2000 times better than the TPM for a PC implementation. We later use this methodology to investigate the hardware implementations of cortex-scale spiking neural system, which is an approximate neural equivalent of BICM based cortex-scale system. The results of this investigation also suggest that CMOL is a promising candidate to implement such large-scale neuromorphic systems. In general, the assessment of such hypothetical baseline hardware architectures provides the prospects for building large-scale (mammalian cortex-scale) implementations of neuromorphic/Bayesian/intelligent systems using state-of-the-art and beyond state-of-the-art silicon structures.
NASA Astrophysics Data System (ADS)
Xiao, Jian; Zhang, Mingqiang; Tian, Haiping; Huang, Bo; Fu, Wenlong
2018-02-01
In this paper, a novel prognostics and health management system architecture for hydropower plant equipment was proposed based on fog computing and Docker container. We employed the fog node to improve the real-time processing ability of improving the cloud architecture-based prognostics and health management system and overcome the problems of long delay time, network congestion and so on. Then Storm-based stream processing of fog node was present and could calculate the health index in the edge of network. Moreover, the distributed micros-service and Docker container architecture of hydropower plants equipment prognostics and health management was also proposed. Using the micro service architecture proposed in this paper, the hydropower unit can achieve the goal of the business intercommunication and seamless integration of different equipment and different manufacturers. Finally a real application case is given in this paper.
NASA Astrophysics Data System (ADS)
Tramm, John R.; Gunow, Geoffrey; He, Tim; Smith, Kord S.; Forget, Benoit; Siegel, Andrew R.
2016-05-01
In this study we present and analyze a formulation of the 3D Method of Characteristics (MOC) technique applied to the simulation of full core nuclear reactors. Key features of the algorithm include a task-based parallelism model that allows independent MOC tracks to be assigned to threads dynamically, ensuring load balancing, and a wide vectorizable inner loop that takes advantage of modern SIMD computer architectures. The algorithm is implemented in a set of highly optimized proxy applications in order to investigate its performance characteristics on CPU, GPU, and Intel Xeon Phi architectures. Speed, power, and hardware cost efficiencies are compared. Additionally, performance bottlenecks are identified for each architecture in order to determine the prospects for continued scalability of the algorithm on next generation HPC architectures.
Architectural Implications of Cloud Computing
2011-10-24
Public Cloud Infrastructure-as-a- Service (IaaS) Software -as-a- Service ( SaaS ) Cloud Computing Types Platform-as-a- Service (PaaS) Based on Type of...Twitter #SEIVirtualForum © 2011 Carnegie Mellon University Software -as-a- Service ( SaaS ) Model of software deployment in which a third-party...and System Solutions (RTSS) Program. Her current interests and projects are in service -oriented architecture (SOA), cloud computing, and context
Concepts and Relations in Neurally Inspired In Situ Concept-Based Computing
van der Velde, Frank
2016-01-01
In situ concept-based computing is based on the notion that conceptual representations in the human brain are “in situ.” In this way, they are grounded in perception and action. Examples are neuronal assemblies, whose connection structures develop over time and are distributed over different brain areas. In situ concepts representations cannot be copied or duplicated because that will disrupt their connection structure, and thus the meaning of these concepts. Higher-level cognitive processes, as found in language and reasoning, can be performed with in situ concepts by embedding them in specialized neurally inspired “blackboards.” The interactions between the in situ concepts and the blackboards form the basis for in situ concept computing architectures. In these architectures, memory (concepts) and processing are interwoven, in contrast with the separation between memory and processing found in Von Neumann architectures. Because the further development of Von Neumann computing (more, faster, yet power limited) is questionable, in situ concept computing might be an alternative for concept-based computing. In situ concept computing will be illustrated with a recently developed BABI reasoning task. Neurorobotics can play an important role in the development of in situ concept computing because of the development of in situ concept representations derived in scenarios as needed for reasoning tasks. Neurorobotics would also benefit from power limited and in situ concept computing. PMID:27242504
Concepts and Relations in Neurally Inspired In Situ Concept-Based Computing.
van der Velde, Frank
2016-01-01
In situ concept-based computing is based on the notion that conceptual representations in the human brain are "in situ." In this way, they are grounded in perception and action. Examples are neuronal assemblies, whose connection structures develop over time and are distributed over different brain areas. In situ concepts representations cannot be copied or duplicated because that will disrupt their connection structure, and thus the meaning of these concepts. Higher-level cognitive processes, as found in language and reasoning, can be performed with in situ concepts by embedding them in specialized neurally inspired "blackboards." The interactions between the in situ concepts and the blackboards form the basis for in situ concept computing architectures. In these architectures, memory (concepts) and processing are interwoven, in contrast with the separation between memory and processing found in Von Neumann architectures. Because the further development of Von Neumann computing (more, faster, yet power limited) is questionable, in situ concept computing might be an alternative for concept-based computing. In situ concept computing will be illustrated with a recently developed BABI reasoning task. Neurorobotics can play an important role in the development of in situ concept computing because of the development of in situ concept representations derived in scenarios as needed for reasoning tasks. Neurorobotics would also benefit from power limited and in situ concept computing.
Using Multimedia for Teaching Analysis in History of Modern Architecture.
ERIC Educational Resources Information Center
Perryman, Garry
This paper presents a case for the development and support of a computer-based interactive multimedia program for teaching analysis in community college architecture design programs. Analysis in architecture design is an extremely important strategy for the teaching of higher-order thinking skills, which senior schools of architecture look for in…
Architecture independent environment for developing engineering software on MIMD computers
NASA Technical Reports Server (NTRS)
Valimohamed, Karim A.; Lopez, L. A.
1990-01-01
Engineers are constantly faced with solving problems of increasing complexity and detail. Multiple Instruction stream Multiple Data stream (MIMD) computers have been developed to overcome the performance limitations of serial computers. The hardware architectures of MIMD computers vary considerably and are much more sophisticated than serial computers. Developing large scale software for a variety of MIMD computers is difficult and expensive. There is a need to provide tools that facilitate programming these machines. First, the issues that must be considered to develop those tools are examined. The two main areas of concern were architecture independence and data management. Architecture independent software facilitates software portability and improves the longevity and utility of the software product. It provides some form of insurance for the investment of time and effort that goes into developing the software. The management of data is a crucial aspect of solving large engineering problems. It must be considered in light of the new hardware organizations that are available. Second, the functional design and implementation of a software environment that facilitates developing architecture independent software for large engineering applications are described. The topics of discussion include: a description of the model that supports the development of architecture independent software; identifying and exploiting concurrency within the application program; data coherence; engineering data base and memory management.
Layered Architectures for Quantum Computers and Quantum Repeaters
NASA Astrophysics Data System (ADS)
Jones, Nathan C.
This chapter examines how to organize quantum computers and repeaters using a systematic framework known as layered architecture, where machine control is organized in layers associated with specialized tasks. The framework is flexible and could be used for analysis and comparison of quantum information systems. To demonstrate the design principles in practice, we develop architectures for quantum computers and quantum repeaters based on optically controlled quantum dots, showing how a myriad of technologies must operate synchronously to achieve fault-tolerance. Optical control makes information processing in this system very fast, scalable to large problem sizes, and extendable to quantum communication.
Neural simulations on multi-core architectures.
Eichner, Hubert; Klug, Tobias; Borst, Alexander
2009-01-01
Neuroscience is witnessing increasing knowledge about the anatomy and electrophysiological properties of neurons and their connectivity, leading to an ever increasing computational complexity of neural simulations. At the same time, a rather radical change in personal computer technology emerges with the establishment of multi-cores: high-density, explicitly parallel processor architectures for both high performance as well as standard desktop computers. This work introduces strategies for the parallelization of biophysically realistic neural simulations based on the compartmental modeling technique and results of such an implementation, with a strong focus on multi-core architectures and automation, i.e. user-transparent load balancing.
Neural Simulations on Multi-Core Architectures
Eichner, Hubert; Klug, Tobias; Borst, Alexander
2009-01-01
Neuroscience is witnessing increasing knowledge about the anatomy and electrophysiological properties of neurons and their connectivity, leading to an ever increasing computational complexity of neural simulations. At the same time, a rather radical change in personal computer technology emerges with the establishment of multi-cores: high-density, explicitly parallel processor architectures for both high performance as well as standard desktop computers. This work introduces strategies for the parallelization of biophysically realistic neural simulations based on the compartmental modeling technique and results of such an implementation, with a strong focus on multi-core architectures and automation, i.e. user-transparent load balancing. PMID:19636393
Yoo, Dongjin
2012-07-01
Advanced additive manufacture (AM) techniques are now being developed to fabricate scaffolds with controlled internal pore architectures in the field of tissue engineering. In general, these techniques use a hybrid method which combines computer-aided design (CAD) with computer-aided manufacturing (CAM) tools to design and fabricate complicated three-dimensional (3D) scaffold models. The mathematical descriptions of micro-architectures along with the macro-structures of the 3D scaffold models are limited by current CAD technologies as well as by the difficulty of transferring the designed digital models to standard formats for fabrication. To overcome these difficulties, we have developed an efficient internal pore architecture design system based on triply periodic minimal surface (TPMS) unit cell libraries and associated computational methods to assemble TPMS unit cells into an entire scaffold model. In addition, we have developed a process planning technique based on TPMS internal architecture pattern of unit cells to generate tool paths for freeform fabrication of tissue engineering porous scaffolds. Copyright © 2012 IPEM. Published by Elsevier Ltd. All rights reserved.
FPGA-based architecture for motion recovering in real-time
NASA Astrophysics Data System (ADS)
Arias-Estrada, Miguel; Maya-Rueda, Selene E.; Torres-Huitzil, Cesar
2002-03-01
A key problem in the computer vision field is the measurement of object motion in a scene. The main goal is to compute an approximation of the 3D motion from the analysis of an image sequence. Once computed, this information can be used as a basis to reach higher level goals in different applications. Motion estimation algorithms pose a significant computational load for the sequential processors limiting its use in practical applications. In this work we propose a hardware architecture for motion estimation in real time based on FPGA technology. The technique used for motion estimation is Optical Flow due to its accuracy, and the density of velocity estimation, however other techniques are being explored. The architecture is composed of parallel modules working in a pipeline scheme to reach high throughput rates near gigaflops. The modules are organized in a regular structure to provide a high degree of flexibility to cover different applications. Some results will be presented and the real-time performance will be discussed and analyzed. The architecture is prototyped in an FPGA board with a Virtex device interfaced to a digital imager.
Computer architecture evaluation for structural dynamics computations: Project summary
NASA Technical Reports Server (NTRS)
Standley, Hilda M.
1989-01-01
The intent of the proposed effort is the examination of the impact of the elements of parallel architectures on the performance realized in a parallel computation. To this end, three major projects are developed: a language for the expression of high level parallelism, a statistical technique for the synthesis of multicomputer interconnection networks based upon performance prediction, and a queueing model for the analysis of shared memory hierarchies.
Performance Analysis of Distributed Object-Oriented Applications
NASA Technical Reports Server (NTRS)
Schoeffler, James D.
1998-01-01
The purpose of this research was to evaluate the efficiency of a distributed simulation architecture which creates individual modules which are made self-scheduling through the use of a message-based communication system used for requesting input data from another module which is the source of that data. To make the architecture as general as possible, the message-based communication architecture was implemented using standard remote object architectures (Common Object Request Broker Architecture (CORBA) and/or Distributed Component Object Model (DCOM)). A series of experiments were run in which different systems are distributed in a variety of ways across multiple computers and the performance evaluated. The experiments were duplicated in each case so that the overhead due to message communication and data transmission can be separated from the time required to actually perform the computational update of a module each iteration. The software used to distribute the modules across multiple computers was developed in the first year of the current grant and was modified considerably to add a message-based communication scheme supported by the DCOM distributed object architecture. The resulting performance was analyzed using a model created during the first year of this grant which predicts the overhead due to CORBA and DCOM remote procedure calls and includes the effects of data passed to and from the remote objects. A report covering the distributed simulation software and the results of the performance experiments has been submitted separately. The above report also discusses possible future work to apply the methodology to dynamically distribute the simulation modules so as to minimize overall computation time.
Silicon CMOS architecture for a spin-based quantum computer.
Veldhorst, M; Eenink, H G J; Yang, C H; Dzurak, A S
2017-12-15
Recent advances in quantum error correction codes for fault-tolerant quantum computing and physical realizations of high-fidelity qubits in multiple platforms give promise for the construction of a quantum computer based on millions of interacting qubits. However, the classical-quantum interface remains a nascent field of exploration. Here, we propose an architecture for a silicon-based quantum computer processor based on complementary metal-oxide-semiconductor (CMOS) technology. We show how a transistor-based control circuit together with charge-storage electrodes can be used to operate a dense and scalable two-dimensional qubit system. The qubits are defined by the spin state of a single electron confined in quantum dots, coupled via exchange interactions, controlled using a microwave cavity, and measured via gate-based dispersive readout. We implement a spin qubit surface code, showing the prospects for universal quantum computation. We discuss the challenges and focus areas that need to be addressed, providing a path for large-scale quantum computing.
The future of computing--new architectures and new technologies.
Warren, P
2004-02-01
All modern computers are designed using the 'von Neumann' architecture and built using silicon transistor technology. Both architecture and technology have been remarkably successful. Yet there are a range of problems for which this conventional architecture is not particularly well adapted, and new architectures are being proposed to solve these problems, in particular based on insight from nature. Transistor technology has enjoyed 50 years of continuing progress. However, the laws of physics dictate that within a relatively short time period this progress will come to an end. New technologies, based on molecular and biological sciences as well as quantum physics, are vying to replace silicon, or at least coexist with it and extend its capability. The paper describes these novel architectures and technologies, places them in the context of the kinds of problems they might help to solve, and predicts their possible manner and time of adoption. Finally it describes some key questions and research problems associated with their use.
Optical Computing Based on Neuronal Models
1988-05-01
walking, and cognition are far too complex for existing sequential digital computers. Therefore new architectures, hardware, and algorithms modeled...collective behavior, and iterative processing into optical processing and artificial neurodynamical systems. Another intriguing promise of neural nets is...with architectures, implementations, and programming; and material research s -7- called for. Our future research in neurodynamics will continue to
NASA Astrophysics Data System (ADS)
Titov, A. G.; Okladnikov, I. G.; Gordov, E. P.
2017-11-01
The use of large geospatial datasets in climate change studies requires the development of a set of Spatial Data Infrastructure (SDI) elements, including geoprocessing and cartographical visualization web services. This paper presents the architecture of a geospatial OGC web service system as an integral part of a virtual research environment (VRE) general architecture for statistical processing and visualization of meteorological and climatic data. The architecture is a set of interconnected standalone SDI nodes with corresponding data storage systems. Each node runs a specialized software, such as a geoportal, cartographical web services (WMS/WFS), a metadata catalog, and a MySQL database of technical metadata describing geospatial datasets available for the node. It also contains geospatial data processing services (WPS) based on a modular computing backend realizing statistical processing functionality and, thus, providing analysis of large datasets with the results of visualization and export into files of standard formats (XML, binary, etc.). Some cartographical web services have been developed in a system’s prototype to provide capabilities to work with raster and vector geospatial data based on OGC web services. The distributed architecture presented allows easy addition of new nodes, computing and data storage systems, and provides a solid computational infrastructure for regional climate change studies based on modern Web and GIS technologies.
NASA Astrophysics Data System (ADS)
Ragan-Kelley, M.; Perez, F.; Granger, B.; Kluyver, T.; Ivanov, P.; Frederic, J.; Bussonnier, M.
2014-12-01
IPython has provided terminal-based tools for interactive computing in Python since 2001. The notebook document format and multi-process architecture introduced in 2011 have expanded the applicable scope of IPython into teaching, presenting, and sharing computational work, in addition to interactive exploration. The new architecture also allows users to work in any language, with implementations in Python, R, Julia, Haskell, and several other languages. The language agnostic parts of IPython have been renamed to Jupyter, to better capture the notion that a cross-language design can encapsulate commonalities present in computational research regardless of the programming language being used. This architecture offers components like the web-based Notebook interface, that supports rich documents that combine code and computational results with text narratives, mathematics, images, video and any media that a modern browser can display. This interface can be used not only in research, but also for publication and education, as notebooks can be converted to a variety of output formats, including HTML and PDF. Recent developments in the Jupyter project include a multi-user environment for hosting notebooks for a class or research group, a live collaboration notebook via Google Docs, and better support for languages other than Python.
NASA Technical Reports Server (NTRS)
Carroll, Chester C.; Youngblood, John N.; Saha, Aindam
1987-01-01
Improvements and advances in the development of computer architecture now provide innovative technology for the recasting of traditional sequential solutions into high-performance, low-cost, parallel system to increase system performance. Research conducted in development of specialized computer architecture for the algorithmic execution of an avionics system, guidance and control problem in real time is described. A comprehensive treatment of both the hardware and software structures of a customized computer which performs real-time computation of guidance commands with updated estimates of target motion and time-to-go is presented. An optimal, real-time allocation algorithm was developed which maps the algorithmic tasks onto the processing elements. This allocation is based on the critical path analysis. The final stage is the design and development of the hardware structures suitable for the efficient execution of the allocated task graph. The processing element is designed for rapid execution of the allocated tasks. Fault tolerance is a key feature of the overall architecture. Parallel numerical integration techniques, tasks definitions, and allocation algorithms are discussed. The parallel implementation is analytically verified and the experimental results are presented. The design of the data-driven computer architecture, customized for the execution of the particular algorithm, is discussed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carroll, C.C.; Youngblood, J.N.; Saha, A.
1987-12-01
Improvements and advances in the development of computer architecture now provide innovative technology for the recasting of traditional sequential solutions into high-performance, low-cost, parallel system to increase system performance. Research conducted in development of specialized computer architecture for the algorithmic execution of an avionics system, guidance and control problem in real time is described. A comprehensive treatment of both the hardware and software structures of a customized computer which performs real-time computation of guidance commands with updated estimates of target motion and time-to-go is presented. An optimal, real-time allocation algorithm was developed which maps the algorithmic tasks onto the processingmore » elements. This allocation is based on the critical path analysis. The final stage is the design and development of the hardware structures suitable for the efficient execution of the allocated task graph. The processing element is designed for rapid execution of the allocated tasks. Fault tolerance is a key feature of the overall architecture. Parallel numerical integration techniques, tasks definitions, and allocation algorithms are discussed. The parallel implementation is analytically verified and the experimental results are presented. The design of the data-driven computer architecture, customized for the execution of the particular algorithm, is discussed.« less
NASA Technical Reports Server (NTRS)
Smith, Paul H.
1988-01-01
The Computer Science Program provides advanced concepts, techniques, system architectures, algorithms, and software for both space and aeronautics information sciences and computer systems. The overall goal is to provide the technical foundation within NASA for the advancement of computing technology in aerospace applications. The research program is improving the state of knowledge of fundamental aerospace computing principles and advancing computing technology in space applications such as software engineering and information extraction from data collected by scientific instruments in space. The program includes the development of special algorithms and techniques to exploit the computing power provided by high performance parallel processors and special purpose architectures. Research is being conducted in the fundamentals of data base logic and improvement techniques for producing reliable computing systems.
Web-based training: a new paradigm in computer-assisted instruction in medicine.
Haag, M; Maylein, L; Leven, F J; Tönshoff, B; Haux, R
1999-01-01
Computer-assisted instruction (CAI) programs based on internet technologies, especially on the world wide web (WWW), provide new opportunities in medical education. The aim of this paper is to examine different aspects of such programs, which we call 'web-based training (WBT) programs', and to differentiate them from conventional CAI programs. First, we will distinguish five different interaction types: presentation; browsing; tutorial dialogue; drill and practice; and simulation. In contrast to conventional CAI, there are four architectural types of WBT programs: client-based; remote data and knowledge; distributed teaching; and server-based. We will discuss the implications of the different architectures for developing WBT software. WBT programs have to meet other requirements than conventional CAI programs. The most important tools and programming languages for developing WBT programs will be listed and assigned to the architecture types. For the future, we expect a trend from conventional CAI towards WBT programs.
Unit cell-based computer-aided manufacturing system for tissue engineering.
Kang, Hyun-Wook; Park, Jeong Hun; Kang, Tae-Yun; Seol, Young-Joon; Cho, Dong-Woo
2012-03-01
Scaffolds play an important role in the regeneration of artificial tissues or organs. A scaffold is a porous structure with a micro-scale inner architecture in the range of several to several hundreds of micrometers. Therefore, computer-aided construction of scaffolds should provide sophisticated functionality for porous structure design and a tool path generation strategy that can achieve micro-scale architecture. In this study, a new unit cell-based computer-aided manufacturing (CAM) system was developed for the automated design and fabrication of a porous structure with micro-scale inner architecture that can be applied to composite tissue regeneration. The CAM system was developed by first defining a data structure for the computing process of a unit cell representing a single pore structure. Next, an algorithm and software were developed and applied to construct porous structures with a single or multiple pore design using solid freeform fabrication technology and a 3D tooth/spine computer-aided design model. We showed that this system is quite feasible for the design and fabrication of a scaffold for tissue engineering.
NASA Astrophysics Data System (ADS)
Mehta, Neville; Kompalli, Suryaprakash; Chaudhary, Vipin
Teleradiology is the electronic transmission of radiological patient images, such as x-rays, CT, or MR across multiple locations. The goal could be interpretation, consultation, or medical records keeping. Information technology solutions have enabled electronic records and their associated benefits are evident in health care today. However, salient aspects of collaborative interfaces, and computer assisted diagnostic (CAD) tools are yet to be integrated into workflow designs. The Computer Assisted Diagnostics and Interventions (CADI) group at the University at Buffalo has developed an architecture that facilitates web-enabled use of CAD tools, along with the novel concept of synchronized collaboration. The architecture can support multiple teleradiology applications and case studies are presented here.
NASA Technical Reports Server (NTRS)
LaValley, Brian W.; Little, Phillip D.; Walter, Chris J.
2011-01-01
This report documents the capabilities of the EDICT tools for error modeling and error propagation analysis when operating with models defined in the Architecture Analysis & Design Language (AADL). We discuss our experience using the EDICT error analysis capabilities on a model of the Scalable Processor-Independent Design for Enhanced Reliability (SPIDER) architecture that uses the Reliable Optical Bus (ROBUS). Based on these experiences we draw some initial conclusions about model based design techniques for error modeling and analysis of highly reliable computing architectures.
Real-time FPGA architectures for computer vision
NASA Astrophysics Data System (ADS)
Arias-Estrada, Miguel; Torres-Huitzil, Cesar
2000-03-01
This paper presents an architecture for real-time generic convolution of a mask and an image. The architecture is intended for fast low level image processing. The FPGA-based architecture takes advantage of the availability of registers in FPGAs to implement an efficient and compact module to process the convolutions. The architecture is designed to minimize the number of accesses to the image memory and is based on parallel modules with internal pipeline operation in order to improve its performance. The architecture is prototyped in a FPGA, but it can be implemented on a dedicated VLSI to reach higher clock frequencies. Complexity issues, FPGA resources utilization, FPGA limitations, and real time performance are discussed. Some results are presented and discussed.
Hybrid techniques for the digital control of mechanical and optical systems
NASA Astrophysics Data System (ADS)
Acernese, Fausto; Barone, Fabrizio; De Rosa, Rosario; Eleuteri, Antonio; Milano, Leopoldo; Pardi, Silvio; Ricciardi, Iolanda; Russo, Guido
2004-07-01
One of the main requirements of a digital system for the control of interferometric detectors of gravitational waves is the computing power, that is a direct consequence of the increasing complexity of the digital algorithms necessary for the control signals generation. For this specific task many specialised non standard real-time architectures have been developed, often very expensive and difficult to upgrade. On the other hand, such computing power is generally fully available for off-line applications on standard Pc based systems. Therefore, a possible and obvious solution may be provided by the integration of both the the real-time and off-line architecture resulting in a hybrid control system architecture based on standards available components, trying to get both the advantages of the perfect data synchronization provided by the real-time systems and by the large computing power available on Pc based systems. Such integration may be provided by the implementation of the link between the two different architectures through the standard Ethernet network, whose data transfer speed is largely increasing in these years, using the TCP/IP and UDP protocols. In this paper we describe the architecture of an hybrid Ethernet based real-time control system protoype we implemented in Napoli, discussing its characteristics and performances. Finally we discuss a possible application to the real-time control of a suspended mass of the mode cleaner of the 3m prototype optical interferometer for gravitational wave detection (IDGW-3P) operational in Napoli.
NASA Astrophysics Data System (ADS)
Marukame, Takao; Nishi, Yoshifumi; Yasuda, Shin-ichi; Tanamoto, Tetsufumi
2018-04-01
The use of memristive devices for creating artificial neurons is promising for brain-inspired computing from the viewpoints of computation architecture and learning protocol. We present an energy-efficient multiplier accumulator based on a memristive array architecture incorporating both analog and digital circuitries. The analog circuitry is used to full advantage for neural networks, as demonstrated by the spike-timing-dependent plasticity (STDP) in fabricated AlO x /TiO x -based metal-oxide memristive devices. STDP protocols for controlling periodic analog resistance with long-range stability were experimentally verified using a variety of voltage amplitudes and spike timings.
Practical Application of Model-based Programming and State-based Architecture to Space Missions
NASA Technical Reports Server (NTRS)
Horvath, Gregory; Ingham, Michel; Chung, Seung; Martin, Oliver; Williams, Brian
2006-01-01
A viewgraph presentation to develop models from systems engineers that accomplish mission objectives and manage the health of the system is shown. The topics include: 1) Overview; 2) Motivation; 3) Objective/Vision; 4) Approach; 5) Background: The Mission Data System; 6) Background: State-based Control Architecture System; 7) Background: State Analysis; 8) Overview of State Analysis; 9) Background: MDS Software Frameworks; 10) Background: Model-based Programming; 10) Background: Titan Model-based Executive; 11) Model-based Execution Architecture; 12) Compatibility Analysis of MDS and Titan Architectures; 13) Integrating Model-based Programming and Execution into the Architecture; 14) State Analysis and Modeling; 15) IMU Subsystem State Effects Diagram; 16) Titan Subsystem Model: IMU Health; 17) Integrating Model-based Programming and Execution into the Software IMU; 18) Testing Program; 19) Computationally Tractable State Estimation & Fault Diagnosis; 20) Diagnostic Algorithm Performance; 21) Integration and Test Issues; 22) Demonstrated Benefits; and 23) Next Steps
A resource management architecture based on complex network theory in cloud computing federation
NASA Astrophysics Data System (ADS)
Zhang, Zehua; Zhang, Xuejie
2011-10-01
Cloud Computing Federation is a main trend of Cloud Computing. Resource Management has significant effect on the design, realization, and efficiency of Cloud Computing Federation. Cloud Computing Federation has the typical characteristic of the Complex System, therefore, we propose a resource management architecture based on complex network theory for Cloud Computing Federation (abbreviated as RMABC) in this paper, with the detailed design of the resource discovery and resource announcement mechanisms. Compare with the existing resource management mechanisms in distributed computing systems, a Task Manager in RMABC can use the historical information and current state data get from other Task Managers for the evolution of the complex network which is composed of Task Managers, thus has the advantages in resource discovery speed, fault tolerance and adaptive ability. The result of the model experiment confirmed the advantage of RMABC in resource discovery performance.
A performance analysis of advanced I/O architectures for PC-based network file servers
NASA Astrophysics Data System (ADS)
Huynh, K. D.; Khoshgoftaar, T. M.
1994-12-01
In the personal computing and workstation environments, more and more I/O adapters are becoming complete functional subsystems that are intelligent enough to handle I/O operations on their own without much intervention from the host processor. The IBM Subsystem Control Block (SCB) architecture has been defined to enhance the potential of these intelligent adapters by defining services and conventions that deliver command information and data to and from the adapters. In recent years, a new storage architecture, the Redundant Array of Independent Disks (RAID), has been quickly gaining acceptance in the world of computing. In this paper, we would like to discuss critical system design issues that are important to the performance of a network file server. We then present a performance analysis of the SCB architecture and disk array technology in typical network file server environments based on personal computers (PCs). One of the key issues investigated in this paper is whether a disk array can outperform a group of disks (of same type, same data capacity, and same cost) operating independently, not in parallel as in a disk array.
Study on Global GIS architecture and its key technologies
NASA Astrophysics Data System (ADS)
Cheng, Chengqi; Guan, Li; Lv, Xuefeng
2009-09-01
Global GIS (G2IS) is a system, which supports the huge data process and the global direct manipulation on global grid based on spheroid or ellipsoid surface. Based on global subdivision grid (GSG), Global GIS architecture is presented in this paper, taking advantage of computer cluster theory, the space-time integration technology and the virtual reality technology. Global GIS system architecture is composed of five layers, including data storage layer, data representation layer, network and cluster layer, data management layer and data application layer. Thereinto, it is designed that functions of four-level protocol framework and three-layer data management pattern of Global GIS based on organization, management and publication of spatial information in this architecture. Three kinds of core supportive technologies, which are computer cluster theory, the space-time integration technology and the virtual reality technology, and its application pattern in the Global GIS are introduced in detail. The primary ideas of Global GIS in this paper will be an important development tendency of GIS.
Study on Global GIS architecture and its key technologies
NASA Astrophysics Data System (ADS)
Cheng, Chengqi; Guan, Li; Lv, Xuefeng
2010-11-01
Global GIS (G2IS) is a system, which supports the huge data process and the global direct manipulation on global grid based on spheroid or ellipsoid surface. Based on global subdivision grid (GSG), Global GIS architecture is presented in this paper, taking advantage of computer cluster theory, the space-time integration technology and the virtual reality technology. Global GIS system architecture is composed of five layers, including data storage layer, data representation layer, network and cluster layer, data management layer and data application layer. Thereinto, it is designed that functions of four-level protocol framework and three-layer data management pattern of Global GIS based on organization, management and publication of spatial information in this architecture. Three kinds of core supportive technologies, which are computer cluster theory, the space-time integration technology and the virtual reality technology, and its application pattern in the Global GIS are introduced in detail. The primary ideas of Global GIS in this paper will be an important development tendency of GIS.
Platform Architecture for Decentralized Positioning Systems.
Kasmi, Zakaria; Norrdine, Abdelmoumen; Blankenbach, Jörg
2017-04-26
A platform architecture for positioning systems is essential for the realization of a flexible localization system, which interacts with other systems and supports various positioning technologies and algorithms. The decentralized processing of a position enables pushing the application-level knowledge into a mobile station and avoids the communication with a central unit such as a server or a base station. In addition, the calculation of the position on low-cost and resource-constrained devices presents a challenge due to the limited computing, storage capacity, as well as power supply. Therefore, we propose a platform architecture that enables the design of a system with the reusability of the components, extensibility (e.g., with other positioning technologies) and interoperability. Furthermore, the position is computed on a low-cost device such as a microcontroller, which simultaneously performs additional tasks such as data collecting or preprocessing based on an operating system. The platform architecture is designed, implemented and evaluated on the basis of two positioning systems: a field strength system and a time of arrival-based positioning system.
Platform Architecture for Decentralized Positioning Systems
Kasmi, Zakaria; Norrdine, Abdelmoumen; Blankenbach, Jörg
2017-01-01
A platform architecture for positioning systems is essential for the realization of a flexible localization system, which interacts with other systems and supports various positioning technologies and algorithms. The decentralized processing of a position enables pushing the application-level knowledge into a mobile station and avoids the communication with a central unit such as a server or a base station. In addition, the calculation of the position on low-cost and resource-constrained devices presents a challenge due to the limited computing, storage capacity, as well as power supply. Therefore, we propose a platform architecture that enables the design of a system with the reusability of the components, extensibility (e.g., with other positioning technologies) and interoperability. Furthermore, the position is computed on a low-cost device such as a microcontroller, which simultaneously performs additional tasks such as data collecting or preprocessing based on an operating system. The platform architecture is designed, implemented and evaluated on the basis of two positioning systems: a field strength system and a time of arrival-based positioning system. PMID:28445414
Design and deployment of an elastic network test-bed in IHEP data center based on SDN
NASA Astrophysics Data System (ADS)
Zeng, Shan; Qi, Fazhi; Chen, Gang
2017-10-01
High energy physics experiments produce huge amounts of raw data, while because of the sharing characteristics of the network resources, there is no guarantee of the available bandwidth for each experiment which may cause link congestion problems. On the other side, with the development of cloud computing technologies, IHEP have established a cloud platform based on OpenStack which can ensure the flexibility of the computing and storage resources, and more and more computing applications have been deployed on virtual machines established by OpenStack. However, under the traditional network architecture, network capability can’t be required elastically, which becomes the bottleneck of restricting the flexible application of cloud computing. In order to solve the above problems, we propose an elastic cloud data center network architecture based on SDN, and we also design a high performance controller cluster based on OpenDaylight. In the end, we present our current test results.
Computers in Academic Architecture Libraries.
ERIC Educational Resources Information Center
Willis, Alfred; And Others
1992-01-01
Computers are widely used in architectural research and teaching in U.S. schools of architecture. A survey of libraries serving these schools sought information on the emphasis placed on computers by the architectural curriculum, accessibility of computers to library staff, and accessibility of computers to library patrons. Survey results and…
Analog Computation by DNA Strand Displacement Circuits.
Song, Tianqi; Garg, Sudhanshu; Mokhtar, Reem; Bui, Hieu; Reif, John
2016-08-19
DNA circuits have been widely used to develop biological computing devices because of their high programmability and versatility. Here, we propose an architecture for the systematic construction of DNA circuits for analog computation based on DNA strand displacement. The elementary gates in our architecture include addition, subtraction, and multiplication gates. The input and output of these gates are analog, which means that they are directly represented by the concentrations of the input and output DNA strands, respectively, without requiring a threshold for converting to Boolean signals. We provide detailed domain designs and kinetic simulations of the gates to demonstrate their expected performance. On the basis of these gates, we describe how DNA circuits to compute polynomial functions of inputs can be built. Using Taylor Series and Newton Iteration methods, functions beyond the scope of polynomials can also be computed by DNA circuits built upon our architecture.
Advanced Architectures for Astrophysical Supercomputing
NASA Astrophysics Data System (ADS)
Barsdell, B. R.; Barnes, D. G.; Fluke, C. J.
2010-12-01
Astronomers have come to rely on the increasing performance of computers to reduce, analyze, simulate and visualize their data. In this environment, faster computation can mean more science outcomes or the opening up of new parameter spaces for investigation. If we are to avoid major issues when implementing codes on advanced architectures, it is important that we have a solid understanding of our algorithms. A recent addition to the high-performance computing scene that highlights this point is the graphics processing unit (GPU). The hardware originally designed for speeding-up graphics rendering in video games is now achieving speed-ups of O(100×) in general-purpose computation - performance that cannot be ignored. We are using a generalized approach, based on the analysis of astronomy algorithms, to identify the optimal problem-types and techniques for taking advantage of both current GPU hardware and future developments in computing architectures.
NASA Astrophysics Data System (ADS)
Rucinski, Marek; Coates, Adam; Montano, Giuseppe; Allouis, Elie; Jameux, David
2015-09-01
The Lightweight Advanced Robotic Arm Demonstrator (LARAD) is a state-of-the-art, two-meter long robotic arm for planetary surface exploration currently being developed by a UK consortium led by Airbus Defence and Space Ltd under contract to the UK Space Agency (CREST-2 programme). LARAD has a modular design, which allows for experimentation with different electronics and control software. The control system architecture includes the on-board computer, control software and firmware, and the communication infrastructure (e.g. data links, switches) connecting on-board computer(s), sensors, actuators and the end-effector. The purpose of the control system is to operate the arm according to pre-defined performance requirements, monitoring its behaviour in real-time and performing safing/recovery actions in case of faults. This paper reports on the results of a recent study about the feasibility of the development and integration of a novel control system architecture for LARAD fully based on the SpaceWire protocol. The current control system architecture is based on the combination of two communication protocols, Ethernet and CAN. The new SpaceWire-based control system will allow for improved monitoring and telecommanding performance thanks to higher communication data rate, allowing for the adoption of advanced control schemes, potentially based on multiple vision sensors, and for the handling of sophisticated end-effectors that require fine control, such as science payloads or robotic hands.
A Nanotechnology-Ready Computing Scheme based on a Weakly Coupled Oscillator Network
NASA Astrophysics Data System (ADS)
Vodenicarevic, Damir; Locatelli, Nicolas; Abreu Araujo, Flavio; Grollier, Julie; Querlioz, Damien
2017-03-01
With conventional transistor technologies reaching their limits, alternative computing schemes based on novel technologies are currently gaining considerable interest. Notably, promising computing approaches have proposed to leverage the complex dynamics emerging in networks of coupled oscillators based on nanotechnologies. The physical implementation of such architectures remains a true challenge, however, as most proposed ideas are not robust to nanotechnology devices’ non-idealities. In this work, we propose and investigate the implementation of an oscillator-based architecture, which can be used to carry out pattern recognition tasks, and which is tailored to the specificities of nanotechnologies. This scheme relies on a weak coupling between oscillators, and does not require a fine tuning of the coupling values. After evaluating its reliability under the severe constraints associated to nanotechnologies, we explore the scalability of such an architecture, suggesting its potential to realize pattern recognition tasks using limited resources. We show that it is robust to issues like noise, variability and oscillator non-linearity. Defining network optimization design rules, we show that nano-oscillator networks could be used for efficient cognitive processing.
A Nanotechnology-Ready Computing Scheme based on a Weakly Coupled Oscillator Network.
Vodenicarevic, Damir; Locatelli, Nicolas; Abreu Araujo, Flavio; Grollier, Julie; Querlioz, Damien
2017-03-21
With conventional transistor technologies reaching their limits, alternative computing schemes based on novel technologies are currently gaining considerable interest. Notably, promising computing approaches have proposed to leverage the complex dynamics emerging in networks of coupled oscillators based on nanotechnologies. The physical implementation of such architectures remains a true challenge, however, as most proposed ideas are not robust to nanotechnology devices' non-idealities. In this work, we propose and investigate the implementation of an oscillator-based architecture, which can be used to carry out pattern recognition tasks, and which is tailored to the specificities of nanotechnologies. This scheme relies on a weak coupling between oscillators, and does not require a fine tuning of the coupling values. After evaluating its reliability under the severe constraints associated to nanotechnologies, we explore the scalability of such an architecture, suggesting its potential to realize pattern recognition tasks using limited resources. We show that it is robust to issues like noise, variability and oscillator non-linearity. Defining network optimization design rules, we show that nano-oscillator networks could be used for efficient cognitive processing.
A Nanotechnology-Ready Computing Scheme based on a Weakly Coupled Oscillator Network
Vodenicarevic, Damir; Locatelli, Nicolas; Abreu Araujo, Flavio; Grollier, Julie; Querlioz, Damien
2017-01-01
With conventional transistor technologies reaching their limits, alternative computing schemes based on novel technologies are currently gaining considerable interest. Notably, promising computing approaches have proposed to leverage the complex dynamics emerging in networks of coupled oscillators based on nanotechnologies. The physical implementation of such architectures remains a true challenge, however, as most proposed ideas are not robust to nanotechnology devices’ non-idealities. In this work, we propose and investigate the implementation of an oscillator-based architecture, which can be used to carry out pattern recognition tasks, and which is tailored to the specificities of nanotechnologies. This scheme relies on a weak coupling between oscillators, and does not require a fine tuning of the coupling values. After evaluating its reliability under the severe constraints associated to nanotechnologies, we explore the scalability of such an architecture, suggesting its potential to realize pattern recognition tasks using limited resources. We show that it is robust to issues like noise, variability and oscillator non-linearity. Defining network optimization design rules, we show that nano-oscillator networks could be used for efficient cognitive processing. PMID:28322262
A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL)
NASA Technical Reports Server (NTRS)
Carroll, Chester C.; Owen, Jeffrey E.
1988-01-01
A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL) is presented which overcomes the traditional disadvantages of simulations executed on a digital computer. The incorporation of parallel processing allows the mapping of simulations into a digital computer to be done in the same inherently parallel manner as they are currently mapped onto an analog computer. The direct-execution format maximizes the efficiency of the executed code since the need for a high level language compiler is eliminated. Resolution is greatly increased over that which is available with an analog computer without the sacrifice in execution speed normally expected with digitial computer simulations. Although this report covers all aspects of the new architecture, key emphasis is placed on the processing element configuration and the microprogramming of the ACLS constructs. The execution times for all ACLS constructs are computed using a model of a processing element based on the AMD 29000 CPU and the AMD 29027 FPU. The increase in execution speed provided by parallel processing is exemplified by comparing the derived execution times of two ACSL programs with the execution times for the same programs executed on a similar sequential architecture.
A static data flow simulation study at Ames Research Center
NASA Technical Reports Server (NTRS)
Barszcz, Eric; Howard, Lauri S.
1987-01-01
Demands in computational power, particularly in the area of computational fluid dynamics (CFD), led NASA Ames Research Center to study advanced computer architectures. One architecture being studied is the static data flow architecture based on research done by Jack B. Dennis at MIT. To improve understanding of this architecture, a static data flow simulator, written in Pascal, has been implemented for use on a Cray X-MP/48. A matrix multiply and a two-dimensional fast Fourier transform (FFT), two algorithms used in CFD work at Ames, have been run on the simulator. Execution times can vary by a factor of more than 2 depending on the partitioning method used to assign instructions to processing elements. Service time for matching tokens has proved to be a major bottleneck. Loop control and array address calculation overhead can double the execution time. The best sustained MFLOPS rates were less than 50% of the maximum capability of the machine.
Strategies for concurrent processing of complex algorithms in data driven architectures
NASA Technical Reports Server (NTRS)
Stoughton, John W.; Mielke, Roland R.
1988-01-01
The purpose is to document research to develop strategies for concurrent processing of complex algorithms in data driven architectures. The problem domain consists of decision-free algorithms having large-grained, computationally complex primitive operations. Such are often found in signal processing and control applications. The anticipated multiprocessor environment is a data flow architecture containing between two and twenty computing elements. Each computing element is a processor having local program memory, and which communicates with a common global data memory. A new graph theoretic model called ATAMM which establishes rules for relating a decomposed algorithm to its execution in a data flow architecture is presented. The ATAMM model is used to determine strategies to achieve optimum time performance and to develop a system diagnostic software tool. In addition, preliminary work on a new multiprocessor operating system based on the ATAMM specifications is described.
Neuromorphic Computing for Very Large Test and Evaluation Data Analysis
2014-05-01
analysis and utilization of newly available hardware- based artificial neural network chips. These two aspects of the program are complementary. The...neuromorphic architectures research focused on long term disruptive technologies with high risk but revolutionary potential. The hardware- based neural...today. Overall, hardware- based neural processing research allows us to study the fundamental system and architectural issues relevant for employing
The Design of Fault Tolerant Quantum Dot Cellular Automata Based Logic
NASA Technical Reports Server (NTRS)
Armstrong, C. Duane; Humphreys, William M.; Fijany, Amir
2002-01-01
As transistor geometries are reduced, quantum effects begin to dominate device performance. At some point, transistors cease to have the properties that make them useful computational components. New computing elements must be developed in order to keep pace with Moore s Law. Quantum dot cellular automata (QCA) represent an alternative paradigm to transistor-based logic. QCA architectures that are robust to manufacturing tolerances and defects must be developed. We are developing software that allows the exploration of fault tolerant QCA gate architectures by automating the specification, simulation, analysis and documentation processes.
Parallel, stochastic measurement of molecular surface area.
Juba, Derek; Varshney, Amitabh
2008-08-01
Biochemists often wish to compute surface areas of proteins. A variety of algorithms have been developed for this task, but they are designed for traditional single-processor architectures. The current trend in computer hardware is towards increasingly parallel architectures for which these algorithms are not well suited. We describe a parallel, stochastic algorithm for molecular surface area computation that maps well to the emerging multi-core architectures. Our algorithm is also progressive, providing a rough estimate of surface area immediately and refining this estimate as time goes on. Furthermore, the algorithm generates points on the molecular surface which can be used for point-based rendering. We demonstrate a GPU implementation of our algorithm and show that it compares favorably with several existing molecular surface computation programs, giving fast estimates of the molecular surface area with good accuracy.
Overview of the LINCS architecture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fletcher, J.G.; Watson, R.W.
1982-01-13
Computing at the Lawrence Livermore National Laboratory (LLNL) has evolved over the past 15 years with a computer network based resource sharing environment. The increasing use of low cost and high performance micro, mini and midi computers and commercially available local networking systems will accelerate this trend. Further, even the large scale computer systems, on which much of the LLNL scientific computing depends, are evolving into multiprocessor systems. It is our belief that the most cost effective use of this environment will depend on the development of application systems structured into cooperating concurrent program modules (processes) distributed appropriately over differentmore » nodes of the environment. A node is defined as one or more processors with a local (shared) high speed memory. Given the latter view, the environment can be characterized as consisting of: multiple nodes communicating over noisy channels with arbitrary delays and throughput, heterogenous base resources and information encodings, no single administration controlling all resources, distributed system state, and no uniform time base. The system design problem is - how to turn the heterogeneous base hardware/firmware/software resources of this environment into a coherent set of resources that facilitate development of cost effective, reliable, and human engineered applications. We believe the answer lies in developing a layered, communication oriented distributed system architecture; layered and modular to support ease of understanding, reconfiguration, extensibility, and hiding of implementation or nonessential local details; communication oriented because that is a central feature of the environment. The Livermore Interactive Network Communication System (LINCS) is a hierarchical architecture designed to meet the above needs. While having characteristics in common with other architectures, it differs in several respects.« less
Computer Architecture for Energy Efficient SFQ
2014-08-27
IBM Corporation (T.J. Watson Research Laboratory) 1101 Kitchawan Road Yorktown Heights, NY 10598 -0000 2 ABSTRACT Number of Papers published in peer...accomplished during this ARO-sponsored project at IBM Research to identify and model an energy efficient SFQ-based computer architecture. The... IBM Windsor Blue (WB), illustrated schematically in Figure 2. The basic building block of WB is a "tile" comprised of a 64-bit arithmetic logic unit
Heterogeneous real-time computing in radio astronomy
NASA Astrophysics Data System (ADS)
Ford, John M.; Demorest, Paul; Ransom, Scott
2010-07-01
Modern computer architectures suited for general purpose computing are often not the best choice for either I/O-bound or compute-bound problems. Sometimes the best choice is not to choose a single architecture, but to take advantage of the best characteristics of different computer architectures to solve your problems. This paper examines the tradeoffs between using computer systems based on the ubiquitous X86 Central Processing Units (CPU's), Field Programmable Gate Array (FPGA) based signal processors, and Graphical Processing Units (GPU's). We will show how a heterogeneous system can be produced that blends the best of each of these technologies into a real-time signal processing system. FPGA's tightly coupled to analog-to-digital converters connect the instrument to the telescope and supply the first level of computing to the system. These FPGA's are coupled to other FPGA's to continue to provide highly efficient processing power. Data is then packaged up and shipped over fast networks to a cluster of general purpose computers equipped with GPU's, which are used for floating-point intensive computation. Finally, the data is handled by the CPU and written to disk, or further processed. Each of the elements in the system has been chosen for its specific characteristics and the role it can play in creating a system that does the most for the least, in terms of power, space, and money.
Service-Oriented Architecture for NVO and TeraGrid Computing
NASA Technical Reports Server (NTRS)
Jacob, Joseph; Miller, Craig; Williams, Roy; Steenberg, Conrad; Graham, Matthew
2008-01-01
The National Virtual Observatory (NVO) Extensible Secure Scalable Service Infrastructure (NESSSI) is a Web service architecture and software framework that enables Web-based astronomical data publishing and processing on grid computers such as the National Science Foundation's TeraGrid. Characteristics of this architecture include the following: (1) Services are created, managed, and upgraded by their developers, who are trusted users of computing platforms on which the services are deployed. (2) Service jobs can be initiated by means of Java or Python client programs run on a command line or with Web portals. (3) Access is granted within a graduated security scheme in which the size of a job that can be initiated depends on the level of authentication of the user.
NASA Technical Reports Server (NTRS)
Rickard, D. A.; Bodenheimer, R. E.
1976-01-01
Digital computer components which perform two dimensional array logic operations (Tse logic) on binary data arrays are described. The properties of Golay transforms which make them useful in image processing are reviewed, and several architectures for Golay transform processors are presented with emphasis on the skeletonizing algorithm. Conventional logic control units developed for the Golay transform processors are described. One is a unique microprogrammable control unit that uses a microprocessor to control the Tse computer. The remaining control units are based on programmable logic arrays. Performance criteria are established and utilized to compare the various Golay transform machines developed. A critique of Tse logic is presented, and recommendations for additional research are included.
Evaluation of the Intel iWarp parallel processor for space flight applications
NASA Technical Reports Server (NTRS)
Hine, Butler P., III; Fong, Terrence W.
1993-01-01
The potential of a DARPA-sponsored advanced processor, the Intel iWarp, for use in future SSF Data Management Systems (DMS) upgrades is evaluated through integration into the Ames DMS testbed and applications testing. The iWarp is a distributed, parallel computing system well suited for high performance computing applications such as matrix operations and image processing. The system architecture is modular, supports systolic and message-based computation, and is capable of providing massive computational power in a low-cost, low-power package. As a consequence, the iWarp offers significant potential for advanced space-based computing. This research seeks to determine the iWarp's suitability as a processing device for space missions. In particular, the project focuses on evaluating the ease of integrating the iWarp into the SSF DMS baseline architecture and the iWarp's ability to support computationally stressing applications representative of SSF tasks.
SiC: An Agent Based Architecture for Preventing and Detecting Attacks to Ubiquitous Databases
NASA Astrophysics Data System (ADS)
Pinzón, Cristian; de Paz, Yanira; Bajo, Javier; Abraham, Ajith; Corchado, Juan M.
One of the main attacks to ubiquitous databases is the structure query language (SQL) injection attack, which causes severe damages both in the commercial aspect and in the user’s confidence. This chapter proposes the SiC architecture as a solution to the SQL injection attack problem. This is a hierarchical distributed multiagent architecture, which involves an entirely new approach with respect to existing architectures for the prevention and detection of SQL injections. SiC incorporates a kind of intelligent agent, which integrates a case-based reasoning system. This agent, which is the core of the architecture, allows the application of detection techniques based on anomalies as well as those based on patterns, providing a great degree of autonomy, flexibility, robustness and dynamic scalability. The characteristics of the multiagent system allow an architecture to detect attacks from different types of devices, regardless of the physical location. The architecture has been tested on a medical database, guaranteeing safe access from various devices such as PDAs and notebook computers.
Bioinspired architecture approach for a one-billion transistor smart CMOS camera chip
NASA Astrophysics Data System (ADS)
Fey, Dietmar; Komann, Marcus
2007-05-01
In the paper we present a massively parallel VLSI architecture for future smart CMOS camera chips with up to one billion transistors. To exploit efficiently the potential offered by future micro- or nanoelectronic devices traditional on central structures oriented parallel architectures based on MIMD or SIMD approaches will fail. They require too long and too many global interconnects for the distribution of code or the access to common memory. On the other hand nature developed self-organising and emergent principles to manage successfully complex structures based on lots of interacting simple elements. Therefore we developed a new as Marching Pixels denoted emergent computing paradigm based on a mixture of bio-inspired computing models like cellular automaton and artificial ants. In the paper we present different Marching Pixels algorithms and the corresponding VLSI array architecture. A detailed synthesis result for a 0.18 μm CMOS process shows that a 256×256 pixel image is processed in less than 10 ms assuming a moderate 100 MHz clock rate for the processor array. Future higher integration densities and a 3D chip stacking technology will allow the integration and processing of Mega pixels within the same time since our architecture is fully scalable.
COMBAT: mobile-Cloud-based cOmpute/coMmunications infrastructure for BATtlefield applications
NASA Astrophysics Data System (ADS)
Soyata, Tolga; Muraleedharan, Rajani; Langdon, Jonathan; Funai, Colin; Ames, Scott; Kwon, Minseok; Heinzelman, Wendi
2012-05-01
The amount of data processed annually over the Internet has crossed the zetabyte boundary, yet this Big Data cannot be efficiently processed or stored using today's mobile devices. Parallel to this explosive growth in data, a substantial increase in mobile compute-capability and the advances in cloud computing have brought the state-of-the- art in mobile-cloud computing to an inflection point, where the right architecture may allow mobile devices to run applications utilizing Big Data and intensive computing. In this paper, we propose the MObile Cloud-based Hybrid Architecture (MOCHA), which formulates a solution to permit mobile-cloud computing applications such as object recognition in the battlefield by introducing a mid-stage compute- and storage-layer, called the cloudlet. MOCHA is built on the key observation that many mobile-cloud applications have the following characteristics: 1) they are compute-intensive, requiring the compute-power of a supercomputer, and 2) they use Big Data, requiring a communications link to cloud-based database sources in near-real-time. In this paper, we describe the operation of MOCHA in battlefield applications, by formulating the aforementioned mobile and cloudlet to be housed within a soldier's vest and inside a military vehicle, respectively, and enabling access to the cloud through high latency satellite links. We provide simulations using the traditional mobile-cloud approach as well as utilizing MOCHA with a mid-stage cloudlet to quantify the utility of this architecture. We show that the MOCHA platform for mobile-cloud computing promises a future for critical battlefield applications that access Big Data, which is currently not possible using existing technology.
Fast semivariogram computation using FPGA architectures
NASA Astrophysics Data System (ADS)
Lagadapati, Yamuna; Shirvaikar, Mukul; Dong, Xuanliang
2015-02-01
The semivariogram is a statistical measure of the spatial distribution of data and is based on Markov Random Fields (MRFs). Semivariogram analysis is a computationally intensive algorithm that has typically seen applications in the geosciences and remote sensing areas. Recently, applications in the area of medical imaging have been investigated, resulting in the need for efficient real time implementation of the algorithm. The semivariogram is a plot of semivariances for different lag distances between pixels. A semi-variance, γ(h), is defined as the half of the expected squared differences of pixel values between any two data locations with a lag distance of h. Due to the need to examine each pair of pixels in the image or sub-image being processed, the base algorithm complexity for an image window with n pixels is O(n2). Field Programmable Gate Arrays (FPGAs) are an attractive solution for such demanding applications due to their parallel processing capability. FPGAs also tend to operate at relatively modest clock rates measured in a few hundreds of megahertz, but they can perform tens of thousands of calculations per clock cycle while operating in the low range of power. This paper presents a technique for the fast computation of the semivariogram using two custom FPGA architectures. The design consists of several modules dedicated to the constituent computational tasks. A modular architecture approach is chosen to allow for replication of processing units. This allows for high throughput due to concurrent processing of pixel pairs. The current implementation is focused on isotropic semivariogram computations only. Anisotropic semivariogram implementation is anticipated to be an extension of the current architecture, ostensibly based on refinements to the current modules. The algorithm is benchmarked using VHDL on a Xilinx XUPV5-LX110T development Kit, which utilizes the Virtex5 FPGA. Medical image data from MRI scans are utilized for the experiments. Computational speedup is measured with respect to Matlab implementation on a personal computer with an Intel i7 multi-core processor. Preliminary simulation results indicate that a significant advantage in speed can be attained by the architectures, making the algorithm viable for implementation in medical devices
Stamatakis, Alexandros; Ott, Michael
2008-12-27
The continuous accumulation of sequence data, for example, due to novel wet-laboratory techniques such as pyrosequencing, coupled with the increasing popularity of multi-gene phylogenies and emerging multi-core processor architectures that face problems of cache congestion, poses new challenges with respect to the efficient computation of the phylogenetic maximum-likelihood (ML) function. Here, we propose two approaches that can significantly speed up likelihood computations that typically represent over 95 per cent of the computational effort conducted by current ML or Bayesian inference programs. Initially, we present a method and an appropriate data structure to efficiently compute the likelihood score on 'gappy' multi-gene alignments. By 'gappy' we denote sampling-induced gaps owing to missing sequences in individual genes (partitions), i.e. not real alignment gaps. A first proof-of-concept implementation in RAXML indicates that this approach can accelerate inferences on large and gappy alignments by approximately one order of magnitude. Moreover, we present insights and initial performance results on multi-core architectures obtained during the transition from an OpenMP-based to a Pthreads-based fine-grained parallelization of the ML function.
Trapped-Ion Quantum Logic with Global Radiation Fields.
Weidt, S; Randall, J; Webster, S C; Lake, K; Webb, A E; Cohen, I; Navickas, T; Lekitsch, B; Retzker, A; Hensinger, W K
2016-11-25
Trapped ions are a promising tool for building a large-scale quantum computer. However, the number of required radiation fields for the realization of quantum gates in any proposed ion-based architecture scales with the number of ions within the quantum computer, posing a major obstacle when imagining a device with millions of ions. Here, we present a fundamentally different approach for trapped-ion quantum computing where this detrimental scaling vanishes. The method is based on individually controlled voltages applied to each logic gate location to facilitate the actual gate operation analogous to a traditional transistor architecture within a classical computer processor. To demonstrate the key principle of this approach we implement a versatile quantum gate method based on long-wavelength radiation and use this method to generate a maximally entangled state of two quantum engineered clock qubits with fidelity 0.985(12). This quantum gate also constitutes a simple-to-implement tool for quantum metrology, sensing, and simulation.
2010-06-01
DATES COVEREDAPR 2009 – JAN 2010 (From - To) APR 2009 – JAN 2010 4. TITLE AND SUBTITLE EMERGING NEUROMORPHIC COMPUTING ARCHITECTURES AND ENABLING...14. ABSTRACT The highly cross-disciplinary emerging field of neuromorphic computing architectures for cognitive information processing applications...belief systems, software, computer engineering, etc. In our effort to develop cognitive systems atop a neuromorphic computing architecture, we explored
Design and Analysis of a Neuromemristive Reservoir Computing Architecture for Biosignal Processing
Kudithipudi, Dhireesha; Saleh, Qutaiba; Merkel, Cory; Thesing, James; Wysocki, Bryant
2016-01-01
Reservoir computing (RC) is gaining traction in several signal processing domains, owing to its non-linear stateful computation, spatiotemporal encoding, and reduced training complexity over recurrent neural networks (RNNs). Previous studies have shown the effectiveness of software-based RCs for a wide spectrum of applications. A parallel body of work indicates that realizing RNN architectures using custom integrated circuits and reconfigurable hardware platforms yields significant improvements in power and latency. In this research, we propose a neuromemristive RC architecture, with doubly twisted toroidal structure, that is validated for biosignal processing applications. We exploit the device mismatch to implement the random weight distributions within the reservoir and propose mixed-signal subthreshold circuits for energy efficiency. A comprehensive analysis is performed to compare the efficiency of the neuromemristive RC architecture in both digital(reconfigurable) and subthreshold mixed-signal realizations. Both Electroencephalogram (EEG) and Electromyogram (EMG) biosignal benchmarks are used for validating the RC designs. The proposed RC architecture demonstrated an accuracy of 90 and 84% for epileptic seizure detection and EMG prosthetic finger control, respectively. PMID:26869876
Architecture for hospital information integration
NASA Astrophysics Data System (ADS)
Chimiak, William J.; Janariz, Daniel L.; Martinez, Ralph
1999-07-01
The ongoing integration of hospital information systems (HIS) continues. Data storage systems, data networks and computers improve, data bases grow and health-care applications increase. Some computer operating systems continue to evolve and some fade. Health care delivery now depends on this computer-assisted environment. The result is the critical harmonization of the various hospital information systems becomes increasingly difficult. The purpose of this paper is to present an architecture for HIS integration that is computer-language-neutral and computer- hardware-neutral for the informatics applications. The proposed architecture builds upon the work done at the University of Arizona on middleware, the work of the National Electrical Manufacturers Association, and the American College of Radiology. It is a fresh approach to allowing applications engineers to access medical data easily and thus concentrates on the application techniques in which they are expert without struggling with medical information syntaxes. The HIS can be modeled using a hierarchy of information sub-systems thus facilitating its understanding. The architecture includes the resulting information model along with a strict but intuitive application programming interface, managed by CORBA. The CORBA requirement facilitates interoperability. It should also reduce software and hardware development times.
A Roadmap for caGrid, an Enterprise Grid Architecture for Biomedical Research
Saltz, Joel; Hastings, Shannon; Langella, Stephen; Oster, Scott; Kurc, Tahsin; Payne, Philip; Ferreira, Renato; Plale, Beth; Goble, Carole; Ervin, David; Sharma, Ashish; Pan, Tony; Permar, Justin; Brezany, Peter; Siebenlist, Frank; Madduri, Ravi; Foster, Ian; Shanbhag, Krishnakant; Mead, Charlie; Hong, Neil Chue
2012-01-01
caGrid is a middleware system which combines the Grid computing, the service oriented architecture, and the model driven architecture paradigms to support development of interoperable data and analytical resources and federation of such resources in a Grid environment. The functionality provided by caGrid is an essential and integral component of the cancer Biomedical Informatics Grid (caBIG™) program. This program is established by the National Cancer Institute as a nationwide effort to develop enabling informatics technologies for collaborative, multi-institutional biomedical research with the overarching goal of accelerating translational cancer research. Although the main application domain for caGrid is cancer research, the infrastructure provides a generic framework that can be employed in other biomedical research and healthcare domains. The development of caGrid is an ongoing effort, adding new functionality and improvements based on feedback and use cases from the community. This paper provides an overview of potential future architecture and tooling directions and areas of improvement for caGrid and caGrid-like systems. This summary is based on discussions at a roadmap workshop held in February with participants from biomedical research, Grid computing, and high performance computing communities. PMID:18560123
A roadmap for caGrid, an enterprise Grid architecture for biomedical research.
Saltz, Joel; Hastings, Shannon; Langella, Stephen; Oster, Scott; Kurc, Tahsin; Payne, Philip; Ferreira, Renato; Plale, Beth; Goble, Carole; Ervin, David; Sharma, Ashish; Pan, Tony; Permar, Justin; Brezany, Peter; Siebenlist, Frank; Madduri, Ravi; Foster, Ian; Shanbhag, Krishnakant; Mead, Charlie; Chue Hong, Neil
2008-01-01
caGrid is a middleware system which combines the Grid computing, the service oriented architecture, and the model driven architecture paradigms to support development of interoperable data and analytical resources and federation of such resources in a Grid environment. The functionality provided by caGrid is an essential and integral component of the cancer Biomedical Informatics Grid (caBIG) program. This program is established by the National Cancer Institute as a nationwide effort to develop enabling informatics technologies for collaborative, multi-institutional biomedical research with the overarching goal of accelerating translational cancer research. Although the main application domain for caGrid is cancer research, the infrastructure provides a generic framework that can be employed in other biomedical research and healthcare domains. The development of caGrid is an ongoing effort, adding new functionality and improvements based on feedback and use cases from the community. This paper provides an overview of potential future architecture and tooling directions and areas of improvement for caGrid and caGrid-like systems. This summary is based on discussions at a roadmap workshop held in February with participants from biomedical research, Grid computing, and high performance computing communities.
A brick-architecture-based mobile under-vehicle inspection system
NASA Astrophysics Data System (ADS)
Qian, Cheng; Page, David; Koschan, Andreas; Abidi, Mongi
2005-05-01
In this paper, a mobile scanning system for real-time under-vehicle inspection is presented, which is founded on a "Brick" architecture. In this "Brick" architecture, the inspection system is basically decomposed into bricks of three kinds: sensing, mobility, and computing. These bricks are physically and logically independent and communicate with each other by wireless communication. Each brick is mainly composed by five modules: data acquisition, data processing, data transmission, power, and self-management. These five modules can be further decomposed into submodules where the function and the interface are well-defined. Based on this architecture, the system is built by four bricks: two sensing bricks consisting of a range scanner and a line CCD, one mobility brick, and one computing brick. The sensing bricks capture geometric data and texture data of the under-vehicle scene, while the mobility brick provides positioning data along the motion path. Data of these three modalities are transmitted to the computing brick where they are fused and reconstruct a 3D under-vehicle model for visualization and danger inspection. This system has been successfully used in several military applications and proved to be an effective safer method for national security.
Real-time field programmable gate array architecture for computer vision
NASA Astrophysics Data System (ADS)
Arias-Estrada, Miguel; Torres-Huitzil, Cesar
2001-01-01
This paper presents an architecture for real-time generic convolution of a mask and an image. The architecture is intended for fast low-level image processing. The field programmable gate array (FPGA)-based architecture takes advantage of the availability of registers in FPGAs to implement an efficient and compact module to process the convolutions. The architecture is designed to minimize the number of accesses to the image memory and it is based on parallel modules with internal pipeline operation in order to improve its performance. The architecture is prototyped in a FPGA, but it can be implemented on dedicated very- large-scale-integrated devices to reach higher clock frequencies. Complexity issues, FPGA resources utilization, FPGA limitations, and real-time performance are discussed. Some results are presented and discussed.
Kohno, R; Hotta, K; Nishioka, S; Matsubara, K; Tansho, R; Suzuki, T
2011-11-21
We implemented the simplified Monte Carlo (SMC) method on graphics processing unit (GPU) architecture under the computer-unified device architecture platform developed by NVIDIA. The GPU-based SMC was clinically applied for four patients with head and neck, lung, or prostate cancer. The results were compared to those obtained by a traditional CPU-based SMC with respect to the computation time and discrepancy. In the CPU- and GPU-based SMC calculations, the estimated mean statistical errors of the calculated doses in the planning target volume region were within 0.5% rms. The dose distributions calculated by the GPU- and CPU-based SMCs were similar, within statistical errors. The GPU-based SMC showed 12.30-16.00 times faster performance than the CPU-based SMC. The computation time per beam arrangement using the GPU-based SMC for the clinical cases ranged 9-67 s. The results demonstrate the successful application of the GPU-based SMC to a clinical proton treatment planning.
Design and Verification of Remote Sensing Image Data Center Storage Architecture Based on Hadoop
NASA Astrophysics Data System (ADS)
Tang, D.; Zhou, X.; Jing, Y.; Cong, W.; Li, C.
2018-04-01
The data center is a new concept of data processing and application proposed in recent years. It is a new method of processing technologies based on data, parallel computing, and compatibility with different hardware clusters. While optimizing the data storage management structure, it fully utilizes cluster resource computing nodes and improves the efficiency of data parallel application. This paper used mature Hadoop technology to build a large-scale distributed image management architecture for remote sensing imagery. Using MapReduce parallel processing technology, it called many computing nodes to process image storage blocks and pyramids in the background to improve the efficiency of image reading and application and sovled the need for concurrent multi-user high-speed access to remotely sensed data. It verified the rationality, reliability and superiority of the system design by testing the storage efficiency of different image data and multi-users and analyzing the distributed storage architecture to improve the application efficiency of remote sensing images through building an actual Hadoop service system.
2012-11-01
few sensors/complex computations, and many sensors/simple computation. II. CHALLENGES WITH NANO-ENABLED NEUROMORPHIC CHIPS A wide variety of...scenarios. Neuromorphic processors, which are based on the highly parallelized computing architecture of the mammalian brain, show great promise in...in the brain. This fundamentally different approach, frequently referred to as neuromorphic computing, is thought to be better able to solve fuzzy
Multi-level Hierarchical Poly Tree computer architectures
NASA Technical Reports Server (NTRS)
Padovan, Joe; Gute, Doug
1990-01-01
Based on the concept of hierarchical substructuring, this paper develops an optimal multi-level Hierarchical Poly Tree (HPT) parallel computer architecture scheme which is applicable to the solution of finite element and difference simulations. Emphasis is given to minimizing computational effort, in-core/out-of-core memory requirements, and the data transfer between processors. In addition, a simplified communications network that reduces the number of I/O channels between processors is presented. HPT configurations that yield optimal superlinearities are also demonstrated. Moreover, to generalize the scope of applicability, special attention is given to developing: (1) multi-level reduction trees which provide an orderly/optimal procedure by which model densification/simplification can be achieved, as well as (2) methodologies enabling processor grading that yields architectures with varying types of multi-level granularity.
Benchmarking high performance computing architectures with CMS’ skeleton framework
NASA Astrophysics Data System (ADS)
Sexton-Kennedy, E.; Gartung, P.; Jones, C. D.
2017-10-01
In 2012 CMS evaluated which underlying concurrency technology would be the best to use for its multi-threaded framework. The available technologies were evaluated on the high throughput computing systems dominating the resources in use at that time. A skeleton framework benchmarking suite that emulates the tasks performed within a CMSSW application was used to select Intel’s Thread Building Block library, based on the measured overheads in both memory and CPU on the different technologies benchmarked. In 2016 CMS will get access to high performance computing resources that use new many core architectures; machines such as Cori Phase 1&2, Theta, Mira. Because of this we have revived the 2012 benchmark to test it’s performance and conclusions on these new architectures. This talk will discuss the results of this exercise.
All-memristive neuromorphic computing with level-tuned neurons
NASA Astrophysics Data System (ADS)
Pantazi, Angeliki; Woźniak, Stanisław; Tuma, Tomas; Eleftheriou, Evangelos
2016-09-01
In the new era of cognitive computing, systems will be able to learn and interact with the environment in ways that will drastically enhance the capabilities of current processors, especially in extracting knowledge from vast amount of data obtained from many sources. Brain-inspired neuromorphic computing systems increasingly attract research interest as an alternative to the classical von Neumann processor architecture, mainly because of the coexistence of memory and processing units. In these systems, the basic components are neurons interconnected by synapses. The neurons, based on their nonlinear dynamics, generate spikes that provide the main communication mechanism. The computational tasks are distributed across the neural network, where synapses implement both the memory and the computational units, by means of learning mechanisms such as spike-timing-dependent plasticity. In this work, we present an all-memristive neuromorphic architecture comprising neurons and synapses realized by using the physical properties and state dynamics of phase-change memristors. The architecture employs a novel concept of interconnecting the neurons in the same layer, resulting in level-tuned neuronal characteristics that preferentially process input information. We demonstrate the proposed architecture in the tasks of unsupervised learning and detection of multiple temporal correlations in parallel input streams. The efficiency of the neuromorphic architecture along with the homogenous neuro-synaptic dynamics implemented with nanoscale phase-change memristors represent a significant step towards the development of ultrahigh-density neuromorphic co-processors.
All-memristive neuromorphic computing with level-tuned neurons.
Pantazi, Angeliki; Woźniak, Stanisław; Tuma, Tomas; Eleftheriou, Evangelos
2016-09-02
In the new era of cognitive computing, systems will be able to learn and interact with the environment in ways that will drastically enhance the capabilities of current processors, especially in extracting knowledge from vast amount of data obtained from many sources. Brain-inspired neuromorphic computing systems increasingly attract research interest as an alternative to the classical von Neumann processor architecture, mainly because of the coexistence of memory and processing units. In these systems, the basic components are neurons interconnected by synapses. The neurons, based on their nonlinear dynamics, generate spikes that provide the main communication mechanism. The computational tasks are distributed across the neural network, where synapses implement both the memory and the computational units, by means of learning mechanisms such as spike-timing-dependent plasticity. In this work, we present an all-memristive neuromorphic architecture comprising neurons and synapses realized by using the physical properties and state dynamics of phase-change memristors. The architecture employs a novel concept of interconnecting the neurons in the same layer, resulting in level-tuned neuronal characteristics that preferentially process input information. We demonstrate the proposed architecture in the tasks of unsupervised learning and detection of multiple temporal correlations in parallel input streams. The efficiency of the neuromorphic architecture along with the homogenous neuro-synaptic dynamics implemented with nanoscale phase-change memristors represent a significant step towards the development of ultrahigh-density neuromorphic co-processors.
A PC-Based Controller for Dextrous Arms
NASA Technical Reports Server (NTRS)
Fiorini, Paolo; Seraji, Homayoun; Long, Mark
1996-01-01
This paper describes the architecture and performance of a PC-based controller for 7-DOF dextrous manipulators. The computing platform is a 486-based personal computer equipped with a bus extender to access the robot Multibus controller, together with a single board computer as the graphical engine, and with a parallel I/O board to interface with a force-torque sensor mounted on the manipulator wrist.
Strategies for concurrent processing of complex algorithms in data driven architectures
NASA Technical Reports Server (NTRS)
Stoughton, John W.; Mielke, Roland R.
1987-01-01
The results of ongoing research directed at developing a graph theoretical model for describing data and control flow associated with the execution of large grained algorithms in a spatial distributed computer environment is presented. This model is identified by the acronym ATAMM (Algorithm/Architecture Mapping Model). The purpose of such a model is to provide a basis for establishing rules for relating an algorithm to its execution in a multiprocessor environment. Specifications derived from the model lead directly to the description of a data flow architecture which is a consequence of the inherent behavior of the data and control flow described by the model. The purpose of the ATAMM based architecture is to optimize computational concurrency in the multiprocessor environment and to provide an analytical basis for performance evaluation. The ATAMM model and architecture specifications are demonstrated on a prototype system for concept validation.
The role of architecture and ontology for interoperability.
Blobel, Bernd; González, Carolina; Oemig, Frank; Lopéz, Diego; Nykänen, Pirkko; Ruotsalainen, Pekka
2010-01-01
Turning from organization-centric to process-controlled or even to personalized approaches, advanced healthcare settings have to meet special interoperability challenges. eHealth and pHealth solutions must assure interoperability between actors cooperating to achieve common business objectives. Hereby, the interoperability chain also includes individually tailored technical systems, but also sensors and actuators. For enabling corresponding pervasive computing and even autonomic computing, individualized systems have to be based on an architecture framework covering many domains, scientifically managed by specialized disciplines using their specific ontologies in a formalized way. Therefore, interoperability has to advance from a communication protocol to an architecture-centric approach mastering ontology coordination challenges.
Nonvolatile Memory Materials for Neuromorphic Intelligent Machines.
Jeong, Doo Seok; Hwang, Cheol Seong
2018-04-18
Recent progress in deep learning extends the capability of artificial intelligence to various practical tasks, making the deep neural network (DNN) an extremely versatile hypothesis. While such DNN is virtually built on contemporary data centers of the von Neumann architecture, physical (in part) DNN of non-von Neumann architecture, also known as neuromorphic computing, can remarkably improve learning and inference efficiency. Particularly, resistance-based nonvolatile random access memory (NVRAM) highlights its handy and efficient application to the multiply-accumulate (MAC) operation in an analog manner. Here, an overview is given of the available types of resistance-based NVRAMs and their technological maturity from the material- and device-points of view. Examples within the strategy are subsequently addressed in comparison with their benchmarks (virtual DNN in deep learning). A spiking neural network (SNN) is another type of neural network that is more biologically plausible than the DNN. The successful incorporation of resistance-based NVRAM in SNN-based neuromorphic computing offers an efficient solution to the MAC operation and spike timing-based learning in nature. This strategy is exemplified from a material perspective. Intelligent machines are categorized according to their architecture and learning type. Also, the functionality and usefulness of NVRAM-based neuromorphic computing are addressed. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Bit storage and bit flip operations in an electromechanical oscillator.
Mahboob, I; Yamaguchi, H
2008-05-01
The Parametron was first proposed as a logic-processing system almost 50 years ago. In this approach the two stable phases of an excited harmonic oscillator provide the basis for logic operations. Computer architectures based on LC oscillators were developed for this approach, but high power consumption and difficulties with integration meant that the Parametron was rendered obsolete by the transistor. Here we propose an approach to mechanical logic based on nanoelectromechanical systems that is a variation on the Parametron architecture and, as a first step towards a possible nanomechanical computer, we demonstrate both bit storage and bit flip operations.
A Computational Architecture for Programmable Automation Research
NASA Astrophysics Data System (ADS)
Taylor, Russell H.; Korein, James U.; Maier, Georg E.; Durfee, Lawrence F.
1987-03-01
This short paper describes recent work at the IBM T. J. Watson Research Center directed at developing a highly flexible computational architecture for research on sensor-based programmable automation. The system described here has been designed with a focus on dynamic configurability, layered user inter-faces and incorporation of sensor-based real time operations into new commands. It is these features which distinguish it from earlier work. The system is cur-rently being implemented at IBM for research purposes and internal use and is an outgrowth of programmable automation research which has been ongoing since 1972 [e.g., 1, 2, 3, 4, 5, 6] .
Programming model for distributed intelligent systems
NASA Technical Reports Server (NTRS)
Sztipanovits, J.; Biegl, C.; Karsai, G.; Bogunovic, N.; Purves, B.; Williams, R.; Christiansen, T.
1988-01-01
A programming model and architecture which was developed for the design and implementation of complex, heterogeneous measurement and control systems is described. The Multigraph Architecture integrates artificial intelligence techniques with conventional software technologies, offers a unified framework for distributed and shared memory based parallel computational models and supports multiple programming paradigms. The system can be implemented on different hardware architectures and can be adapted to strongly different applications.
Sabne, Amit J.; Sakdhnagool, Putt; Lee, Seyong; ...
2015-07-13
Accelerator-based heterogeneous computing is gaining momentum in the high-performance computing arena. However, the increased complexity of heterogeneous architectures demands more generic, high-level programming models. OpenACC is one such attempt to tackle this problem. Although the abstraction provided by OpenACC offers productivity, it raises questions concerning both functional and performance portability. In this article, the authors propose HeteroIR, a high-level, architecture-independent intermediate representation, to map high-level programming models, such as OpenACC, to heterogeneous architectures. They present a compiler approach that translates OpenACC programs into HeteroIR and accelerator kernels to obtain OpenACC functional portability. They then evaluate the performance portability obtained bymore » OpenACC with their approach on 12 OpenACC programs on Nvidia CUDA, AMD GCN, and Intel Xeon Phi architectures. They study the effects of various compiler optimizations and OpenACC program settings on these architectures to provide insights into the achieved performance portability.« less
Performance study of a data flow architecture
NASA Technical Reports Server (NTRS)
Adams, George
1985-01-01
Teams of scientists studied data flow concepts, static data flow machine architecture, and the VAL language. Each team mapped its application onto the machine and coded it in VAL. The principal findings of the study were: (1) Five of the seven applications used the full power of the target machine. The galactic simulation and multigrid fluid flow teams found that a significantly smaller version of the machine (16 processing elements) would suffice. (2) A number of machine design parameters including processing element (PE) function unit numbers, array memory size and bandwidth, and routing network capability were found to be crucial for optimal machine performance. (3) The study participants readily acquired VAL programming skills. (4) Participants learned that application-based performance evaluation is a sound method of evaluating new computer architectures, even those that are not fully specified. During the course of the study, participants developed models for using computers to solve numerical problems and for evaluating new architectures. These models form the bases for future evaluation studies.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of manymore » computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.« less
Chen, Qingkui; Zhao, Deyu; Wang, Jingjuan
2017-01-01
This paper aims to develop a low-cost, high-performance and high-reliability computing system to process large-scale data using common data mining algorithms in the Internet of Things (IoT) computing environment. Considering the characteristics of IoT data processing, similar to mainstream high performance computing, we use a GPU (Graphics Processing Unit) cluster to achieve better IoT services. Firstly, we present an energy consumption calculation method (ECCM) based on WSNs. Then, using the CUDA (Compute Unified Device Architecture) Programming model, we propose a Two-level Parallel Optimization Model (TLPOM) which exploits reasonable resource planning and common compiler optimization techniques to obtain the best blocks and threads configuration considering the resource constraints of each node. The key to this part is dynamic coupling Thread-Level Parallelism (TLP) and Instruction-Level Parallelism (ILP) to improve the performance of the algorithms without additional energy consumption. Finally, combining the ECCM and the TLPOM, we use the Reliable GPU Cluster Architecture (RGCA) to obtain a high-reliability computing system considering the nodes’ diversity, algorithm characteristics, etc. The results show that the performance of the algorithms significantly increased by 34.1%, 33.96% and 24.07% for Fermi, Kepler and Maxwell on average with TLPOM and the RGCA ensures that our IoT computing system provides low-cost and high-reliability services. PMID:28777325
Fang, Yuling; Chen, Qingkui; Xiong, Neal N; Zhao, Deyu; Wang, Jingjuan
2017-08-04
This paper aims to develop a low-cost, high-performance and high-reliability computing system to process large-scale data using common data mining algorithms in the Internet of Things (IoT) computing environment. Considering the characteristics of IoT data processing, similar to mainstream high performance computing, we use a GPU (Graphics Processing Unit) cluster to achieve better IoT services. Firstly, we present an energy consumption calculation method (ECCM) based on WSNs. Then, using the CUDA (Compute Unified Device Architecture) Programming model, we propose a Two-level Parallel Optimization Model (TLPOM) which exploits reasonable resource planning and common compiler optimization techniques to obtain the best blocks and threads configuration considering the resource constraints of each node. The key to this part is dynamic coupling Thread-Level Parallelism (TLP) and Instruction-Level Parallelism (ILP) to improve the performance of the algorithms without additional energy consumption. Finally, combining the ECCM and the TLPOM, we use the Reliable GPU Cluster Architecture (RGCA) to obtain a high-reliability computing system considering the nodes' diversity, algorithm characteristics, etc. The results show that the performance of the algorithms significantly increased by 34.1%, 33.96% and 24.07% for Fermi, Kepler and Maxwell on average with TLPOM and the RGCA ensures that our IoT computing system provides low-cost and high-reliability services.
1983-12-30
AD-Ri46 57? ARCHITECTURE DESIGN AND SYSTEM; PERFORMANCE ASSESSMENT i/i AND DEVELOPMENT ME..(U) NAVAL SURFACE WEAPONS CENTER SILYER SPRING MD J...AD-A 146 577 NSIWC TR 83-324 ARCHITECTURE , DESIGN , AND SYSTEM; PERFORMANCE ASSESSMENT AND DEVELOPMENT METHODOLOGY...REPORT NUMBER 12. GOVT ACCESSION NO.3. RECIPIENT’S CATALOG NUMBER NSWC TR 83-324 10- 1 1 51’ 4. ?ITLE (and subtitle) ARCHITECTURE , DESIGN , AND SYSTEM; S
Cloud Computing in Support of Synchronized Disaster Response Operations
2010-09-01
scalable, Web application based on cloud computing technologies to facilitate communication between a broad range of public and private entities without...requiring them to compromise security or competitive advantage. The proposed design applies the unique benefits of cloud computing architectures such as
Cladé, Thierry; Snyder, Joshua C.
2010-01-01
Clinical trials which use imaging typically require data management and workflow integration across several parties. We identify opportunities for all parties involved to realize benefits with a modular interoperability model based on service-oriented architecture and grid computing principles. We discuss middleware products for implementation of this model, and propose caGrid as an ideal candidate due to its healthcare focus; free, open source license; and mature developer tools and support. PMID:20449775
A FPGA-based architecture for real-time image matching
NASA Astrophysics Data System (ADS)
Wang, Jianhui; Zhong, Sheng; Xu, Wenhui; Zhang, Weijun; Cao, Zhiguo
2013-10-01
Image matching is a fundamental task in computer vision. It is used to establish correspondence between two images taken at different viewpoint or different time from the same scene. However, its large computational complexity has been a challenge to most embedded systems. This paper proposes a single FPGA-based image matching system, which consists of SIFT feature detection, BRIEF descriptor extraction and BRIEF matching. It optimizes the FPGA architecture for the SIFT feature detection to reduce the FPGA resources utilization. Moreover, we implement BRIEF description and matching on FPGA also. The proposed system can implement image matching at 30fps (frame per second) for 1280x720 images. Its processing speed can meet the demand of most real-life computer vision applications.
Shared Memory Parallelization of an Implicit ADI-type CFD Code
NASA Technical Reports Server (NTRS)
Hauser, Th.; Huang, P. G.
1999-01-01
A parallelization study designed for ADI-type algorithms is presented using the OpenMP specification for shared-memory multiprocessor programming. Details of optimizations specifically addressed to cache-based computer architectures are described and performance measurements for the single and multiprocessor implementation are summarized. The paper demonstrates that optimization of memory access on a cache-based computer architecture controls the performance of the computational algorithm. A hybrid MPI/OpenMP approach is proposed for clusters of shared memory machines to further enhance the parallel performance. The method is applied to develop a new LES/DNS code, named LESTool. A preliminary DNS calculation of a fully developed channel flow at a Reynolds number of 180, Re(sub tau) = 180, has shown good agreement with existing data.
A compositional approach to building applications in a computational environment
NASA Astrophysics Data System (ADS)
Roslovtsev, V. V.; Shumsky, L. D.; Wolfengagen, V. E.
2014-04-01
The paper presents an approach to creating an applicative computational environment to feature computational processes and data decomposition, and a compositional approach to application building. The approach in question is based on the notion of combinator - both in systems with variable binding (such as λ-calculi) and those allowing programming without variables (combinatory logic style). We present a computation decomposition technique based on objects' structural decomposition, with the focus on computation decomposition. The computational environment's architecture is based on a network with nodes playing several roles simultaneously.
Parallel compression/decompression-based datapath architecture for multibeam mask writers
NASA Astrophysics Data System (ADS)
Chaudhary, Narendra; Savari, Serap A.
2017-06-01
Multibeam electron beam systems will be used in the future for mask writing and for complimentary lithography. The major challenges of the multibeam systems are in meeting throughput requirements and in handling the large data volumes associated with writing grayscale data on the wafer. In terms of future communications and computational requirements Amdahl's Law suggests that a simple increase of computation power and parallelism may not be a sustainable solution. We propose a parallel data compression algorithm to exploit the sparsity of mask data and a grayscale video-like representation of data. To improve the communication and computational efficiency of these systems at the write time we propose an alternate datapath architecture partly motivated by multibeam direct write lithography and partly motivated by the circuit testing literature, where parallel decompression reduces clock cycles. We explain a deflection plate architecture inspired by NuFlare Technology's multibeam mask writing system and how our datapath architecture can be easily added to it to improve performance.
Parallel compression/decompression-based datapath architecture for multibeam mask writers
NASA Astrophysics Data System (ADS)
Chaudhary, Narendra; Savari, Serap A.
2017-10-01
Multibeam electron beam systems will be used in the future for mask writing and for complementary lithography. The major challenges of the multibeam systems are in meeting throughput requirements and in handling the large data volumes associated with writing grayscale data on the wafer. In terms of future communications and computational requirements, Amdahl's law suggests that a simple increase of computation power and parallelism may not be a sustainable solution. We propose a parallel data compression algorithm to exploit the sparsity of mask data and a grayscale video-like representation of data. To improve the communication and computational efficiency of these systems at the write time, we propose an alternate datapath architecture partly motivated by multibeam direct-write lithography and partly motivated by the circuit testing literature, where parallel decompression reduces clock cycles. We explain a deflection plate architecture inspired by NuFlare Technology's multibeam mask writing system and how our datapath architecture can be easily added to it to improve performance.
Reconfigurable vision system for real-time applications
NASA Astrophysics Data System (ADS)
Torres-Huitzil, Cesar; Arias-Estrada, Miguel
2002-03-01
Recently, a growing community of researchers has used reconfigurable systems to solve computationally intensive problems. Reconfigurability provides optimized processors for systems on chip designs, and makes easy to import technology to a new system through reusable modules. The main objective of this work is the investigation of a reconfigurable computer system targeted for computer vision and real-time applications. The system is intended to circumvent the inherent computational load of most window-based computer vision algorithms. It aims to build a system for such tasks by providing an FPGA-based hardware architecture for task specific vision applications with enough processing power, using the minimum amount of hardware resources as possible, and a mechanism for building systems using this architecture. Regarding the software part of the system, a library of pre-designed and general-purpose modules that implement common window-based computer vision operations is being investigated. A common generic interface is established for these modules in order to define hardware/software components. These components can be interconnected to develop more complex applications, providing an efficient mechanism for transferring image and result data among modules. Some preliminary results are presented and discussed.
A Cloud-based Infrastructure and Architecture for Environmental System Research
NASA Astrophysics Data System (ADS)
Wang, D.; Wei, Y.; Shankar, M.; Quigley, J.; Wilson, B. E.
2016-12-01
The present availability of high-capacity networks, low-cost computers and storage devices, and the widespread adoption of hardware virtualization and service-oriented architecture provide a great opportunity to enable data and computing infrastructure sharing between closely related research activities. By taking advantage of these approaches, along with the world-class high computing and data infrastructure located at Oak Ridge National Laboratory, a cloud-based infrastructure and architecture has been developed to efficiently deliver essential data and informatics service and utilities to the environmental system research community, and will provide unique capabilities that allows terrestrial ecosystem research projects to share their software utilities (tools), data and even data submission workflow in a straightforward fashion. The infrastructure will minimize large disruptions from current project-based data submission workflows for better acceptances from existing projects, since many ecosystem research projects already have their own requirements or preferences for data submission and collection. The infrastructure will eliminate scalability problems with current project silos by provide unified data services and infrastructure. The Infrastructure consists of two key components (1) a collection of configurable virtual computing environments and user management systems that expedite data submission and collection from environmental system research community, and (2) scalable data management services and system, originated and development by ORNL data centers.
Developing a Distributed Computing Architecture at Arizona State University.
ERIC Educational Resources Information Center
Armann, Neil; And Others
1994-01-01
Development of Arizona State University's computing architecture, designed to ensure that all new distributed computing pieces will work together, is described. Aspects discussed include the business rationale, the general architectural approach, characteristics and objectives of the architecture, specific services, and impact on the university…
Detailed Primitive-Based 3d Modeling of Architectural Elements
NASA Astrophysics Data System (ADS)
Remondino, F.; Lo Buglio, D.; Nony, N.; De Luca, L.
2012-07-01
The article describes a pipeline, based on image-data, for the 3D reconstruction of building façades or architectural elements and the successive modeling using geometric primitives. The approach overcome some existing problems in modeling architectural elements and deliver efficient-in-size reality-based textured 3D models useful for metric applications. For the 3D reconstruction, an opensource pipeline developed within the TAPENADE project is employed. In the successive modeling steps, the user manually selects an area containing an architectural element (capital, column, bas-relief, window tympanum, etc.) and then the procedure fits geometric primitives and computes disparity and displacement maps in order to tie visual and geometric information together in a light but detailed 3D model. Examples are reported and commented.
F-Nets and Software Cabling: Deriving a Formal Model and Language for Portable Parallel Programming
NASA Technical Reports Server (NTRS)
DiNucci, David C.; Saini, Subhash (Technical Monitor)
1998-01-01
Parallel programming is still being based upon antiquated sequence-based definitions of the terms "algorithm" and "computation", resulting in programs which are architecture dependent and difficult to design and analyze. By focusing on obstacles inherent in existing practice, a more portable model is derived here, which is then formalized into a model called Soviets which utilizes a combination of imperative and functional styles. This formalization suggests more general notions of algorithm and computation, as well as insights into the meaning of structured programming in a parallel setting. To illustrate how these principles can be applied, a very-high-level graphical architecture-independent parallel language, called Software Cabling, is described, with many of the features normally expected from today's computer languages (e.g. data abstraction, data parallelism, and object-based programming constructs).
Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures
2017-10-04
Report: Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures The views, opinions and/or findings contained in this...Chapel Hill Title: Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures Report Term: 0-Other Email: dm...algorithms for scientific and geometric computing by exploiting the power and performance efficiency of heterogeneous shared memory architectures . These
Parallel algorithms for mapping pipelined and parallel computations
NASA Technical Reports Server (NTRS)
Nicol, David M.
1988-01-01
Many computational problems in image processing, signal processing, and scientific computing are naturally structured for either pipelined or parallel computation. When mapping such problems onto a parallel architecture it is often necessary to aggregate an obvious problem decomposition. Even in this context the general mapping problem is known to be computationally intractable, but recent advances have been made in identifying classes of problems and architectures for which optimal solutions can be found in polynomial time. Among these, the mapping of pipelined or parallel computations onto linear array, shared memory, and host-satellite systems figures prominently. This paper extends that work first by showing how to improve existing serial mapping algorithms. These improvements have significantly lower time and space complexities: in one case a published O(nm sup 3) time algorithm for mapping m modules onto n processors is reduced to an O(nm log m) time complexity, and its space requirements reduced from O(nm sup 2) to O(m). Run time complexity is further reduced with parallel mapping algorithms based on these improvements, which run on the architecture for which they create the mappings.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Yao; Balaprakash, Prasanna; Meng, Jiayuan
We present Raexplore, a performance modeling framework for architecture exploration. Raexplore enables rapid, automated, and systematic search of architecture design space by combining hardware counter-based performance characterization and analytical performance modeling. We demonstrate Raexplore for two recent manycore processors IBM Blue- Gene/Q compute chip and Intel Xeon Phi, targeting a set of scientific applications. Our framework is able to capture complex interactions between architectural components including instruction pipeline, cache, and memory, and to achieve a 3–22% error for same-architecture and cross-architecture performance predictions. Furthermore, we apply our framework to assess the two processors, and discover and evaluate a list ofmore » architectural scaling options for future processor designs.« less
NASA Astrophysics Data System (ADS)
Megherbi, Dalila B.; Yan, Yin; Tanmay, Parikh; Khoury, Jed; Woods, C. L.
2004-11-01
Recently surveillance and Automatic Target Recognition (ATR) applications are increasing as the cost of computing power needed to process the massive amount of information continues to fall. This computing power has been made possible partly by the latest advances in FPGAs and SOPCs. In particular, to design and implement state-of-the-Art electro-optical imaging systems to provide advanced surveillance capabilities, there is a need to integrate several technologies (e.g. telescope, precise optics, cameras, image/compute vision algorithms, which can be geographically distributed or sharing distributed resources) into a programmable system and DSP systems. Additionally, pattern recognition techniques and fast information retrieval, are often important components of intelligent systems. The aim of this work is using embedded FPGA as a fast, configurable and synthesizable search engine in fast image pattern recognition/retrieval in a distributed hardware/software co-design environment. In particular, we propose and show a low cost Content Addressable Memory (CAM)-based distributed embedded FPGA hardware architecture solution with real time recognition capabilities and computing for pattern look-up, pattern recognition, and image retrieval. We show how the distributed CAM-based architecture offers a performance advantage of an order-of-magnitude over RAM-based architecture (Random Access Memory) search for implementing high speed pattern recognition for image retrieval. The methods of designing, implementing, and analyzing the proposed CAM based embedded architecture are described here. Other SOPC solutions/design issues are covered. Finally, experimental results, hardware verification, and performance evaluations using both the Xilinx Virtex-II and the Altera Apex20k are provided to show the potential and power of the proposed method for low cost reconfigurable fast image pattern recognition/retrieval at the hardware/software co-design level.
Prospective Architectures for Onboard vs Cloud-Based Decision Making for Unmanned Aerial Systems
NASA Technical Reports Server (NTRS)
Sankararaman, Shankar; Teubert, Christopher
2017-01-01
This paper investigates propsective architectures for decision-making in unmanned aerial systems. When these unmanned vehicles operate in urban environments, there are several sources of uncertainty that affect their behavior, and decision-making algorithms need to be robust to account for these different sources of uncertainty. It is important to account for several risk-factors that affect the flight of these unmanned systems, and facilitate decision-making by taking into consideration these various risk-factors. In addition, there are several technical challenges related to autonomous flight of unmanned aerial systems; these challenges include sensing, obstacle detection, path planning and navigation, trajectory generation and selection, etc. Many of these activities require significant computational power and in many situations, all of these activities need to be performed in real-time. In order to efficiently integrate these activities, it is important to develop a systematic architecture that can facilitate real-time decision-making. Four prospective architectures are discussed in this paper; on one end of the spectrum, the first architecture considers all activities/computations being performed onboard the vehicle whereas on the other end of the spectrum, the fourth and final architecture considers all activities/computations being performed in the cloud, using a new service known as Prognostics as a Service that is being developed at NASA Ames Research Center. The four different architectures are compared, their advantages and disadvantages are explained and conclusions are presented.
Frances: A Tool for Understanding Computer Architecture and Assembly Language
ERIC Educational Resources Information Center
Sondag, Tyler; Pokorny, Kian L.; Rajan, Hridesh
2012-01-01
Students in all areas of computing require knowledge of the computing device including software implementation at the machine level. Several courses in computer science curricula address these low-level details such as computer architecture and assembly languages. For such courses, there are advantages to studying real architectures instead of…
Noise tolerant spatiotemporal chaos computing.
Kia, Behnam; Kia, Sarvenaz; Lindner, John F; Sinha, Sudeshna; Ditto, William L
2014-12-01
We introduce and design a noise tolerant chaos computing system based on a coupled map lattice (CML) and the noise reduction capabilities inherent in coupled dynamical systems. The resulting spatiotemporal chaos computing system is more robust to noise than a single map chaos computing system. In this CML based approach to computing, under the coupled dynamics, the local noise from different nodes of the lattice diffuses across the lattice, and it attenuates each other's effects, resulting in a system with less noise content and a more robust chaos computing architecture.
Noise tolerant spatiotemporal chaos computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kia, Behnam; Kia, Sarvenaz; Ditto, William L.
We introduce and design a noise tolerant chaos computing system based on a coupled map lattice (CML) and the noise reduction capabilities inherent in coupled dynamical systems. The resulting spatiotemporal chaos computing system is more robust to noise than a single map chaos computing system. In this CML based approach to computing, under the coupled dynamics, the local noise from different nodes of the lattice diffuses across the lattice, and it attenuates each other's effects, resulting in a system with less noise content and a more robust chaos computing architecture.
An Object-Oriented Network-Centric Software Architecture for Physical Computing
NASA Astrophysics Data System (ADS)
Palmer, Richard
1997-08-01
Recent developments in object-oriented computer languages and infrastructure such as the Internet, Web browsers, and the like provide an opportunity to define a more productive computational environment for scientific programming that is based more closely on the underlying mathematics describing physics than traditional programming languages such as FORTRAN or C++. In this talk I describe an object-oriented software architecture for representing physical problems that includes classes for such common mathematical objects as geometry, boundary conditions, partial differential and integral equations, discretization and numerical solution methods, etc. In practice, a scientific program written using this architecture looks remarkably like the mathematics used to understand the problem, is typically an order of magnitude smaller than traditional FORTRAN or C++ codes, and hence easier to understand, debug, describe, etc. All objects in this architecture are ``network-enabled,'' which means that components of a software solution to a physical problem can be transparently loaded from anywhere on the Internet or other global network. The architecture is expressed as an ``API,'' or application programmers interface specification, with reference embeddings in Java, Python, and C++. A C++ class library for an early version of this API has been implemented for machines ranging from PC's to the IBM SP2, meaning that phidentical codes run on all architectures.
Tutorial: Computer architecture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gajski, D.D.; Milutinovic, V.M.; Siegel, H.J.
1986-01-01
This book presents the state-of-the-art in advanced computer architecture. It deals with the concepts underlying current architectures and covers approaches and techniques being used in the design of advanced computer systems.
Benchmarking high performance computing architectures with CMS’ skeleton framework
Sexton-Kennedy, E.; Gartung, P.; Jones, C. D.
2017-11-23
Here, in 2012 CMS evaluated which underlying concurrency technology would be the best to use for its multi-threaded framework. The available technologies were evaluated on the high throughput computing systems dominating the resources in use at that time. A skeleton framework benchmarking suite that emulates the tasks performed within a CMSSW application was used to select Intel’s Thread Building Block library, based on the measured overheads in both memory and CPU on the different technologies benchmarked. In 2016 CMS will get access to high performance computing resources that use new many core architectures; machines such as Cori Phase 1&2, Theta,more » Mira. Because of this we have revived the 2012 benchmark to test it’s performance and conclusions on these new architectures. This talk will discuss the results of this exercise.« less
Benchmarking high performance computing architectures with CMS’ skeleton framework
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sexton-Kennedy, E.; Gartung, P.; Jones, C. D.
Here, in 2012 CMS evaluated which underlying concurrency technology would be the best to use for its multi-threaded framework. The available technologies were evaluated on the high throughput computing systems dominating the resources in use at that time. A skeleton framework benchmarking suite that emulates the tasks performed within a CMSSW application was used to select Intel’s Thread Building Block library, based on the measured overheads in both memory and CPU on the different technologies benchmarked. In 2016 CMS will get access to high performance computing resources that use new many core architectures; machines such as Cori Phase 1&2, Theta,more » Mira. Because of this we have revived the 2012 benchmark to test it’s performance and conclusions on these new architectures. This talk will discuss the results of this exercise.« less
1986-12-31
synthesize synchronization skeletons"Science of Computer Programming 2, 1982, pp. 241-266 [Gel85] Gelernter, David, "Generative communication in...effective computation based on given primitives . An architecture is an abstract object-type, whose instances are computing systems. By a parallel computing...explaining the language primitives on this basis. We explain how such a basis can be "simpler" than a general-purpose manual-programming language such as
Outline of a novel architecture for cortical computation.
Majumdar, Kaushik
2008-03-01
In this paper a novel architecture for cortical computation has been proposed. This architecture is composed of computing paths consisting of neurons and synapses. These paths have been decomposed into lateral, longitudinal and vertical components. Cortical computation has then been decomposed into lateral computation (LaC), longitudinal computation (LoC) and vertical computation (VeC). It has been shown that various loop structures in the cortical circuit play important roles in cortical computation as well as in memory storage and retrieval, keeping in conformity with the molecular basis of short and long term memory. A new learning scheme for the brain has also been proposed and how it is implemented within the proposed architecture has been explained. A few mathematical results about the architecture have been proposed, some of which are without proof.
Design and Development of a Run-Time Monitor for Multi-Core Architectures in Cloud Computing
Kang, Mikyung; Kang, Dong-In; Crago, Stephen P.; Park, Gyung-Leen; Lee, Junghoon
2011-01-01
Cloud computing is a new information technology trend that moves computing and data away from desktops and portable PCs into large data centers. The basic principle of cloud computing is to deliver applications as services over the Internet as well as infrastructure. A cloud is a type of parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources. The large-scale distributed applications on a cloud require adaptive service-based software, which has the capability of monitoring system status changes, analyzing the monitored information, and adapting its service configuration while considering tradeoffs among multiple QoS features simultaneously. In this paper, we design and develop a Run-Time Monitor (RTM) which is a system software to monitor the application behavior at run-time, analyze the collected information, and optimize cloud computing resources for multi-core architectures. RTM monitors application software through library instrumentation as well as underlying hardware through a performance counter optimizing its computing configuration based on the analyzed data. PMID:22163811
Design and development of a run-time monitor for multi-core architectures in cloud computing.
Kang, Mikyung; Kang, Dong-In; Crago, Stephen P; Park, Gyung-Leen; Lee, Junghoon
2011-01-01
Cloud computing is a new information technology trend that moves computing and data away from desktops and portable PCs into large data centers. The basic principle of cloud computing is to deliver applications as services over the Internet as well as infrastructure. A cloud is a type of parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources. The large-scale distributed applications on a cloud require adaptive service-based software, which has the capability of monitoring system status changes, analyzing the monitored information, and adapting its service configuration while considering tradeoffs among multiple QoS features simultaneously. In this paper, we design and develop a Run-Time Monitor (RTM) which is a system software to monitor the application behavior at run-time, analyze the collected information, and optimize cloud computing resources for multi-core architectures. RTM monitors application software through library instrumentation as well as underlying hardware through a performance counter optimizing its computing configuration based on the analyzed data.
CERN's Common Unix and X Terminal Environment
NASA Astrophysics Data System (ADS)
Cass, Tony
The Desktop Infrastructure Group of CERN's Computing and Networks Division has developed a Common Unix and X Terminal Environment to ease the migration to Unix based Interactive Computing. The CUTE architecture relies on a distributed filesystem—currently Trans arc's AFS—to enable essentially interchangeable client work-stations to access both "home directory" and program files transparently. Additionally, we provide a suite of programs to configure workstations for CUTE and to ensure continued compatibility. This paper describes the different components and the development of the CUTE architecture.
Integrating Software Modules For Robot Control
NASA Technical Reports Server (NTRS)
Volpe, Richard A.; Khosla, Pradeep; Stewart, David B.
1993-01-01
Reconfigurable, sensor-based control system uses state variables in systematic integration of reusable control modules. Designed for open-architecture hardware including many general-purpose microprocessors, each having own local memory plus access to global shared memory. Implemented in software as extension of Chimera II real-time operating system. Provides transparent computing mechanism for intertask communication between control modules and generic process-module architecture for multiprocessor realtime computation. Used to control robot arm. Proves useful in variety of other control and robotic applications.
2008-03-01
computational version of the CASIE architecture serves to demonstrate the functionality of our primary theories. However, implementation of several other...following facts. First, based on Theorem 3 and Theorem 5, the objective function is non -increasing under updating rule (6); second, by the criteria for...reassignment in updating rule (7), it is trivial to show that the objective function is non -increasing under updating rule (7). A Unified View to Graph
Architecture Adaptive Computing Environment
NASA Technical Reports Server (NTRS)
Dorband, John E.
2006-01-01
Architecture Adaptive Computing Environment (aCe) is a software system that includes a language, compiler, and run-time library for parallel computing. aCe was developed to enable programmers to write programs, more easily than was previously possible, for a variety of parallel computing architectures. Heretofore, it has been perceived to be difficult to write parallel programs for parallel computers and more difficult to port the programs to different parallel computing architectures. In contrast, aCe is supportable on all high-performance computing architectures. Currently, it is supported on LINUX clusters. aCe uses parallel programming constructs that facilitate writing of parallel programs. Such constructs were used in single-instruction/multiple-data (SIMD) programming languages of the 1980s, including Parallel Pascal, Parallel Forth, C*, *LISP, and MasPar MPL. In aCe, these constructs are extended and implemented for both SIMD and multiple- instruction/multiple-data (MIMD) architectures. Two new constructs incorporated in aCe are those of (1) scalar and virtual variables and (2) pre-computed paths. The scalar-and-virtual-variables construct increases flexibility in optimizing memory utilization in various architectures. The pre-computed-paths construct enables the compiler to pre-compute part of a communication operation once, rather than computing it every time the communication operation is performed.
Web Image Retrieval Using Self-Organizing Feature Map.
ERIC Educational Resources Information Center
Wu, Qishi; Iyengar, S. Sitharama; Zhu, Mengxia
2001-01-01
Provides an overview of current image retrieval systems. Describes the architecture of the SOFM (Self Organizing Feature Maps) based image retrieval system, discussing the system architecture and features. Introduces the Kohonen model, and describes the implementation details of SOFM computation and its learning algorithm. Presents a test example…
Open architecture CMM motion controller
NASA Astrophysics Data System (ADS)
Chang, David; Spence, Allan D.; Bigg, Steve; Heslip, Joe; Peterson, John
2001-12-01
Although initially the only Coordinate Measuring Machine (CMM) sensor available was a touch trigger probe, technological advances in sensors and computing have greatly increased the variety of available inspection sensors. Non-contact laser digitizers and analog scanning touch probes require very well tuned CMM motion control, as well as an extensible, open architecture interface. This paper describes the implementation of a retrofit CMM motion controller designed for open architecture interface to a variety of sensors. The controller is based on an Intel Pentium microcomputer and a Servo To Go motion interface electronics card. Motor amplifiers, safety, and additional interface electronics are housed in a separate enclosure. Host Signal Processing (HSP) is used for the motion control algorithm. Compared to the usual host plus DSP architecture, single CPU HSP simplifies integration with the various sensors, and implementation of software geometric error compensation. Motion control tuning is accomplished using a remote computer via 100BaseTX Ethernet. A Graphical User Interface (GUI) is used to enter geometric error compensation data, and to optimize the motion control tuning parameters. It is shown that this architecture achieves the required real time motion control response, yet is much easier to extend to additional sensors.
Remote hardware-reconfigurable robotic camera
NASA Astrophysics Data System (ADS)
Arias-Estrada, Miguel; Torres-Huitzil, Cesar; Maya-Rueda, Selene E.
2001-10-01
In this work, a camera with integrated image processing capabilities is discussed. The camera is based on an imager coupled to an FPGA device (Field Programmable Gate Array) which contains an architecture for real-time computer vision low-level processing. The architecture can be reprogrammed remotely for application specific purposes. The system is intended for rapid modification and adaptation for inspection and recognition applications, with the flexibility of hardware and software reprogrammability. FPGA reconfiguration allows the same ease of upgrade in hardware as a software upgrade process. The camera is composed of a digital imager coupled to an FPGA device, two memory banks, and a microcontroller. The microcontroller is used for communication tasks and FPGA programming. The system implements a software architecture to handle multiple FPGA architectures in the device, and the possibility to download a software/hardware object from the host computer into its internal context memory. System advantages are: small size, low power consumption, and a library of hardware/software functionalities that can be exchanged during run time. The system has been validated with an edge detection and a motion processing architecture, which will be presented in the paper. Applications targeted are in robotics, mobile robotics, and vision based quality control.
Parallel Signal Processing and System Simulation using aCe
NASA Technical Reports Server (NTRS)
Dorband, John E.; Aburdene, Maurice F.
2003-01-01
Recently, networked and cluster computation have become very popular for both signal processing and system simulation. A new language is ideally suited for parallel signal processing applications and system simulation since it allows the programmer to explicitly express the computations that can be performed concurrently. In addition, the new C based parallel language (ace C) for architecture-adaptive programming allows programmers to implement algorithms and system simulation applications on parallel architectures by providing them with the assurance that future parallel architectures will be able to run their applications with a minimum of modification. In this paper, we will focus on some fundamental features of ace C and present a signal processing application (FFT).
A highly efficient 3D level-set grain growth algorithm tailored for ccNUMA architecture
NASA Astrophysics Data System (ADS)
Mießen, C.; Velinov, N.; Gottstein, G.; Barrales-Mora, L. A.
2017-12-01
A highly efficient simulation model for 2D and 3D grain growth was developed based on the level-set method. The model introduces modern computational concepts to achieve excellent performance on parallel computer architectures. Strong scalability was measured on cache-coherent non-uniform memory access (ccNUMA) architectures. To achieve this, the proposed approach considers the application of local level-set functions at the grain level. Ideal and non-ideal grain growth was simulated in 3D with the objective to study the evolution of statistical representative volume elements in polycrystals. In addition, microstructure evolution in an anisotropic magnetic material affected by an external magnetic field was simulated.
A cognitive computational model inspired by the immune system response.
Abdo Abd Al-Hady, Mohamed; Badr, Amr Ahmed; Mostafa, Mostafa Abd Al-Azim
2014-01-01
The immune system has a cognitive ability to differentiate between healthy and unhealthy cells. The immune system response (ISR) is stimulated by a disorder in the temporary fuzzy state that is oscillating between the healthy and unhealthy states. However, modeling the immune system is an enormous challenge; the paper introduces an extensive summary of how the immune system response functions, as an overview of a complex topic, to present the immune system as a cognitive intelligent agent. The homogeneity and perfection of the natural immune system have been always standing out as the sought-after model we attempted to imitate while building our proposed model of cognitive architecture. The paper divides the ISR into four logical phases: setting a computational architectural diagram for each phase, proceeding from functional perspectives (input, process, and output), and their consequences. The proposed architecture components are defined by matching biological operations with computational functions and hence with the framework of the paper. On the other hand, the architecture focuses on the interoperability of main theoretical immunological perspectives (classic, cognitive, and danger theory), as related to computer science terminologies. The paper presents a descriptive model of immune system, to figure out the nature of response, deemed to be intrinsic for building a hybrid computational model based on a cognitive intelligent agent perspective and inspired by the natural biology. To that end, this paper highlights the ISR phases as applied to a case study on hepatitis C virus, meanwhile illustrating our proposed architecture perspective.
A Cognitive Computational Model Inspired by the Immune System Response
Abdo Abd Al-Hady, Mohamed; Badr, Amr Ahmed; Mostafa, Mostafa Abd Al-Azim
2014-01-01
The immune system has a cognitive ability to differentiate between healthy and unhealthy cells. The immune system response (ISR) is stimulated by a disorder in the temporary fuzzy state that is oscillating between the healthy and unhealthy states. However, modeling the immune system is an enormous challenge; the paper introduces an extensive summary of how the immune system response functions, as an overview of a complex topic, to present the immune system as a cognitive intelligent agent. The homogeneity and perfection of the natural immune system have been always standing out as the sought-after model we attempted to imitate while building our proposed model of cognitive architecture. The paper divides the ISR into four logical phases: setting a computational architectural diagram for each phase, proceeding from functional perspectives (input, process, and output), and their consequences. The proposed architecture components are defined by matching biological operations with computational functions and hence with the framework of the paper. On the other hand, the architecture focuses on the interoperability of main theoretical immunological perspectives (classic, cognitive, and danger theory), as related to computer science terminologies. The paper presents a descriptive model of immune system, to figure out the nature of response, deemed to be intrinsic for building a hybrid computational model based on a cognitive intelligent agent perspective and inspired by the natural biology. To that end, this paper highlights the ISR phases as applied to a case study on hepatitis C virus, meanwhile illustrating our proposed architecture perspective. PMID:25003131
DualTrust: A Trust Management Model for Swarm-Based Autonomic Computing Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Maiden, Wendy M.
Trust management techniques must be adapted to the unique needs of the application architectures and problem domains to which they are applied. For autonomic computing systems that utilize mobile agents and ant colony algorithms for their sensor layer, certain characteristics of the mobile agent ant swarm -- their lightweight, ephemeral nature and indirect communication -- make this adaptation especially challenging. This thesis looks at the trust issues and opportunities in swarm-based autonomic computing systems and finds that by monitoring the trustworthiness of the autonomic managers rather than the swarming sensors, the trust management problem becomes much more scalable and stillmore » serves to protect the swarm. After analyzing the applicability of trust management research as it has been applied to architectures with similar characteristics, this thesis specifies the required characteristics for trust management mechanisms used to monitor the trustworthiness of entities in a swarm-based autonomic computing system and describes a trust model that meets these requirements.« less
Optimal nonlinear information processing capacity in delay-based reservoir computers
NASA Astrophysics Data System (ADS)
Grigoryeva, Lyudmila; Henriques, Julie; Larger, Laurent; Ortega, Juan-Pablo
2015-09-01
Reservoir computing is a recently introduced brain-inspired machine learning paradigm capable of excellent performances in the processing of empirical data. We focus in a particular kind of time-delay based reservoir computers that have been physically implemented using optical and electronic systems and have shown unprecedented data processing rates. Reservoir computing is well-known for the ease of the associated training scheme but also for the problematic sensitivity of its performance to architecture parameters. This article addresses the reservoir design problem, which remains the biggest challenge in the applicability of this information processing scheme. More specifically, we use the information available regarding the optimal reservoir working regimes to construct a functional link between the reservoir parameters and its performance. This function is used to explore various properties of the device and to choose the optimal reservoir architecture, thus replacing the tedious and time consuming parameter scannings used so far in the literature.
Optimal nonlinear information processing capacity in delay-based reservoir computers.
Grigoryeva, Lyudmila; Henriques, Julie; Larger, Laurent; Ortega, Juan-Pablo
2015-09-11
Reservoir computing is a recently introduced brain-inspired machine learning paradigm capable of excellent performances in the processing of empirical data. We focus in a particular kind of time-delay based reservoir computers that have been physically implemented using optical and electronic systems and have shown unprecedented data processing rates. Reservoir computing is well-known for the ease of the associated training scheme but also for the problematic sensitivity of its performance to architecture parameters. This article addresses the reservoir design problem, which remains the biggest challenge in the applicability of this information processing scheme. More specifically, we use the information available regarding the optimal reservoir working regimes to construct a functional link between the reservoir parameters and its performance. This function is used to explore various properties of the device and to choose the optimal reservoir architecture, thus replacing the tedious and time consuming parameter scannings used so far in the literature.
Optimal nonlinear information processing capacity in delay-based reservoir computers
Grigoryeva, Lyudmila; Henriques, Julie; Larger, Laurent; Ortega, Juan-Pablo
2015-01-01
Reservoir computing is a recently introduced brain-inspired machine learning paradigm capable of excellent performances in the processing of empirical data. We focus in a particular kind of time-delay based reservoir computers that have been physically implemented using optical and electronic systems and have shown unprecedented data processing rates. Reservoir computing is well-known for the ease of the associated training scheme but also for the problematic sensitivity of its performance to architecture parameters. This article addresses the reservoir design problem, which remains the biggest challenge in the applicability of this information processing scheme. More specifically, we use the information available regarding the optimal reservoir working regimes to construct a functional link between the reservoir parameters and its performance. This function is used to explore various properties of the device and to choose the optimal reservoir architecture, thus replacing the tedious and time consuming parameter scannings used so far in the literature. PMID:26358528
NASA Astrophysics Data System (ADS)
Elkurdi, Yousef; Fernández, David; Souleimanov, Evgueni; Giannacopoulos, Dennis; Gross, Warren J.
2008-04-01
The Finite Element Method (FEM) is a computationally intensive scientific and engineering analysis tool that has diverse applications ranging from structural engineering to electromagnetic simulation. The trends in floating-point performance are moving in favor of Field-Programmable Gate Arrays (FPGAs), hence increasing interest has grown in the scientific community to exploit this technology. We present an architecture and implementation of an FPGA-based sparse matrix-vector multiplier (SMVM) for use in the iterative solution of large, sparse systems of equations arising from FEM applications. FEM matrices display specific sparsity patterns that can be exploited to improve the efficiency of hardware designs. Our architecture exploits FEM matrix sparsity structure to achieve a balance between performance and hardware resource requirements by relying on external SDRAM for data storage while utilizing the FPGAs computational resources in a stream-through systolic approach. The architecture is based on a pipelined linear array of processing elements (PEs) coupled with a hardware-oriented matrix striping algorithm and a partitioning scheme which enables it to process arbitrarily big matrices without changing the number of PEs in the architecture. Therefore, this architecture is only limited by the amount of external RAM available to the FPGA. The implemented SMVM-pipeline prototype contains 8 PEs and is clocked at 110 MHz obtaining a peak performance of 1.76 GFLOPS. For 8 GB/s of memory bandwidth typical of recent FPGA systems, this architecture can achieve 1.5 GFLOPS sustained performance. Using multiple instances of the pipeline, linear scaling of the peak and sustained performance can be achieved. Our stream-through architecture provides the added advantage of enabling an iterative implementation of the SMVM computation required by iterative solution techniques such as the conjugate gradient method, avoiding initialization time due to data loading and setup inside the FPGA internal memory.
Wolinski, Christophe Czeslaw [Los Alamos, NM; Gokhale, Maya B [Los Alamos, NM; McCabe, Kevin Peter [Los Alamos, NM
2011-01-18
Fabric-based computing systems and methods are disclosed. A fabric-based computing system can include a polymorphous computing fabric that can be customized on a per application basis and a host processor in communication with said polymorphous computing fabric. The polymorphous computing fabric includes a cellular architecture that can be highly parameterized to enable a customized synthesis of fabric instances for a variety of enhanced application performances thereof. A global memory concept can also be included that provides the host processor random access to all variables and instructions associated with the polymorphous computing fabric.
Cloud Computing for Mission Design and Operations
NASA Technical Reports Server (NTRS)
Arrieta, Juan; Attiyah, Amy; Beswick, Robert; Gerasimantos, Dimitrios
2012-01-01
The space mission design and operations community already recognizes the value of cloud computing and virtualization. However, natural and valid concerns, like security, privacy, up-time, and vendor lock-in, have prevented a more widespread and expedited adoption into official workflows. In the interest of alleviating these concerns, we propose a series of guidelines for internally deploying a resource-oriented hub of data and algorithms. These guidelines provide a roadmap for implementing an architecture inspired in the cloud computing model: associative, elastic, semantical, interconnected, and adaptive. The architecture can be summarized as exposing data and algorithms as resource-oriented Web services, coordinated via messaging, and running on virtual machines; it is simple, and based on widely adopted standards, protocols, and tools. The architecture may help reduce common sources of complexity intrinsic to data-driven, collaborative interactions and, most importantly, it may provide the means for teams and agencies to evaluate the cloud computing model in their specific context, with minimal infrastructure changes, and before committing to a specific cloud services provider.
Space Debris Detection on the HPDP, a Coarse-Grained Reconfigurable Array Architecture for Space
NASA Astrophysics Data System (ADS)
Suarez, Diego Andres; Bretz, Daniel; Helfers, Tim; Weidendorfer, Josef; Utzmann, Jens
2016-08-01
Stream processing, widely used in communications and digital signal processing applications, requires high- throughput data processing that is achieved in most cases using Application-Specific Integrated Circuit (ASIC) designs. Lack of programmability is an issue especially in space applications, which use on-board components with long life-cycles requiring applications updates. To this end, the High Performance Data Processor (HPDP) architecture integrates an array of coarse-grained reconfigurable elements to provide both flexible and efficient computational power suitable for stream-based data processing applications in space. In this work the capabilities of the HPDP architecture are demonstrated with the implementation of a real-time image processing algorithm for space debris detection in a space-based space surveillance system. The implementation challenges and alternatives are described making trade-offs to improve performance at the expense of negligible degradation of detection accuracy. The proposed implementation uses over 99% of the available computational resources. Performance estimations based on simulations show that the HPDP can amply match the application requirements.
ERIC Educational Resources Information Center
Stifle, Jack
The PLATO IV computer-based instructional system consists of a large scale centrally located CDC 6400 computer and a large number of remote student terminals. This is a brief and general description of the proposed input/output hardware necessary to interface the student terminals with the computer's central processing unit (CPU) using available…
NASA Astrophysics Data System (ADS)
Hegde, Ganapathi; Vaya, Pukhraj
2013-10-01
This article presents a parallel architecture for 3-D discrete wavelet transform (3-DDWT). The proposed design is based on the 1-D pipelined lifting scheme. The architecture is fully scalable beyond the present coherent Daubechies filter bank (9, 7). This 3-DDWT architecture has advantages such as no group of pictures restriction and reduced memory referencing. It offers low power consumption, low latency and high throughput. The computing technique is based on the concept that lifting scheme minimises the storage requirement. The application specific integrated circuit implementation of the proposed architecture is done by synthesising it using 65 nm Taiwan Semiconductor Manufacturing Company standard cell library. It offers a speed of 486 MHz with a power consumption of 2.56 mW. This architecture is suitable for real-time video compression even with large frame dimensions.
Associative architecture for image processing
NASA Astrophysics Data System (ADS)
Adar, Rutie; Akerib, Avidan
1997-09-01
This article presents a new generation in parallel processing architecture for real-time image processing. The approach is implemented in a real time image processor chip, called the XiumTM-2, based on combining a fully associative array which provides the parallel engine with a serial RISC core on the same die. The architecture is fully programmable and can be programmed to implement a wide range of color image processing, computer vision and media processing functions in real time. The associative part of the chip is based on patented pending methodology of Associative Computing Ltd. (ACL), which condenses 2048 associative processors, each of 128 'intelligent' bits. Each bit can be a processing bit or a memory bit. At only 33 MHz and 0.6 micron manufacturing technology process, the chip has a computational power of 3 billion ALU operations per second and 66 billion string search operations per second. The fully programmable nature of the XiumTM-2 chip enables developers to use ACL tools to write their own proprietary algorithms combined with existing image processing and analysis functions from ACL's extended set of libraries.
NASA Astrophysics Data System (ADS)
Benini, Luca
2017-06-01
The "internet of everything" envisions trillions of connected objects loaded with high-bandwidth sensors requiring massive amounts of local signal processing, fusion, pattern extraction and classification. From the computational viewpoint, the challenge is formidable and can be addressed only by pushing computing fabrics toward massive parallelism and brain-like energy efficiency levels. CMOS technology can still take us a long way toward this goal, but technology scaling is losing steam. Energy efficiency improvement will increasingly hinge on architecture, circuits, design techniques such as heterogeneous 3D integration, mixed-signal preprocessing, event-based approximate computing and non-Von-Neumann architectures for scalable acceleration.
Supporting Undergraduate Computer Architecture Students Using a Visual MIPS64 CPU Simulator
ERIC Educational Resources Information Center
Patti, D.; Spadaccini, A.; Palesi, M.; Fazzino, F.; Catania, V.
2012-01-01
The topics of computer architecture are always taught using an Assembly dialect as an example. The most commonly used textbooks in this field use the MIPS64 Instruction Set Architecture (ISA) to help students in learning the fundamentals of computer architecture because of its orthogonality and its suitability for real-world applications. This…
An MPI-based MoSST core dynamics model
NASA Astrophysics Data System (ADS)
Jiang, Weiyuan; Kuang, Weijia
2008-09-01
Distributed systems are among the main cost-effective and expandable platforms for high-end scientific computing. Therefore scalable numerical models are important for effective use of such systems. In this paper, we present an MPI-based numerical core dynamics model for simulation of geodynamo and planetary dynamos, and for simulation of core-mantle interactions. The model is developed based on MPI libraries. Two algorithms are used for node-node communication: a "master-slave" architecture and a "divide-and-conquer" architecture. The former is easy to implement but not scalable in communication. The latter is scalable in both computation and communication. The model scalability is tested on Linux PC clusters with up to 128 nodes. This model is also benchmarked with a published numerical dynamo model solution.
Evolutionary Telemetry and Command Processor (TCP) architecture
NASA Technical Reports Server (NTRS)
Schneider, John R.
1992-01-01
A low cost, modular, high performance, and compact Telemetry and Command Processor (TCP) is being built as the foundation of command and data handling subsystems for the next generation of satellites. The TCP product line will support command and telemetry requirements for small to large spacecraft and from low to high rate data transmission. It is compatible with the latest TDRSS, STDN and SGLS transponders and provides CCSDS protocol communications in addition to standard TDM formats. Its high performance computer provides computing resources for hosted flight software. Layered and modular software provides common services using standardized interfaces to applications thereby enhancing software re-use, transportability, and interoperability. The TCP architecture is based on existing standards, distributed networking, distributed and open system computing, and packet technology. The first TCP application is planned for the 94 SDIO SPAS 3 mission. The architecture enhances rapid tailoring of functions thereby reducing costs and schedules developed for individual spacecraft missions.
AHaH computing-from metastable switches to attractors to machine learning.
Nugent, Michael Alexander; Molter, Timothy Wesley
2014-01-01
Modern computing architecture based on the separation of memory and processing leads to a well known problem called the von Neumann bottleneck, a restrictive limit on the data bandwidth between CPU and RAM. This paper introduces a new approach to computing we call AHaH computing where memory and processing are combined. The idea is based on the attractor dynamics of volatile dissipative electronics inspired by biological systems, presenting an attractive alternative architecture that is able to adapt, self-repair, and learn from interactions with the environment. We envision that both von Neumann and AHaH computing architectures will operate together on the same machine, but that the AHaH computing processor may reduce the power consumption and processing time for certain adaptive learning tasks by orders of magnitude. The paper begins by drawing a connection between the properties of volatility, thermodynamics, and Anti-Hebbian and Hebbian (AHaH) plasticity. We show how AHaH synaptic plasticity leads to attractor states that extract the independent components of applied data streams and how they form a computationally complete set of logic functions. After introducing a general memristive device model based on collections of metastable switches, we show how adaptive synaptic weights can be formed from differential pairs of incremental memristors. We also disclose how arrays of synaptic weights can be used to build a neural node circuit operating AHaH plasticity. By configuring the attractor states of the AHaH node in different ways, high level machine learning functions are demonstrated. This includes unsupervised clustering, supervised and unsupervised classification, complex signal prediction, unsupervised robotic actuation and combinatorial optimization of procedures-all key capabilities of biological nervous systems and modern machine learning algorithms with real world application.
Improving Conceptual Design for Launch Vehicles
NASA Technical Reports Server (NTRS)
Olds, John R.
1998-01-01
This report summarizes activities performed during the second year of a three year cooperative agreement between NASA - Langley Research Center and Georgia Tech. Year 1 of the project resulted in the creation of a new Cost and Business Assessment Model (CABAM) for estimating the economic performance of advanced reusable launch vehicles including non-recurring costs, recurring costs, and revenue. The current year (second year) activities were focused on the evaluation of automated, collaborative design frameworks (computation architectures or computational frameworks) for automating the design process in advanced space vehicle design. Consistent with NASA's new thrust area in developing and understanding Intelligent Synthesis Environments (ISE), the goals of this year's research efforts were to develop and apply computer integration techniques and near-term computational frameworks for conducting advanced space vehicle design. NASA - Langley (VAB) has taken a lead role in developing a web-based computing architectures within which the designer can interact with disciplinary analysis tools through a flexible web interface. The advantages of this approach are, 1) flexible access to the designer interface through a simple web browser (e.g. Netscape Navigator), 2) ability to include existing 'legacy' codes, and 3) ability to include distributed analysis tools running on remote computers. To date, VAB's internal emphasis has been on developing this test system for the planetary entry mission under the joint Integrated Design System (IDS) program with NASA - Ames and JPL. Georgia Tech's complementary goals this year were to: 1) Examine an alternate 'custom' computational architecture for the three-discipline IDS planetary entry problem to assess the advantages and disadvantages relative to the web-based approach.and 2) Develop and examine a web-based interface and framework for a typical launch vehicle design problem.
Boolean and brain-inspired computing using spin-transfer torque devices
NASA Astrophysics Data System (ADS)
Fan, Deliang
Several completely new approaches (such as spintronic, carbon nanotube, graphene, TFETs, etc.) to information processing and data storage technologies are emerging to address the time frame beyond current Complementary Metal-Oxide-Semiconductor (CMOS) roadmap. The high speed magnetization switching of a nano-magnet due to current induced spin-transfer torque (STT) have been demonstrated in recent experiments. Such STT devices can be explored in compact, low power memory and logic design. In order to truly leverage STT devices based computing, researchers require a re-think of circuit, architecture, and computing model, since the STT devices are unlikely to be drop-in replacements for CMOS. The potential of STT devices based computing will be best realized by considering new computing models that are inherently suited to the characteristics of STT devices, and new applications that are enabled by their unique capabilities, thereby attaining performance that CMOS cannot achieve. The goal of this research is to conduct synergistic exploration in architecture, circuit and device levels for Boolean and brain-inspired computing using nanoscale STT devices. Specifically, we first show that the non-volatile STT devices can be used in designing configurable Boolean logic blocks. We propose a spin-memristor threshold logic (SMTL) gate design, where memristive cross-bar array is used to perform current mode summation of binary inputs and the low power current mode spintronic threshold device carries out the energy efficient threshold operation. Next, for brain-inspired computing, we have exploited different spin-transfer torque device structures that can implement the hard-limiting and soft-limiting artificial neuron transfer functions respectively. We apply such STT based neuron (or 'spin-neuron') in various neural network architectures, such as hierarchical temporal memory and feed-forward neural network, for performing "human-like" cognitive computing, which show more than two orders of lower energy consumption compared to state of the art CMOS implementation. Finally, we show the dynamics of injection locked Spin Hall Effect Spin-Torque Oscillator (SHE-STO) cluster can be exploited as a robust multi-dimensional distance metric for associative computing, image/ video analysis, etc. Our simulation results show that the proposed system architecture with injection locked SHE-STOs and the associated CMOS interface circuits can be suitable for robust and energy efficient associative computing and pattern matching.
Multimedia courseware in an open-systems environment: a DoD strategy
NASA Astrophysics Data System (ADS)
Welsch, Lawrence A.
1991-03-01
The federal government is about to invest billions of dollars to develop multimedia training materials for delivery on computer-based interactive training systems. Acquisition of a variety of computers and peripheral devices hosting various operating systems and suites of authoring system software will be necessary to facilitate the development of this courseware. There is no single source that will satisfy all needs. Although high-performance, low-cost interactive training hardware is available, the products have proprietary software interfaces. Because the interfaces are proprietary, expensive reprogramming is usually required to adapt such software products to other platforms. This costly reprogramming could be eliminated by adopting standard software interfaces. DoD's Portable Courseware Project (PORTCO) is typical of projects worldwide that require standard software interfaces. This paper articulates the strategy whereby PORTCO leverages the open systems movement and the new realities of information technology. These realities encompass changes in the pace at which new technology becomes available, changes in organizational goals and philosophy, new roles of vendors and users, changes in the procurement process, and acceleration toward open system environments. The PORTCO strategy is applicable to all projects and systems that require open systems to achieve mission objectives. The federal goal is to facilitate the creation of an environment in which high quality portable courseware is available as commercial off-the-shelf products and is competitively supplied by a variety of vendors. In order to achieve this goal a system architecture incorporating standards to meet the users' needs must be established. The Request for Architecture (RFA) developed cooperatively by DoD and the National Institute of Standards and Technology (NIST) will generate the PORTCO systems architecture. This architecture must freely integrate the courseware and authoring software from the lower levels of machine architecture and systems service implementation. In addition, the systems architecture will establish how the application-specific technologies relate to other technologies. Further, a computer-based interactive training applications profile must be developed. This profile, along with the systems architecture derived as a result of the RFA, provides the basis for identifying the needed standards. NIST will then accelerate the development of these standards using, but not restricted to, existing standards activities within established standards forums. The federal multimedia courseware effort has adopted the Interactive Multimedia Association (INA) Recommended Practices for Interactive Video Portability as the baseline for the migration of computer-based interactive training systems to an open systems environment based upon international standards. The PORTCO strategy includes an evolutionary migration to a standards-based, Open System Environments (OSE). An important aspect of this migration strategy is to move to open systems via stepwise evolution rather than via quantum leaps. Another area of concern is that of infrastructure issues, such as maintaining and supporting the technologies required for computer-based interactive training. The federal multimedia initiative will use the RFA-based architecture to differentiate between those technologies that can be maintained and supported by existing infrastructure mechanisms and those that require new mechanisms. Existing infrastructure mechanisms will be used and where infrastructure mechanisms do not exist, the approach will be to place high priority on establishing the appropriate mechanisms. Establishing an infrastructure mechanism is a nontrivial task requiring sustained investment of resources.
NASA Technical Reports Server (NTRS)
Fischer, James R.; Grosch, Chester; Mcanulty, Michael; Odonnell, John; Storey, Owen
1987-01-01
NASA's Office of Space Science and Applications (OSSA) gave a select group of scientists the opportunity to test and implement their computational algorithms on the Massively Parallel Processor (MPP) located at Goddard Space Flight Center, beginning in late 1985. One year later, the Working Group presented its report, which addressed the following: algorithms, programming languages, architecture, programming environments, the way theory relates, and performance measured. The findings point to a number of demonstrated computational techniques for which the MPP architecture is ideally suited. For example, besides executing much faster on the MPP than on conventional computers, systolic VLSI simulation (where distances are short), lattice simulation, neural network simulation, and image problems were found to be easier to program on the MPP's architecture than on a CYBER 205 or even a VAX. The report also makes technical recommendations covering all aspects of MPP use, and recommendations concerning the future of the MPP and machines based on similar architectures, expansion of the Working Group, and study of the role of future parallel processors for space station, EOS, and the Great Observatories era.
Performance analysis of parallel branch and bound search with the hypercube architecture
NASA Technical Reports Server (NTRS)
Mraz, Richard T.
1987-01-01
With the availability of commercial parallel computers, researchers are examining new classes of problems which might benefit from parallel computing. This paper presents results of an investigation of the class of search intensive problems. The specific problem discussed is the Least-Cost Branch and Bound search method of deadline job scheduling. The object-oriented design methodology was used to map the problem into a parallel solution. While the initial design was good for a prototype, the best performance resulted from fine-tuning the algorithm for a specific computer. The experiments analyze the computation time, the speed up over a VAX 11/785, and the load balance of the problem when using loosely coupled multiprocessor system based on the hypercube architecture.
A High Performance COTS Based Computer Architecture
NASA Astrophysics Data System (ADS)
Patte, Mathieu; Grimoldi, Raoul; Trautner, Roland
2014-08-01
Using Commercial Off The Shelf (COTS) electronic components for space applications is a long standing idea. Indeed the difference in processing performance and energy efficiency between radiation hardened components and COTS components is so important that COTS components are very attractive for use in mass and power constrained systems. However using COTS components in space is not straightforward as one must account with the effects of the space environment on the COTS components behavior. In the frame of the ESA funded activity called High Performance COTS Based Computer, Airbus Defense and Space and its subcontractor OHB CGS have developed and prototyped a versatile COTS based architecture for high performance processing. The rest of the paper is organized as follows: in a first section we will start by recapitulating the interests and constraints of using COTS components for space applications; then we will briefly describe existing fault mitigation architectures and present our solution for fault mitigation based on a component called the SmartIO; in the last part of the paper we will describe the prototyping activities executed during the HiP CBC project.
NASA Astrophysics Data System (ADS)
Luo, Aiwen; An, Fengwei; Zhang, Xiangyu; Chen, Lei; Huang, Zunkai; Jürgen Mattausch, Hans
2018-04-01
Feature extraction techniques are a cornerstone of object detection in computer-vision-based applications. The detection performance of vison-based detection systems is often degraded by, e.g., changes in the illumination intensity of the light source, foreground-background contrast variations or automatic gain control from the camera. In order to avoid such degradation effects, we present a block-based L1-norm-circuit architecture which is configurable for different image-cell sizes, cell-based feature descriptors and image resolutions according to customization parameters from the circuit input. The incorporated flexibility in both the image resolution and the cell size for multi-scale image pyramids leads to lower computational complexity and power consumption. Additionally, an object-detection prototype for performance evaluation in 65 nm CMOS implements the proposed L1-norm circuit together with a histogram of oriented gradients (HOG) descriptor and a support vector machine (SVM) classifier. The proposed parallel architecture with high hardware efficiency enables real-time processing, high detection robustness, small chip-core area as well as low power consumption for multi-scale object detection.
Developing Information Power Grid Based Algorithms and Software
NASA Technical Reports Server (NTRS)
Dongarra, Jack
1998-01-01
This exploratory study initiated our effort to understand performance modeling on parallel systems. The basic goal of performance modeling is to understand and predict the performance of a computer program or set of programs on a computer system. Performance modeling has numerous applications, including evaluation of algorithms, optimization of code implementations, parallel library development, comparison of system architectures, parallel system design, and procurement of new systems. Our work lays the basis for the construction of parallel libraries that allow for the reconstruction of application codes on several distinct architectures so as to assure performance portability. Following our strategy, once the requirements of applications are well understood, one can then construct a library in a layered fashion. The top level of this library will consist of architecture-independent geometric, numerical, and symbolic algorithms that are needed by the sample of applications. These routines should be written in a language that is portable across the targeted architectures.
Evaluation of hardware costs of implementing PSK signal detection circuit based on "system on chip"
NASA Astrophysics Data System (ADS)
Sokolovskiy, A. V.; Dmitriev, D. D.; Veisov, E. A.; Gladyshev, A. B.
2018-05-01
The article deals with the choice of the architecture of digital signal processing units for implementing the PSK signal detection scheme. As an assessment of the effectiveness of architectures, the required number of shift registers and computational processes are used when implementing the "system on a chip" on the chip. A statistical estimation of the normalized code sequence offset in the signal synchronization scheme for various hardware block architectures is used.
A distributed parallel storage architecture and its potential application within EOSDIS
NASA Technical Reports Server (NTRS)
Johnston, William E.; Tierney, Brian; Feuquay, Jay; Butzer, Tony
1994-01-01
We describe the architecture, implementation, use of a scalable, high performance, distributed-parallel data storage system developed in the ARPA funded MAGIC gigabit testbed. A collection of wide area distributed disk servers operate in parallel to provide logical block level access to large data sets. Operated primarily as a network-based cache, the architecture supports cooperation among independently owned resources to provide fast, large-scale, on-demand storage to support data handling, simulation, and computation.
Differential geometric treewidth estimation in adiabatic quantum computation
NASA Astrophysics Data System (ADS)
Wang, Chi; Jonckheere, Edmond; Brun, Todd
2016-10-01
The D-Wave adiabatic quantum computing platform is designed to solve a particular class of problems—the Quadratic Unconstrained Binary Optimization (QUBO) problems. Due to the particular "Chimera" physical architecture of the D-Wave chip, the logical problem graph at hand needs an extra process called minor embedding in order to be solvable on the D-Wave architecture. The latter problem is itself NP-hard. In this paper, we propose a novel polynomial-time approximation to the closely related treewidth based on the differential geometric concept of Ollivier-Ricci curvature. The latter runs in polynomial time and thus could significantly reduce the overall complexity of determining whether a QUBO problem is minor embeddable, and thus solvable on the D-Wave architecture.
Local Alignment Tool Based on Hadoop Framework and GPU Architecture
Hung, Che-Lun; Hua, Guan-Jie
2014-01-01
With the rapid growth of next generation sequencing technologies, such as Slex, more and more data have been discovered and published. To analyze such huge data the computational performance is an important issue. Recently, many tools, such as SOAP, have been implemented on Hadoop and GPU parallel computing architectures. BLASTP is an important tool, implemented on GPU architectures, for biologists to compare protein sequences. To deal with the big biology data, it is hard to rely on single GPU. Therefore, we implement a distributed BLASTP by combining Hadoop and multi-GPUs. The experimental results present that the proposed method can improve the performance of BLASTP on single GPU, and also it can achieve high availability and fault tolerance. PMID:24955362
Local alignment tool based on Hadoop framework and GPU architecture.
Hung, Che-Lun; Hua, Guan-Jie
2014-01-01
With the rapid growth of next generation sequencing technologies, such as Slex, more and more data have been discovered and published. To analyze such huge data the computational performance is an important issue. Recently, many tools, such as SOAP, have been implemented on Hadoop and GPU parallel computing architectures. BLASTP is an important tool, implemented on GPU architectures, for biologists to compare protein sequences. To deal with the big biology data, it is hard to rely on single GPU. Therefore, we implement a distributed BLASTP by combining Hadoop and multi-GPUs. The experimental results present that the proposed method can improve the performance of BLASTP on single GPU, and also it can achieve high availability and fault tolerance.
KeyWare: an open wireless distributed computing environment
NASA Astrophysics Data System (ADS)
Shpantzer, Isaac; Schoenfeld, Larry; Grindahl, Merv; Kelman, Vladimir
1995-12-01
Deployment of distributed applications in the wireless domain lack equivalent tools, methodologies, architectures, and network management that exist in LAN based applications. A wireless distributed computing environment (KeyWareTM) based on intelligent agents within a multiple client multiple server scheme was developed to resolve this problem. KeyWare renders concurrent application services to wireline and wireless client nodes encapsulated in multiple paradigms such as message delivery, database access, e-mail, and file transfer. These services and paradigms are optimized to cope with temporal and spatial radio coverage, high latency, limited throughput and transmission costs. A unified network management paradigm for both wireless and wireline facilitates seamless extensions of LAN- based management tools to include wireless nodes. A set of object oriented tools and methodologies enables direct asynchronous invocation of agent-based services supplemented by tool-sets matched to supported KeyWare paradigms. The open architecture embodiment of KeyWare enables a wide selection of client node computing platforms, operating systems, transport protocols, radio modems and infrastructures while maintaining application portability.
DOE Office of Scientific and Technical Information (OSTI.GOV)
De Supinski, B.; Caliga, D.
2017-09-28
The primary objective of this project was to develop memory optimization technology to efficiently deliver data to, and distribute data within, the SRC-6's Field Programmable Gate Array- ("FPGA") based Multi-Adaptive Processors (MAPs). The hardware/software approach was to explore efficient MAP configurations and generate the compiler technology to exploit those configurations. This memory accessing technology represents an important step towards making reconfigurable symmetric multi-processor (SMP) architectures that will be a costeffective solution for large-scale scientific computing.
NASA Technical Reports Server (NTRS)
Abada, Christopher H.; Farley, Gary L.; Hyer, Michael W.
2006-01-01
A computer-based parametric study of the effect of reinforcement architectures on fracture response of aluminum compact-tension (CT) specimens is performed. Eleven different reinforcement architectures consisting of rectangular and triangular cross-section reinforcements were evaluated. Reinforced specimens produced between 13 and 28 percent higher fracture load than achieved with the non-reinforced case. Reinforcements with blunt leading edges (rectangular reinforcements) exhibited superior performance relative to the triangular reinforcements with sharp leading edges. Relative to the rectangular reinforcements, the most important architectural feature was reinforcement thickness. At failure, the reinforcements carried between 58 and 85 percent of the load applied to the specimen, suggesting that there is considerable load transfer between the base material and the reinforcement.
Rodríguez, Alfonso; Valverde, Juan; Portilla, Jorge; Otero, Andrés; Riesgo, Teresa; de la Torre, Eduardo
2018-06-08
Cyber-Physical Systems are experiencing a paradigm shift in which processing has been relocated to the distributed sensing layer and is no longer performed in a centralized manner. This approach, usually referred to as Edge Computing, demands the use of hardware platforms that are able to manage the steadily increasing requirements in computing performance, while keeping energy efficiency and the adaptability imposed by the interaction with the physical world. In this context, SRAM-based FPGAs and their inherent run-time reconfigurability, when coupled with smart power management strategies, are a suitable solution. However, they usually fail in user accessibility and ease of development. In this paper, an integrated framework to develop FPGA-based high-performance embedded systems for Edge Computing in Cyber-Physical Systems is presented. This framework provides a hardware-based processing architecture, an automated toolchain, and a runtime to transparently generate and manage reconfigurable systems from high-level system descriptions without additional user intervention. Moreover, it provides users with support for dynamically adapting the available computing resources to switch the working point of the architecture in a solution space defined by computing performance, energy consumption and fault tolerance. Results show that it is indeed possible to explore this solution space at run time and prove that the proposed framework is a competitive alternative to software-based edge computing platforms, being able to provide not only faster solutions, but also higher energy efficiency for computing-intensive algorithms with significant levels of data-level parallelism.
Architectures for single-chip image computing
NASA Astrophysics Data System (ADS)
Gove, Robert J.
1992-04-01
This paper will focus on the architectures of VLSI programmable processing components for image computing applications. TI, the maker of industry-leading RISC, DSP, and graphics components, has developed an architecture for a new-generation of image processors capable of implementing a plurality of image, graphics, video, and audio computing functions. We will show that the use of a single-chip heterogeneous MIMD parallel architecture best suits this class of processors--those which will dominate the desktop multimedia, document imaging, computer graphics, and visualization systems of this decade.
A global distributed storage architecture
NASA Technical Reports Server (NTRS)
Lionikis, Nemo M.; Shields, Michael F.
1996-01-01
NSA architects and planners have come to realize that to gain the maximum benefit from, and keep pace with, emerging technologies, we must move to a radically different computing architecture. The compute complex of the future will be a distributed heterogeneous environment, where, to a much greater extent than today, network-based services are invoked to obtain resources. Among the rewards of implementing the services-based view are that it insulates the user from much of the complexity of our multi-platform, networked, computer and storage environment and hides its diverse underlying implementation details. In this paper, we will describe one of the fundamental services being built in our envisioned infrastructure; a global, distributed archive with near-real-time access characteristics. Our approach for adapting mass storage services to this infrastructure will become clear as the service is discussed.
Transitioning ISR architecture into the cloud
NASA Astrophysics Data System (ADS)
Lash, Thomas D.
2012-06-01
Emerging cloud computing platforms offer an ideal opportunity for Intelligence, Surveillance, and Reconnaissance (ISR) intelligence analysis. Cloud computing platforms help overcome challenges and limitations of traditional ISR architectures. Modern ISR architectures can benefit from examining commercial cloud applications, especially as they relate to user experience, usage profiling, and transformational business models. This paper outlines legacy ISR architectures and their limitations, presents an overview of cloud technologies and their applications to the ISR intelligence mission, and presents an idealized ISR architecture implemented with cloud computing.
Karthikeyan, M; Krishnan, S; Pandey, Anil Kumar; Bender, Andreas; Tropsha, Alexander
2008-04-01
We present the application of a Java remote method invocation (RMI) based open source architecture to distributed chemical computing. This architecture was previously employed for distributed data harvesting of chemical information from the Internet via the Google application programming interface (API; ChemXtreme). Due to its open source character and its flexibility, the underlying server/client framework can be quickly adopted to virtually every computational task that can be parallelized. Here, we present the server/client communication framework as well as an application to distributed computing of chemical properties on a large scale (currently the size of PubChem; about 18 million compounds), using both the Marvin toolkit as well as the open source JOELib package. As an application, for this set of compounds, the agreement of log P and TPSA between the packages was compared. Outliers were found to be mostly non-druglike compounds and differences could usually be explained by differences in the underlying algorithms. ChemStar is the first open source distributed chemical computing environment built on Java RMI, which is also easily adaptable to user demands due to its "plug-in architecture". The complete source codes as well as calculated properties along with links to PubChem resources are available on the Internet via a graphical user interface at http://moltable.ncl.res.in/chemstar/.
Quantum Computing Architectural Design
NASA Astrophysics Data System (ADS)
West, Jacob; Simms, Geoffrey; Gyure, Mark
2006-03-01
Large scale quantum computers will invariably require scalable architectures in addition to high fidelity gate operations. Quantum computing architectural design (QCAD) addresses the problems of actually implementing fault-tolerant algorithms given physical and architectural constraints beyond those of basic gate-level fidelity. Here we introduce a unified framework for QCAD that enables the scientist to study the impact of varying error correction schemes, architectural parameters including layout and scheduling, and physical operations native to a given architecture. Our software package, aptly named QCAD, provides compilation, manipulation/transformation, multi-paradigm simulation, and visualization tools. We demonstrate various features of the QCAD software package through several examples.
Hypercluster Parallel Processor
NASA Technical Reports Server (NTRS)
Blech, Richard A.; Cole, Gary L.; Milner, Edward J.; Quealy, Angela
1992-01-01
Hypercluster computer system includes multiple digital processors, operation of which coordinated through specialized software. Configurable according to various parallel-computing architectures of shared-memory or distributed-memory class, including scalar computer, vector computer, reduced-instruction-set computer, and complex-instruction-set computer. Designed as flexible, relatively inexpensive system that provides single programming and operating environment within which one can investigate effects of various parallel-computing architectures and combinations on performance in solution of complicated problems like those of three-dimensional flows in turbomachines. Hypercluster software and architectural concepts are in public domain.
Practical considerations in experimental computational sensing
NASA Astrophysics Data System (ADS)
Poon, Phillip K.
Computational sensing has demonstrated the ability to ameliorate or eliminate many trade-offs in traditional sensors. Rather than attempting to form a perfect image, then sampling at the Nyquist rate, and reconstructing the signal of interest prior to post-processing, the computational sensor attempts to utilize a priori knowledge, active or passive coding of the signal-of-interest combined with a variety of algorithms to overcome the trade-offs or to improve various task-specific metrics. While it is a powerful approach to radically new sensor architectures, published research tends to focus on architecture concepts and positive results. Little attention is given towards the practical issues when faced with implementing computational sensing prototypes. I will discuss the various practical challenges that I encountered while developing three separate applications of computational sensors. The first is a compressive sensing based object tracking camera, the SCOUT, which exploits the sparsity of motion between consecutive frames while using no moving parts to create a psuedo-random shift variant point-spread function. The second is a spectral imaging camera, the AFSSI-C, which uses a modified version of Principal Component Analysis with a Bayesian strategy to adaptively design spectral filters for direct spectral classification using a digital micro-mirror device (DMD) based architecture. The third demonstrates two separate architectures to perform spectral unmixing by using an adaptive algorithm or a hybrid techniques of using Maximum Noise Fraction and random filter selection from a liquid crystal on silicon based computational spectral imager, the LCSI. All of these applications demonstrate a variety of challenges that have been addressed or continue to challenge the computational sensing community. One issue is calibration, since many computational sensors require an inversion step and in the case of compressive sensing, lack of redundancy in the measurement data. Another issue is over multiplexing, as more light is collected per sample, the finite amount of dynamic range and quantization resolution can begin to degrade the recovery of the relevant information. A priori knowledge of the sparsity and or other statistics of the signal or noise is often used by computational sensors to outperform their isomorphic counterparts. This is demonstrated in all three of the sensors I have developed. These challenges and others will be discussed using a case-study approach through these three applications.
Efficient fuzzy C-means architecture for image segmentation.
Li, Hui-Ya; Hwang, Wen-Jyi; Chang, Chia-Yen
2011-01-01
This paper presents a novel VLSI architecture for image segmentation. The architecture is based on the fuzzy c-means algorithm with spatial constraint for reducing the misclassification rate. In the architecture, the usual iterative operations for updating the membership matrix and cluster centroid are merged into one single updating process to evade the large storage requirement. In addition, an efficient pipelined circuit is used for the updating process for accelerating the computational speed. Experimental results show that the the proposed circuit is an effective alternative for real-time image segmentation with low area cost and low misclassification rate.
A Model Based Framework for Semantic Interpretation of Architectural Construction Drawings
ERIC Educational Resources Information Center
Babalola, Olubi Oluyomi
2011-01-01
The study addresses the automated translation of architectural drawings from 2D Computer Aided Drafting (CAD) data into a Building Information Model (BIM), with emphasis on the nature, possible role, and limitations of a drafting language Knowledge Representation (KR) on the problem and process. The central idea is that CAD to BIM translation is a…
Functional language and data flow architectures
NASA Technical Reports Server (NTRS)
Ercegovac, M. D.; Patel, D. R.; Lang, T.
1983-01-01
This is a tutorial article about language and architecture approaches for highly concurrent computer systems based on the functional style of programming. The discussion concentrates on the basic aspects of functional languages, and sequencing models such as data-flow, demand-driven and reduction which are essential at the machine organization level. Several examples of highly concurrent machines are described.
Multiplier Architecture for Coding Circuits
NASA Technical Reports Server (NTRS)
Wang, C. C.; Truong, T. K.; Shao, H. M.; Deutsch, L. J.
1986-01-01
Multipliers based on new algorithm for Galois-field (GF) arithmetic regular and expandable. Pipeline structures used for computing both multiplications and inverses. Designs suitable for implementation in very-large-scale integrated (VLSI) circuits. This general type of inverter and multiplier architecture especially useful in performing finite-field arithmetic of Reed-Solomon error-correcting codes and of some cryptographic algorithms.
IPython: components for interactive and parallel computing across disciplines. (Invited)
NASA Astrophysics Data System (ADS)
Perez, F.; Bussonnier, M.; Frederic, J. D.; Froehle, B. M.; Granger, B. E.; Ivanov, P.; Kluyver, T.; Patterson, E.; Ragan-Kelley, B.; Sailer, Z.
2013-12-01
Scientific computing is an inherently exploratory activity that requires constantly cycling between code, data and results, each time adjusting the computations as new insights and questions arise. To support such a workflow, good interactive environments are critical. The IPython project (http://ipython.org) provides a rich architecture for interactive computing with: 1. Terminal-based and graphical interactive consoles. 2. A web-based Notebook system with support for code, text, mathematical expressions, inline plots and other rich media. 3. Easy to use, high performance tools for parallel computing. Despite its roots in Python, the IPython architecture is designed in a language-agnostic way to facilitate interactive computing in any language. This allows users to mix Python with Julia, R, Octave, Ruby, Perl, Bash and more, as well as to develop native clients in other languages that reuse the IPython clients. In this talk, I will show how IPython supports all stages in the lifecycle of a scientific idea: 1. Individual exploration. 2. Collaborative development. 3. Production runs with parallel resources. 4. Publication. 5. Education. In particular, the IPython Notebook provides an environment for "literate computing" with a tight integration of narrative and computation (including parallel computing). These Notebooks are stored in a JSON-based document format that provides an "executable paper": notebooks can be version controlled, exported to HTML or PDF for publication, and used for teaching.
Plagianakos, V P; Magoulas, G D; Vrahatis, M N
2006-03-01
Distributed computing is a process through which a set of computers connected by a network is used collectively to solve a single problem. In this paper, we propose a distributed computing methodology for training neural networks for the detection of lesions in colonoscopy. Our approach is based on partitioning the training set across multiple processors using a parallel virtual machine. In this way, interconnected computers of varied architectures can be used for the distributed evaluation of the error function and gradient values, and, thus, training neural networks utilizing various learning methods. The proposed methodology has large granularity and low synchronization, and has been implemented and tested. Our results indicate that the parallel virtual machine implementation of the training algorithms developed leads to considerable speedup, especially when large network architectures and training sets are used.
Rodríguez-Tizcareño, Mario H; Barajas, Lizbeth; Pérez-Gásque, Marisol; Gómez, Salvador
2012-06-01
This report presents a protocol used to transfer the virtual treatment plan data to the surgical and prosthetic reality and its clinical application, bone site augmentation with computer-custom milled bovine bone graft blocks to their ideal architecture form, implant insertion based on image-guided stent fabrication, and the restorative manufacturing process through computed tomography-based software programs and navigation systems and the computer-aided design and manufacturing techniques for the treatment of the edentulous maxilla.
Design and optimization of a portable LQCD Monte Carlo code using OpenACC
NASA Astrophysics Data System (ADS)
Bonati, Claudio; Coscetti, Simone; D'Elia, Massimo; Mesiti, Michele; Negro, Francesco; Calore, Enrico; Schifano, Sebastiano Fabio; Silvi, Giorgio; Tripiccione, Raffaele
The present panorama of HPC architectures is extremely heterogeneous, ranging from traditional multi-core CPU processors, supporting a wide class of applications but delivering moderate computing performance, to many-core Graphics Processor Units (GPUs), exploiting aggressive data-parallelism and delivering higher performances for streaming computing applications. In this scenario, code portability (and performance portability) become necessary for easy maintainability of applications; this is very relevant in scientific computing where code changes are very frequent, making it tedious and prone to error to keep different code versions aligned. In this work, we present the design and optimization of a state-of-the-art production-level LQCD Monte Carlo application, using the directive-based OpenACC programming model. OpenACC abstracts parallel programming to a descriptive level, relieving programmers from specifying how codes should be mapped onto the target architecture. We describe the implementation of a code fully written in OpenAcc, and show that we are able to target several different architectures, including state-of-the-art traditional CPUs and GPUs, with the same code. We also measure performance, evaluating the computing efficiency of our OpenACC code on several architectures, comparing with GPU-specific implementations and showing that a good level of performance-portability can be reached.
NASA Astrophysics Data System (ADS)
Lhamon, Michael Earl
A pattern recognition system which uses complex correlation filter banks requires proportionally more computational effort than single-real valued filters. This introduces increased computation burden but also introduces a higher level of parallelism, that common computing platforms fail to identify. As a result, we consider algorithm mapping to both optical and digital processors. For digital implementation, we develop computationally efficient pattern recognition algorithms, referred to as, vector inner product operators that require less computational effort than traditional fast Fourier methods. These algorithms do not need correlation and they map readily onto parallel digital architectures, which imply new architectures for optical processors. These filters exploit circulant-symmetric matrix structures of the training set data representing a variety of distortions. By using the same mathematical basis as with the vector inner product operations, we are able to extend the capabilities of more traditional correlation filtering to what we refer to as "Super Images". These "Super Images" are used to morphologically transform a complicated input scene into a predetermined dot pattern. The orientation of the dot pattern is related to the rotational distortion of the object of interest. The optical implementation of "Super Images" yields feature reduction necessary for using other techniques, such as artificial neural networks. We propose a parallel digital signal processor architecture based on specific pattern recognition algorithms but general enough to be applicable to other similar problems. Such an architecture is classified as a data flow architecture. Instead of mapping an algorithm to an architecture, we propose mapping the DSP architecture to a class of pattern recognition algorithms. Today's optical processing systems have difficulties implementing full complex filter structures. Typically, optical systems (like the 4f correlators) are limited to phase-only implementation with lower detection performance than full complex electronic systems. Our study includes pseudo-random pixel encoding techniques for approximating full complex filtering. Optical filter bank implementation is possible and they have the advantage of time averaging the entire filter bank at real time rates. Time-averaged optical filtering is computational comparable to billions of digital operations-per-second. For this reason, we believe future trends in high speed pattern recognition will involve hybrid architectures of both optical and DSP elements.
2012-09-30
System N Agent « datatype » SoS Architecture -Receives Capabilities1 -Provides Capabilities1 1 -Provides Capabilities1 1 -Provides Capabilities1 -Updates 1...fitness, or objective function. The structure of the SoS Agent is depicted in Figure 10. SoS Agent Architecture « datatype » Initial SoS...Architecture «subsystem» Fuzzy Inference Engine FAM « datatype » Affordability « datatype » Flexibility « datatype » Performance « datatype » Robustness Input Input
Why is a computational framework for motivational and metacognitive control needed?
NASA Astrophysics Data System (ADS)
Sun, Ron
2018-01-01
This paper discusses, in the context of computational modelling and simulation of cognition, the relevance of deeper structures in the control of behaviour. Such deeper structures include motivational control of behaviour, which provides underlying causes for actions, and also metacognitive control, which provides higher-order processes for monitoring and regulation. It is argued that such deeper structures are important and thus cannot be ignored in computational cognitive architectures. A general framework based on the Clarion cognitive architecture is outlined that emphasises the interaction amongst action selection, motivation, and metacognition. The upshot is that it is necessary to incorporate all essential processes; short of that, the understanding of cognition can only be incomplete.
Super-resolution using a light inception layer in convolutional neural network
NASA Astrophysics Data System (ADS)
Mou, Qinyang; Guo, Jun
2018-04-01
Recently, several models based on CNN architecture have achieved great result on Single Image Super-Resolution (SISR) problem. In this paper, we propose an image super-resolution method (SR) using a light inception layer in convolutional network (LICN). Due to the strong representation ability of our well-designed inception layer that can learn richer representation with less parameters, we can build our model with shallow architecture that can reduce the effect of vanishing gradients problem and save computational costs. Our model strike a balance between computational speed and the quality of the result. Compared with state-of-the-art result, we produce comparable or better results with faster computational speed.
AHaH Computing–From Metastable Switches to Attractors to Machine Learning
Nugent, Michael Alexander; Molter, Timothy Wesley
2014-01-01
Modern computing architecture based on the separation of memory and processing leads to a well known problem called the von Neumann bottleneck, a restrictive limit on the data bandwidth between CPU and RAM. This paper introduces a new approach to computing we call AHaH computing where memory and processing are combined. The idea is based on the attractor dynamics of volatile dissipative electronics inspired by biological systems, presenting an attractive alternative architecture that is able to adapt, self-repair, and learn from interactions with the environment. We envision that both von Neumann and AHaH computing architectures will operate together on the same machine, but that the AHaH computing processor may reduce the power consumption and processing time for certain adaptive learning tasks by orders of magnitude. The paper begins by drawing a connection between the properties of volatility, thermodynamics, and Anti-Hebbian and Hebbian (AHaH) plasticity. We show how AHaH synaptic plasticity leads to attractor states that extract the independent components of applied data streams and how they form a computationally complete set of logic functions. After introducing a general memristive device model based on collections of metastable switches, we show how adaptive synaptic weights can be formed from differential pairs of incremental memristors. We also disclose how arrays of synaptic weights can be used to build a neural node circuit operating AHaH plasticity. By configuring the attractor states of the AHaH node in different ways, high level machine learning functions are demonstrated. This includes unsupervised clustering, supervised and unsupervised classification, complex signal prediction, unsupervised robotic actuation and combinatorial optimization of procedures–all key capabilities of biological nervous systems and modern machine learning algorithms with real world application. PMID:24520315
Analysis OpenMP performance of AMD and Intel architecture for breaking waves simulation using MPS
NASA Astrophysics Data System (ADS)
Alamsyah, M. N. A.; Utomo, A.; Gunawan, P. H.
2018-03-01
Simulation of breaking waves by using Navier-Stokes equation via moving particle semi-implicit method (MPS) over close domain is given. The results show the parallel computing on multicore architecture using OpenMP platform can reduce the computational time almost half of the serial time. Here, the comparison using two computer architectures (AMD and Intel) are performed. The results using Intel architecture is shown better than AMD architecture in CPU time. However, in efficiency, the computer with AMD architecture gives slightly higher than the Intel. For the simulation by 1512 number of particles, the CPU time using Intel and AMD are 12662.47 and 28282.30 respectively. Moreover, the efficiency using similar number of particles, AMD obtains 50.09 % and Intel up to 49.42 %.
A survey of compiler optimization techniques
NASA Technical Reports Server (NTRS)
Schneck, P. B.
1972-01-01
Major optimization techniques of compilers are described and grouped into three categories: machine dependent, architecture dependent, and architecture independent. Machine-dependent optimizations tend to be local and are performed upon short spans of generated code by using particular properties of an instruction set to reduce the time or space required by a program. Architecture-dependent optimizations are global and are performed while generating code. These optimizations consider the structure of a computer, but not its detailed instruction set. Architecture independent optimizations are also global but are based on analysis of the program flow graph and the dependencies among statements of source program. A conceptual review of a universal optimizer that performs architecture-independent optimizations at source-code level is also presented.
Interaction with Machine Improvisation
NASA Astrophysics Data System (ADS)
Assayag, Gerard; Bloch, George; Cont, Arshia; Dubnov, Shlomo
We describe two multi-agent architectures for an improvisation oriented musician-machine interaction systems that learn in real time from human performers. The improvisation kernel is based on sequence modeling and statistical learning. We present two frameworks of interaction with this kernel. In the first, the stylistic interaction is guided by a human operator in front of an interactive computer environment. In the second framework, the stylistic interaction is delegated to machine intelligence and therefore, knowledge propagation and decision are taken care of by the computer alone. The first framework involves a hybrid architecture using two popular composition/performance environments, Max and OpenMusic, that are put to work and communicate together, each one handling the process at a different time/memory scale. The second framework shares the same representational schemes with the first but uses an Active Learning architecture based on collaborative, competitive and memory-based learning to handle stylistic interactions. Both systems are capable of processing real-time audio/video as well as MIDI. After discussing the general cognitive background of improvisation practices, the statistical modelling tools and the concurrent agent architecture are presented. Then, an Active Learning scheme is described and considered in terms of using different improvisation regimes for improvisation planning. Finally, we provide more details about the different system implementations and describe several performances with the system.
NASA Astrophysics Data System (ADS)
Gupta, V.; Gupta, N.; Gupta, S.; Field, E.; Maechling, P.
2003-12-01
Modern laptop computers, and personal computers, can provide capabilities that are, in many ways, comparable to workstations or departmental servers. However, this doesn't mean we should run all computations on our local computers. We have identified several situations in which it preferable to implement our seismological application programs in a distributed, server-based, computing model. In this model, application programs on the user's laptop, or local computer, invoke programs that run on an organizational server, and the results are returned to the invoking system. Situations in which a server-based architecture may be preferred include: (a) a program is written in a language, or written for an operating environment, that is unsupported on the local computer, (b) software libraries or utilities required to execute a program are not available on the users computer, (c) a computational program is physically too large, or computationally too expensive, to run on a users computer, (d) a user community wants to enforce a consistent method of performing a computation by standardizing on a single implementation of a program, and (e) the computational program may require current information, that is not available to all client computers. Until recently, distributed, server-based, computational capabilities were implemented using client/server architectures. In these architectures, client programs were often written in the same language, and they executed in the same computing environment, as the servers. Recently, a new distributed computational model, called Web Services, has been developed. Web Services are based on Internet standards such as XML, SOAP, WDSL, and UDDI. Web Services offer the promise of platform, and language, independent distributed computing. To investigate this new computational model, and to provide useful services to the SCEC Community, we have implemented several computational and utility programs using a Web Service architecture. We have hosted these Web Services as a part of the SCEC Community Modeling Environment (SCEC/CME) ITR Project (http://www.scec.org/cme). We have implemented Web Services for several of the reasons sited previously. For example, we implemented a FORTRAN-based Earthquake Rupture Forecast (ERF) as a Web Service for use by client computers that don't support a FORTRAN runtime environment. We implemented a Generic Mapping Tool (GMT) Web Service for use by systems that don't have local access to GMT. We implemented a Hazard Map Calculator Web Service to execute Hazard calculations that are too computationally intensive to run on a local system. We implemented a Coordinate Conversion Web Service to enforce a standard and consistent method for converting between UTM and Lat/Lon. Our experience developing these services indicates both strengths and weakness in current Web Service technology. Client programs that utilize Web Services typically need network access, a significant disadvantage at times. Programs with simple input and output parameters were the easiest to implement as Web Services, while programs with complex parameter-types required a significant amount of additional development. We also noted that Web services are very data-oriented, and adapting object-oriented software into the Web Service model proved problematic. Also, the Web Service approach of converting data types into XML format for network transmission has significant inefficiencies for some data sets.
A synchronized computational architecture for generalized bilateral control of robot arms
NASA Technical Reports Server (NTRS)
Bejczy, Antal K.; Szakaly, Zoltan
1987-01-01
This paper describes a computational architecture for an interconnected high speed distributed computing system for generalized bilateral control of robot arms. The key method of the architecture is the use of fully synchronized, interrupt driven software. Since an objective of the development is to utilize the processing resources efficiently, the synchronization is done in the hardware level to reduce system software overhead. The architecture also achieves a balaced load on the communication channel. The paper also describes some architectural relations to trading or sharing manual and automatic control.
GPU accelerated dynamic functional connectivity analysis for functional MRI data.
Akgün, Devrim; Sakoğlu, Ünal; Esquivel, Johnny; Adinoff, Bryon; Mete, Mutlu
2015-07-01
Recent advances in multi-core processors and graphics card based computational technologies have paved the way for an improved and dynamic utilization of parallel computing techniques. Numerous applications have been implemented for the acceleration of computationally-intensive problems in various computational science fields including bioinformatics, in which big data problems are prevalent. In neuroimaging, dynamic functional connectivity (DFC) analysis is a computationally demanding method used to investigate dynamic functional interactions among different brain regions or networks identified with functional magnetic resonance imaging (fMRI) data. In this study, we implemented and analyzed a parallel DFC algorithm based on thread-based and block-based approaches. The thread-based approach was designed to parallelize DFC computations and was implemented in both Open Multi-Processing (OpenMP) and Compute Unified Device Architecture (CUDA) programming platforms. Another approach developed in this study to better utilize CUDA architecture is the block-based approach, where parallelization involves smaller parts of fMRI time-courses obtained by sliding-windows. Experimental results showed that the proposed parallel design solutions enabled by the GPUs significantly reduce the computation time for DFC analysis. Multicore implementation using OpenMP on 8-core processor provides up to 7.7× speed-up. GPU implementation using CUDA yielded substantial accelerations ranging from 18.5× to 157× speed-up once thread-based and block-based approaches were combined in the analysis. Proposed parallel programming solutions showed that multi-core processor and CUDA-supported GPU implementations accelerated the DFC analyses significantly. Developed algorithms make the DFC analyses more practical for multi-subject studies with more dynamic analyses. Copyright © 2015 Elsevier Ltd. All rights reserved.
Performance Analysis of Cloud Computing Architectures Using Discrete Event Simulation
NASA Technical Reports Server (NTRS)
Stocker, John C.; Golomb, Andrew M.
2011-01-01
Cloud computing offers the economic benefit of on-demand resource allocation to meet changing enterprise computing needs. However, the flexibility of cloud computing is disadvantaged when compared to traditional hosting in providing predictable application and service performance. Cloud computing relies on resource scheduling in a virtualized network-centric server environment, which makes static performance analysis infeasible. We developed a discrete event simulation model to evaluate the overall effectiveness of organizations in executing their workflow in traditional and cloud computing architectures. The two part model framework characterizes both the demand using a probability distribution for each type of service request as well as enterprise computing resource constraints. Our simulations provide quantitative analysis to design and provision computing architectures that maximize overall mission effectiveness. We share our analysis of key resource constraints in cloud computing architectures and findings on the appropriateness of cloud computing in various applications.
NASA Technical Reports Server (NTRS)
1998-01-01
Under a NASA SBIR (Small Business Innovative Research) contract, (NAS5-30905), EAI Simulation Associates, Inc., developed a new digital simulation computer, Starlight(tm). With an architecture based on the analog model of computation, Starlight(tm) outperforms all other computers on a wide range of continuous system simulation. This system is used in a variety of applications, including aerospace, automotive, electric power and chemical reactors.
The symbolic computation and automatic analysis of trajectories
NASA Technical Reports Server (NTRS)
Grossman, Robert
1991-01-01
Research was generally done on computation of trajectories of dynamical systems, especially control systems. Algorithms were further developed for rewriting expressions involving differential operators. The differential operators involved arise in the local analysis of nonlinear control systems. An initial design was completed of the system architecture for software to analyze nonlinear control systems using data base computing.
RICIS Symposium 1992: Mission and Safety Critical Systems Research and Applications
NASA Technical Reports Server (NTRS)
1992-01-01
This conference deals with computer systems which control systems whose failure to operate correctly could produce the loss of life and or property, mission and safety critical systems. Topics covered are: the work of standards groups, computer systems design and architecture, software reliability, process control systems, knowledge based expert systems, and computer and telecommunication protocols.
Integrated Visible Photonics for Trapped-Ion Quantum Computing
2017-06-10
necessarily reflect the views of the Department of Defense. Abstract- A scalable trapped-ion-based quantum - computing architecture requires the... Quantum Computing Dave Kharas, Cheryl Sorace-Agaskar, Suraj Bramhavar, William Loh, Jeremy M. Sage, Paul W. Juodawlkis, and John...coherence times, strong coulomb interactions, and optical addressability, hold great promise for implementation of practical quantum information
The Development of the Non-hydrostatic Unified Model of the Atmosphere (NUMA)
2011-09-19
capabilities: 1. Highly scalable on current and future computer architectures ( exascale computing: this means CPUs and GPUs) 2. Flexibility to use a...From Terascale to Petascale/ Exascale Computing • 10 of Top 500 are already in the Petascale range • 3 of top 10 are GPU-based machines 2
NASA Astrophysics Data System (ADS)
Gallagher, J. H. R.; Jelenak, A.; Potter, N.; Fulker, D. W.; Habermann, T.
2017-12-01
Providing data services based on cloud computing technology that is equivalent to those developed for traditional computing and storage systems is critical for successful migration to cloud-based architectures for data production, scientific analysis and storage. OPeNDAP Web-service capabilities (comprising the Data Access Protocol (DAP) specification plus open-source software for realizing DAP in servers and clients) are among the most widely deployed means for achieving data-as-service functionality in the Earth sciences. OPeNDAP services are especially common in traditional data center environments where servers offer access to datasets stored in (very large) file systems, and a preponderance of the source data for these services is being stored in the Hierarchical Data Format Version 5 (HDF5). Three candidate architectures for serving NASA satellite Earth Science HDF5 data via Hyrax running on Amazon Web Services (AWS) were developed and their performance examined for a set of representative use cases. The performance was based both on runtime and incurred cost. The three architectures differ in how HDF5 files are stored in the Amazon Simple Storage Service (S3) and how the Hyrax server (as an EC2 instance) retrieves their data. The results for both the serial and parallel access to HDF5 data in the S3 will be presented. While the study focused on HDF5 data, OPeNDAP and the Hyrax data server, the architectures are generic and the analysis can be extrapolated to many different data formats, web APIs, and data servers.
Handsfield, Geoffrey G; Bolsterlee, Bart; Inouye, Joshua M; Herbert, Robert D; Besier, Thor F; Fernandez, Justin W
2017-12-01
Determination of skeletal muscle architecture is important for accurately modeling muscle behavior. Current methods for 3D muscle architecture determination can be costly and time-consuming, making them prohibitive for clinical or modeling applications. Computational approaches such as Laplacian flow simulations can estimate muscle fascicle orientation based on muscle shape and aponeurosis location. The accuracy of this approach is unknown, however, since it has not been validated against other standards for muscle architecture determination. In this study, muscle architectures from the Laplacian approach were compared to those determined from diffusion tensor imaging in eight adult medial gastrocnemius muscles. The datasets were subdivided into training and validation sets, and computational fluid dynamics software was used to conduct Laplacian simulations. In training sets, inputs of muscle geometry, aponeurosis location, and geometric flow guides resulted in good agreement between methods. Application of the method to validation sets showed no significant differences in pennation angle (mean difference [Formula: see text] or fascicle length (mean difference 0.9 mm). Laplacian simulation was thus effective at predicting gastrocnemius muscle architectures in healthy volunteers using imaging-derived muscle shape and aponeurosis locations. This method may serve as a tool for determining muscle architecture in silico and as a complement to other approaches.
The computation in diagnostics for tokamaks: systems, designs, approaches
NASA Astrophysics Data System (ADS)
Krawczyk, Rafał; Linczuk, Paweł; Czarski, Tomasz; Wojeński, Andrzej; Chernyshova, Maryna; Poźniak, Krzysztof; Kolasiński, Piotr; Kasprowicz, Grzegorz; Zabołotny, Wojciech; Kowalska-Strzeciwilk, Ewa; Malinowski, Karol; Gaska, Michał
2017-08-01
The requirements given for GEM (Gaseous Electron Multiplier) detector based acquisition system for plasma impurities diagnostics triggered a need for the development of a specialized software and hardware architecture. The amount of computations with latency and throughput restrictions cause that an advanced solution is sought for. In order to provide a mechanism fitting the designated tokamaks, an insight into existing solutions was necessary. In the article there is discussed architecture of systems used for plasma diagnostics and in related scientific fields. The developed solution is compared and contrasted with other diagnostic and control systems. Particular attention is payed to specific requirements for plasma impurities diagnostics in tokamak thermal fusion reactor. Subsequently, the details are presented that justified the choice of the system architecture and the discussion on various approaches is given.
Streaming parallel GPU acceleration of large-scale filter-based spiking neural networks.
Slażyński, Leszek; Bohte, Sander
2012-01-01
The arrival of graphics processing (GPU) cards suitable for massively parallel computing promises affordable large-scale neural network simulation previously only available at supercomputing facilities. While the raw numbers suggest that GPUs may outperform CPUs by at least an order of magnitude, the challenge is to develop fine-grained parallel algorithms to fully exploit the particulars of GPUs. Computation in a neural network is inherently parallel and thus a natural match for GPU architectures: given inputs, the internal state for each neuron can be updated in parallel. We show that for filter-based spiking neurons, like the Spike Response Model, the additive nature of membrane potential dynamics enables additional update parallelism. This also reduces the accumulation of numerical errors when using single precision computation, the native precision of GPUs. We further show that optimizing simulation algorithms and data structures to the GPU's architecture has a large pay-off: for example, matching iterative neural updating to the memory architecture of the GPU speeds up this simulation step by a factor of three to five. With such optimizations, we can simulate in better-than-realtime plausible spiking neural networks of up to 50 000 neurons, processing over 35 million spiking events per second.
Power System Information Delivering System Based on Distributed Object
NASA Astrophysics Data System (ADS)
Tanaka, Tatsuji; Tsuchiya, Takehiko; Tamura, Setsuo; Seki, Tomomichi; Kubota, Kenji
In recent years, improvement in computer performance and development of computer network technology or the distributed information processing technology has a remarkable thing. Moreover, the deregulation is starting and will be spreading in the electric power industry in Japan. Consequently, power suppliers are required to supply low cost power with high quality services to customers. Corresponding to these movements the authors have been proposed SCOPE (System Configuration Of PowEr control system) architecture for distributed EMS/SCADA (Energy Management Systems / Supervisory Control and Data Acquisition) system based on distributed object technology, which offers the flexibility and expandability adapting those movements. In this paper, the authors introduce a prototype of the power system information delivering system, which was developed based on SCOPE architecture. This paper describes the architecture and the evaluation results of this prototype system. The power system information delivering system supplies useful power systems information such as electric power failures to the customers using Internet and distributed object technology. This system is new type of SCADA system which monitors failure of power transmission system and power distribution system with geographic information integrated way.
Architectural Specialization for Inter-Iteration Loop Dependence Patterns
2015-10-01
Architectural Specialization for Inter-Iteration Loop Dependence Patterns Christopher Batten Computer Systems Laboratory School of Electrical and...Trends in Computer Architecture Transistors (Thousands) Frequency (MHz) Typical Power (W) MIPS R2K Intel P4 DEC Alpha 21264 Data collected by M...T as ks p er Jo ule ) Simple Processor Design Power Constraint High-Performance Architectures Embedded Architectures Design Performance
Fast underdetermined BSS architecture design methodology for real time applications.
Mopuri, Suresh; Reddy, P Sreenivasa; Acharyya, Amit; Naik, Ganesh R
2015-01-01
In this paper, we propose a high speed architecture design methodology for the Under-determined Blind Source Separation (UBSS) algorithm using our recently proposed high speed Discrete Hilbert Transform (DHT) targeting real time applications. In UBSS algorithm, unlike the typical BSS, the number of sensors are less than the number of the sources, which is of more interest in the real time applications. The DHT architecture has been implemented based on sub matrix multiplication method to compute M point DHT, which uses N point architecture recursively and where M is an integer multiples of N. The DHT architecture and state of the art architecture are coded in VHDL for 16 bit word length and ASIC implementation is carried out using UMC 90 - nm technology @V DD = 1V and @ 1MHZ clock frequency. The proposed architecture implementation and experimental comparison results show that the DHT design is two times faster than state of the art architecture.
Manyscale Computing for Sensor Processing in Support of Space Situational Awareness
NASA Astrophysics Data System (ADS)
Schmalz, M.; Chapman, W.; Hayden, E.; Sahni, S.; Ranka, S.
2014-09-01
Increasing image and signal data burden associated with sensor data processing in support of space situational awareness implies continuing computational throughput growth beyond the petascale regime. In addition to growing applications data burden and diversity, the breadth, diversity and scalability of high performance computing architectures and their various organizations challenge the development of a single, unifying, practicable model of parallel computation. Therefore, models for scalable parallel processing have exploited architectural and structural idiosyncrasies, yielding potential misapplications when legacy programs are ported among such architectures. In response to this challenge, we have developed a concise, efficient computational paradigm and software called Manyscale Computing to facilitate efficient mapping of annotated application codes to heterogeneous parallel architectures. Our theory, algorithms, software, and experimental results support partitioning and scheduling of application codes for envisioned parallel architectures, in terms of work atoms that are mapped (for example) to threads or thread blocks on computational hardware. Because of the rigor, completeness, conciseness, and layered design of our manyscale approach, application-to-architecture mapping is feasible and scalable for architectures at petascales, exascales, and above. Further, our methodology is simple, relying primarily on a small set of primitive mapping operations and support routines that are readily implemented on modern parallel processors such as graphics processing units (GPUs) and hybrid multi-processors (HMPs). In this paper, we overview the opportunities and challenges of manyscale computing for image and signal processing in support of space situational awareness applications. We discuss applications in terms of a layered hardware architecture (laboratory > supercomputer > rack > processor > component hierarchy). Demonstration applications include performance analysis and results in terms of execution time as well as storage, power, and energy consumption for bus-connected and/or networked architectures. The feasibility of the manyscale paradigm is demonstrated by addressing four principal challenges: (1) architectural/structural diversity, parallelism, and locality, (2) masking of I/O and memory latencies, (3) scalability of design as well as implementation, and (4) efficient representation/expression of parallel applications. Examples will demonstrate how manyscale computing helps solve these challenges efficiently on real-world computing systems.
NASA Astrophysics Data System (ADS)
Santagati, C.; Inzerillo, L.; Di Paola, F.
2013-07-01
3D reconstruction from images has undergone a revolution in the last few years. Computer vision techniques use photographs from data set collection to rapidly build detailed 3D models. The simultaneous applications of different algorithms (MVS), the different techniques of image matching, feature extracting and mesh optimization are inside an active field of research in computer vision. The results are promising: the obtained models are beginning to challenge the precision of laser-based reconstructions. Among all the possibilities we can mainly distinguish desktop and web-based packages. Those last ones offer the opportunity to exploit the power of cloud computing in order to carry out a semi-automatic data processing, thus allowing the user to fulfill other tasks on its computer; whereas desktop systems employ too much processing time and hard heavy approaches. Computer vision researchers have explored many applications to verify the visual accuracy of 3D model but the approaches to verify metric accuracy are few and no one is on Autodesk 123D Catch applied on Architectural Heritage Documentation. Our approach to this challenging problem is to compare the 3Dmodels by Autodesk 123D Catch and 3D models by terrestrial LIDAR considering different object size, from the detail (capitals, moldings, bases) to large scale buildings for practitioner purpose.
VLSI implementation of a new LMS-based algorithm for noise removal in ECG signal
NASA Astrophysics Data System (ADS)
Satheeskumaran, S.; Sabrigiriraj, M.
2016-06-01
Least mean square (LMS)-based adaptive filters are widely deployed for removing artefacts in electrocardiogram (ECG) due to less number of computations. But they posses high mean square error (MSE) under noisy environment. The transform domain variable step-size LMS algorithm reduces the MSE at the cost of computational complexity. In this paper, a variable step-size delayed LMS adaptive filter is used to remove the artefacts from the ECG signal for improved feature extraction. The dedicated digital Signal processors provide fast processing, but they are not flexible. By using field programmable gate arrays, the pipelined architectures can be used to enhance the system performance. The pipelined architecture can enhance the operation efficiency of the adaptive filter and save the power consumption. This technique provides high signal-to-noise ratio and low MSE with reduced computational complexity; hence, it is a useful method for monitoring patients with heart-related problem.
GISpark: A Geospatial Distributed Computing Platform for Spatiotemporal Big Data
NASA Astrophysics Data System (ADS)
Wang, S.; Zhong, E.; Wang, E.; Zhong, Y.; Cai, W.; Li, S.; Gao, S.
2016-12-01
Geospatial data are growing exponentially because of the proliferation of cost effective and ubiquitous positioning technologies such as global remote-sensing satellites and location-based devices. Analyzing large amounts of geospatial data can provide great value for both industrial and scientific applications. Data- and compute- intensive characteristics inherent in geospatial big data increasingly pose great challenges to technologies of data storing, computing and analyzing. Such challenges require a scalable and efficient architecture that can store, query, analyze, and visualize large-scale spatiotemporal data. Therefore, we developed GISpark - a geospatial distributed computing platform for processing large-scale vector, raster and stream data. GISpark is constructed based on the latest virtualized computing infrastructures and distributed computing architecture. OpenStack and Docker are used to build multi-user hosting cloud computing infrastructure for GISpark. The virtual storage systems such as HDFS, Ceph, MongoDB are combined and adopted for spatiotemporal data storage management. Spark-based algorithm framework is developed for efficient parallel computing. Within this framework, SuperMap GIScript and various open-source GIS libraries can be integrated into GISpark. GISpark can also integrated with scientific computing environment (e.g., Anaconda), interactive computing web applications (e.g., Jupyter notebook), and machine learning tools (e.g., TensorFlow/Orange). The associated geospatial facilities of GISpark in conjunction with the scientific computing environment, exploratory spatial data analysis tools, temporal data management and analysis systems make up a powerful geospatial computing tool. GISpark not only provides spatiotemporal big data processing capacity in the geospatial field, but also provides spatiotemporal computational model and advanced geospatial visualization tools that deals with other domains related with spatial property. We tested the performance of the platform based on taxi trajectory analysis. Results suggested that GISpark achieves excellent run time performance in spatiotemporal big data applications.
NASA Astrophysics Data System (ADS)
Liyanagedera, Chamika M.; Sengupta, Abhronil; Jaiswal, Akhilesh; Roy, Kaushik
2017-12-01
Stochastic spiking neural networks based on nanoelectronic spin devices can be a possible pathway to achieving "brainlike" compact and energy-efficient cognitive intelligence. The computational model attempt to exploit the intrinsic device stochasticity of nanoelectronic synaptic or neural components to perform learning or inference. However, there has been limited analysis on the scaling effect of stochastic spin devices and its impact on the operation of such stochastic networks at the system level. This work attempts to explore the design space and analyze the performance of nanomagnet-based stochastic neuromorphic computing architectures for magnets with different barrier heights. We illustrate how the underlying network architecture must be modified to account for the random telegraphic switching behavior displayed by magnets with low barrier heights as they are scaled into the superparamagnetic regime. We perform a device-to-system-level analysis on a deep neural-network architecture for a digit-recognition problem on the MNIST data set.
ASIC-based architecture for the real-time computation of 2D convolution with large kernel size
NASA Astrophysics Data System (ADS)
Shao, Rui; Zhong, Sheng; Yan, Luxin
2015-12-01
Bidimensional convolution is a low-level processing algorithm of interest in many areas, but its high computational cost constrains the size of the kernels, especially in real-time embedded systems. This paper presents a hardware architecture for the ASIC-based implementation of 2-D convolution with medium-large kernels. Aiming to improve the efficiency of storage resources on-chip, reducing off-chip bandwidth of these two issues, proposed construction of a data cache reuse. Multi-block SPRAM to cross cached images and the on-chip ping-pong operation takes full advantage of the data convolution calculation reuse, design a new ASIC data scheduling scheme and overall architecture. Experimental results show that the structure can achieve 40× 32 size of template real-time convolution operations, and improve the utilization of on-chip memory bandwidth and on-chip memory resources, the experimental results show that the structure satisfies the conditions to maximize data throughput output , reducing the need for off-chip memory bandwidth.
Hybrid architecture for encoded measurement-based quantum computation
Zwerger, M.; Briegel, H. J.; Dür, W.
2014-01-01
We present a hybrid scheme for quantum computation that combines the modular structure of elementary building blocks used in the circuit model with the advantages of a measurement-based approach to quantum computation. We show how to construct optimal resource states of minimal size to implement elementary building blocks for encoded quantum computation in a measurement-based way, including states for error correction and encoded gates. The performance of the scheme is determined by the quality of the resource states, where within the considered error model a threshold of the order of 10% local noise per particle for fault-tolerant quantum computation and quantum communication. PMID:24946906
Integration of a CAS/DGS as a CAD System in the Mathematics Curriculum for Architecture Students
ERIC Educational Resources Information Center
Falcon, R. M.
2011-01-01
Students of Architecture and Building Engineering Degrees work with Computer Aided Design systems daily in order to design and model architectonic constructions. Since this kind of software is based on the creation and transformation of geometrical objects, it seems to be a useful tool in Maths classes in order to capture the attention of the…
ERIC Educational Resources Information Center
Brown, Sydney E.; Karle, Sarah Thomas; Kelly, Brian
2015-01-01
DSGN110 was a multidisciplinary course teaching first year students enrolled in in a variety of majors about design thinking. The course is offered for the majors of architecture, landscape architecture, interior design, community and regional planning, along with computer science and business students. By blending face-to-face and online…
2013-03-29
Assessor that is in the SoS agent. Figure 31. Fuzzy Assessor for the SoS Agent for Assessment of SoS Architecture «subsystem» Fuzzy Rules « datatype ...Affordability « datatype » Flexibility « datatype » Performance « datatype » Robustness Input Input Input Input « datatype » Architecture QualityOutput Fuzzy
Reprogrammable logic in memristive crossbar for in-memory computing
NASA Astrophysics Data System (ADS)
Cheng, Long; Zhang, Mei-Yun; Li, Yi; Zhou, Ya-Xiong; Wang, Zhuo-Rui; Hu, Si-Yu; Long, Shi-Bing; Liu, Ming; Miao, Xiang-Shui
2017-12-01
Memristive stateful logic has emerged as a promising next-generation in-memory computing paradigm to address escalating computing-performance pressures in traditional von Neumann architecture. Here, we present a nonvolatile reprogrammable logic method that can process data between different rows and columns in a memristive crossbar array based on material implication (IMP) logic. Arbitrary Boolean logic can be executed with a reprogrammable cell containing four memristors in a crossbar array. In the fabricated Ti/HfO2/W memristive array, some fundamental functions, such as universal NAND logic and data transfer, were experimentally implemented. Moreover, using eight memristors in a 2 × 4 array, a one-bit full adder was theoretically designed and verified by simulation to exhibit the feasibility of our method to accomplish complex computing tasks. In addition, some critical logic-related performances were further discussed, such as the flexibility of data processing, cascading problem and bit error rate. Such a method could be a step forward in developing IMP-based memristive nonvolatile logic for large-scale in-memory computing architecture.
Convolutional neural network architectures for predicting DNA–protein binding
Zeng, Haoyang; Edwards, Matthew D.; Liu, Ge; Gifford, David K.
2016-01-01
Motivation: Convolutional neural networks (CNN) have outperformed conventional methods in modeling the sequence specificity of DNA–protein binding. Yet inappropriate CNN architectures can yield poorer performance than simpler models. Thus an in-depth understanding of how to match CNN architecture to a given task is needed to fully harness the power of CNNs for computational biology applications. Results: We present a systematic exploration of CNN architectures for predicting DNA sequence binding using a large compendium of transcription factor datasets. We identify the best-performing architectures by varying CNN width, depth and pooling designs. We find that adding convolutional kernels to a network is important for motif-based tasks. We show the benefits of CNNs in learning rich higher-order sequence features, such as secondary motifs and local sequence context, by comparing network performance on multiple modeling tasks ranging in difficulty. We also demonstrate how careful construction of sequence benchmark datasets, using approaches that control potentially confounding effects like positional or motif strength bias, is critical in making fair comparisons between competing methods. We explore how to establish the sufficiency of training data for these learning tasks, and we have created a flexible cloud-based framework that permits the rapid exploration of alternative neural network architectures for problems in computational biology. Availability and Implementation: All the models analyzed are available at http://cnn.csail.mit.edu. Contact: gifford@mit.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27307608
FPGA-Based, Self-Checking, Fault-Tolerant Computers
NASA Technical Reports Server (NTRS)
Some, Raphael; Rennels, David
2004-01-01
A proposed computer architecture would exploit the capabilities of commercially available field-programmable gate arrays (FPGAs) to enable computers to detect and recover from bit errors. The main purpose of the proposed architecture is to enable fault-tolerant computing in the presence of single-event upsets (SEUs). [An SEU is a spurious bit flip (also called a soft error) caused by a single impact of ionizing radiation.] The architecture would also enable recovery from some soft errors caused by electrical transients and, to some extent, from intermittent and permanent (hard) errors caused by aging of electronic components. A typical FPGA of the current generation contains one or more complete processor cores, memories, and highspeed serial input/output (I/O) channels, making it possible to shrink a board-level processor node to a single integrated-circuit chip. Custom, highly efficient microcontrollers, general-purpose computers, custom I/O processors, and signal processors can be rapidly and efficiently implemented by use of FPGAs. Unfortunately, FPGAs are susceptible to SEUs. Prior efforts to mitigate the effects of SEUs have yielded solutions that degrade performance of the system and require support from external hardware and software. In comparison with other fault-tolerant- computing architectures (e.g., triple modular redundancy), the proposed architecture could be implemented with less circuitry and lower power demand. Moreover, the fault-tolerant computing functions would require only minimal support from circuitry outside the central processing units (CPUs) of computers, would not require any software support, and would be largely transparent to software and to other computer hardware. There would be two types of modules: a self-checking processor module and a memory system (see figure). The self-checking processor module would be implemented on a single FPGA and would be capable of detecting its own internal errors. It would contain two CPUs executing identical programs in lock step, with comparison of their outputs to detect errors. It would also contain various cache local memory circuits, communication circuits, and configurable special-purpose processors that would use self-checking checkers. (The basic principle of the self-checking checker method is to utilize logic circuitry that generates error signals whenever there is an error in either the checker or the circuit being checked.) The memory system would comprise a main memory and a hardware-controlled check-pointing system (CPS) based on a buffer memory denoted the recovery cache. The main memory would contain random-access memory (RAM) chips and FPGAs that would, in addition to everything else, implement double-error-detecting and single-error-correcting memory functions to enable recovery from single-bit errors.
The flight telerobotic servicer: From functional architecture to computer architecture
NASA Technical Reports Server (NTRS)
Lumia, Ronald; Fiala, John
1989-01-01
After a brief tutorial on the NASA/National Bureau of Standards Standard Reference Model for Telerobot Control System Architecture (NASREM) functional architecture, the approach to its implementation is shown. First, interfaces must be defined which are capable of supporting the known algorithms. This is illustrated by considering the interfaces required for the SERVO level of the NASREM functional architecture. After interface definition, the specific computer architecture for the implementation must be determined. This choice is obviously technology dependent. An example illustrating one possible mapping of the NASREM functional architecture to a particular set of computers which implements it is shown. The result of choosing the NASREM functional architecture is that it provides a technology independent paradigm which can be mapped into a technology dependent implementation capable of evolving with technology in the laboratory and in space.
NASA Astrophysics Data System (ADS)
Yarovyi, Andrii A.; Timchenko, Leonid I.; Kozhemiako, Volodymyr P.; Kokriatskaia, Nataliya I.; Hamdi, Rami R.; Savchuk, Tamara O.; Kulyk, Oleksandr O.; Surtel, Wojciech; Amirgaliyev, Yedilkhan; Kashaganova, Gulzhan
2017-08-01
The paper deals with a problem of insufficient productivity of existing computer means for large image processing, which do not meet modern requirements posed by resource-intensive computing tasks of laser beam profiling. The research concentrated on one of the profiling problems, namely, real-time processing of spot images of the laser beam profile. Development of a theory of parallel-hierarchic transformation allowed to produce models for high-performance parallel-hierarchical processes, as well as algorithms and software for their implementation based on the GPU-oriented architecture using GPGPU technologies. The analyzed performance of suggested computerized tools for processing and classification of laser beam profile images allows to perform real-time processing of dynamic images of various sizes.
Efficient parallel implementation of active appearance model fitting algorithm on GPU.
Wang, Jinwei; Ma, Xirong; Zhu, Yuanping; Sun, Jizhou
2014-01-01
The active appearance model (AAM) is one of the most powerful model-based object detecting and tracking methods which has been widely used in various situations. However, the high-dimensional texture representation causes very time-consuming computations, which makes the AAM difficult to apply to real-time systems. The emergence of modern graphics processing units (GPUs) that feature a many-core, fine-grained parallel architecture provides new and promising solutions to overcome the computational challenge. In this paper, we propose an efficient parallel implementation of the AAM fitting algorithm on GPUs. Our design idea is fine grain parallelism in which we distribute the texture data of the AAM, in pixels, to thousands of parallel GPU threads for processing, which makes the algorithm fit better into the GPU architecture. We implement our algorithm using the compute unified device architecture (CUDA) on the Nvidia's GTX 650 GPU, which has the latest Kepler architecture. To compare the performance of our algorithm with different data sizes, we built sixteen face AAM models of different dimensional textures. The experiment results show that our parallel AAM fitting algorithm can achieve real-time performance for videos even on very high-dimensional textures.
Efficient Parallel Implementation of Active Appearance Model Fitting Algorithm on GPU
Wang, Jinwei; Ma, Xirong; Zhu, Yuanping; Sun, Jizhou
2014-01-01
The active appearance model (AAM) is one of the most powerful model-based object detecting and tracking methods which has been widely used in various situations. However, the high-dimensional texture representation causes very time-consuming computations, which makes the AAM difficult to apply to real-time systems. The emergence of modern graphics processing units (GPUs) that feature a many-core, fine-grained parallel architecture provides new and promising solutions to overcome the computational challenge. In this paper, we propose an efficient parallel implementation of the AAM fitting algorithm on GPUs. Our design idea is fine grain parallelism in which we distribute the texture data of the AAM, in pixels, to thousands of parallel GPU threads for processing, which makes the algorithm fit better into the GPU architecture. We implement our algorithm using the compute unified device architecture (CUDA) on the Nvidia's GTX 650 GPU, which has the latest Kepler architecture. To compare the performance of our algorithm with different data sizes, we built sixteen face AAM models of different dimensional textures. The experiment results show that our parallel AAM fitting algorithm can achieve real-time performance for videos even on very high-dimensional textures. PMID:24723812
High Performance GPU-Based Fourier Volume Rendering.
Abdellah, Marwan; Eldeib, Ayman; Sharawi, Amr
2015-01-01
Fourier volume rendering (FVR) is a significant visualization technique that has been used widely in digital radiography. As a result of its (N (2)logN) time complexity, it provides a faster alternative to spatial domain volume rendering algorithms that are (N (3)) computationally complex. Relying on the Fourier projection-slice theorem, this technique operates on the spectral representation of a 3D volume instead of processing its spatial representation to generate attenuation-only projections that look like X-ray radiographs. Due to the rapid evolution of its underlying architecture, the graphics processing unit (GPU) became an attractive competent platform that can deliver giant computational raw power compared to the central processing unit (CPU) on a per-dollar-basis. The introduction of the compute unified device architecture (CUDA) technology enables embarrassingly-parallel algorithms to run efficiently on CUDA-capable GPU architectures. In this work, a high performance GPU-accelerated implementation of the FVR pipeline on CUDA-enabled GPUs is presented. This proposed implementation can achieve a speed-up of 117x compared to a single-threaded hybrid implementation that uses the CPU and GPU together by taking advantage of executing the rendering pipeline entirely on recent GPU architectures.
Real-time control system for adaptive resonator
DOE Office of Scientific and Technical Information (OSTI.GOV)
Flath, L; An, J; Brase, J
2000-07-24
Sustained operation of high average power solid-state lasers currently requires an adaptive resonator to produce the optimal beam quality. We describe the architecture of a real-time adaptive control system for correcting intra-cavity aberrations in a heat capacity laser. Image data collected from a wavefront sensor are processed and used to control phase with a high-spatial-resolution deformable mirror. Our controller takes advantage of recent developments in low-cost, high-performance processor technology. A desktop-based computational engine and object-oriented software architecture replaces the high-cost rack-mount embedded computers of previous systems.
76 FR 34965 - Cybersecurity, Innovation, and the Internet Economy
Federal Register 2010, 2011, 2012, 2013, 2014
2011-06-15
... disrupt computing systems. These threats are exacerbated by the interconnected and interdependent architecture of today's computing environment. Theoretically, security deficiencies in one area may provide... does the move to cloud-based services have on education and research efforts in the I3S? 45. What is...
USDA-ARS?s Scientific Manuscript database
Service oriented architectures allow modelling engines to be hosted over the Internet abstracting physical hardware configuration and software deployments from model users. Many existing environmental models are deployed as desktop applications running on user's personal computers (PCs). Migration ...
High performance network and channel-based storage
NASA Technical Reports Server (NTRS)
Katz, Randy H.
1991-01-01
In the traditional mainframe-centered view of a computer system, storage devices are coupled to the system through complex hardware subsystems called input/output (I/O) channels. With the dramatic shift towards workstation-based computing, and its associated client/server model of computation, storage facilities are now found attached to file servers and distributed throughout the network. We discuss the underlying technology trends that are leading to high performance network-based storage, namely advances in networks, storage devices, and I/O controller and server architectures. We review several commercial systems and research prototypes that are leading to a new approach to high performance computing based on network-attached storage.
Computer aided design of architecture of degradable tissue engineering scaffolds.
Heljak, M K; Kurzydlowski, K J; Swieszkowski, W
2017-11-01
One important factor affecting the process of tissue regeneration is scaffold stiffness loss, which should be properly balanced with the rate of tissue regeneration. The aim of the research reported here was to develop a computer tool for designing the architecture of biodegradable scaffolds fabricated by melt-dissolution deposition systems (e.g. Fused Deposition Modeling) to provide the required scaffold stiffness at each stage of degradation/regeneration. The original idea presented in the paper is that the stiffness of a tissue engineering scaffold can be controlled during degradation by means of a proper selection of the diameter of the constituent fibers and the distances between them. This idea is based on the size-effect on degradation of aliphatic polyesters. The presented computer tool combines a genetic algorithm and a diffusion-reaction model of polymer hydrolytic degradation. In particular, we show how to design the architecture of scaffolds made of poly(DL-lactide-co-glycolide) with the required Young's modulus change during hydrolytic degradation.
Innovative architectures for dense multi-microprocessor computers
NASA Technical Reports Server (NTRS)
Donaldson, Thomas; Doty, Karl; Engle, Steven W.; Larson, Robert E.; O'Reilly, John G.
1988-01-01
The results of a Phase I Small Business Innovative Research (SBIR) project performed for the NASA Langley Computational Structural Mechanics Group are described. The project resulted in the identification of a family of chordal-ring interconnection architectures with excellent potential to serve as the basis for new multimicroprocessor (MMP) computers. The paper presents examples of how computational algorithms from structural mechanics can be efficiently implemented on the chordal-ring architecture.
Systematic Development of Intelligent Systems for Public Road Transport.
García, Carmelo R; Quesada-Arencibia, Alexis; Cristóbal, Teresa; Padrón, Gabino; Alayón, Francisco
2016-07-16
This paper presents an architecture model for the development of intelligent systems for public passenger transport by road. The main objective of our proposal is to provide a framework for the systematic development and deployment of telematics systems to improve various aspects of this type of transport, such as efficiency, accessibility and safety. The architecture model presented herein is based on international standards on intelligent transport system architectures, ubiquitous computing and service-oriented architecture for distributed systems. To illustrate the utility of the model, we also present a use case of a monitoring system for stops on a public passenger road transport network.
Systematic Development of Intelligent Systems for Public Road Transport
García, Carmelo R.; Quesada-Arencibia, Alexis; Cristóbal, Teresa; Padrón, Gabino; Alayón, Francisco
2016-01-01
This paper presents an architecture model for the development of intelligent systems for public passenger transport by road. The main objective of our proposal is to provide a framework for the systematic development and deployment of telematics systems to improve various aspects of this type of transport, such as efficiency, accessibility and safety. The architecture model presented herein is based on international standards on intelligent transport system architectures, ubiquitous computing and service-oriented architecture for distributed systems. To illustrate the utility of the model, we also present a use case of a monitoring system for stops on a public passenger road transport network. PMID:27438836
Architecture Framework for Trapped-Ion Quantum Computer based on Performance Simulation Tool
NASA Astrophysics Data System (ADS)
Ahsan, Muhammad
The challenge of building scalable quantum computer lies in striking appropriate balance between designing a reliable system architecture from large number of faulty computational resources and improving the physical quality of system components. The detailed investigation of performance variation with physics of the components and the system architecture requires adequate performance simulation tool. In this thesis we demonstrate a software tool capable of (1) mapping and scheduling the quantum circuit on a realistic quantum hardware architecture with physical resource constraints, (2) evaluating the performance metrics such as the execution time and the success probability of the algorithm execution, and (3) analyzing the constituents of these metrics and visualizing resource utilization to identify system components which crucially define the overall performance. Using this versatile tool, we explore vast design space for modular quantum computer architecture based on trapped ions. We find that while success probability is uniformly determined by the fidelity of physical quantum operation, the execution time is a function of system resources invested at various layers of design hierarchy. At physical level, the number of lasers performing quantum gates, impact the latency of the fault-tolerant circuit blocks execution. When these blocks are used to construct meaningful arithmetic circuit such as quantum adders, the number of ancilla qubits for complicated non-clifford gates and entanglement resources to establish long-distance communication channels, become major performance limiting factors. Next, in order to factorize large integers, these adders are assembled into modular exponentiation circuit comprising bulk of Shor's algorithm. At this stage, the overall scaling of resource-constraint performance with the size of problem, describes the effectiveness of chosen design. By matching the resource investment with the pace of advancement in hardware technology, we find optimal designs for different types of quantum adders. Conclusively, we show that 2,048-bit Shor's algorithm can be reliably executed within the resource budget of 1.5 million qubits.
Fuzzy Behavior Modulation with Threshold Activation for Autonomous Vehicle Navigation
NASA Technical Reports Server (NTRS)
Tunstel, Edward
2000-01-01
This paper describes fuzzy logic techniques used in a hierarchical behavior-based architecture for robot navigation. An architectural feature for threshold activation of fuzzy-behaviors is emphasized, which is potentially useful for tuning navigation performance in real world applications. The target application is autonomous local navigation of a small planetary rover. Threshold activation of low-level navigation behaviors is the primary focus. A preliminary assessment of its impact on local navigation performance is provided based on computer simulations.
An Object Oriented Extensible Architecture for Affordable Aerospace Propulsion Systems
NASA Technical Reports Server (NTRS)
Follen, Gregory J.; Lytle, John K. (Technical Monitor)
2002-01-01
Driven by a need to explore and develop propulsion systems that exceeded current computing capabilities, NASA Glenn embarked on a novel strategy leading to the development of an architecture that enables propulsion simulations never thought possible before. Full engine 3 Dimensional Computational Fluid Dynamic propulsion system simulations were deemed impossible due to the impracticality of the hardware and software computing systems required. However, with a software paradigm shift and an embracing of parallel and distributed processing, an architecture was designed to meet the needs of future propulsion system modeling. The author suggests that the architecture designed at the NASA Glenn Research Center for propulsion system modeling has potential for impacting the direction of development of affordable weapons systems currently under consideration by the Applied Vehicle Technology Panel (AVT). This paper discusses the salient features of the NPSS Architecture including its interface layer, object layer, implementation for accessing legacy codes, numerical zooming infrastructure and its computing layer. The computing layer focuses on the use and deployment of these propulsion simulations on parallel and distributed computing platforms which has been the focus of NASA Ames. Additional features of the object oriented architecture that support MultiDisciplinary (MD) Coupling, computer aided design (CAD) access and MD coupling objects will be discussed. Included will be a discussion of the successes, challenges and benefits of implementing this architecture.
The CEOS Global Observation Strategy for Disaster Risk Management: An Enterprise Architect's View
NASA Astrophysics Data System (ADS)
Moe, K.; Evans, J. D.; Frye, S.
2013-12-01
The Committee on Earth Observation Satellites (CEOS) Working Group on Information Systems and Services (WGISS), on behalf of the Global Earth Observation System of Systems (GEOSS), is defining an enterprise architecture (known as GA.4.D) for the use of satellite observations in international disaster management. This architecture defines the scope and structure of the disaster management enterprise (based on disaster types and phases); its processes (expressed via use cases / system functions); and its core values (in particular, free and open data sharing via standard interfaces). The architecture also details how a disaster management enterprise describes, obtains, and handles earth observations and data products for decision-support; and how it draws on distributed computational services for streamlined operational capability. We have begun to apply this architecture to a new CEOS initiative, the Global Observation Strategy for Disaster Risk Management (DRM). CEOS is defining this Strategy based on the outcomes of three pilot projects focused on seismic hazards, volcanoes, and floods. These pilots offer a unique opportunity to characterize and assess the impacts (benefits / costs) of the GA.4.D architecture in practice. In particular, the DRM Floods Pilot is applying satellite-based optical and radar data to flood mitigation, warning, and response, including monitoring and modeling at regional to global scales. It is focused on serving user needs and building local institutional / technical capacity in the Caribbean, Southern Africa, and Southeast Asia. In the context of these CEOS DRM Pilots, we are characterizing where and how the GA.4D architecture helps participants to: - Understand the scope and nature of hazard events quickly and accurately - Assure timely delivery of observations into analysis, modeling, and decision-making - Streamline user access to products - Lower barriers to entry for users or suppliers - Streamline or focus field operations in disaster reduction - Reduce redundancies and gaps in inter-organizational systems - Assist in planning / managing / prioritizing information and computing resources - Adapt computational resources to new technologies or evolving user needs - Sustain capability for the long term Insights from this exercise are helping us to abstract best practices applicable to other contexts, disaster types, and disaster phases, whereby local communities can improve their use of satellite data for greater preparedness. This effort is also helping to assess the likely impacts and roles of emerging technologies (such as cloud computing, "Big Data" analysis, location-based services, crowdsourcing, semantic services, small satellites, drones, direct broadcast, or model webs) in future disaster management activities.
A Web Centric Architecture for Deploying Multi-Disciplinary Engineering Design Processes
NASA Technical Reports Server (NTRS)
Woyak, Scott; Kim, Hongman; Mullins, James; Sobieszczanski-Sobieski, Jaroslaw
2004-01-01
There are continuous needs for engineering organizations to improve their design process. Current state of the art techniques use computational simulations to predict design performance, and optimize it through advanced design methods. These tools have been used mostly by individual engineers. This paper presents an architecture for achieving results at an organization level beyond individual level. The next set of gains in process improvement will come from improving the effective use of computers and software within a whole organization, not just for an individual. The architecture takes advantage of state of the art capabilities to produce a Web based system to carry engineering design into the future. To illustrate deployment of the architecture, a case study for implementing advanced multidisciplinary design optimization processes such as Bi-Level Integrated System Synthesis is discussed. Another example for rolling-out a design process for Design for Six Sigma is also described. Each example explains how an organization can effectively infuse engineering practice with new design methods and retain the knowledge over time.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, C.; Yu, G.; Wang, K.
The physical designs of the new concept reactors which have complex structure, various materials and neutronic energy spectrum, have greatly improved the requirements to the calculation methods and the corresponding computing hardware. Along with the widely used parallel algorithm, heterogeneous platforms architecture has been introduced into numerical computations in reactor physics. Because of the natural parallel characteristics, the CPU-FPGA architecture is often used to accelerate numerical computation. This paper studies the application and features of this kind of heterogeneous platforms used in numerical calculation of reactor physics through practical examples. After the designed neutron diffusion module based on CPU-FPGA architecturemore » achieves a 11.2 speed up factor, it is proved to be feasible to apply this kind of heterogeneous platform into reactor physics. (authors)« less
Distributed computing environments for future space control systems
NASA Technical Reports Server (NTRS)
Viallefont, Pierre
1993-01-01
The aim of this paper is to present the results of a CNES research project on distributed computing systems. The purpose of this research was to study the impact of the use of new computer technologies in the design and development of future space applications. The first part of this study was a state-of-the-art review of distributed computing systems. One of the interesting ideas arising from this review is the concept of a 'virtual computer' allowing the distributed hardware architecture to be hidden from a software application. The 'virtual computer' can improve system performance by adapting the best architecture (addition of computers) to the software application without having to modify its source code. This concept can also decrease the cost and obsolescence of the hardware architecture. In order to verify the feasibility of the 'virtual computer' concept, a prototype representative of a distributed space application is being developed independently of the hardware architecture.
Electro-Optic Computing Architectures. Volume I
1998-02-01
The objective of the Electro - Optic Computing Architecture (EOCA) program was to develop multi-function electro - optic interfaces and optical...interconnect units to enhance the performance of parallel processor systems and form the building blocks for future electro - optic computing architectures...Specifically, three multi-function interface modules were targeted for development - an Electro - Optic Interface (EOI), an Optical Interconnection Unit (OW
NASA Astrophysics Data System (ADS)
Litinski, Daniel; Kesselring, Markus S.; Eisert, Jens; von Oppen, Felix
2017-07-01
We present a scalable architecture for fault-tolerant topological quantum computation using networks of voltage-controlled Majorana Cooper pair boxes and topological color codes for error correction. Color codes have a set of transversal gates which coincides with the set of topologically protected gates in Majorana-based systems, namely, the Clifford gates. In this way, we establish color codes as providing a natural setting in which advantages offered by topological hardware can be combined with those arising from topological error-correcting software for full-fledged fault-tolerant quantum computing. We provide a complete description of our architecture, including the underlying physical ingredients. We start by showing that in topological superconductor networks, hexagonal cells can be employed to serve as physical qubits for universal quantum computation, and we present protocols for realizing topologically protected Clifford gates. These hexagonal-cell qubits allow for a direct implementation of open-boundary color codes with ancilla-free syndrome read-out and logical T gates via magic-state distillation. For concreteness, we describe how the necessary operations can be implemented using networks of Majorana Cooper pair boxes, and we give a feasibility estimate for error correction in this architecture. Our approach is motivated by nanowire-based networks of topological superconductors, but it could also be realized in alternative settings such as quantum-Hall-superconductor hybrids.
Visualization of decision processes using a cognitive architecture
NASA Astrophysics Data System (ADS)
Livingston, Mark A.; Murugesan, Arthi; Brock, Derek; Frost, Wende K.; Perzanowski, Dennis
2013-01-01
Cognitive architectures are computational theories of reasoning the human mind engages in as it processes facts and experiences. A cognitive architecture uses declarative and procedural knowledge to represent mental constructs that are involved in decision making. Employing a model of behavioral and perceptual constraints derived from a set of one or more scenarios, the architecture reasons about the most likely consequence(s) of a sequence of events. Reasoning of any complexity and depth involving computational processes, however, is often opaque and challenging to comprehend. Arguably, for decision makers who may need to evaluate or question the results of autonomous reasoning, it would be useful to be able to inspect the steps involved in an interactive, graphical format. When a chain of evidence and constraint-based decision points can be visualized, it becomes easier to explore both how and why a scenario of interest will likely unfold in a particular way. In initial work on a scheme for visualizing cognitively-based decision processes, we focus on generating graphical representations of models run in the Polyscheme cognitive architecture. Our visualization algorithm operates on a modified version of Polyscheme's output, which is accomplished by augmenting models with a simple set of tags. We provide example visualizations and discuss properties of our technique that pose challenges for our representation goals. We conclude with a summary of feedback solicited from domain experts and practitioners in the field of cognitive modeling.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Armstrong, Robert C.; Ray, Jaideep; Malony, A.
2003-11-01
We present a case study of performance measurement and modeling of a CCA (Common Component Architecture) component-based application in a high performance computing environment. We explore issues peculiar to component-based HPC applications and propose a performance measurement infrastructure for HPC based loosely on recent work done for Grid environments. A prototypical implementation of the infrastructure is used to collect data for a three components in a scientific application and construct performance models for two of them. Both computational and message-passing performance are addressed.
ERIC Educational Resources Information Center
La Brecque, Mort
1984-01-01
To break the bottleneck inherent in today's linear computer architectures, parallel schemes (which allow computers to perform multiple tasks at one time) are being devised. Several of these schemes are described. Dataflow devices, parallel number-crunchers, programing languages, and a device based on a neurological model are among the areas…
Recent Trends in Spintronics-Based Nanomagnetic Logic
NASA Astrophysics Data System (ADS)
Das, Jayita; Alam, Syed M.; Bhanja, Sanjukta
2014-09-01
With the growing concerns of standby power in sub-100-nm CMOS technologies, alternative computing techniques and memory technologies are explored. Spin transfer torque magnetoresistive RAM (STT-MRAM) is one such nonvolatile memory relying on magnetic tunnel junctions (MTJs) to store information. It uses spin transfer torque to write information and magnetoresistance to read information. In 2012, Everspin Technologies, Inc. commercialized the first 64Mbit Spin Torque MRAM. On the computing end, nanomagnetic logic (NML) is a promising technique with zero leakage and high data retention. In 2000, Cowburn and Welland first demonstrated its potential in logic and information propagation through magnetostatic interaction in a chain of single domain circular nanomagnetic dots of Supermalloy (Ni80Fe14Mo5X1, X is other metals). In 2006, Imre et al. demonstrated wires and majority gates followed by coplanar cross wire systems demonstration in 2010 by Pulecio et al. Since 2004 researchers have also investigated the potential of MTJs in logic. More recently with dipolar coupling between MTJs demonstrated in 2012, logic-in-memory architecture with STT-MRAM have been investigated. The architecture borrows the computing concept from NML and read and write style from MRAM. The architecture can switch its operation between logic and memory modes with clock as classifier. Further through logic partitioning between MTJ and CMOS plane, a significant performance boost has been observed in basic computing blocks within the architecture. In this work, we have explored the developments in NML, in MTJs and more recent developments in hybrid MTJ/CMOS logic-in-memory architecture and its unique logic partitioning capability.
Molecular Sticker Model Stimulation on Silicon for a Maximum Clique Problem
Ning, Jianguo; Li, Yanmei; Yu, Wen
2015-01-01
Molecular computers (also called DNA computers), as an alternative to traditional electronic computers, are smaller in size but more energy efficient, and have massive parallel processing capacity. However, DNA computers may not outperform electronic computers owing to their higher error rates and some limitations of the biological laboratory. The stickers model, as a typical DNA-based computer, is computationally complete and universal, and can be viewed as a bit-vertically operating machine. This makes it attractive for silicon implementation. Inspired by the information processing method on the stickers computer, we propose a novel parallel computing model called DEM (DNA Electronic Computing Model) on System-on-a-Programmable-Chip (SOPC) architecture. Except for the significant difference in the computing medium—transistor chips rather than bio-molecules—the DEM works similarly to DNA computers in immense parallel information processing. Additionally, a plasma display panel (PDP) is used to show the change of solutions, and helps us directly see the distribution of assignments. The feasibility of the DEM is tested by applying it to compute a maximum clique problem (MCP) with eight vertices. Owing to the limited computing sources on SOPC architecture, the DEM could solve moderate-size problems in polynomial time. PMID:26075867
Bit-parallel arithmetic in a massively-parallel associative processor
NASA Technical Reports Server (NTRS)
Scherson, Isaac D.; Kramer, David A.; Alleyne, Brian D.
1992-01-01
A simple but powerful new architecture based on a classical associative processor model is presented. Algorithms for performing the four basic arithmetic operations both for integer and floating point operands are described. For m-bit operands, the proposed architecture makes it possible to execute complex operations in O(m) cycles as opposed to O(m exp 2) for bit-serial machines. A word-parallel, bit-parallel, massively-parallel computing system can be constructed using this architecture with VLSI technology. The operation of this system is demonstrated for the fast Fourier transform and matrix multiplication.
Kou, W; Pandolfino, J E; Kahrilas, P J; Patankar, N A
2017-06-01
Based on a fully coupled computational model of esophageal transport, we analyzed how varied esophageal muscle fiber architecture and/or dual contraction waves (CWs) affect bolus transport. Specifically, we studied the luminal pressure profile in those cases to better understand possible origins of the peristaltic transition zone. Two groups of studies were conducted using a computational model. The first studied esophageal transport with circumferential-longitudinal fiber architecture, helical fiber architecture and various combinations of the two. In the second group, cases with dual CWs and varied muscle fiber architecture were simulated. Overall transport characteristics were examined and the space-time profiles of luminal pressure were plotted and compared. Helical muscle fiber architecture featured reduced circumferential wall stress, greater esophageal distensibility, and greater axial shortening. Non-uniform fiber architecture featured a peristaltic pressure trough between two high-pressure segments. The distal pressure segment showed greater amplitude than the proximal segment, consistent with experimental data. Dual CWs also featured a pressure trough between two high-pressure segments. However, the minimum pressure in the region of overlap was much lower, and the amplitudes of the two high-pressure segments were similar. The efficacy of esophageal transport is greatly affected by muscle fiber architecture. The peristaltic transition zone may be attributable to non-uniform architecture of muscle fibers along the length of the esophagus and/or dual CWs. The difference in amplitude between the proximal and distal pressure segments may be attributable to non-uniform muscle fiber architecture. © 2017 John Wiley & Sons Ltd.
Image-Processing Software For A Hypercube Computer
NASA Technical Reports Server (NTRS)
Lee, Meemong; Mazer, Alan S.; Groom, Steven L.; Williams, Winifred I.
1992-01-01
Concurrent Image Processing Executive (CIPE) is software system intended to develop and use image-processing application programs on concurrent computing environment. Designed to shield programmer from complexities of concurrent-system architecture, it provides interactive image-processing environment for end user. CIPE utilizes architectural characteristics of particular concurrent system to maximize efficiency while preserving architectural independence from user and programmer. CIPE runs on Mark-IIIfp 8-node hypercube computer and associated SUN-4 host computer.
Experimental Comparison of Two Quantum Computing Architectures
2017-03-28
IN A U G U RA L A RT IC LE CO M PU TE R SC IE N CE S Experimental comparison of two quantum computing architectures Norbert M. Linkea,b,1, Dmitri...the vast computing power a universal quantumcomputer could offer, several candidate systems are being explored. They have allowed experimental ...existing systems and the role of architecture in quantum computer design . These will be crucial for the realization of more advanced future incarna
Fracture Response Enhancement Of Aluminum Using In-Situ Selective Reinforcement
NASA Technical Reports Server (NTRS)
Abada, Christopher H.; Farley, Gary L.; Hyer, Michael W.
2006-01-01
A computer-based parametric study of the effect of reinforcement architectures on fracture response of aluminum compact-tension (CT) specimens is performed. Eleven different reinforcement architectures consisting of rectangular and triangular cross-section reinforcements were evaluated. Reinforced specimens produced between 13 and 28 percent higher fracture load than achieved with the unreinforced case. Reinforcements with blunt leading edges (rectangular reinforcements) exhibited superior performance relative to the triangular reinforcements with sharp leading edges. Relative to the rectangular reinforcements, the most important architectural feature was reinforcement thickness. At failure, the reinforcements carried between 58 and 85 percent of the load applied to the specimen, suggesting that there is considerable load transfer between the base material and the reinforcement.
Reliability models for dataflow computer systems
NASA Technical Reports Server (NTRS)
Kavi, K. M.; Buckles, B. P.
1985-01-01
The demands for concurrent operation within a computer system and the representation of parallelism in programming languages have yielded a new form of program representation known as data flow (DENN 74, DENN 75, TREL 82a). A new model based on data flow principles for parallel computations and parallel computer systems is presented. Necessary conditions for liveness and deadlock freeness in data flow graphs are derived. The data flow graph is used as a model to represent asynchronous concurrent computer architectures including data flow computers.
NASA Technical Reports Server (NTRS)
Vaden, Karl R.
2006-01-01
Communication systems for future NASA interplanetary spacecraft require transmitter power ranging from several hundred watts to kilowatts. Several hybrid junctions are considered as elements within a corporate combining architecture for high power Ka-band space traveling-wave tube amplifiers (TWTAs). This report presents the simulated transmission characteristics of several hybrid junctions designed for a low loss, high power waveguide based power combiner.
Remote Adaptive Communication System
2001-10-25
manage several different devices using the software tool A. Client /Server Architecture The architecture we are proposing is based on the Client ...Server model (see figure 3). We want both client and server to be accessible from anywhere via internet. The computer, acting as a server, is in...the other hand, each of the client applications will act as sender or receiver, depending on the associated interface: user interface or device
Yang, Shu; Qiu, Yuyan; Shi, Bo
2016-09-01
This paper explores the methods of building the internet of things of a regional ECG monitoring, focused on the implementation of ECG monitoring center based on cloud computing platform. It analyzes implementation principles of automatic identifi cation in the types of arrhythmia. It also studies the system architecture and key techniques of cloud computing platform, including server load balancing technology, reliable storage of massive smalfi les and the implications of quick search function.
System design in an evolving system-of-systems architecture and concept of operations
NASA Astrophysics Data System (ADS)
Rovekamp, Roger N., Jr.
Proposals for space exploration architectures have increased in complexity and scope. Constituent systems (e.g., rovers, habitats, in-situ resource utilization facilities, transfer vehicles, etc) must meet the needs of these architectures by performing in multiple operational environments and across multiple phases of the architecture's evolution. This thesis proposes an approach for using system-of-systems engineering principles in conjunction with system design methods (e.g., Multi-objective optimization, genetic algorithms, etc) to create system design options that perform effectively at both the system and system-of-systems levels, across multiple concepts of operations, and over multiple architectural phases. The framework is presented by way of an application problem that investigates the design of power systems within a power sharing architecture for use in a human Lunar Surface Exploration Campaign. A computer model has been developed that uses candidate power grid distribution solutions for a notional lunar base. The agent-based model utilizes virtual control agents to manage the interactions of various exploration and infrastructure agents. The philosophy behind the model is based both on lunar power supply strategies proposed in literature, as well as on the author's own approaches for power distribution strategies of future lunar bases. In addition to proposing a framework for system design, further implications of system-of-systems engineering principles are briefly explored, specifically as they relate to producing more robust cross-cultural system-of-systems architecture solutions.
Oh, Sungyoung; Cha, Jieun; Ji, Myungkyu; Kang, Hyekyung; Kim, Seok; Heo, Eunyoung; Han, Jong Soo; Kang, Hyunggoo; Chae, Hoseok; Hwang, Hee; Yoo, Sooyoung
2015-04-01
To design a cloud computing-based Healthcare Software-as-a-Service (SaaS) Platform (HSP) for delivering healthcare information services with low cost, high clinical value, and high usability. We analyzed the architecture requirements of an HSP, including the interface, business services, cloud SaaS, quality attributes, privacy and security, and multi-lingual capacity. For cloud-based SaaS services, we focused on Clinical Decision Service (CDS) content services, basic functional services, and mobile services. Microsoft's Azure cloud computing for Infrastructure-as-a-Service (IaaS) and Platform-as-a-Service (PaaS) was used. The functional and software views of an HSP were designed in a layered architecture. External systems can be interfaced with the HSP using SOAP and REST/JSON. The multi-tenancy model of the HSP was designed as a shared database, with a separate schema for each tenant through a single application, although healthcare data can be physically located on a cloud or in a hospital, depending on regulations. The CDS services were categorized into rule-based services for medications, alert registration services, and knowledge services. We expect that cloud-based HSPs will allow small and mid-sized hospitals, in addition to large-sized hospitals, to adopt information infrastructures and health information technology with low system operation and maintenance costs.
Bentsen, Thomas; May, Tobias; Kressner, Abigail A; Dau, Torsten
2018-01-01
Computational speech segregation attempts to automatically separate speech from noise. This is challenging in conditions with interfering talkers and low signal-to-noise ratios. Recent approaches have adopted deep neural networks and successfully demonstrated speech intelligibility improvements. A selection of components may be responsible for the success with these state-of-the-art approaches: the system architecture, a time frame concatenation technique and the learning objective. The aim of this study was to explore the roles and the relative contributions of these components by measuring speech intelligibility in normal-hearing listeners. A substantial improvement of 25.4 percentage points in speech intelligibility scores was found going from a subband-based architecture, in which a Gaussian Mixture Model-based classifier predicts the distributions of speech and noise for each frequency channel, to a state-of-the-art deep neural network-based architecture. Another improvement of 13.9 percentage points was obtained by changing the learning objective from the ideal binary mask, in which individual time-frequency units are labeled as either speech- or noise-dominated, to the ideal ratio mask, where the units are assigned a continuous value between zero and one. Therefore, both components play significant roles and by combining them, speech intelligibility improvements were obtained in a six-talker condition at a low signal-to-noise ratio.
Atomic switch networks-nanoarchitectonic design of a complex system for natural computing.
Demis, E C; Aguilera, R; Sillin, H O; Scharnhorst, K; Sandouk, E J; Aono, M; Stieg, A Z; Gimzewski, J K
2015-05-22
Self-organized complex systems are ubiquitous in nature, and the structural complexity of these natural systems can be used as a model to design new classes of functional nanotechnology based on highly interconnected networks of interacting units. Conventional fabrication methods for electronic computing devices are subject to known scaling limits, confining the diversity of possible architectures. This work explores methods of fabricating a self-organized complex device known as an atomic switch network and discusses its potential utility in computing. Through a merger of top-down and bottom-up techniques guided by mathematical and nanoarchitectonic design principles, we have produced functional devices comprising nanoscale elements whose intrinsic nonlinear dynamics and memorization capabilities produce robust patterns of distributed activity and a capacity for nonlinear transformation of input signals when configured in the appropriate network architecture. Their operational characteristics represent a unique potential for hardware implementation of natural computation, specifically in the area of reservoir computing-a burgeoning field that investigates the computational aptitude of complex biologically inspired systems.
The GOES-R Product Generation Architecture
NASA Astrophysics Data System (ADS)
Dittberner, G. J.; Kalluri, S.; Hansen, D.; Weiner, A.; Tarpley, A.; Marley, S.
2011-12-01
The GOES-R system will substantially improve users' ability to succeed in their work by providing data with significantly enhanced instruments, higher resolution, much shorter relook times, and an increased number and diversity of products. The Product Generation architecture is designed to provide the computer and memory resources necessary to achieve the necessary latency and availability for these products. Over time, new and updated algorithms are expected to be added and old ones removed as science advances and new products are developed. The GOES-R GS architecture is being planned to maintain functionality so that when such changes are implemented, operational product generation will continue without interruption. The primary parts of the PG infrastructure are the Service Based Architecture (SBA) and the Data Fabric (DF). SBA is the middleware that encapsulates and manages science algorithms that generate products. It is divided into three parts, the Executive, which manages and configures the algorithm as a service, the Dispatcher, which provides data to the algorithm, and the Strategy, which determines when the algorithm can execute with the available data. SBA is a distributed architecture, with services connected to each other over a compute grid and is highly scalable. This plug-and-play architecture allows algorithms to be added, removed, or updated without affecting any other services or software currently running and producing data. Algorithms require product data from other algorithms, so a scalable and reliable messaging is necessary. The SBA uses the DF to provide this data communication layer between algorithms. The DF provides an abstract interface over a distributed and persistent multi-layered storage system (e.g., memory based caching above disk-based storage) and an event management system that allows event-driven algorithm services to know when instrument data are available and where they reside. Together, the SBA and the DF provide a flexible, high performance architecture that can meet the needs of product processing now and as they grow in the future.
Particle Based Simulations of Complex Systems with MP2C : Hydrodynamics and Electrostatics
NASA Astrophysics Data System (ADS)
Sutmann, Godehard; Westphal, Lidia; Bolten, Matthias
2010-09-01
Particle based simulation methods are well established paths to explore system behavior on microscopic to mesoscopic time and length scales. With the development of new computer architectures it becomes more and more important to concentrate on local algorithms which do not need global data transfer or reorganisation of large arrays of data across processors. This requirement strongly addresses long-range interactions in particle systems, i.e. mainly hydrodynamic and electrostatic contributions. In this article, emphasis is given to the implementation and parallelization of the Multi-Particle Collision Dynamics method for hydrodynamic contributions and a splitting scheme based on Multigrid for electrostatic contributions. Implementations are done for massively parallel architectures and are demonstrated for the IBM Blue Gene/P architecture Jugene in Jülich.
Digital optical computers at the optoelectronic computing systems center
NASA Technical Reports Server (NTRS)
Jordan, Harry F.
1991-01-01
The Digital Optical Computing Program within the National Science Foundation Engineering Research Center for Opto-electronic Computing Systems has as its specific goal research on optical computing architectures suitable for use at the highest possible speeds. The program can be targeted toward exploiting the time domain because other programs in the Center are pursuing research on parallel optical systems, exploiting optical interconnection and optical devices and materials. Using a general purpose computing architecture as the focus, we are developing design techniques, tools and architecture for operation at the speed of light limit. Experimental work is being done with the somewhat low speed components currently available but with architectures which will scale up in speed as faster devices are developed. The design algorithms and tools developed for a general purpose, stored program computer are being applied to other systems such as optimally controlled optical communication networks.
Graphics processing unit based computation for NDE applications
NASA Astrophysics Data System (ADS)
Nahas, C. A.; Rajagopal, Prabhu; Balasubramaniam, Krishnan; Krishnamurthy, C. V.
2012-05-01
Advances in parallel processing in recent years are helping to improve the cost of numerical simulation. Breakthroughs in Graphical Processing Unit (GPU) based computation now offer the prospect of further drastic improvements. The introduction of 'compute unified device architecture' (CUDA) by NVIDIA (the global technology company based in Santa Clara, California, USA) has made programming GPUs for general purpose computing accessible to the average programmer. Here we use CUDA to develop parallel finite difference schemes as applicable to two problems of interest to NDE community, namely heat diffusion and elastic wave propagation. The implementations are for two-dimensions. Performance improvement of the GPU implementation against serial CPU implementation is then discussed.
Software/hardware distributed processing network supporting the Ada environment
NASA Astrophysics Data System (ADS)
Wood, Richard J.; Pryk, Zen
1993-09-01
A high-performance, fault-tolerant, distributed network has been developed, tested, and demonstrated. The network is based on the MIPS Computer Systems, Inc. R3000 Risc for processing, VHSIC ASICs for high speed, reliable, inter-node communications and compatible commercial memory and I/O boards. The network is an evolution of the Advanced Onboard Signal Processor (AOSP) architecture. It supports Ada application software with an Ada- implemented operating system. A six-node implementation (capable of expansion up to 256 nodes) of the RISC multiprocessor architecture provides 120 MIPS of scalar throughput, 96 Mbytes of RAM and 24 Mbytes of non-volatile memory. The network provides for all ground processing applications, has merit for space-qualified RISC-based network, and interfaces to advanced Computer Aided Software Engineering (CASE) tools for application software development.
Multi-threaded ATLAS simulation on Intel Knights Landing processors
NASA Astrophysics Data System (ADS)
Farrell, Steven; Calafiura, Paolo; Leggett, Charles; Tsulaia, Vakhtang; Dotti, Andrea; ATLAS Collaboration
2017-10-01
The Knights Landing (KNL) release of the Intel Many Integrated Core (MIC) Xeon Phi line of processors is a potential game changer for HEP computing. With 72 cores and deep vector registers, the KNL cards promise significant performance benefits for highly-parallel, compute-heavy applications. Cori, the newest supercomputer at the National Energy Research Scientific Computing Center (NERSC), was delivered to its users in two phases with the first phase online at the end of 2015 and the second phase now online at the end of 2016. Cori Phase 2 is based on the KNL architecture and contains over 9000 compute nodes with 96GB DDR4 memory. ATLAS simulation with the multithreaded Athena Framework (AthenaMT) is a good potential use-case for the KNL architecture and supercomputers like Cori. ATLAS simulation jobs have a high ratio of CPU computation to disk I/O and have been shown to scale well in multi-threading and across many nodes. In this paper we will give an overview of the ATLAS simulation application with details on its multi-threaded design. Then, we will present a performance analysis of the application on KNL devices and compare it to a traditional x86 platform to demonstrate the capabilities of the architecture and evaluate the benefits of utilizing KNL platforms like Cori for ATLAS production.
The Workstation Approach to Laboratory Computing
Crosby, P.A.; Malachowski, G.C.; Hall, B.R.; Stevens, V.; Gunn, B.J.; Hudson, S.; Schlosser, D.
1985-01-01
There is a need for a Laboratory Workstation which specifically addresses the problems associated with computing in the scientific laboratory. A workstation based on the IBM PC architecture and including a front end data acquisition system which communicates with a host computer via a high speed communications link; a new graphics display controller with hardware window management and window scrolling; and an integrated software package is described.
Compact VLSI neural computer integrated with active pixel sensor for real-time ATR applications
NASA Astrophysics Data System (ADS)
Fang, Wai-Chi; Udomkesmalee, Gabriel; Alkalai, Leon
1997-04-01
A compact VLSI neural computer integrated with an active pixel sensor has been under development to mimic what is inherent in biological vision systems. This electronic eye- brain computer is targeted for real-time machine vision applications which require both high-bandwidth communication and high-performance computing for data sensing, synergy of multiple types of sensory information, feature extraction, target detection, target recognition, and control functions. The neural computer is based on a composite structure which combines Annealing Cellular Neural Network (ACNN) and Hierarchical Self-Organization Neural Network (HSONN). The ACNN architecture is a programmable and scalable multi- dimensional array of annealing neurons which are locally connected with their local neurons. Meanwhile, the HSONN adopts a hierarchical structure with nonlinear basis functions. The ACNN+HSONN neural computer is effectively designed to perform programmable functions for machine vision processing in all levels with its embedded host processor. It provides a two order-of-magnitude increase in computation power over the state-of-the-art microcomputer and DSP microelectronics. A compact current-mode VLSI design feasibility of the ACNN+HSONN neural computer is demonstrated by a 3D 16X8X9-cube neural processor chip design in a 2-micrometers CMOS technology. Integration of this neural computer as one slice of a 4'X4' multichip module into the 3D MCM based avionics architecture for NASA's New Millennium Program is also described.
Crowd Sensing-Enabling Security Service Recommendation for Social Fog Computing Systems
Wu, Jun; Su, Zhou; Li, Jianhua
2017-01-01
Fog computing, shifting intelligence and resources from the remote cloud to edge networks, has the potential of providing low-latency for the communication from sensing data sources to users. For the objects from the Internet of Things (IoT) to the cloud, it is a new trend that the objects establish social-like relationships with each other, which efficiently brings the benefits of developed sociality to a complex environment. As fog service become more sophisticated, it will become more convenient for fog users to share their own services, resources, and data via social networks. Meanwhile, the efficient social organization can enable more flexible, secure, and collaborative networking. Aforementioned advantages make the social network a potential architecture for fog computing systems. In this paper, we design an architecture for social fog computing, in which the services of fog are provisioned based on “friend” relationships. To the best of our knowledge, this is the first attempt at an organized fog computing system-based social model. Meanwhile, social networking enhances the complexity and security risks of fog computing services, creating difficulties of security service recommendations in social fog computing. To address this, we propose a novel crowd sensing-enabling security service provisioning method to recommend security services accurately in social fog computing systems. Simulation results show the feasibilities and efficiency of the crowd sensing-enabling security service recommendation method for social fog computing systems. PMID:28758943
Crowd Sensing-Enabling Security Service Recommendation for Social Fog Computing Systems.
Wu, Jun; Su, Zhou; Wang, Shen; Li, Jianhua
2017-07-30
Fog computing, shifting intelligence and resources from the remote cloud to edge networks, has the potential of providing low-latency for the communication from sensing data sources to users. For the objects from the Internet of Things (IoT) to the cloud, it is a new trend that the objects establish social-like relationships with each other, which efficiently brings the benefits of developed sociality to a complex environment. As fog service become more sophisticated, it will become more convenient for fog users to share their own services, resources, and data via social networks. Meanwhile, the efficient social organization can enable more flexible, secure, and collaborative networking. Aforementioned advantages make the social network a potential architecture for fog computing systems. In this paper, we design an architecture for social fog computing, in which the services of fog are provisioned based on "friend" relationships. To the best of our knowledge, this is the first attempt at an organized fog computing system-based social model. Meanwhile, social networking enhances the complexity and security risks of fog computing services, creating difficulties of security service recommendations in social fog computing. To address this, we propose a novel crowd sensing-enabling security service provisioning method to recommend security services accurately in social fog computing systems. Simulation results show the feasibilities and efficiency of the crowd sensing-enabling security service recommendation method for social fog computing systems.
Basu, Protonu; Williams, Samuel; Van Straalen, Brian; ...
2017-04-05
GPUs, with their high bandwidths and computational capabilities are an increasingly popular target for scientific computing. Unfortunately, to date, harnessing the power of the GPU has required use of a GPU-specific programming model like CUDA, OpenCL, or OpenACC. Thus, in order to deliver portability across CPU-based and GPU-accelerated supercomputers, programmers are forced to write and maintain two versions of their applications or frameworks. In this paper, we explore the use of a compiler-based autotuning framework based on CUDA-CHiLL to deliver not only portability, but also performance portability across CPU- and GPU-accelerated platforms for the geometric multigrid linear solvers found inmore » many scientific applications. We also show that with autotuning we can attain near Roofline (a performance bound for a computation and target architecture) performance across the key operations in the miniGMG benchmark for both CPU- and GPU-based architectures as well as for a multiple stencil discretizations and smoothers. We show that our technology is readily interoperable with MPI resulting in performance at scale equal to that obtained via hand-optimized MPI+CUDA implementation.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Basu, Protonu; Williams, Samuel; Van Straalen, Brian
GPUs, with their high bandwidths and computational capabilities are an increasingly popular target for scientific computing. Unfortunately, to date, harnessing the power of the GPU has required use of a GPU-specific programming model like CUDA, OpenCL, or OpenACC. Thus, in order to deliver portability across CPU-based and GPU-accelerated supercomputers, programmers are forced to write and maintain two versions of their applications or frameworks. In this paper, we explore the use of a compiler-based autotuning framework based on CUDA-CHiLL to deliver not only portability, but also performance portability across CPU- and GPU-accelerated platforms for the geometric multigrid linear solvers found inmore » many scientific applications. We also show that with autotuning we can attain near Roofline (a performance bound for a computation and target architecture) performance across the key operations in the miniGMG benchmark for both CPU- and GPU-based architectures as well as for a multiple stencil discretizations and smoothers. We show that our technology is readily interoperable with MPI resulting in performance at scale equal to that obtained via hand-optimized MPI+CUDA implementation.« less
Study of a unified hardware and software fault-tolerant architecture
NASA Technical Reports Server (NTRS)
Lala, Jaynarayan; Alger, Linda; Friend, Steven; Greeley, Gregory; Sacco, Stephen; Adams, Stuart
1989-01-01
A unified architectural concept, called the Fault Tolerant Processor Attached Processor (FTP-AP), that can tolerate hardware as well as software faults is proposed for applications requiring ultrareliable computation capability. An emulation of the FTP-AP architecture, consisting of a breadboard Motorola 68010-based quadruply redundant Fault Tolerant Processor, four VAX 750s as attached processors, and four versions of a transport aircraft yaw damper control law, is used as a testbed in the AIRLAB to examine a number of critical issues. Solutions of several basic problems associated with N-Version software are proposed and implemented on the testbed. This includes a confidence voter to resolve coincident errors in N-Version software. A reliability model of N-Version software that is based upon the recent understanding of software failure mechanisms is also developed. The basic FTP-AP architectural concept appears suitable for hosting N-Version application software while at the same time tolerating hardware failures. Architectural enhancements for greater efficiency, software reliability modeling, and N-Version issues that merit further research are identified.
SCA Waveform Development for Space Telemetry
NASA Technical Reports Server (NTRS)
Mortensen, Dale J.; Kifle, Multi; Hall, C. Steve; Quinn, Todd M.
2004-01-01
The NASA Glenn Research Center is investigating and developing suitable reconfigurable radio architectures for future NASA missions. This effort is examining software-based open-architectures for space based transceivers, as well as common hardware platform architectures. The Joint Tactical Radio System's (JTRS) Software Communications Architecture (SCA) is a candidate for the software approach, but may need modifications or adaptations for use in space. An in-house SCA compliant waveform development focuses on increasing understanding of software defined radio architectures and more specifically the JTRS SCA. Space requirements put a premium on size, mass, and power. This waveform development effort is key to evaluating tradeoffs with the SCA for space applications. Existing NASA telemetry links, as well as Space Exploration Initiative scenarios, are the basis for defining the waveform requirements. Modeling and simulations are being developed to determine signal processing requirements associated with a waveform and a mission-specific computational burden. Implementation of the waveform on a laboratory software defined radio platform is proceeding in an iterative fashion. Parallel top-down and bottom-up design approaches are employed.
Modular, Cost-Effective, Extensible Avionics Architecture for Secure, Mobile Communications
NASA Technical Reports Server (NTRS)
Ivancic, William D.
2006-01-01
Current onboard communication architectures are based upon an all-in-one communications management unit. This unit and associated radio systems has regularly been designed as a one-off, proprietary system. As such, it lacks flexibility and cannot adapt easily to new technology, new communication protocols, and new communication links. This paper describes the current avionics communication architecture and provides a historical perspective of the evolution of this system. A new onboard architecture is proposed that allows full use of commercial-off-the-shelf technologies to be integrated in a modular approach thereby enabling a flexible, cost-effective and fully deployable design that can take advantage of ongoing advances in the computer, cryptography, and telecommunications industries.
Modular, Cost-Effective, Extensible Avionics Architecture for Secure, Mobile Communications
NASA Technical Reports Server (NTRS)
Ivancic, William D.
2007-01-01
Current onboard communication architectures are based upon an all-in-one communications management unit. This unit and associated radio systems has regularly been designed as a one-off, proprietary system. As such, it lacks flexibility and cannot adapt easily to new technology, new communication protocols, and new communication links. This paper describes the current avionics communication architecture and provides a historical perspective of the evolution of this system. A new onboard architecture is proposed that allows full use of commercial-off-the-shelf technologies to be integrated in a modular approach thereby enabling a flexible, cost-effective and fully deployable design that can take advantage of ongoing advances in the computer, cryptography, and telecommunications industries.
An ontology-based telemedicine tasks management system architecture.
Nageba, Ebrahim; Fayn, Jocelyne; Rubel, Paul
2008-01-01
The recent developments in ambient intelligence and ubiquitous computing offer new opportunities for the design of advanced Telemedicine systems providing high quality services, anywhere, anytime. In this paper we present an approach for building an ontology-based task-driven telemedicine system. The architecture is composed of a task management server, a communication server and a knowledge base for enabling decision makings taking account of different telemedical concepts such as actors, resources, services and the Electronic Health Record. The final objective is to provide an intelligent management of the different types of available human, material and communication resources.
Streamlining ICAT development through off-the shelf hypermedia systems
NASA Technical Reports Server (NTRS)
Orey, Michael; Trent, Ann; Young, James; Sanders, Michael
1993-01-01
This project examined the efficacy of building intelligent computer assisted training using an off-the-shelf hypermedia package. In addition, we compared this package to an architecture that had been developed in a previous contract which was based in the C programming language. One person developed a tutor in LinkWay (an off-the-shelf hypermedia system) and another developed the same tutor using the ALM C-based architecture. Development times, ease of use, learner preferences, learner options, and learning effectiveness were compared. In all cases, the off-the-shelf package was shown to be superior to the C-based system.
Electro-Optic Computing Architectures: Volume II. Components and System Design and Analysis
1998-02-01
The objective of the Electro - Optic Computing Architecture (EOCA) program was to develop multi-function electro - optic interfaces and optical...interconnect units to enhance the performance of parallel processor systems and form the building blocks for future electro - optic computing architectures...Specifically, three multi-function interface modules were targeted for development - an Electro - Optic Interface (EOI), an Optical Interconnection Unit
Battlefield Object Control via Internet Architecture
2002-01-01
superiority is the best way to reach the goal of competition superiority. Using information technology (IT) in data processing, including computer hardware... technologies : Global Positioning System (GPS), Geographic Information System (GIS), Battlefield Information Transmission System (BITS), and Intelligent...operational environment. Keywords: C4ISR Systems, Information Superiority, Battlefield Objects, Computer - Aided Prototyping System (CAPS), IP-based
An Architectural Design System Based on Computer Graphics.
ERIC Educational Resources Information Center
MacDonald, Stephen L.; Wehrli, Robert
The recent developments in computer hardware and software are presented to inform architects of this design tool. Technical advancements in equipment include--(1) cathode ray tube displays, (2) light pens, (3) print-out and photo copying attachments, (4) controls for comparison and selection of images, (5) chording keyboards, (6) plotters, and (7)…
A Low Cost Microcomputer Laboratory for Investigating Computer Architecture.
ERIC Educational Resources Information Center
Mitchell, Eugene E., Ed.
1980-01-01
Described is a microcomputer laboratory at the United States Military Academy at West Point, New York, which provides easy access to non-volatile memory and a single input/output file system for 16 microcomputer laboratory positions. A microcomputer network that has a centralized data base is implemented using the concepts of computer network…
NASA Technical Reports Server (NTRS)
Hsia, T. C.; Lu, G. Z.; Han, W. H.
1987-01-01
In advanced robot control problems, on-line computation of inverse Jacobian solution is frequently required. Parallel processing architecture is an effective way to reduce computation time. A parallel processing architecture is developed for the inverse Jacobian (inverse differential kinematic equation) of the PUMA arm. The proposed pipeline/parallel algorithm can be inplemented on an IC chip using systolic linear arrays. This implementation requires 27 processing cells and 25 time units. Computation time is thus significantly reduced.
NASA Astrophysics Data System (ADS)
Radziszewski, Kacper
2017-10-01
The following paper presents the results of the research in the field of the machine learning, investigating the scope of application of the artificial neural networks algorithms as a tool in architectural design. The computational experiment was held using the backward propagation of errors method of training the artificial neural network, which was trained based on the geometry of the details of the Roman Corinthian order capital. During the experiment, as an input training data set, five local geometry parameters combined has given the best results: Theta, Pi, Rho in spherical coordinate system based on the capital volume centroid, followed by Z value of the Cartesian coordinate system and a distance from vertical planes created based on the capital symmetry. Additionally during the experiment, artificial neural network hidden layers optimal count and structure was found, giving results of the error below 0.2% for the mentioned before input parameters. Once successfully trained artificial network, was able to mimic the details composition on any other geometry type given. Despite of calculating the transformed geometry locally and separately for each of the thousands of surface points, system could create visually attractive and diverse, complex patterns. Designed tool, based on the supervised learning method of machine learning, gives possibility of generating new architectural forms- free of the designer’s imagination bounds. Implementing the infinitely broad computational methods of machine learning, or Artificial Intelligence in general, not only could accelerate and simplify the design process, but give an opportunity to explore never seen before, unpredictable forms or everyday architectural practice solutions.
Architectural requirements for the Red Storm computing system.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Camp, William J.; Tomkins, James Lee
This report is based on the Statement of Work (SOW) describing the various requirements for delivering 3 new supercomputer system to Sandia National Laboratories (Sandia) as part of the Department of Energy's (DOE) Accelerated Strategic Computing Initiative (ASCI) program. This system is named Red Storm and will be a distributed memory, massively parallel processor (MPP) machine built primarily out of commodity parts. The requirements presented here distill extensive architectural and design experience accumulated over a decade and a half of research, development and production operation of similar machines at Sandia. Red Storm will have an unusually high bandwidth, low latencymore » interconnect, specially designed hardware and software reliability features, a light weight kernel compute node operating system and the ability to rapidly switch major sections of the machine between classified and unclassified computing environments. Particular attention has been paid to architectural balance in the design of Red Storm, and it is therefore expected to achieve an atypically high fraction of its peak speed of 41 TeraOPS on real scientific computing applications. In addition, Red Storm is designed to be upgradeable to many times this initial peak capability while still retaining appropriate balance in key design dimensions. Installation of the Red Storm computer system at Sandia's New Mexico site is planned for 2004, and it is expected that the system will be operated for a minimum of five years following installation.« less
Li-ion synaptic transistor for low power analog computing
Fuller, Elliot J.; Gabaly, Farid El; Leonard, Francois; ...
2016-11-22
Nonvolatile redox transistors (NVRTs) based upon Li-ion battery materials are demonstrated as memory elements for neuromorphic computer architectures with multi-level analog states, “write” linearity, low-voltage switching, and low power dissipation. Simulations of back propagation using the device properties reach ideal classification accuracy. Finally, physics-based simulations predict energy costs per “write” operation of <10 aJ when scaled to 200 nm × 200 nm.
Hybrid Architectures for Evolutionary Computing Algorithms
2008-01-01
other EC algorithms to FPGA Core Burns P1026/MAPLD 200532 Genetic Algorithm Hardware References S. Scott, A. Samal , and S. Seth, “HGA: A Hardware Based...on Parallel and Distributed Processing (IPPS/SPDP ), pp. 316-320, Proceedings. IEEE Computer Society 1998. [12] Scott, S. D. , Samal , A., and...Algorithm Hardware References S. Scott, A. Samal , and S. Seth, “HGA: A Hardware Based Genetic Algorithm”, Proceedings of the 1995 ACM Third
Software architecture and engineering for patient records: current and future.
Weng, Chunhua; Levine, Betty A; Mun, Seong K
2009-05-01
During the "The National Forum on the Future of the Defense Health Information System," a track focusing on "Systems Architecture and Software Engineering" included eight presenters. These presenters identified three key areas of interest in this field, which include the need for open enterprise architecture and a federated database design, net centrality based on service-oriented architecture, and the need for focus on software usability and reusability. The eight panelists provided recommendations related to the suitability of service-oriented architecture and the enabling technologies of grid computing and Web 2.0 for building health services research centers and federated data warehouses to facilitate large-scale collaborative health care and research. Finally, they discussed the need to leverage industry best practices for software engineering to facilitate rapid software development, testing, and deployment.
High-Speed GPU-Based Fully Three-Dimensional Diffuse Optical Tomographic System
Saikia, Manob Jyoti; Kanhirodan, Rajan; Mohan Vasu, Ram
2014-01-01
We have developed a graphics processor unit (GPU-) based high-speed fully 3D system for diffuse optical tomography (DOT). The reduction in execution time of 3D DOT algorithm, a severely ill-posed problem, is made possible through the use of (1) an algorithmic improvement that uses Broyden approach for updating the Jacobian matrix and thereby updating the parameter matrix and (2) the multinode multithreaded GPU and CUDA (Compute Unified Device Architecture) software architecture. Two different GPU implementations of DOT programs are developed in this study: (1) conventional C language program augmented by GPU CUDA and CULA routines (C GPU), (2) MATLAB program supported by MATLAB parallel computing toolkit for GPU (MATLAB GPU). The computation time of the algorithm on host CPU and the GPU system is presented for C and Matlab implementations. The forward computation uses finite element method (FEM) and the problem domain is discretized into 14610, 30823, and 66514 tetrahedral elements. The reconstruction time, so achieved for one iteration of the DOT reconstruction for 14610 elements, is 0.52 seconds for a C based GPU program for 2-plane measurements. The corresponding MATLAB based GPU program took 0.86 seconds. The maximum number of reconstructed frames so achieved is 2 frames per second. PMID:24891848
High-Speed GPU-Based Fully Three-Dimensional Diffuse Optical Tomographic System.
Saikia, Manob Jyoti; Kanhirodan, Rajan; Mohan Vasu, Ram
2014-01-01
We have developed a graphics processor unit (GPU-) based high-speed fully 3D system for diffuse optical tomography (DOT). The reduction in execution time of 3D DOT algorithm, a severely ill-posed problem, is made possible through the use of (1) an algorithmic improvement that uses Broyden approach for updating the Jacobian matrix and thereby updating the parameter matrix and (2) the multinode multithreaded GPU and CUDA (Compute Unified Device Architecture) software architecture. Two different GPU implementations of DOT programs are developed in this study: (1) conventional C language program augmented by GPU CUDA and CULA routines (C GPU), (2) MATLAB program supported by MATLAB parallel computing toolkit for GPU (MATLAB GPU). The computation time of the algorithm on host CPU and the GPU system is presented for C and Matlab implementations. The forward computation uses finite element method (FEM) and the problem domain is discretized into 14610, 30823, and 66514 tetrahedral elements. The reconstruction time, so achieved for one iteration of the DOT reconstruction for 14610 elements, is 0.52 seconds for a C based GPU program for 2-plane measurements. The corresponding MATLAB based GPU program took 0.86 seconds. The maximum number of reconstructed frames so achieved is 2 frames per second.
Memory Reconsolidation and Computational Learning
2010-03-01
Cooper and H.T. Siegelmann, "Memory Reconsolidation for Natural Language Processing," Cognitive Neurodynamics , 3, 2009: 365-372. M.M. Olsen, N...computerized memories and other state of the art cognitive architectures, our memory system has the ability to process on-line and in real-time as...on both continuous and binary inputs, unlike state of the art methods in case based reasoning and in cognitive architectures, which are bound to
A Pipeline Software Architecture for NMR Spectrum Data Translation
Ellis, Heidi J.C.; Weatherby, Gerard; Nowling, Ronald J.; Vyas, Jay; Fenwick, Matthew; Gryk, Michael R.
2012-01-01
The problem of formatting data so that it conforms to the required input for scientific data processing tools pervades scientific computing. The CONNecticut Joint University Research Group (CONNJUR) has developed a data translation tool based on a pipeline architecture that partially solves this problem. The CONNJUR Spectrum Translator supports data format translation for experiments that use Nuclear Magnetic Resonance to determine the structure of large protein molecules. PMID:24634607
VLSI architectures for computing multiplications and inverses in GF(2m)
NASA Technical Reports Server (NTRS)
Wang, C. C.; Truong, T. K.; Shao, H. M.; Deutsch, L. J.; Omura, J. K.
1985-01-01
Finite field arithmetic logic is central in the implementation of Reed-Solomon coders and in some cryptographic algorithms. There is a need for good multiplication and inversion algorithms that are easily realized on VLSI chips. Massey and Omura recently developed a new multiplication algorithm for Galois fields based on a normal basis representation. A pipeline structure is developed to realize the Massey-Omura multiplier in the finite field GF(2m). With the simple squaring property of the normal-basis representation used together with this multiplier, a pipeline architecture is also developed for computing inverse elements in GF(2m). The designs developed for the Massey-Omura multiplier and the computation of inverse elements are regular, simple, expandable and, therefore, naturally suitable for VLSI implementation.
VLSI architectures for computing multiplications and inverses in GF(2-m)
NASA Technical Reports Server (NTRS)
Wang, C. C.; Truong, T. K.; Shao, H. M.; Deutsch, L. J.; Omura, J. K.; Reed, I. S.
1983-01-01
Finite field arithmetic logic is central in the implementation of Reed-Solomon coders and in some cryptographic algorithms. There is a need for good multiplication and inversion algorithms that are easily realized on VLSI chips. Massey and Omura recently developed a new multiplication algorithm for Galois fields based on a normal basis representation. A pipeline structure is developed to realize the Massey-Omura multiplier in the finite field GF(2m). With the simple squaring property of the normal-basis representation used together with this multiplier, a pipeline architecture is also developed for computing inverse elements in GF(2m). The designs developed for the Massey-Omura multiplier and the computation of inverse elements are regular, simple, expandable and, therefore, naturally suitable for VLSI implementation.
VLSI architectures for computing multiplications and inverses in GF(2m).
Wang, C C; Truong, T K; Shao, H M; Deutsch, L J; Omura, J K; Reed, I S
1985-08-01
Finite field arithmetic logic is central in the implementation of Reed-Solomon coders and in some cryptographic algorithms. There is a need for good multiplication and inversion algorithms that can be easily realized on VLSI chips. Massey and Omura recently developed a new multiplication algorithm for Galois fields based on a normal basis representation. In this paper, a pipeline structure is developed to realize the Massey-Omura multiplier in the finite field GF(2m). With the simple squaring property of the normal basis representation used together with this multiplier, a pipeline architecture is developed for computing inverse elements in GF(2m). The designs developed for the Massey-Omura multiplier and the computation of inverse elements are regular, simple, expandable, and therefore, naturally suitable for VLSI implementation.
The monitoring and managing application of cloud computing based on Internet of Things.
Luo, Shiliang; Ren, Bin
2016-07-01
Cloud computing and the Internet of Things are the two hot points in the Internet application field. The application of the two new technologies is in hot discussion and research, but quite less on the field of medical monitoring and managing application. Thus, in this paper, we study and analyze the application of cloud computing and the Internet of Things on the medical field. And we manage to make a combination of the two techniques in the medical monitoring and managing field. The model architecture for remote monitoring cloud platform of healthcare information (RMCPHI) was established firstly. Then the RMCPHI architecture was analyzed. Finally an efficient PSOSAA algorithm was proposed for the medical monitoring and managing application of cloud computing. Simulation results showed that our proposed scheme can improve the efficiency about 50%. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Job Scheduling in a Heterogeneous Grid Environment
NASA Technical Reports Server (NTRS)
Shan, Hong-Zhang; Smith, Warren; Oliker, Leonid; Biswas, Rupak
2004-01-01
Computational grids have the potential for solving large-scale scientific problems using heterogeneous and geographically distributed resources. However, a number of major technical hurdles must be overcome before this potential can be realized. One problem that is critical to effective utilization of computational grids is the efficient scheduling of jobs. This work addresses this problem by describing and evaluating a grid scheduling architecture and three job migration algorithms. The architecture is scalable and does not assume control of local site resources. The job migration policies use the availability and performance of computer systems, the network bandwidth available between systems, and the volume of input and output data associated with each job. An extensive performance comparison is presented using real workloads from leading computational centers. The results, based on several key metrics, demonstrate that the performance of our distributed migration algorithms is significantly greater than that of a local scheduling framework and comparable to a non-scalable global scheduling approach.
NASA Technical Reports Server (NTRS)
Walker, Carrie K.
1991-01-01
A technique has been developed for combining features of a systems architecture design and assessment tool and a software development tool. This technique reduces simulation development time and expands simulation detail. The Architecture Design and Assessment System (ADAS), developed at the Research Triangle Institute, is a set of computer-assisted engineering tools for the design and analysis of computer systems. The ADAS system is based on directed graph concepts and supports the synthesis and analysis of software algorithms mapped to candidate hardware implementations. Greater simulation detail is provided by the ADAS functional simulator. With the functional simulator, programs written in either Ada or C can be used to provide a detailed description of graph nodes. A Computer-Aided Software Engineering tool developed at the Charles Stark Draper Laboratory (CSDL CASE) automatically generates Ada or C code from engineering block diagram specifications designed with an interactive graphical interface. A technique to use the tools together has been developed, which further automates the design process.
NASA Technical Reports Server (NTRS)
1989-01-01
The results of the refined conceptual design phase (task 5) of the Simulation Computer System (SCS) study are reported. The SCS is the computational portion of the Payload Training Complex (PTC) providing simulation based training on payload operations of the Space Station Freedom (SSF). In task 4 of the SCS study, the range of architectures suitable for the SCS was explored. Identified system architectures, along with their relative advantages and disadvantages for SCS, were presented in the Conceptual Design Report. Six integrated designs-combining the most promising features from the architectural formulations-were additionally identified in the report. The six integrated designs were evaluated further to distinguish the more viable designs to be refined as conceptual designs. The three designs that were selected represent distinct approaches to achieving a capable and cost effective SCS configuration for the PTC. Here, the results of task 4 (input to this task) are briefly reviewed. Then, prior to describing individual conceptual designs, the PTC facility configuration and the SSF systems architecture that must be supported by the SCS are reviewed. Next, basic features of SCS implementation that have been incorporated into all selected SCS designs are considered. The details of the individual SCS designs are then presented before making a final comparison of the three designs.
The AI Bus architecture for distributed knowledge-based systems
NASA Technical Reports Server (NTRS)
Schultz, Roger D.; Stobie, Iain
1991-01-01
The AI Bus architecture is layered, distributed object oriented framework developed to support the requirements of advanced technology programs for an order of magnitude improvement in software costs. The consequent need for highly autonomous computer systems, adaptable to new technology advances over a long lifespan, led to the design of an open architecture and toolbox for building large scale, robust, production quality systems. The AI Bus accommodates a mix of knowledge based and conventional components, running on heterogeneous, distributed real world and testbed environment. The concepts and design is described of the AI Bus architecture and its current implementation status as a Unix C++ library or reusable objects. Each high level semiautonomous agent process consists of a number of knowledge sources together with interagent communication mechanisms based on shared blackboards and message passing acquaintances. Standard interfaces and protocols are followed for combining and validating subsystems. Dynamic probes or demons provide an event driven means for providing active objects with shared access to resources, and each other, while not violating their security.
Hardware implementation of hierarchical volume subdivision-based elastic registration.
Dandekar, Omkar; Walimbe, Vivek; Shekhar, Raj
2006-01-01
Real-time, elastic and fully automated 3D image registration is critical to the efficiency and effectiveness of many image-guided diagnostic and treatment procedures relying on multimodality image fusion or serial image comparison. True, real-time performance will make many 3D image registration-based techniques clinically viable. Hierarchical volume subdivision-based image registration techniques are inherently faster than most elastic registration techniques, e.g. free-form deformation (FFD)-based techniques, and are more amenable for achieving real-time performance through hardware acceleration. Our group has previously reported an FPGA-based architecture for accelerating FFD-based image registration. In this article we show how our existing architecture can be adapted to support hierarchical volume subdivision-based image registration. A proof-of-concept implementation of the architecture achieved speedups of 100 for elastic registration against an optimized software implementation on a 3.2 GHz Pentium III Xeon workstation. Due to inherent parallel nature of the hierarchical volume subdivision-based image registration techniques further speedup can be achieved by using several computing modules in parallel.
NASA Astrophysics Data System (ADS)
Yue, S. S.; Wen, Y. N.; Lv, G. N.; Hu, D.
2013-10-01
In recent years, the increasing development of cloud computing technologies laid critical foundation for efficiently solving complicated geographic issues. However, it is still difficult to realize the cooperative operation of massive heterogeneous geographical models. Traditional cloud architecture is apt to provide centralized solution to end users, while all the required resources are often offered by large enterprises or special agencies. Thus, it's a closed framework from the perspective of resource utilization. Solving comprehensive geographic issues requires integrating multifarious heterogeneous geographical models and data. In this case, an open computing platform is in need, with which the model owners can package and deploy their models into cloud conveniently, while model users can search, access and utilize those models with cloud facility. Based on this concept, the open cloud service strategies for the sharing of heterogeneous geographic analysis models is studied in this article. The key technology: unified cloud interface strategy, sharing platform based on cloud service, and computing platform based on cloud service are discussed in detail, and related experiments are conducted for further verification.
Analysis of a Smartphone-Based Architecture with Multiple Mobility Sensors for Fall Detection
Santoyo-Ramón, Jose Antonio; Cano-García, Jose Manuel
2016-01-01
During the last years, many research efforts have been devoted to the definition of Fall Detection Systems (FDSs) that benefit from the inherent computing, communication and sensing capabilities of smartphones. However, employing a smartphone as the unique sensor in a FDS application entails several disadvantages as long as an accurate characterization of the patient’s mobility may force to transport this personal device on an unnatural position. This paper presents a smartphone-based architecture for the automatic detection of falls. The system incorporates a set of small sensing motes that can communicate with the smartphone to help in the fall detection decision. The deployed architecture is systematically evaluated in a testbed with experimental users in order to determine the number and positions of the sensors that optimize the effectiveness of the FDS, as well as to assess the most convenient role of the smartphone in the architecture. PMID:27930736
Analysis of a Smartphone-Based Architecture with Multiple Mobility Sensors for Fall Detection.
Casilari, Eduardo; Santoyo-Ramón, Jose Antonio; Cano-García, Jose Manuel
2016-01-01
During the last years, many research efforts have been devoted to the definition of Fall Detection Systems (FDSs) that benefit from the inherent computing, communication and sensing capabilities of smartphones. However, employing a smartphone as the unique sensor in a FDS application entails several disadvantages as long as an accurate characterization of the patient's mobility may force to transport this personal device on an unnatural position. This paper presents a smartphone-based architecture for the automatic detection of falls. The system incorporates a set of small sensing motes that can communicate with the smartphone to help in the fall detection decision. The deployed architecture is systematically evaluated in a testbed with experimental users in order to determine the number and positions of the sensors that optimize the effectiveness of the FDS, as well as to assess the most convenient role of the smartphone in the architecture.
Parallel evolutionary computation in bioinformatics applications.
Pinho, Jorge; Sobral, João Luis; Rocha, Miguel
2013-05-01
A large number of optimization problems within the field of Bioinformatics require methods able to handle its inherent complexity (e.g. NP-hard problems) and also demand increased computational efforts. In this context, the use of parallel architectures is a necessity. In this work, we propose ParJECoLi, a Java based library that offers a large set of metaheuristic methods (such as Evolutionary Algorithms) and also addresses the issue of its efficient execution on a wide range of parallel architectures. The proposed approach focuses on the easiness of use, making the adaptation to distinct parallel environments (multicore, cluster, grid) transparent to the user. Indeed, this work shows how the development of the optimization library can proceed independently of its adaptation for several architectures, making use of Aspect-Oriented Programming. The pluggable nature of parallelism related modules allows the user to easily configure its environment, adding parallelism modules to the base source code when needed. The performance of the platform is validated with two case studies within biological model optimization. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Comparison of Classifier Architectures for Online Neural Spike Sorting.
Saeed, Maryam; Khan, Amir Ali; Kamboh, Awais Mehmood
2017-04-01
High-density, intracranial recordings from micro-electrode arrays need to undergo Spike Sorting in order to associate the recorded neuronal spikes to particular neurons. This involves spike detection, feature extraction, and classification. To reduce the data transmission and power requirements, on-chip real-time processing is becoming very popular. However, high computational resources are required for classifiers in on-chip spike-sorters, making scalability a great challenge. In this review paper, we analyze several popular classifiers to propose five new hardware architectures using the off-chip training with on-chip classification approach. These include support vector classification, fuzzy C-means classification, self-organizing maps classification, moving-centroid K-means classification, and Cosine distance classification. The performance of these architectures is analyzed in terms of accuracy and resource requirement. We establish that the neural networks based Self-Organizing Maps classifier offers the most viable solution. A spike sorter based on the Self-Organizing Maps classifier, requires only 7.83% of computational resources of the best-reported spike sorter, hierarchical adaptive means, while offering a 3% better accuracy at 7 dB SNR.
NASA Technical Reports Server (NTRS)
Kemeny, Sabrina E.
1994-01-01
Electronic and optoelectronic hardware implementations of highly parallel computing architectures address several ill-defined and/or computation-intensive problems not easily solved by conventional computing techniques. The concurrent processing architectures developed are derived from a variety of advanced computing paradigms including neural network models, fuzzy logic, and cellular automata. Hardware implementation technologies range from state-of-the-art digital/analog custom-VLSI to advanced optoelectronic devices such as computer-generated holograms and e-beam fabricated Dammann gratings. JPL's concurrent processing devices group has developed a broad technology base in hardware implementable parallel algorithms, low-power and high-speed VLSI designs and building block VLSI chips, leading to application-specific high-performance embeddable processors. Application areas include high throughput map-data classification using feedforward neural networks, terrain based tactical movement planner using cellular automata, resource optimization (weapon-target assignment) using a multidimensional feedback network with lateral inhibition, and classification of rocks using an inner-product scheme on thematic mapper data. In addition to addressing specific functional needs of DOD and NASA, the JPL-developed concurrent processing device technology is also being customized for a variety of commercial applications (in collaboration with industrial partners), and is being transferred to U.S. industries. This viewgraph p resentation focuses on two application-specific processors which solve the computation intensive tasks of resource allocation (weapon-target assignment) and terrain based tactical movement planning using two extremely different topologies. Resource allocation is implemented as an asynchronous analog competitive assignment architecture inspired by the Hopfield network. Hardware realization leads to a two to four order of magnitude speed-up over conventional techniques and enables multiple assignments, (many to many), not achievable with standard statistical approaches. Tactical movement planning (finding the best path from A to B) is accomplished with a digital two-dimensional concurrent processor array. By exploiting the natural parallel decomposition of the problem in silicon, a four order of magnitude speed-up over optimized software approaches has been demonstrated.
NASA Astrophysics Data System (ADS)
Saponara, M.; Tramutola, A.; Creten, P.; Hardy, J.; Philippe, C.
2013-08-01
Optimization-based control techniques such as Model Predictive Control (MPC) are considered extremely attractive for space rendezvous, proximity operations and capture applications that require high level of autonomy, optimal path planning and dynamic safety margins. Such control techniques require high-performance computational needs for solving large optimization problems. The development and implementation in a flight representative avionic architecture of a MPC based Guidance, Navigation and Control system has been investigated in the ESA R&T study “On-line Reconfiguration Control System and Avionics Architecture” (ORCSAT) of the Aurora programme. The paper presents the baseline HW and SW avionic architectures, and verification test results obtained with a customised RASTA spacecraft avionics development platform from Aeroflex Gaisler.
Augmented reality & gesture-based architecture in games for the elderly.
McCallum, Simon; Boletsis, Costas
2013-01-01
Serious games for health and, more specifically, for elderly people have developed rapidly in recent years. The recent popularization of novel interaction methods of consoles, such as the Nintendo Wii and Microsoft Kinect, has provided an opportunity for the elderly to engage in computer and video games. These interaction methods, however, still present various challenges for elderly users. To address these challenges, we propose an architecture consisted of Augmented Reality (as an output mechanism) combined with gestured-based devices (as an input method). The intention of this work is to provide a theoretical justification for using these technologies and to integrate them into an architecture, acting as a basis for potentially creating suitable interaction techniques for the elderly players.
Efficient computation of PDF-based characteristics from diffusion MR signal.
Assemlal, Haz-Edine; Tschumperlé, David; Brun, Luc
2008-01-01
We present a general method for the computation of PDF-based characteristics of the tissue micro-architecture in MR imaging. The approach relies on the approximation of the MR signal by a series expansion based on Spherical Harmonics and Laguerre-Gaussian functions, followed by a simple projection step that is efficiently done in a finite dimensional space. The resulting algorithm is generic, flexible and is able to compute a large set of useful characteristics of the local tissues structure. We illustrate the effectiveness of this approach by showing results on synthetic and real MR datasets acquired in a clinical time-frame.
NASA Technical Reports Server (NTRS)
1985-01-01
The second task in the Space Station Data System (SSDS) Analysis/Architecture Study is the development of an information base that will support the conduct of trade studies and provide sufficient data to make key design/programmatic decisions. This volume identifies the preferred options in the technology category and characterizes these options with respect to performance attributes, constraints, cost, and risk. The technology category includes advanced materials, processes, and techniques that can be used to enhance the implementation of SSDS design structures. The specific areas discussed are mass storage, including space and round on-line storage and off-line storage; man/machine interface; data processing hardware, including flight computers and advanced/fault tolerant computer architectures; and software, including data compression algorithms, on-board high level languages, and software tools. Also discussed are artificial intelligence applications and hard-wire communications.
DOE Office of Scientific and Technical Information (OSTI.GOV)
McGhee, J.M.; Roberts, R.M.; Morel, J.E.
1997-06-01
A spherical harmonics research code (DANTE) has been developed which is compatible with parallel computer architectures. DANTE provides 3-D, multi-material, deterministic, transport capabilities using an arbitrary finite element mesh. The linearized Boltzmann transport equation is solved in a second order self-adjoint form utilizing a Galerkin finite element spatial differencing scheme. The core solver utilizes a preconditioned conjugate gradient algorithm. Other distinguishing features of the code include options for discrete-ordinates and simplified spherical harmonics angular differencing, an exact Marshak boundary treatment for arbitrarily oriented boundary faces, in-line matrix construction techniques to minimize memory consumption, and an effective diffusion based preconditioner formore » scattering dominated problems. Algorithm efficiency is demonstrated for a massively parallel SIMD architecture (CM-5), and compatibility with MPP multiprocessor platforms or workstation clusters is anticipated.« less
Multi-threaded Sparse Matrix Sparse Matrix Multiplication for Many-Core and GPU Architectures.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deveci, Mehmet; Trott, Christian Robert; Rajamanickam, Sivasankaran
Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scientific computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we develop parallel algorithms for sparse matrix- matrix multiplication with a focus on performance portability across different high performance computing architectures. The performance of these algorithms depend on the data structures used in them. We compare different types of accumulators in these algorithms and demonstrate the performance difference between these data structures. Furthermore, we develop a meta-algorithm, kkSpGEMM, to choose the right algorithm and datamore » structure based on the characteristics of the problem. We show performance comparisons on three architectures and demonstrate the need for the community to develop two phase sparse matrix-matrix multiplication implementations for efficient reuse of the data structures involved.« less
Multi-threaded Sparse Matrix-Matrix Multiplication for Many-Core and GPU Architectures.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deveci, Mehmet; Rajamanickam, Sivasankaran; Trott, Christian Robert
Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scienti c computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we develop parallel algorithms for sparse matrix-matrix multiplication with a focus on performance portability across different high performance computing architectures. The performance of these algorithms depend on the data structures used in them. We compare different types of accumulators in these algorithms and demonstrate the performance difference between these data structures. Furthermore, we develop a meta-algorithm, kkSpGEMM, to choose the right algorithm and datamore » structure based on the characteristics of the problem. We show performance comparisons on three architectures and demonstrate the need for the community to develop two phase sparse matrix-matrix multiplication implementations for efficient reuse of the data structures involved.« less
Programming a hillslope water movement model on the MPP
NASA Technical Reports Server (NTRS)
Devaney, J. E.; Irving, A. R.; Camillo, P. J.; Gurney, R. J.
1987-01-01
A physically based numerical model was developed of heat and moisture flow within a hillslope on a parallel architecture computer, as a precursor to a model of a complete catchment. Moisture flow within a catchment includes evaporation, overland flow, flow in unsaturated soil, and flow in saturated soil. Because of the empirical evidence that moisture flow in unsaturated soil is mainly in the vertical direction, flow in the unsaturated zone can be modeled as a series of one dimensional columns. This initial version of the hillslope model includes evaporation and a single column of one dimensional unsaturated zone flow. This case has already been solved on an IBM 3081 computer and is now being applied to the massively parallel processor architecture so as to make the extension to the one dimensional case easier and to check the problems and benefits of using a parallel architecture machine.
System architecture for asynchronous multi-processor robotic control system
NASA Technical Reports Server (NTRS)
Steele, Robert D.; Long, Mark; Backes, Paul
1993-01-01
The architecture for the Modular Telerobot Task Execution System (MOTES) as implemented in the Supervisory Telerobotics (STELER) Laboratory is described. MOTES is the software component of the remote site of a local-remote telerobotic system which is being developed for NASA for space applications, in particular Space Station Freedom applications. The system is being developed to provide control and supervised autonomous control to support both space based operation and ground-remote control with time delay. The local-remote architecture places task planning responsibilities at the local site and task execution responsibilities at the remote site. This separation allows the remote site to be designed to optimize task execution capability within a limited computational environment such as is expected in flight systems. The local site task planning system could be placed on the ground where few computational limitations are expected. MOTES is written in the Ada programming language for a multiprocessor environment.
Parallel mutual information estimation for inferring gene regulatory networks on GPUs
2011-01-01
Background Mutual information is a measure of similarity between two variables. It has been widely used in various application domains including computational biology, machine learning, statistics, image processing, and financial computing. Previously used simple histogram based mutual information estimators lack the precision in quality compared to kernel based methods. The recently introduced B-spline function based mutual information estimation method is competitive to the kernel based methods in terms of quality but at a lower computational complexity. Results We present a new approach to accelerate the B-spline function based mutual information estimation algorithm with commodity graphics hardware. To derive an efficient mapping onto this type of architecture, we have used the Compute Unified Device Architecture (CUDA) programming model to design and implement a new parallel algorithm. Our implementation, called CUDA-MI, can achieve speedups of up to 82 using double precision on a single GPU compared to a multi-threaded implementation on a quad-core CPU for large microarray datasets. We have used the results obtained by CUDA-MI to infer gene regulatory networks (GRNs) from microarray data. The comparisons to existing methods including ARACNE and TINGe show that CUDA-MI produces GRNs of higher quality in less time. Conclusions CUDA-MI is publicly available open-source software, written in CUDA and C++ programming languages. It obtains significant speedup over sequential multi-threaded implementation by fully exploiting the compute capability of commonly used CUDA-enabled low-cost GPUs. PMID:21672264
Optimizing Engineering Tools Using Modern Ground Architectures
2017-12-01
Considerations,” International Journal of Computer Science & Engineering Survey , vol. 5, no. 4, 2014. [10] R. Bell. (n.d). A beginner’s guide to big O notation...scientific community. Traditional computing architectures were not capable of processing the data efficiently, or in some cases, could not process the...thesis investigates how these modern computing architectures could be leveraged by industry and academia to improve the performance and capabilities of
Ag2S atomic switch-based `tug of war' for decision making
NASA Astrophysics Data System (ADS)
Lutz, C.; Hasegawa, T.; Chikyow, T.
2016-07-01
For a computing process such as making a decision, a software controlled chip of several transistors is necessary. Inspired by how a single cell amoeba decides its movements, the theoretical `tug of war' computing model was proposed but not yet implemented in an analogue device suitable for integrated circuits. Based on this model, we now developed a new electronic element for decision making processes, which will have no need for prior programming. The devices are based on the growth and shrinkage of Ag filaments in α-Ag2+δS gap-type atomic switches. Here we present the adapted device design and the new materials. We demonstrate the basic `tug of war' operation by IV-measurements and Scanning Electron Microscopy (SEM) observation. These devices could be the base for a CMOS-free new computer architecture.For a computing process such as making a decision, a software controlled chip of several transistors is necessary. Inspired by how a single cell amoeba decides its movements, the theoretical `tug of war' computing model was proposed but not yet implemented in an analogue device suitable for integrated circuits. Based on this model, we now developed a new electronic element for decision making processes, which will have no need for prior programming. The devices are based on the growth and shrinkage of Ag filaments in α-Ag2+δS gap-type atomic switches. Here we present the adapted device design and the new materials. We demonstrate the basic `tug of war' operation by IV-measurements and Scanning Electron Microscopy (SEM) observation. These devices could be the base for a CMOS-free new computer architecture. Electronic supplementary information (ESI) available. See DOI: 10.1039/c6nr00690f
Integrated Distributed Directory Service for KSC
NASA Technical Reports Server (NTRS)
Ghansah, Isaac
1997-01-01
This paper describes an integrated distributed directory services (DDS) architecture as a fundamental component of KSC distributed computing systems. Specifically, an architecture for an integrated directory service based on DNS and X.500/LDAP has been suggested. The architecture supports using DNS in its traditional role as a name service and X.500 for other services. Specific designs were made in the integration of X.500 DDS for Public Key Certificates, Kerberos Security Services, Network-wide Login, Electronic Mail, WWW URLS, Servers, and other diverse network objects. Issues involved in incorporating the emerging Microsoft Active Directory Service MADS in KSC's X.500 were discussed.
A Grid Infrastructure for Supporting Space-based Science Operations
NASA Technical Reports Server (NTRS)
Bradford, Robert N.; Redman, Sandra H.; McNair, Ann R. (Technical Monitor)
2002-01-01
Emerging technologies for computational grid infrastructures have the potential for revolutionizing the way computers are used in all aspects of our lives. Computational grids are currently being implemented to provide a large-scale, dynamic, and secure research and engineering environments based on standards and next-generation reusable software, enabling greater science and engineering productivity through shared resources and distributed computing for less cost than traditional architectures. Combined with the emerging technologies of high-performance networks, grids provide researchers, scientists and engineers the first real opportunity for an effective distributed collaborative environment with access to resources such as computational and storage systems, instruments, and software tools and services for the most computationally challenging applications.
Gilgamesh: A Multithreaded Processor-In-Memory Architecture for Petaflops Computing
NASA Technical Reports Server (NTRS)
Sterling, T. L.; Zima, H. P.
2002-01-01
Processor-in-Memory (PIM) architectures avoid the von Neumann bottleneck in conventional machines by integrating high-density DRAM and CMOS logic on the same chip. Parallel systems based on this new technology are expected to provide higher scalability, adaptability, robustness, fault tolerance and lower power consumption than current MPPs or commodity clusters. In this paper we describe the design of Gilgamesh, a PIM-based massively parallel architecture, and elements of its execution model. Gilgamesh extends existing PIM capabilities by incorporating advanced mechanisms for virtualizing tasks and data and providing adaptive resource management for load balancing and latency tolerance. The Gilgamesh execution model is based on macroservers, a middleware layer which supports object-based runtime management of data and threads allowing explicit and dynamic control of locality and load balancing. The paper concludes with a discussion of related research activities and an outlook to future work.
A WPS Based Architecture for Climate Data Analytic Services (CDAS) at NASA
NASA Astrophysics Data System (ADS)
Maxwell, T. P.; McInerney, M.; Duffy, D.; Carriere, L.; Potter, G. L.; Doutriaux, C.
2015-12-01
Faced with unprecedented growth in the Big Data domain of climate science, NASA has developed the Climate Data Analytic Services (CDAS) framework. This framework enables scientists to execute trusted and tested analysis operations in a high performance environment close to the massive data stores at NASA. The data is accessed in standard (NetCDF, HDF, etc.) formats in a POSIX file system and processed using trusted climate data analysis tools (ESMF, CDAT, NCO, etc.). The framework is structured as a set of interacting modules allowing maximal flexibility in deployment choices. The current set of module managers include: Staging Manager: Runs the computation locally on the WPS server or remotely using tools such as celery or SLURM. Compute Engine Manager: Runs the computation serially or distributed over nodes using a parallelization framework such as celery or spark. Decomposition Manger: Manages strategies for distributing the data over nodes. Data Manager: Handles the import of domain data from long term storage and manages the in-memory and disk-based caching architectures. Kernel manager: A kernel is an encapsulated computational unit which executes a processor's compute task. Each kernel is implemented in python exploiting existing analysis packages (e.g. CDAT) and is compatible with all CDAS compute engines and decompositions. CDAS services are accessed via a WPS API being developed in collaboration with the ESGF Compute Working Team to support server-side analytics for ESGF. The API can be executed using either direct web service calls, a python script or application, or a javascript-based web application. Client packages in python or javascript contain everything needed to make CDAS requests. The CDAS architecture brings together the tools, data storage, and high-performance computing required for timely analysis of large-scale data sets, where the data resides, to ultimately produce societal benefits. It is is currently deployed at NASA in support of the Collaborative REAnalysis Technical Environment (CREATE) project, which centralizes numerous global reanalysis datasets onto a single advanced data analytics platform. This service permits decision makers to investigate climate changes around the globe, inspect model trends, compare multiple reanalysis datasets, and variability.
Atomic switch networks—nanoarchitectonic design of a complex system for natural computing
NASA Astrophysics Data System (ADS)
Demis, E. C.; Aguilera, R.; Sillin, H. O.; Scharnhorst, K.; Sandouk, E. J.; Aono, M.; Stieg, A. Z.; Gimzewski, J. K.
2015-05-01
Self-organized complex systems are ubiquitous in nature, and the structural complexity of these natural systems can be used as a model to design new classes of functional nanotechnology based on highly interconnected networks of interacting units. Conventional fabrication methods for electronic computing devices are subject to known scaling limits, confining the diversity of possible architectures. This work explores methods of fabricating a self-organized complex device known as an atomic switch network and discusses its potential utility in computing. Through a merger of top-down and bottom-up techniques guided by mathematical and nanoarchitectonic design principles, we have produced functional devices comprising nanoscale elements whose intrinsic nonlinear dynamics and memorization capabilities produce robust patterns of distributed activity and a capacity for nonlinear transformation of input signals when configured in the appropriate network architecture. Their operational characteristics represent a unique potential for hardware implementation of natural computation, specifically in the area of reservoir computing—a burgeoning field that investigates the computational aptitude of complex biologically inspired systems.
NASA Technical Reports Server (NTRS)
Montag, Bruce C.; Bishop, Alfred M.; Redfield, Joe B.
1989-01-01
The findings of a preliminary investigation by Southwest Research Institute (SwRI) in simulation host computer concepts is presented. It is designed to aid NASA in evaluating simulation technologies for use in spaceflight training. The focus of the investigation is on the next generation of space simulation systems that will be utilized in training personnel for Space Station Freedom operations. SwRI concludes that NASA should pursue a distributed simulation host computer system architecture for the Space Station Training Facility (SSTF) rather than a centralized mainframe based arrangement. A distributed system offers many advantages and is seen by SwRI as the only architecture that will allow NASA to achieve established functional goals and operational objectives over the life of the Space Station Freedom program. Several distributed, parallel computing systems are available today that offer real-time capabilities for time critical, man-in-the-loop simulation. These systems are flexible in terms of connectivity and configurability, and are easily scaled to meet increasing demands for more computing power.
Compensated Crystal Assemblies for Type-II Entangled Photon Generation in Quantum Cluster States
2010-03-01
in quantum computational architectures that operate by principles entirely distinct from any based on classical physics. In contrast with other...of the SPDC spectral function, to enable applications in regions that have not been accessible with other methods. Quantum Information and Computation ...Eliminating frequency and space-time correlations in multi-photon states, PRA 64, 063815, 2001 [2]A. Zeilinger et.al. Experimental One-way computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Muller, U.A.; Baumle, B.; Kohler, P.
1992-10-01
Music, a DSP-based system with a parallel distributed-memory architecture, provides enormous computing power yet retains the flexibility of a general-purpose computer. Reaching a peak performance of 2.7 Gflops at a significantly lower cost, power consumption, and space requirement than conventional supercomputers, Music is well suited to computationally intensive applications such as neural network simulation. 12 refs., 9 figs., 2 tabs.
A parallel computing engine for a class of time critical processes.
Nabhan, T M; Zomaya, A Y
1997-01-01
This paper focuses on the efficient parallel implementation of systems of numerically intensive nature over loosely coupled multiprocessor architectures. These analytical models are of significant importance to many real-time systems that have to meet severe time constants. A parallel computing engine (PCE) has been developed in this work for the efficient simplification and the near optimal scheduling of numerical models over the different cooperating processors of the parallel computer. First, the analytical system is efficiently coded in its general form. The model is then simplified by using any available information (e.g., constant parameters). A task graph representing the interconnections among the different components (or equations) is generated. The graph can then be compressed to control the computation/communication requirements. The task scheduler employs a graph-based iterative scheme, based on the simulated annealing algorithm, to map the vertices of the task graph onto a Multiple-Instruction-stream Multiple-Data-stream (MIMD) type of architecture. The algorithm uses a nonanalytical cost function that properly considers the computation capability of the processors, the network topology, the communication time, and congestion possibilities. Moreover, the proposed technique is simple, flexible, and computationally viable. The efficiency of the algorithm is demonstrated by two case studies with good results.
State-of-the-art in Heterogeneous Computing
Brodtkorb, Andre R.; Dyken, Christopher; Hagen, Trond R.; ...
2010-01-01
Node level heterogeneous architectures have become attractive during the last decade for several reasons: compared to traditional symmetric CPUs, they offer high peak performance and are energy and/or cost efficient. With the increase of fine-grained parallelism in high-performance computing, as well as the introduction of parallelism in workstations, there is an acute need for a good overview and understanding of these architectures. We give an overview of the state-of-the-art in heterogeneous computing, focusing on three commonly found architectures: the Cell Broadband Engine Architecture, graphics processing units (GPUs), and field programmable gate arrays (FPGAs). We present a review of hardware, availablemore » software tools, and an overview of state-of-the-art techniques and algorithms. Furthermore, we present a qualitative and quantitative comparison of the architectures, and give our view on the future of heterogeneous computing.« less
Advanced computer architecture for large-scale real-time applications.
DOT National Transportation Integrated Search
1973-04-01
Air traffic control automation is identified as a crucial problem which provides a complex, real-time computer application environment. A novel computer architecture in the form of a pipeline associative processor is conceived to achieve greater perf...
Integrating Computing Resources: A Shared Distributed Architecture for Academics and Administrators.
ERIC Educational Resources Information Center
Beltrametti, Monica; English, Will
1994-01-01
Development and implementation of a shared distributed computing architecture at the University of Alberta (Canada) are described. Aspects discussed include design of the architecture, users' views of the electronic environment, technical and managerial challenges, and the campuswide human infrastructures needed to manage such an integrated…
NASA Astrophysics Data System (ADS)
Iacobucci, Joseph V.
The research objective for this manuscript is to develop a Rapid Architecture Alternative Modeling (RAAM) methodology to enable traceable Pre-Milestone A decision making during the conceptual phase of design of a system of systems. Rather than following current trends that place an emphasis on adding more analysis which tends to increase the complexity of the decision making problem, RAAM improves on current methods by reducing both runtime and model creation complexity. RAAM draws upon principles from computer science, system architecting, and domain specific languages to enable the automatic generation and evaluation of architecture alternatives. For example, both mission dependent and mission independent metrics are considered. Mission dependent metrics are determined by the performance of systems accomplishing a task, such as Probability of Success. In contrast, mission independent metrics, such as acquisition cost, are solely determined and influenced by the other systems in the portfolio. RAAM also leverages advances in parallel computing to significantly reduce runtime by defining executable models that are readily amendable to parallelization. This allows the use of cloud computing infrastructures such as Amazon's Elastic Compute Cloud and the PASTEC cluster operated by the Georgia Institute of Technology Research Institute (GTRI). Also, the amount of data that can be generated when fully exploring the design space can quickly exceed the typical capacity of computational resources at the analyst's disposal. To counter this, specific algorithms and techniques are employed. Streaming algorithms and recursive architecture alternative evaluation algorithms are used that reduce computer memory requirements. Lastly, a domain specific language is created to provide a reduction in the computational time of executing the system of systems models. A domain specific language is a small, usually declarative language that offers expressive power focused on a particular problem domain by establishing an effective means to communicate the semantics from the RAAM framework. These techniques make it possible to include diverse multi-metric models within the RAAM framework in addition to system and operational level trades. A canonical example was used to explore the uses of the methodology. The canonical example contains all of the features of a full system of systems architecture analysis study but uses fewer tasks and systems. Using RAAM with the canonical example it was possible to consider both system and operational level trades in the same analysis. Once the methodology had been tested with the canonical example, a Suppression of Enemy Air Defenses (SEAD) capability model was developed. Due to the sensitive nature of analyses on that subject, notional data was developed. The notional data has similar trends and properties to realistic Suppression of Enemy Air Defenses data. RAAM was shown to be traceable and provided a mechanism for a unified treatment of a variety of metrics. The SEAD capability model demonstrated lower computer runtimes and reduced model creation complexity as compared to methods currently in use. To determine the usefulness of the implementation of the methodology on current computing hardware, RAAM was tested with system of system architecture studies of different sizes. This was necessary since system of systems may be called upon to accomplish thousands of tasks. It has been clearly demonstrated that RAAM is able to enumerate and evaluate the types of large, complex design spaces usually encountered in capability based design, oftentimes providing the ability to efficiently search the entire decision space. The core algorithms for generation and evaluation of alternatives scale linearly with expected problem sizes. The SEAD capability model outputs prompted the discovery a new issue, the data storage and manipulation requirements for an analysis. Two strategies were developed to counter large data sizes, the use of portfolio views and top 'n' analysis. This proved the usefulness of the RAAM framework and methodology during Pre-Milestone A capability based analysis. (Abstract shortened by UMI.).
ERIC Educational Resources Information Center
Farid, Ayman A.; Zaghloul, Weaam M.; Dewidar, Khaled M.
2014-01-01
The great shift in sustainability and computer aided design in the field of architecture caused a remarkable change in the architecture philosophy, new aspects of beauty and aesthetic values are being introduced, and traditional definitions for beauty cannot fully cover this aspects, which causes a gap between; new architecture works criticism and…
Big data mining analysis method based on cloud computing
NASA Astrophysics Data System (ADS)
Cai, Qing Qiu; Cui, Hong Gang; Tang, Hao
2017-08-01
Information explosion era, large data super-large, discrete and non-(semi) structured features have gone far beyond the traditional data management can carry the scope of the way. With the arrival of the cloud computing era, cloud computing provides a new technical way to analyze the massive data mining, which can effectively solve the problem that the traditional data mining method cannot adapt to massive data mining. This paper introduces the meaning and characteristics of cloud computing, analyzes the advantages of using cloud computing technology to realize data mining, designs the mining algorithm of association rules based on MapReduce parallel processing architecture, and carries out the experimental verification. The algorithm of parallel association rule mining based on cloud computing platform can greatly improve the execution speed of data mining.
Correlation-based pattern recognition for implantable defibrillators.
Wilkins, J.
1996-01-01
An estimated 300,000 Americans die each year from cardiac arrhythmias. Historically, drug therapy or surgery were the only treatment options available for patients suffering from arrhythmias. Recently, implantable arrhythmia management devices have been developed. These devices allow abnormal cardiac rhythms to be sensed and corrected in vivo. Proper arrhythmia classification is critical to selecting the appropriate therapeutic intervention. The classification problem is made more challenging by the power/computation constraints imposed by the short battery life of implantable devices. Current devices utilize heart rate-based classification algorithms. Although easy to implement, rate-based approaches have unacceptably high error rates in distinguishing supraventricular tachycardia (SVT) from ventricular tachycardia (VT). Conventional morphology assessment techniques used in ECG analysis often require too much computation to be practical for implantable devices. In this paper, a computationally-efficient, arrhythmia classification architecture using correlation-based morphology assessment is presented. The architecture classifies individuals heart beats by assessing similarity between an incoming cardiac signal vector and a series of prestored class templates. A series of these beat classifications are used to make an overall rhythm assessment. The system makes use of several new results in the field of pattern recognition. The resulting system achieved excellent accuracy in discriminating SVT and VT. PMID:8947674
2015-06-01
system accuracy. The AnRAD system was also generalized for the additional application of network intrusion detection . A self-structuring technique...to Host- based Intrusion Detection Systems using Contiguous and Discontiguous System Call Patterns,” IEEE Transactions on Computer, 63(4), pp. 807...square kilometer areas. The anomaly recognition and detection (AnRAD) system was built as a cogent confabulation network . It represented road
A Comparison of Computational Cognitive Models: Agent-Based Systems Versus Rule-Based Architectures
2003-03-01
Java™ How To Program , Prentice Hall, 1999. Friedman-Hill, E., Jess, The Expert System Shell for the Java Platform, Sandia National Laboratories, 2001...transition from the descriptive NDM theory to a computational model raises several questions: Who is an experienced decision maker? How do you model the...progression from being a novice to an experienced decision maker? How does the model account for previous experiences? Are there situations where
Design Tools for Accelerating Development and Usage of Multi-Core Computing Platforms
2014-04-01
Government formulated or supplied the drawings, specifications, or other data does not license the holder or any other person or corporation ; or convey...multicore PDSP platforms. The GPU- based capabilities of TDIF are currently oriented towards NVIDIA GPUs, based on the Compute Unified Device Architecture...CUDA) programming language [ NVIDIA 2007], which can be viewed as an extension of C. The multicore PDSP capabilities currently in TDIF are oriented
Execution environment for intelligent real-time control systems
NASA Technical Reports Server (NTRS)
Sztipanovits, Janos
1987-01-01
Modern telerobot control technology requires the integration of symbolic and non-symbolic programming techniques, different models of parallel computations, and various programming paradigms. The Multigraph Architecture, which has been developed for the implementation of intelligent real-time control systems is described. The layered architecture includes specific computational models, integrated execution environment and various high-level tools. A special feature of the architecture is the tight coupling between the symbolic and non-symbolic computations. It supports not only a data interface, but also the integration of the control structures in a parallel computing environment.
Design and Implementation of a Tool for Teaching Programming.
ERIC Educational Resources Information Center
Goktepe, Mesut; And Others
1989-01-01
Discussion of the use of computers in education focuses on a graphics-based system for teaching the Pascal programing language for problem solving. Topics discussed include user interface; notification based systems; communication processes; object oriented programing; workstations; graphics architecture; and flowcharts. (18 references) (LRW)
Cross stratum resources protection in fog-computing-based radio over fiber networks for 5G services
NASA Astrophysics Data System (ADS)
Guo, Shaoyong; Shao, Sujie; Wang, Yao; Yang, Hui
2017-09-01
In order to meet the requirement of internet of things (IoT) and 5G, the cloud radio access network is a paradigm which converges all base stations computational resources into a cloud baseband unit (BBU) pool, while the distributed radio frequency signals are collected by remote radio head (RRH). A precondition for centralized processing in the BBU pool is an interconnection fronthaul network with high capacity and low delay. However, it has become more complex and frequent in the interaction between RRH and BBU and resource scheduling among BBUs in cloud. Cloud radio over fiber network has been proposed in our previous work already. In order to overcome the complexity and latency, in this paper, we first present a novel cross stratum resources protection (CSRP) architecture in fog-computing-based radio over fiber networks (F-RoFN) for 5G services. Additionally, a cross stratum protection (CSP) scheme considering the network survivability is introduced in the proposed architecture. The CSRP with CSP scheme can effectively pull the remote processing resource locally to implement the cooperative radio resource management, enhance the responsiveness and resilience to the dynamic end-to-end 5G service demands, and globally optimize optical network, wireless and fog resources. The feasibility and efficiency of the proposed architecture with CSP scheme are verified on our software defined networking testbed in terms of service latency, transmission success rate, resource occupation rate and blocking probability.
Trust Management in Swarm-Based Autonomic Computing Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Maiden, Wendy M.; Haack, Jereme N.; Fink, Glenn A.
2009-07-07
Reputation-based trust management techniques can address issues such as insider threat as well as quality of service issues that may be malicious in nature. However, trust management techniques must be adapted to the unique needs of the architectures and problem domains to which they are applied. Certain characteristics of swarms such as their lightweight ephemeral nature and indirect communication make this adaptation especially challenging. In this paper we look at the trust issues and opportunities in mobile agent swarm-based autonomic systems and find that by monitoring the trustworthiness of the autonomic managers rather than the swarming sensors, the trust managementmore » problem becomes much more scalable and still serves to protect the swarms. We also analyze the applicability of trust management research as it has been applied to architectures with similar characteristics. Finally, we specify required characteristics for trust management mechanisms to be used to monitor the trustworthiness of the entities in a swarm-based autonomic computing system.« less
NASA Astrophysics Data System (ADS)
Jeffery, Keith; Harrison, Matt; Bailo, Daniele
2016-04-01
The EPOS-PP Project 2010-2014 proposed an architecture and demonstrated feasibility with a prototype. Requirements based on use cases were collected and an inventory of assets (e.g. datasets, software, users, computing resources, equipment/detectors, laboratory services) (RIDE) was developed. The architecture evolved through three stages of refinement with much consultation both with the EPOS community representing EPOS users and participants in geoscience and with the overall ICT community especially those working on research such as the RDA (Research Data Alliance) community. The architecture consists of a central ICS (Integrated Core Services) consisting of a portal and catalog, the latter providing to end-users a 'map' of all EPOS resources (datasets, software, users, computing, equipment/detectors etc.). ICS is extended to ICS-d (distributed ICS) for certain services (such as visualisation software services or Cloud computing resources) and CES (Computational Earth Science) for specific simulation or analytical processing. ICS also communicates with TCS (Thematic Core Services) which represent European-wide portals to national and local assets, resources and services in the various specific domains (e.g. seismology, volcanology, geodesy) of EPOS. The EPOS-IP project 2015-2019 started October 2015. Two work-packages cover the ICT aspects; WP6 involves interaction with the TCS while WP7 concentrates on ICS including interoperation with ICS-d and CES offerings: in short the ICT architecture. Based on the experience and results of EPOS-PP the ICT team held a pre-meeting in July 2015 and set out a project plan. The first major activity involved requirements (re-)collection with use cases and also updating the inventory of assets held by the various TCS in EPOS. The RIDE database of assets is currently being converted to CERIF (Common European Research Information Format - an EU Recommendation to Member States) to provide the basis for the EPOS-IP ICS Catalog. In parallel the ICT team is tracking developments in ICT for relevance to EPOS-IP. In particular, the potential utilisation of e-Is (e-Infrastructures) such as GEANT(network), AARC (security), EGI (GRID computing), EUDAT (data curation), PRACE (High Performance Computing), HELIX-Nebula / Open Science Cloud (Cloud computing) are being assessed. Similarly relationships to other e-RIs (e-Research Infrastructures) such as ENVRI+, EXCELERATE and other ESFRI (European Strategic Forum for Research Infrastructures) projects are developed to share experience and technology and to promote interoperability. EPOS ICT team members are also involved in VRE4EIC, a project developing a reference architecture and component software services for a Virtual Research Environment to be superimposed on EPOS-ICS. The challenge which is being tackled now is therefore to keep consistency and interoperability among the different modules, initiatives and actors which participate to the process of running the EPOS platform. It implies both a continuous update about IT aspects of mentioned initiatives and a refinement of the e-architecture designed so far. One major aspect of EPOS-IP is the ICT support for legalistic, financial and governance aspects of the EPOS ERIC to be initiated during EPOS-IP. This implies a sophisticated AAAI (Authentication, authorization, accounting infrastructure) with consistency throughout the software, communications and data stack.
Category-theoretic models of algebraic computer systems
NASA Astrophysics Data System (ADS)
Kovalyov, S. P.
2016-01-01
A computer system is said to be algebraic if it contains nodes that implement unconventional computation paradigms based on universal algebra. A category-based approach to modeling such systems that provides a theoretical basis for mapping tasks to these systems' architecture is proposed. The construction of algebraic models of general-purpose computations involving conditional statements and overflow control is formally described by a reflector in an appropriate category of algebras. It is proved that this reflector takes the modulo ring whose operations are implemented in the conventional arithmetic processors to the Łukasiewicz logic matrix. Enrichments of the set of ring operations that form bases in the Łukasiewicz logic matrix are found.
Aguilar, Mario; Peot, Mark A; Zhou, Jiangying; Simons, Stephen; Liao, Yuwei; Metwalli, Nader; Anderson, Mark B
2012-03-01
The mammalian visual system is still the gold standard for recognition accuracy, flexibility, efficiency, and speed. Ongoing advances in our understanding of function and mechanisms in the visual system can now be leveraged to pursue the design of computer vision architectures that will revolutionize the state of the art in computer vision.
Persistent Memory in Single Node Delay-Coupled Reservoir Computing.
Kovac, André David; Koall, Maximilian; Pipa, Gordon; Toutounji, Hazem
2016-01-01
Delays are ubiquitous in biological systems, ranging from genetic regulatory networks and synaptic conductances, to predator/pray population interactions. The evidence is mounting, not only to the presence of delays as physical constraints in signal propagation speed, but also to their functional role in providing dynamical diversity to the systems that comprise them. The latter observation in biological systems inspired the recent development of a computational architecture that harnesses this dynamical diversity, by delay-coupling a single nonlinear element to itself. This architecture is a particular realization of Reservoir Computing, where stimuli are injected into the system in time rather than in space as is the case with classical recurrent neural network realizations. This architecture also exhibits an internal memory which fades in time, an important prerequisite to the functioning of any reservoir computing device. However, fading memory is also a limitation to any computation that requires persistent storage. In order to overcome this limitation, the current work introduces an extended version to the single node Delay-Coupled Reservoir, that is based on trained linear feedback. We show by numerical simulations that adding task-specific linear feedback to the single node Delay-Coupled Reservoir extends the class of solvable tasks to those that require nonfading memory. We demonstrate, through several case studies, the ability of the extended system to carry out complex nonlinear computations that depend on past information, whereas the computational power of the system with fading memory alone quickly deteriorates. Our findings provide the theoretical basis for future physical realizations of a biologically-inspired ultrafast computing device with extended functionality.
Persistent Memory in Single Node Delay-Coupled Reservoir Computing
Pipa, Gordon; Toutounji, Hazem
2016-01-01
Delays are ubiquitous in biological systems, ranging from genetic regulatory networks and synaptic conductances, to predator/pray population interactions. The evidence is mounting, not only to the presence of delays as physical constraints in signal propagation speed, but also to their functional role in providing dynamical diversity to the systems that comprise them. The latter observation in biological systems inspired the recent development of a computational architecture that harnesses this dynamical diversity, by delay-coupling a single nonlinear element to itself. This architecture is a particular realization of Reservoir Computing, where stimuli are injected into the system in time rather than in space as is the case with classical recurrent neural network realizations. This architecture also exhibits an internal memory which fades in time, an important prerequisite to the functioning of any reservoir computing device. However, fading memory is also a limitation to any computation that requires persistent storage. In order to overcome this limitation, the current work introduces an extended version to the single node Delay-Coupled Reservoir, that is based on trained linear feedback. We show by numerical simulations that adding task-specific linear feedback to the single node Delay-Coupled Reservoir extends the class of solvable tasks to those that require nonfading memory. We demonstrate, through several case studies, the ability of the extended system to carry out complex nonlinear computations that depend on past information, whereas the computational power of the system with fading memory alone quickly deteriorates. Our findings provide the theoretical basis for future physical realizations of a biologically-inspired ultrafast computing device with extended functionality. PMID:27783690
High-throughput sample adaptive offset hardware architecture for high-efficiency video coding
NASA Astrophysics Data System (ADS)
Zhou, Wei; Yan, Chang; Zhang, Jingzhi; Zhou, Xin
2018-03-01
A high-throughput hardware architecture for a sample adaptive offset (SAO) filter in the high-efficiency video coding video coding standard is presented. First, an implementation-friendly and simplified bitrate estimation method of rate-distortion cost calculation is proposed to reduce the computational complexity in the mode decision of SAO. Then, a high-throughput VLSI architecture for SAO is presented based on the proposed bitrate estimation method. Furthermore, multiparallel VLSI architecture for in-loop filters, which integrates both deblocking filter and SAO filter, is proposed. Six parallel strategies are applied in the proposed in-loop filters architecture to improve the system throughput and filtering speed. Experimental results show that the proposed in-loop filters architecture can achieve up to 48% higher throughput in comparison with prior work. The proposed architecture can reach a high-operating clock frequency of 297 MHz with TSMC 65-nm library and meet the real-time requirement of the in-loop filters for 8 K × 4 K video format at 132 fps.
ERIC Educational Resources Information Center
Betts, Janelle Lyon
2001-01-01
Describes a high school art assignment in which students utilize Appleworks or Claris Works to design their own house, after learning about architectural styles and how to use the computer program. States that the project develops student computer skills and increases student knowledge about architecture. (CMK)
Parallel computing of physical maps--a comparative study in SIMD and MIMD parallelism.
Bhandarkar, S M; Chirravuri, S; Arnold, J
1996-01-01
Ordering clones from a genomic library into physical maps of whole chromosomes presents a central computational problem in genetics. Chromosome reconstruction via clone ordering is usually isomorphic to the NP-complete Optimal Linear Arrangement problem. Parallel SIMD and MIMD algorithms for simulated annealing based on Markov chain distribution are proposed and applied to the problem of chromosome reconstruction via clone ordering. Perturbation methods and problem-specific annealing heuristics are proposed and described. The SIMD algorithms are implemented on a 2048 processor MasPar MP-2 system which is an SIMD 2-D toroidal mesh architecture whereas the MIMD algorithms are implemented on an 8 processor Intel iPSC/860 which is an MIMD hypercube architecture. A comparative analysis of the various SIMD and MIMD algorithms is presented in which the convergence, speedup, and scalability characteristics of the various algorithms are analyzed and discussed. On a fine-grained, massively parallel SIMD architecture with a low synchronization overhead such as the MasPar MP-2, a parallel simulated annealing algorithm based on multiple periodically interacting searches performs the best. For a coarse-grained MIMD architecture with high synchronization overhead such as the Intel iPSC/860, a parallel simulated annealing algorithm based on multiple independent searches yields the best results. In either case, distribution of clonal data across multiple processors is shown to exacerbate the tendency of the parallel simulated annealing algorithm to get trapped in a local optimum.
Strategies for concurrent processing of complex algorithms in data driven architectures
NASA Technical Reports Server (NTRS)
Stoughton, John W.; Mielke, Roland R.
1988-01-01
Research directed at developing a graph theoretical model for describing data and control flow associated with the execution of large grained algorithms in a special distributed computer environment is presented. This model is identified by the acronym ATAMM which represents Algorithms To Architecture Mapping Model. The purpose of such a model is to provide a basis for establishing rules for relating an algorithm to its execution in a multiprocessor environment. Specifications derived from the model lead directly to the description of a data flow architecture which is a consequence of the inherent behavior of the data and control flow described by the model. The purpose of the ATAMM based architecture is to provide an analytical basis for performance evaluation. The ATAMM model and architecture specifications are demonstrated on a prototype system for concept validation.
Oh, Sungyoung; Cha, Jieun; Ji, Myungkyu; Kang, Hyekyung; Kim, Seok; Heo, Eunyoung; Han, Jong Soo; Kang, Hyunggoo; Chae, Hoseok; Hwang, Hee
2015-01-01
Objectives To design a cloud computing-based Healthcare Software-as-a-Service (SaaS) Platform (HSP) for delivering healthcare information services with low cost, high clinical value, and high usability. Methods We analyzed the architecture requirements of an HSP, including the interface, business services, cloud SaaS, quality attributes, privacy and security, and multi-lingual capacity. For cloud-based SaaS services, we focused on Clinical Decision Service (CDS) content services, basic functional services, and mobile services. Microsoft's Azure cloud computing for Infrastructure-as-a-Service (IaaS) and Platform-as-a-Service (PaaS) was used. Results The functional and software views of an HSP were designed in a layered architecture. External systems can be interfaced with the HSP using SOAP and REST/JSON. The multi-tenancy model of the HSP was designed as a shared database, with a separate schema for each tenant through a single application, although healthcare data can be physically located on a cloud or in a hospital, depending on regulations. The CDS services were categorized into rule-based services for medications, alert registration services, and knowledge services. Conclusions We expect that cloud-based HSPs will allow small and mid-sized hospitals, in addition to large-sized hospitals, to adopt information infrastructures and health information technology with low system operation and maintenance costs. PMID:25995962
Geospace simulations using modern accelerator processor technology
NASA Astrophysics Data System (ADS)
Germaschewski, K.; Raeder, J.; Larson, D. J.
2009-12-01
OpenGGCM (Open Geospace General Circulation Model) is a well-established numerical code simulating the Earth's space environment. The most computing intensive part is the MHD (magnetohydrodynamics) solver that models the plasma surrounding Earth and its interaction with Earth's magnetic field and the solar wind flowing in from the sun. Like other global magnetosphere codes, OpenGGCM's realism is currently limited by computational constraints on grid resolution. OpenGGCM has been ported to make use of the added computational powerof modern accelerator based processor architectures, in particular the Cell processor. The Cell architecture is a novel inhomogeneous multicore architecture capable of achieving up to 230 GFLops on a single chip. The University of New Hampshire recently acquired a PowerXCell 8i based computing cluster, and here we will report initial performance results of OpenGGCM. Realizing the high theoretical performance of the Cell processor is a programming challenge, though. We implemented the MHD solver using a multi-level parallelization approach: On the coarsest level, the problem is distributed to processors based upon the usual domain decomposition approach. Then, on each processor, the problem is divided into 3D columns, each of which is handled by the memory limited SPEs (synergistic processing elements) slice by slice. Finally, SIMD instructions are used to fully exploit the SIMD FPUs in each SPE. Memory management needs to be handled explicitly by the code, using DMA to move data from main memory to the per-SPE local store and vice versa. We use a modern technique, automatic code generation, which shields the application programmer from having to deal with all of the implementation details just described, keeping the code much more easily maintainable. Our preliminary results indicate excellent performance, a speed-up of a factor of 30 compared to the unoptimized version.
Hardware implementation of CMAC neural network with reduced storage requirement.
Ker, J S; Kuo, Y H; Wen, R C; Liu, B D
1997-01-01
The cerebellar model articulation controller (CMAC) neural network has the advantages of fast convergence speed and low computation complexity. However, it suffers from a low storage space utilization rate on weight memory. In this paper, we propose a direct weight address mapping approach, which can reduce the required weight memory size with a utilization rate near 100%. Based on such an address mapping approach, we developed a pipeline architecture to efficiently perform the addressing operations. The proposed direct weight address mapping approach also speeds up the computation for the generation of weight addresses. Besides, a CMAC hardware prototype used for color calibration has been implemented to confirm the proposed approach and architecture.
Evaluation of Visual Computer Simulator for Computer Architecture Education
ERIC Educational Resources Information Center
Imai, Yoshiro; Imai, Masatoshi; Moritoh, Yoshio
2013-01-01
This paper presents trial evaluation of a visual computer simulator in 2009-2011, which has been developed to play some roles of both instruction facility and learning tool simultaneously. And it illustrates an example of Computer Architecture education for University students and usage of e-Learning tool for Assembly Programming in order to…
Design of a massively parallel computer using bit serial processing elements
NASA Technical Reports Server (NTRS)
Aburdene, Maurice F.; Khouri, Kamal S.; Piatt, Jason E.; Zheng, Jianqing
1995-01-01
A 1-bit serial processor designed for a parallel computer architecture is described. This processor is used to develop a massively parallel computational engine, with a single instruction-multiple data (SIMD) architecture. The computer is simulated and tested to verify its operation and to measure its performance for further development.
A heterogeneous hierarchical architecture for real-time computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Skroch, D.A.; Fornaro, R.J.
The need for high-speed data acquisition and control algorithms has prompted continued research in the area of multiprocessor systems and related programming techniques. The result presented here is a unique hardware and software architecture for high-speed real-time computer systems. The implementation of a prototype of this architecture has required the integration of architecture, operating systems and programming languages into a cohesive unit. This report describes a Heterogeneous Hierarchial Architecture for Real-Time (H{sup 2} ART) and system software for program loading and interprocessor communication.
All-optical reservoir computing.
Duport, François; Schneider, Bendix; Smerieri, Anteo; Haelterman, Marc; Massar, Serge
2012-09-24
Reservoir Computing is a novel computing paradigm that uses a nonlinear recurrent dynamical system to carry out information processing. Recent electronic and optoelectronic Reservoir Computers based on an architecture with a single nonlinear node and a delay loop have shown performance on standardized tasks comparable to state-of-the-art digital implementations. Here we report an all-optical implementation of a Reservoir Computer, made of off-the-shelf components for optical telecommunications. It uses the saturation of a semiconductor optical amplifier as nonlinearity. The present work shows that, within the Reservoir Computing paradigm, all-optical computing with state-of-the-art performance is possible.
NASA Astrophysics Data System (ADS)
Tolba, Khaled Ibrahim; Morgenthal, Guido
2018-01-01
This paper presents an analysis of the scalability and efficiency of a simulation framework based on the vortex particle method. The code is applied for the numerical aerodynamic analysis of line-like structures. The numerical code runs on multicore CPU and GPU architectures using OpenCL framework. The focus of this paper is the analysis of the parallel efficiency and scalability of the method being applied to an engineering test case, specifically the aeroelastic response of a long-span bridge girder at the construction stage. The target is to assess the optimal configuration and the required computer architecture, such that it becomes feasible to efficiently utilise the method within the computational resources available for a regular engineering office. The simulations and the scalability analysis are performed on a regular gaming type computer.
NASA Astrophysics Data System (ADS)
Yager, Kevin; Albert, Thomas; Brower, Bernard V.; Pellechia, Matthew F.
2015-06-01
The domain of Geospatial Intelligence Analysis is rapidly shifting toward a new paradigm of Activity Based Intelligence (ABI) and information-based Tipping and Cueing. General requirements for an advanced ABIAA system present significant challenges in architectural design, computing resources, data volumes, workflow efficiency, data mining and analysis algorithms, and database structures. These sophisticated ABI software systems must include advanced algorithms that automatically flag activities of interest in less time and within larger data volumes than can be processed by human analysts. In doing this, they must also maintain the geospatial accuracy necessary for cross-correlation of multi-intelligence data sources. Historically, serial architectural workflows have been employed in ABIAA system design for tasking, collection, processing, exploitation, and dissemination. These simpler architectures may produce implementations that solve short term requirements; however, they have serious limitations that preclude them from being used effectively in an automated ABIAA system with multiple data sources. This paper discusses modern ABIAA architectural considerations providing an overview of an advanced ABIAA system and comparisons to legacy systems. It concludes with a recommended strategy and incremental approach to the research, development, and construction of a fully automated ABIAA system.
Metal oxide resistive random access memory based synaptic devices for brain-inspired computing
NASA Astrophysics Data System (ADS)
Gao, Bin; Kang, Jinfeng; Zhou, Zheng; Chen, Zhe; Huang, Peng; Liu, Lifeng; Liu, Xiaoyan
2016-04-01
The traditional Boolean computing paradigm based on the von Neumann architecture is facing great challenges for future information technology applications such as big data, the Internet of Things (IoT), and wearable devices, due to the limited processing capability issues such as binary data storage and computing, non-parallel data processing, and the buses requirement between memory units and logic units. The brain-inspired neuromorphic computing paradigm is believed to be one of the promising solutions for realizing more complex functions with a lower cost. To perform such brain-inspired computing with a low cost and low power consumption, novel devices for use as electronic synapses are needed. Metal oxide resistive random access memory (ReRAM) devices have emerged as the leading candidate for electronic synapses. This paper comprehensively addresses the recent work on the design and optimization of metal oxide ReRAM-based synaptic devices. A performance enhancement methodology and optimized operation scheme to achieve analog resistive switching and low-energy training behavior are provided. A three-dimensional vertical synapse network architecture is proposed for high-density integration and low-cost fabrication. The impacts of the ReRAM synaptic device features on the performances of neuromorphic systems are also discussed on the basis of a constructed neuromorphic visual system with a pattern recognition function. Possible solutions to achieve the high recognition accuracy and efficiency of neuromorphic systems are presented.
Cyberpsychology: a human-interaction perspective based on cognitive modeling.
Emond, Bruno; West, Robert L
2003-10-01
This paper argues for the relevance of cognitive modeling and cognitive architectures to cyberpsychology. From a human-computer interaction point of view, cognitive modeling can have benefits both for theory and model building, and for the design and evaluation of sociotechnical systems usability. Cognitive modeling research applied to human-computer interaction has two complimentary objectives: (1) to develop theories and computational models of human interactive behavior with information and collaborative technologies, and (2) to use the computational models as building blocks for the design, implementation, and evaluation of interactive technologies. From the perspective of building theories and models, cognitive modeling offers the possibility to anchor cyberpsychology theories and models into cognitive architectures. From the perspective of the design and evaluation of socio-technical systems, cognitive models can provide the basis for simulated users, which can play an important role in usability testing. As an example of application of cognitive modeling to technology design, the paper presents a simulation of interactive behavior with five different adaptive menu algorithms: random, fixed, stacked, frequency based, and activation based. Results of the simulation indicate that fixed menu positions seem to offer the best support for classification like tasks such as filing e-mails. This research is part of the Human-Computer Interaction, and the Broadband Visual Communication research programs at the National Research Council of Canada, in collaboration with the Carleton Cognitive Modeling Lab at Carleton University.
The GOES-R Product Generation Architecture - Post CDR Update
NASA Astrophysics Data System (ADS)
Dittberner, G.; Kalluri, S.; Weiner, A.
2012-12-01
The GOES-R system will substantially improve the accuracy of information available to users by providing data from significantly enhanced instruments, which will generate an increased number and diversity of products with higher resolution, and much shorter relook times. Considerably greater compute and memory resources are necessary to achieve the necessary latency and availability for these products. Over time, new and updated algorithms are expected to be added and old ones removed as science advances and new products are developed. The GOES-R GS architecture is being planned to maintain functionality so that when such changes are implemented, operational product generation will continue without interruption. The primary parts of the PG infrastructure are the Service Based Architecture (SBA) and the Data Fabric (DF). SBA is the middleware that encapsulates and manages science algorithms that generate products. It is divided into three parts, the Executive, which manages and configures the algorithm as a service, the Dispatcher, which provides data to the algorithm, and the Strategy, which determines when the algorithm can execute with the available data. SBA is a distributed architecture, with services connected to each other over a compute grid and is highly scalable. This plug-and-play architecture allows algorithms to be added, removed, or updated without affecting any other services or software currently running and producing data. Algorithms require product data from other algorithms, so a scalable and reliable messaging is necessary. The SBA uses the DF to provide this data communication layer between algorithms. The DF provides an abstract interface over a distributed and persistent multi-layered storage system (e.g., memory based caching above disk-based storage) and an event management system that allows event-driven algorithm services to know when instrument data are available and where they reside. Together, the SBA and the DF provide a flexible, high performance architecture that can meet the needs of product processing now and as they grow in the future.
Problems Related to Parallelization of CFD Algorithms on GPU, Multi-GPU and Hybrid Architectures
NASA Astrophysics Data System (ADS)
Biazewicz, Marek; Kurowski, Krzysztof; Ludwiczak, Bogdan; Napieraia, Krystyna
2010-09-01
Computational Fluid Dynamics (CFD) is one of the branches of fluid mechanics, which uses numerical methods and algorithms to solve and analyze fluid flows. CFD is used in various domains, such as oil and gas reservoir uncertainty analysis, aerodynamic body shapes optimization (e.g. planes, cars, ships, sport helmets, skis), natural phenomena analysis, numerical simulation for weather forecasting or realistic visualizations. CFD problem is very complex and needs a lot of computational power to obtain the results in a reasonable time. We have implemented a parallel application for two-dimensional CFD simulation with a free surface approximation (MAC method) using new hardware architectures, in particular multi-GPU and hybrid computing environments. For this purpose we decided to use NVIDIA graphic cards with CUDA environment due to its simplicity of programming and good computations performance. We used finite difference discretization of Navier-Stokes equations, where fluid is propagated over an Eulerian Grid. In this model, the behavior of the fluid inside the cell depends only on the properties of local, surrounding cells, therefore it is well suited for the GPU-based architecture. In this paper we demonstrate how to use efficiently the computing power of GPUs for CFD. Additionally, we present some best practices to help users analyze and improve the performance of CFD applications executed on GPU. Finally, we discuss various challenges around the multi-GPU implementation on the example of matrix multiplication.
NASA Astrophysics Data System (ADS)
Neff, John A.
1989-12-01
Experiments originating from Gestalt psychology have shown that representing information in a symbolic form provides a more effective means to understanding. Computer scientists have been struggling for the last two decades to determine how best to create, manipulate, and store collections of symbolic structures. In the past, much of this struggling led to software innovations because that was the path of least resistance. For example, the development of heuristics for organizing the searching through knowledge bases was much less expensive than building massively parallel machines that could search in parallel. That is now beginning to change with the emergence of parallel architectures which are showing the potential for handling symbolic structures. This paper will review the relationships between symbolic computing and parallel computing architectures, and will identify opportunities for optics to significantly impact the performance of such computing machines. Although neural networks are an exciting subset of massively parallel computing structures, this paper will not touch on this area since it is receiving a great deal of attention in the literature. That is, the concepts presented herein do not consider the distributed representation of knowledge.
FPGA-Based Laboratory Assignments for NoC-Based Manycore Systems
ERIC Educational Resources Information Center
Ttofis, C.; Theocharides, T.; Michael, M. K.
2012-01-01
Manycore systems have emerged as being one of the dominant architectural trends in next-generation computer systems. These highly parallel systems are expected to be interconnected via packet-based networks-on-chip (NoC). The complexity of such systems poses novel and exciting challenges in academia, as teaching their design requires the students…
Demonstration of optical computing logics based on binary decision diagram.
Lin, Shiyun; Ishikawa, Yasuhiko; Wada, Kazumi
2012-01-16
Optical circuits are low power consumption and fast speed alternatives for the current information processing based on transistor circuits. However, because of no transistor function available in optics, the architecture for optical computing should be chosen that optics prefers. One of which is Binary Decision Diagram (BDD), where signal is processed by sending an optical signal from the root through a serial of switching nodes to the leaf (terminal). Speed of optical computing is limited by either transmission time of optical signals from the root to the leaf or switching time of a node. We have designed and experimentally demonstrated 1-bit and 2-bit adders based on the BDD architecture. The switching nodes are silicon ring resonators with a modulation depth of 10 dB and the states are changed by the plasma dispersion effect. The quality, Q of the rings designed is 1500, which allows fast transmission of signal, e.g., 1.3 ps calculated by a photon escaping time. A total processing time is thus analyzed to be ~9 ps for a 2-bit adder and would scales linearly with the number of bit. It is two orders of magnitude faster than the conventional CMOS circuitry, ~ns scale of delay. The presented results show the potential of fast speed optical computing circuits.
NASA Astrophysics Data System (ADS)
Mills, R. T.; Rupp, K.; Smith, B. F.; Brown, J.; Knepley, M.; Zhang, H.; Adams, M.; Hammond, G. E.
2017-12-01
As the high-performance computing community pushes towards the exascale horizon, power and heat considerations have driven the increasing importance and prevalence of fine-grained parallelism in new computer architectures. High-performance computing centers have become increasingly reliant on GPGPU accelerators and "manycore" processors such as the Intel Xeon Phi line, and 512-bit SIMD registers have even been introduced in the latest generation of Intel's mainstream Xeon server processors. The high degree of fine-grained parallelism and more complicated memory hierarchy considerations of such "manycore" processors present several challenges to existing scientific software. Here, we consider how the massively parallel, open-source hydrologic flow and reactive transport code PFLOTRAN - and the underlying Portable, Extensible Toolkit for Scientific Computation (PETSc) library on which it is built - can best take advantage of such architectures. We will discuss some key features of these novel architectures and our code optimizations and algorithmic developments targeted at them, and present experiences drawn from working with a wide range of PFLOTRAN benchmark problems on these architectures.
NASA Astrophysics Data System (ADS)
Hadade, Ioan; di Mare, Luca
2016-08-01
Modern multicore and manycore processors exhibit multiple levels of parallelism through a wide range of architectural features such as SIMD for data parallel execution or threads for core parallelism. The exploitation of multi-level parallelism is therefore crucial for achieving superior performance on current and future processors. This paper presents the performance tuning of a multiblock CFD solver on Intel SandyBridge and Haswell multicore CPUs and the Intel Xeon Phi Knights Corner coprocessor. Code optimisations have been applied on two computational kernels exhibiting different computational patterns: the update of flow variables and the evaluation of the Roe numerical fluxes. We discuss at great length the code transformations required for achieving efficient SIMD computations for both kernels across the selected devices including SIMD shuffles and transpositions for flux stencil computations and global memory transformations. Core parallelism is expressed through threading based on a number of domain decomposition techniques together with optimisations pertaining to alleviating NUMA effects found in multi-socket compute nodes. Results are correlated with the Roofline performance model in order to assert their efficiency for each distinct architecture. We report significant speedups for single thread execution across both kernels: 2-5X on the multicore CPUs and 14-23X on the Xeon Phi coprocessor. Computations at full node and chip concurrency deliver a factor of three speedup on the multicore processors and up to 24X on the Xeon Phi manycore coprocessor.
EON: a component-based approach to automation of protocol-directed therapy.
Musen, M A; Tu, S W; Das, A K; Shahar, Y
1996-01-01
Provision of automated support for planning protocol-directed therapy requires a computer program to take as input clinical data stored in an electronic patient-record system and to generate as output recommendations for therapeutic interventions and laboratory testing that are defined by applicable protocols. This paper presents a synthesis of research carried out at Stanford University to model the therapy-planning task and to demonstrate a component-based architecture for building protocol-based decision-support systems. We have constructed general-purpose software components that (1) interpret abstract protocol specifications to construct appropriate patient-specific treatment plans; (2) infer from time-stamped patient data higher-level, interval-based, abstract concepts; (3) perform time-oriented queries on a time-oriented patient database; and (4) allow acquisition and maintenance of protocol knowledge in a manner that facilitates efficient processing both by humans and by computers. We have implemented these components in a computer system known as EON. Each of the components has been developed, evaluated, and reported independently. We have evaluated the integration of the components as a composite architecture by implementing T-HELPER, a computer-based patient-record system that uses EON to offer advice regarding the management of patients who are following clinical trial protocols for AIDS or HIV infection. A test of the reuse of the software components in a different clinical domain demonstrated rapid development of a prototype application to support protocol-based care of patients who have breast cancer. PMID:8930854
CE-ACCE: The Cloud Enabled Advanced sCience Compute Environment
NASA Astrophysics Data System (ADS)
Cinquini, L.; Freeborn, D. J.; Hardman, S. H.; Wong, C.
2017-12-01
Traditionally, Earth Science data from NASA remote sensing instruments has been processed by building custom data processing pipelines (often based on a common workflow engine or framework) which are typically deployed and run on an internal cluster of computing resources. This approach has some intrinsic limitations: it requires each mission to develop and deploy a custom software package on top of the adopted framework; it makes use of dedicated hardware, network and storage resources, which must be specifically purchased, maintained and re-purposed at mission completion; and computing services cannot be scaled on demand beyond the capability of the available servers.More recently, the rise of Cloud computing, coupled with other advances in containerization technology (most prominently, Docker) and micro-services architecture, has enabled a new paradigm, whereby space mission data can be processed through standard system architectures, which can be seamlessly deployed and scaled on demand on either on-premise clusters, or commercial Cloud providers. In this talk, we will present one such architecture named CE-ACCE ("Cloud Enabled Advanced sCience Compute Environment"), which we have been developing at the NASA Jet Propulsion Laboratory over the past year. CE-ACCE is based on the Apache OODT ("Object Oriented Data Technology") suite of services for full data lifecycle management, which are turned into a composable array of Docker images, and complemented by a plug-in model for mission-specific customization. We have applied this infrastructure to both flying and upcoming NASA missions, such as ECOSTRESS and SMAP, and demonstrated deployment on the Amazon Cloud, either using simple EC2 instances, or advanced AWS services such as Amazon Lambda and ECS (EC2 Container Services).
Fast and Scalable Computation of the Forward and Inverse Discrete Periodic Radon Transform.
Carranza, Cesar; Llamocca, Daniel; Pattichis, Marios
2016-01-01
The discrete periodic radon transform (DPRT) has extensively been used in applications that involve image reconstructions from projections. Beyond classic applications, the DPRT can also be used to compute fast convolutions that avoids the use of floating-point arithmetic associated with the use of the fast Fourier transform. Unfortunately, the use of the DPRT has been limited by the need to compute a large number of additions and the need for a large number of memory accesses. This paper introduces a fast and scalable approach for computing the forward and inverse DPRT that is based on the use of: a parallel array of fixed-point adder trees; circular shift registers to remove the need for accessing external memory components when selecting the input data for the adder trees; an image block-based approach to DPRT computation that can fit the proposed architecture to available resources; and fast transpositions that are computed in one or a few clock cycles that do not depend on the size of the input image. As a result, for an N × N image (N prime), the proposed approach can compute up to N(2) additions per clock cycle. Compared with the previous approaches, the scalable approach provides the fastest known implementations for different amounts of computational resources. For example, for a 251×251 image, for approximately 25% fewer flip-flops than required for a systolic implementation, we have that the scalable DPRT is computed 36 times faster. For the fastest case, we introduce optimized just 2N + ⌈log(2) N⌉ + 1 and 2N + 3 ⌈log(2) N⌉ + B + 2 cycles, architectures that can compute the DPRT and its inverse in respectively, where B is the number of bits used to represent each input pixel. On the other hand, the scalable DPRT approach requires more 1-b additions than for the systolic implementation and provides a tradeoff between speed and additional 1-b additions. All of the proposed DPRT architectures were implemented in VHSIC Hardware Description Language (VHDL) and validated using an Field-Programmable Gate Array (FPGA) implementation.
NASA Technical Reports Server (NTRS)
Weeks, Cindy Lou
1986-01-01
Experiments were conducted at NASA Ames Research Center to define multi-tasking software requirements for multiple-instruction, multiple-data stream (MIMD) computer architectures. The focus was on specifying solutions for algorithms in the field of computational fluid dynamics (CFD). The program objectives were to allow researchers to produce usable parallel application software as soon as possible after acquiring MIMD computer equipment, to provide researchers with an easy-to-learn and easy-to-use parallel software language which could be implemented on several different MIMD machines, and to enable researchers to list preferred design specifications for future MIMD computer architectures. Analysis of CFD algorithms indicated that extensions of an existing programming language, adaptable to new computer architectures, provided the best solution to meeting program objectives. The CoFORTRAN Language was written in response to these objectives and to provide researchers a means to experiment with parallel software solutions to CFD algorithms on machines with parallel architectures.
Semivariogram Analysis of Bone Images Implemented on FPGA Architectures.
Shirvaikar, Mukul; Lagadapati, Yamuna; Dong, Xuanliang
2017-03-01
Osteoporotic fractures are a major concern for the healthcare of elderly and female populations. Early diagnosis of patients with a high risk of osteoporotic fractures can be enhanced by introducing second-order statistical analysis of bone image data using techniques such as variogram analysis. Such analysis is computationally intensive thereby creating an impediment for introduction into imaging machines found in common clinical settings. This paper investigates the fast implementation of the semivariogram algorithm, which has been proven to be effective in modeling bone strength, and should be of interest to readers in the areas of computer-aided diagnosis and quantitative image analysis. The semivariogram is a statistical measure of the spatial distribution of data, and is based on Markov Random Fields (MRFs). Semivariogram analysis is a computationally intensive algorithm that has typically seen applications in the geosciences and remote sensing areas. Recently, applications in the area of medical imaging have been investigated, resulting in the need for efficient real time implementation of the algorithm. A semi-variance, γ ( h ), is defined as the half of the expected squared differences of pixel values between any two data locations with a lag distance of h . Due to the need to examine each pair of pixels in the image or sub-image being processed, the base algorithm complexity for an image window with n pixels is O ( n 2 ) Field Programmable Gate Arrays (FPGAs) are an attractive solution for such demanding applications due to their parallel processing capability. FPGAs also tend to operate at relatively modest clock rates measured in a few hundreds of megahertz. This paper presents a technique for the fast computation of the semivariogram using two custom FPGA architectures. A modular architecture approach is chosen to allow for replication of processing units. This allows for high throughput due to concurrent processing of pixel pairs. The current implementation is focused on isotropic semivariogram computations only. The algorithm is benchmarked using VHDL on a Xilinx XUPV5-LX110T development Kit, which utilizes the Virtex5 FPGA. Medical image data from DXA scans are utilized for the experiments. Implementation results show that a significant advantage in computational speed is attained by the architectures with respect to implementation on a personal computer with an Intel i7 multi-core processor.
Semivariogram Analysis of Bone Images Implemented on FPGA Architectures
Shirvaikar, Mukul; Lagadapati, Yamuna; Dong, Xuanliang
2016-01-01
Osteoporotic fractures are a major concern for the healthcare of elderly and female populations. Early diagnosis of patients with a high risk of osteoporotic fractures can be enhanced by introducing second-order statistical analysis of bone image data using techniques such as variogram analysis. Such analysis is computationally intensive thereby creating an impediment for introduction into imaging machines found in common clinical settings. This paper investigates the fast implementation of the semivariogram algorithm, which has been proven to be effective in modeling bone strength, and should be of interest to readers in the areas of computer-aided diagnosis and quantitative image analysis. The semivariogram is a statistical measure of the spatial distribution of data, and is based on Markov Random Fields (MRFs). Semivariogram analysis is a computationally intensive algorithm that has typically seen applications in the geosciences and remote sensing areas. Recently, applications in the area of medical imaging have been investigated, resulting in the need for efficient real time implementation of the algorithm. A semi-variance, γ(h), is defined as the half of the expected squared differences of pixel values between any two data locations with a lag distance of h. Due to the need to examine each pair of pixels in the image or sub-image being processed, the base algorithm complexity for an image window with n pixels is O (n2) Field Programmable Gate Arrays (FPGAs) are an attractive solution for such demanding applications due to their parallel processing capability. FPGAs also tend to operate at relatively modest clock rates measured in a few hundreds of megahertz. This paper presents a technique for the fast computation of the semivariogram using two custom FPGA architectures. A modular architecture approach is chosen to allow for replication of processing units. This allows for high throughput due to concurrent processing of pixel pairs. The current implementation is focused on isotropic semivariogram computations only. The algorithm is benchmarked using VHDL on a Xilinx XUPV5-LX110T development Kit, which utilizes the Virtex5 FPGA. Medical image data from DXA scans are utilized for the experiments. Implementation results show that a significant advantage in computational speed is attained by the architectures with respect to implementation on a personal computer with an Intel i7 multi-core processor. PMID:28428829
A Metascalable Computing Framework for Large Spatiotemporal-Scale Atomistic Simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nomura, K; Seymour, R; Wang, W
2009-02-17
A metascalable (or 'design once, scale on new architectures') parallel computing framework has been developed for large spatiotemporal-scale atomistic simulations of materials based on spatiotemporal data locality principles, which is expected to scale on emerging multipetaflops architectures. The framework consists of: (1) an embedded divide-and-conquer (EDC) algorithmic framework based on spatial locality to design linear-scaling algorithms for high complexity problems; (2) a space-time-ensemble parallel (STEP) approach based on temporal locality to predict long-time dynamics, while introducing multiple parallelization axes; and (3) a tunable hierarchical cellular decomposition (HCD) parallelization framework to map these O(N) algorithms onto a multicore cluster based onmore » hybrid implementation combining message passing and critical section-free multithreading. The EDC-STEP-HCD framework exposes maximal concurrency and data locality, thereby achieving: (1) inter-node parallel efficiency well over 0.95 for 218 billion-atom molecular-dynamics and 1.68 trillion electronic-degrees-of-freedom quantum-mechanical simulations on 212,992 IBM BlueGene/L processors (superscalability); (2) high intra-node, multithreading parallel efficiency (nanoscalability); and (3) nearly perfect time/ensemble parallel efficiency (eon-scalability). The spatiotemporal scale covered by MD simulation on a sustained petaflops computer per day (i.e. petaflops {center_dot} day of computing) is estimated as NT = 2.14 (e.g. N = 2.14 million atoms for T = 1 microseconds).« less
Magnetic tunnel junction based spintronic logic devices
NASA Astrophysics Data System (ADS)
Lyle, Andrew Paul
The International Technology Roadmap for Semiconductors (ITRS) predicts that complimentary metal oxide semiconductor (CMOS) based technologies will hit their last generation on or near the 16 nm node, which we expect to reach by the year 2025. Thus future advances in computational power will not be realized from ever-shrinking device sizes, but rather by 'outside the box' designs and new physics, including molecular or DNA based computation, organics, magnonics, or spintronic. This dissertation investigates magnetic logic devices for post-CMOS computation. Three different architectures were studied, each relying on a different magnetic mechanism to compute logic functions. Each design has it benefits and challenges that must be overcome. This dissertation focuses on pushing each design from the drawing board to a realistic logic technology. The first logic architecture is based on electrically connected magnetic tunnel junctions (MTJs) that allow direct communication between elements without intermediate sensing amplifiers. Two and three input logic gates, which consist of two and three MTJs connected in parallel, respectively were fabricated and are compared. The direct communication is realized by electrically connecting the output in series with the input and applying voltage across the series connections. The logic gates rely on the fact that a change in resistance at the input modulates the voltage that is needed to supply the critical current for spin transfer torque switching the output. The change in resistance at the input resulted in a voltage margin of 50--200 mV and 250--300 mV for the closest input states for the three and two input designs, respectively. The two input logic gate realizes the AND, NAND, NOR, and OR logic functions. The three input logic function realizes the Majority, AND, NAND, NOR, and OR logic operations. The second logic architecture utilizes magnetostatically coupled nanomagnets to compute logic functions, which is the basis of Magnetic Quantum Cellular Automata (MQCA). MQCA has the potential to be thousands of times more energy efficient than CMOS technology. While interesting, these systems are academic unless they can be interfaced into current technologies. This dissertation pushed past a major hurdle by experimentally demonstrating a spintronic input/output (I/O) interface for the magnetostatically coupled nanomagnets by incorporating MTJs. This spintronic interface allows individual nanomagnets to be programmed using spin transfer torque and read using magneto resistance structure. Additionally the spintronic interface allows statistical data on the reliability of the magnetic coupling utilized for data propagation to be easily measured. The integration of spintronics and MQCA for an electrical interface to achieve a magnetic logic device with low power creates a competitive post-CMOS logic device. The final logic architecture that was studied used MTJs to compute logic functions and magnetic domain walls to communicate between gates. Simulations were used to optimize the design of this architecture. Spin transfer torque was used to compute logic function at each MTJ gate and was used to drive the domain walls. The design demonstrated that multiple nanochannels could be connected to each MTJ to realize fan-out from the logic gates. As a result this logic scheme eliminates the need for intermediate reads and conversions to pass information from one logic gate to another.
A low-power and high-quality implementation of the discrete cosine transformation
NASA Astrophysics Data System (ADS)
Heyne, B.; Götze, J.
2007-06-01
In this paper a computationally efficient and high-quality preserving DCT architecture is presented. It is obtained by optimizing the Loeffler DCT based on the Cordic algorithm. The computational complexity is reduced from 11 multiply and 29 add operations (Loeffler DCT) to 38 add and 16 shift operations (which is similar to the complexity of the binDCT). The experimental results show that the proposed DCT algorithm not only reduces the computational complexity significantly, but also retains the good transformation quality of the Loeffler DCT. Therefore, the proposed Cordic based Loeffler DCT is especially suited for low-power and high-quality CODECs in battery-based systems.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Potok, Thomas; Schuman, Catherine; Patton, Robert
The White House and Department of Energy have been instrumental in driving the development of a neuromorphic computing program to help the United States continue its lead in basic research into (1) Beyond Exascale—high performance computing beyond Moore’s Law and von Neumann architectures, (2) Scientific Discovery—new paradigms for understanding increasingly large and complex scientific data, and (3) Emerging Architectures—assessing the potential of neuromorphic and quantum architectures. Neuromorphic computing spans a broad range of scientific disciplines from materials science to devices, to computer science, to neuroscience, all of which are required to solve the neuromorphic computing grand challenge. In our workshopmore » we focus on the computer science aspects, specifically from a neuromorphic device through an application. Neuromorphic devices present a very different paradigm to the computer science community from traditional von Neumann architectures, which raises six major questions about building a neuromorphic application from the device level. We used these fundamental questions to organize the workshop program and to direct the workshop panels and discussions. From the white papers, presentations, panels, and discussions, there emerged several recommendations on how to proceed.« less
Peculiarities of Natural Technology Application in Architecture
NASA Astrophysics Data System (ADS)
Umorina, Z.
2017-11-01
Technical advancement of the modern world has made it possible to create unique artificial objects based on the natural technology principle. New engineering and design types, such as computational design, additive manufacturing, materials engineering, synthetic biology, etc. allow us to enter a new level of interaction between a human being and nature. This influences the formation of a new world view in the sphere of architecture and leads to the development of new methods and styles [1,2].
The architecture of a distributed medical dictionary.
Fowler, J; Buffone, G; Moreau, D
1995-01-01
Exploiting high-speed computer networks to provide a national medical information infrastructure is a goal for medical informatics. The Distributed Medical Dictionary under development at Baylor College of Medicine is a model for an architecture that supports collaborative development of a distributed online medical terminology knowledge-base. A prototype is described that illustrates the concept. Issues that must be addressed by such a system include high availability, acceptable response time, support for local idiom, and control of vocabulary.
Three-dimensional integration of nanotechnologies for computing and data storage on a single chip
NASA Astrophysics Data System (ADS)
Shulaker, Max M.; Hills, Gage; Park, Rebecca S.; Howe, Roger T.; Saraswat, Krishna; Wong, H.-S. Philip; Mitra, Subhasish
2017-07-01
The computing demands of future data-intensive applications will greatly exceed the capabilities of current electronics, and are unlikely to be met by isolated improvements in transistors, data storage technologies or integrated circuit architectures alone. Instead, transformative nanosystems, which use new nanotechnologies to simultaneously realize improved devices and new integrated circuit architectures, are required. Here we present a prototype of such a transformative nanosystem. It consists of more than one million resistive random-access memory cells and more than two million carbon-nanotube field-effect transistors—promising new nanotechnologies for use in energy-efficient digital logic circuits and for dense data storage—fabricated on vertically stacked layers in a single chip. Unlike conventional integrated circuit architectures, the layered fabrication realizes a three-dimensional integrated circuit architecture with fine-grained and dense vertical connectivity between layers of computing, data storage, and input and output (in this instance, sensing). As a result, our nanosystem can capture massive amounts of data every second, store it directly on-chip, perform in situ processing of the captured data, and produce ‘highly processed’ information. As a working prototype, our nanosystem senses and classifies ambient gases. Furthermore, because the layers are fabricated on top of silicon logic circuitry, our nanosystem is compatible with existing infrastructure for silicon-based technologies. Such complex nano-electronic systems will be essential for future high-performance and highly energy-efficient electronic systems.
NASA Astrophysics Data System (ADS)
King, Nelson E.; Liu, Brent; Zhou, Zheng; Documet, Jorge; Huang, H. K.
2005-04-01
Grid Computing represents the latest and most exciting technology to evolve from the familiar realm of parallel, peer-to-peer and client-server models that can address the problem of fault-tolerant storage for backup and recovery of clinical images. We have researched and developed a novel Data Grid testbed involving several federated PAC systems based on grid architecture. By integrating a grid computing architecture to the DICOM environment, a failed PACS archive can recover its image data from others in the federation in a timely and seamless fashion. The design reflects the five-layer architecture of grid computing: Fabric, Resource, Connectivity, Collective, and Application Layers. The testbed Data Grid architecture representing three federated PAC systems, the Fault-Tolerant PACS archive server at the Image Processing and Informatics Laboratory, Marina del Rey, the clinical PACS at Saint John's Health Center, Santa Monica, and the clinical PACS at the Healthcare Consultation Center II, USC Health Science Campus, will be presented. The successful demonstration of the Data Grid in the testbed will provide an understanding of the Data Grid concept in clinical image data backup as well as establishment of benchmarks for performance from future grid technology improvements and serve as a road map for expanded research into large enterprise and federation level data grids to guarantee 99.999 % up time.
Three-dimensional integration of nanotechnologies for computing and data storage on a single chip.
Shulaker, Max M; Hills, Gage; Park, Rebecca S; Howe, Roger T; Saraswat, Krishna; Wong, H-S Philip; Mitra, Subhasish
2017-07-05
The computing demands of future data-intensive applications will greatly exceed the capabilities of current electronics, and are unlikely to be met by isolated improvements in transistors, data storage technologies or integrated circuit architectures alone. Instead, transformative nanosystems, which use new nanotechnologies to simultaneously realize improved devices and new integrated circuit architectures, are required. Here we present a prototype of such a transformative nanosystem. It consists of more than one million resistive random-access memory cells and more than two million carbon-nanotube field-effect transistors-promising new nanotechnologies for use in energy-efficient digital logic circuits and for dense data storage-fabricated on vertically stacked layers in a single chip. Unlike conventional integrated circuit architectures, the layered fabrication realizes a three-dimensional integrated circuit architecture with fine-grained and dense vertical connectivity between layers of computing, data storage, and input and output (in this instance, sensing). As a result, our nanosystem can capture massive amounts of data every second, store it directly on-chip, perform in situ processing of the captured data, and produce 'highly processed' information. As a working prototype, our nanosystem senses and classifies ambient gases. Furthermore, because the layers are fabricated on top of silicon logic circuitry, our nanosystem is compatible with existing infrastructure for silicon-based technologies. Such complex nano-electronic systems will be essential for future high-performance and highly energy-efficient electronic systems.
ERIC Educational Resources Information Center
Amenyo, John-Thones
2012-01-01
Carefully engineered playable games can serve as vehicles for students and practitioners to learn and explore the programming of advanced computer architectures to execute applications, such as high performance computing (HPC) and complex, inter-networked, distributed systems. The article presents families of playable games that are grounded in…
PathCase-SB architecture and database design
2011-01-01
Background Integration of metabolic pathways resources and regulatory metabolic network models, and deploying new tools on the integrated platform can help perform more effective and more efficient systems biology research on understanding the regulation in metabolic networks. Therefore, the tasks of (a) integrating under a single database environment regulatory metabolic networks and existing models, and (b) building tools to help with modeling and analysis are desirable and intellectually challenging computational tasks. Description PathCase Systems Biology (PathCase-SB) is built and released. The PathCase-SB database provides data and API for multiple user interfaces and software tools. The current PathCase-SB system provides a database-enabled framework and web-based computational tools towards facilitating the development of kinetic models for biological systems. PathCase-SB aims to integrate data of selected biological data sources on the web (currently, BioModels database and KEGG), and to provide more powerful and/or new capabilities via the new web-based integrative framework. This paper describes architecture and database design issues encountered in PathCase-SB's design and implementation, and presents the current design of PathCase-SB's architecture and database. Conclusions PathCase-SB architecture and database provide a highly extensible and scalable environment with easy and fast (real-time) access to the data in the database. PathCase-SB itself is already being used by researchers across the world. PMID:22070889
Automatic control of a negative ion source
NASA Astrophysics Data System (ADS)
Saadatmand, K.; Sredniawski, J.; Solensten, L.
1989-04-01
A CAMAC based control architecture is devised for a Berkeley-type H - volume ion source [1]. The architecture employs three 80386 TM PCs. One PC is dedicated to control and monitoring of source operation. The other PC functions with digitizers to provide data acquisition of waveforms. The third PC is used for off-line analysis. Initially, operation of the source was put under remote computer control (supervisory). This was followed by development of an automated startup procedure. Finally, a study of the physics of operation is now underway to establish a data base from which automatic beam optimization can be derived.
MonALISA, an agent-based monitoring and control system for the LHC experiments
NASA Astrophysics Data System (ADS)
Balcas, J.; Kcira, D.; Mughal, A.; Newman, H.; Spiropulu, M.; Vlimant, J. R.
2017-10-01
MonALISA, which stands for Monitoring Agents using a Large Integrated Services Architecture, has been developed over the last fifteen years by California Insitute of Technology (Caltech) and its partners with the support of the software and computing program of the CMS and ALICE experiments at the Large Hadron Collider (LHC). The framework is based on Dynamic Distributed Service Architecture and is able to provide complete system monitoring, performance metrics of applications, Jobs or services, system control and global optimization services for complex systems. A short overview and status of MonALISA is given in this paper.
NASA Astrophysics Data System (ADS)
Maiti, Anup Kumar; Nath Roy, Jitendra; Mukhopadhyay, Sourangshu
2007-08-01
In the field of optical computing and parallel information processing, several number systems have been used for different arithmetic and algebraic operations. Therefore an efficient conversion scheme from one number system to another is very important. Modified trinary number (MTN) has already taken a significant role towards carry and borrow free arithmetic operations. In this communication, we propose a tree-net architecture based all optical conversion scheme from binary number to its MTN form. Optical switch using nonlinear material (NLM) plays an important role.
New Trends in Robotics for Agriculture: Integration and Assessment of a Real Fleet of Robots
Gonzalez-de-Soto, Mariano; Pajares, Gonzalo
2014-01-01
Computer-based sensors and actuators such as global positioning systems, machine vision, and laser-based sensors have progressively been incorporated into mobile robots with the aim of configuring autonomous systems capable of shifting operator activities in agricultural tasks. However, the incorporation of many electronic systems into a robot impairs its reliability and increases its cost. Hardware minimization, as well as software minimization and ease of integration, is essential to obtain feasible robotic systems. A step forward in the application of automatic equipment in agriculture is the use of fleets of robots, in which a number of specialized robots collaborate to accomplish one or several agricultural tasks. This paper strives to develop a system architecture for both individual robots and robots working in fleets to improve reliability, decrease complexity and costs, and permit the integration of software from different developers. Several solutions are studied, from a fully distributed to a whole integrated architecture in which a central computer runs all processes. This work also studies diverse topologies for controlling fleets of robots and advances other prospective topologies. The architecture presented in this paper is being successfully applied in the RHEA fleet, which comprises three ground mobile units based on a commercial tractor chassis. PMID:25143976
New trends in robotics for agriculture: integration and assessment of a real fleet of robots.
Emmi, Luis; Gonzalez-de-Soto, Mariano; Pajares, Gonzalo; Gonzalez-de-Santos, Pablo
2014-01-01
Computer-based sensors and actuators such as global positioning systems, machine vision, and laser-based sensors have progressively been incorporated into mobile robots with the aim of configuring autonomous systems capable of shifting operator activities in agricultural tasks. However, the incorporation of many electronic systems into a robot impairs its reliability and increases its cost. Hardware minimization, as well as software minimization and ease of integration, is essential to obtain feasible robotic systems. A step forward in the application of automatic equipment in agriculture is the use of fleets of robots, in which a number of specialized robots collaborate to accomplish one or several agricultural tasks. This paper strives to develop a system architecture for both individual robots and robots working in fleets to improve reliability, decrease complexity and costs, and permit the integration of software from different developers. Several solutions are studied, from a fully distributed to a whole integrated architecture in which a central computer runs all processes. This work also studies diverse topologies for controlling fleets of robots and advances other prospective topologies. The architecture presented in this paper is being successfully applied in the RHEA fleet, which comprises three ground mobile units based on a commercial tractor chassis.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Luo, Y.; Cameron, K.W.
1998-11-24
Workload characterization has been proven an essential tool to architecture design and performance evaluation in both scientific and commercial computing areas. Traditional workload characterization techniques include FLOPS rate, cache miss ratios, CPI (cycles per instruction or IPC, instructions per cycle) etc. With the complexity of sophisticated modern superscalar microprocessors, these traditional characterization techniques are not powerful enough to pinpoint the performance bottleneck of an application on a specific microprocessor. They are also incapable of immediately demonstrating the potential performance benefit of any architectural or functional improvement in a new processor design. To solve these problems, many people rely on simulators,more » which have substantial constraints especially on large-scale scientific computing applications. This paper presents a new technique of characterizing applications at the instruction level using hardware performance counters. It has the advantage of collecting instruction-level characteristics in a few runs virtually without overhead or slowdown. A variety of instruction counts can be utilized to calculate some average abstract workload parameters corresponding to microprocessor pipelines or functional units. Based on the microprocessor architectural constraints and these calculated abstract parameters, the architectural performance bottleneck for a specific application can be estimated. In particular, the analysis results can provide some insight to the problem that only a small percentage of processor peak performance can be achieved even for many very cache-friendly codes. Meanwhile, the bottleneck estimation can provide suggestions about viable architectural/functional improvement for certain workloads. Eventually, these abstract parameters can lead to the creation of an analytical microprocessor pipeline model and memory hierarchy model.« less
NASA Technical Reports Server (NTRS)
Denning, Peter J.; Tichy, Walter F.
1990-01-01
Highly parallel computing architectures are the only means to achieve the computation rates demanded by advanced scientific problems. A decade of research has demonstrated the feasibility of such machines and current research focuses on which architectures designated as multiple instruction multiple datastream (MIMD) and single instruction multiple datastream (SIMD) have produced the best results to date; neither shows a decisive advantage for most near-homogeneous scientific problems. For scientific problems with many dissimilar parts, more speculative architectures such as neural networks or data flow may be needed.
Switching from computer to microcomputer architecture education
NASA Astrophysics Data System (ADS)
Bolanakis, Dimosthenis E.; Kotsis, Konstantinos T.; Laopoulos, Theodore
2010-03-01
In the last decades, the technological and scientific evolution of the computing discipline has been widely affecting research in software engineering education, which nowadays advocates more enlightened and liberal ideas. This article reviews cross-disciplinary research on a computer architecture class in consideration of its switching to microcomputer architecture. The authors present their strategies towards a successful crossing of boundaries between engineering disciplines. This communication aims at providing a different aspect on professional courses that are, nowadays, addressed at the expense of traditional courses.
A heterogeneous system based on GPU and multi-core CPU for real-time fluid and rigid body simulation
NASA Astrophysics Data System (ADS)
da Silva Junior, José Ricardo; Gonzalez Clua, Esteban W.; Montenegro, Anselmo; Lage, Marcos; Dreux, Marcelo de Andrade; Joselli, Mark; Pagliosa, Paulo A.; Kuryla, Christine Lucille
2012-03-01
Computational fluid dynamics in simulation has become an important field not only for physics and engineering areas but also for simulation, computer graphics, virtual reality and even video game development. Many efficient models have been developed over the years, but when many contact interactions must be processed, most models present difficulties or cannot achieve real-time results when executed. The advent of parallel computing has enabled the development of many strategies for accelerating the simulations. Our work proposes a new system which uses some successful algorithms already proposed, as well as a data structure organisation based on a heterogeneous architecture using CPUs and GPUs, in order to process the simulation of the interaction of fluids and rigid bodies. This successfully results in a two-way interaction between them and their surrounding objects. As far as we know, this is the first work that presents a computational collaborative environment which makes use of two different paradigms of hardware architecture for this specific kind of problem. Since our method achieves real-time results, it is suitable for virtual reality, simulation and video game fluid simulation problems.
Switching from Computer to Microcomputer Architecture Education
ERIC Educational Resources Information Center
Bolanakis, Dimosthenis E.; Kotsis, Konstantinos T.; Laopoulos, Theodore
2010-01-01
In the last decades, the technological and scientific evolution of the computing discipline has been widely affecting research in software engineering education, which nowadays advocates more enlightened and liberal ideas. This article reviews cross-disciplinary research on a computer architecture class in consideration of its switching to…
An Architecture for Cross-Cloud System Management
NASA Astrophysics Data System (ADS)
Dodda, Ravi Teja; Smith, Chris; van Moorsel, Aad
The emergence of the cloud computing paradigm promises flexibility and adaptability through on-demand provisioning of compute resources. As the utilization of cloud resources extends beyond a single provider, for business as well as technical reasons, the issue of effectively managing such resources comes to the fore. Different providers expose different interfaces to their compute resources utilizing varied architectures and implementation technologies. This heterogeneity poses a significant system management problem, and can limit the extent to which the benefits of cross-cloud resource utilization can be realized. We address this problem through the definition of an architecture to facilitate the management of compute resources from different cloud providers in an homogenous manner. This preserves the flexibility and adaptability promised by the cloud computing paradigm, whilst enabling the benefits of cross-cloud resource utilization to be realized. The practical efficacy of the architecture is demonstrated through an implementation utilizing compute resources managed through different interfaces on the Amazon Elastic Compute Cloud (EC2) service. Additionally, we provide empirical results highlighting the performance differential of these different interfaces, and discuss the impact of this performance differential on efficiency and profitability.
Lu, Xiaofeng; Song, Li; Shen, Sumin; He, Kang; Yu, Songyu; Ling, Nam
2013-01-01
Hough Transform has been widely used for straight line detection in low-definition and still images, but it suffers from execution time and resource requirements. Field Programmable Gate Arrays (FPGA) provide a competitive alternative for hardware acceleration to reap tremendous computing performance. In this paper, we propose a novel parallel Hough Transform (PHT) and FPGA architecture-associated framework for real-time straight line detection in high-definition videos. A resource-optimized Canny edge detection method with enhanced non-maximum suppression conditions is presented to suppress most possible false edges and obtain more accurate candidate edge pixels for subsequent accelerated computation. Then, a novel PHT algorithm exploiting spatial angle-level parallelism is proposed to upgrade computational accuracy by improving the minimum computational step. Moreover, the FPGA based multi-level pipelined PHT architecture optimized by spatial parallelism ensures real-time computation for 1,024 × 768 resolution videos without any off-chip memory consumption. This framework is evaluated on ALTERA DE2-115 FPGA evaluation platform at a maximum frequency of 200 MHz, and it can calculate straight line parameters in 15.59 ms on the average for one frame. Qualitative and quantitative evaluation results have validated the system performance regarding data throughput, memory bandwidth, resource, speed and robustness. PMID:23867746
Lu, Xiaofeng; Song, Li; Shen, Sumin; He, Kang; Yu, Songyu; Ling, Nam
2013-07-17
Hough Transform has been widely used for straight line detection in low-definition and still images, but it suffers from execution time and resource requirements. Field Programmable Gate Arrays (FPGA) provide a competitive alternative for hardware acceleration to reap tremendous computing performance. In this paper, we propose a novel parallel Hough Transform (PHT) and FPGA architecture-associated framework for real-time straight line detection in high-definition videos. A resource-optimized Canny edge detection method with enhanced non-maximum suppression conditions is presented to suppress most possible false edges and obtain more accurate candidate edge pixels for subsequent accelerated computation. Then, a novel PHT algorithm exploiting spatial angle-level parallelism is proposed to upgrade computational accuracy by improving the minimum computational step. Moreover, the FPGA based multi-level pipelined PHT architecture optimized by spatial parallelism ensures real-time computation for 1,024 × 768 resolution videos without any off-chip memory consumption. This framework is evaluated on ALTERA DE2-115 FPGA evaluation platform at a maximum frequency of 200 MHz, and it can calculate straight line parameters in 15.59 ms on the average for one frame. Qualitative and quantitative evaluation results have validated the system performance regarding data throughput, memory bandwidth, resource, speed and robustness.
Porting marine ecosystem model spin-up using transport matrices to GPUs
NASA Astrophysics Data System (ADS)
Siewertsen, E.; Piwonski, J.; Slawig, T.
2013-01-01
We have ported an implementation of the spin-up for marine ecosystem models based on transport matrices to graphics processing units (GPUs). The original implementation was designed for distributed-memory architectures and uses the Portable, Extensible Toolkit for Scientific Computation (PETSc) library that is based on the Message Passing Interface (MPI) standard. The spin-up computes a steady seasonal cycle of ecosystem tracers with climatological ocean circulation data as forcing. Since the transport is linear with respect to the tracers, the resulting operator is represented by matrices. Each iteration of the spin-up involves two matrix-vector multiplications and the evaluation of the used biogeochemical model. The original code was written in C and Fortran. On the GPU, we use the Compute Unified Device Architecture (CUDA) standard, a customized version of PETSc and a commercial CUDA Fortran compiler. We describe the extensions to PETSc and the modifications of the original C and Fortran codes that had to be done. Here we make use of freely available libraries for the GPU. We analyze the computational effort of the main parts of the spin-up for two exemplar ecosystem models and compare the overall computational time to those necessary on different CPUs. The results show that a consumer GPU can compete with a significant number of cluster CPUs without further code optimization.
Advanced Computing Architectures for Cognitive Processing
2009-07-01
Evolution ................................................................................. 20 Figure 9: Logic diagram smart block-based neuron...48 Figure 21: Naive Grid Potential Kernel...processing would be helpful for Air Force systems acquisition. Specific cognitive processing approaches addressed herein include global information grid
NASA Technical Reports Server (NTRS)
Erickson, W. K.; Hofman, L. B.; Donovan, W. E.
1984-01-01
Difficulties regarding the digital image analysis of remotely sensed imagery can arise in connection with the extensive calculations required. In the past, an expensive large to medium mainframe computer system was needed for performing these calculations. For image-processing applications smaller minicomputer-based systems are now used by many organizations. The costs for such systems are still in the range from $100K to $300K. Recently, as a result of new developments, the use of low-cost microcomputers for image processing and display systems appeared to have become feasible. These developments are related to the advent of the 16-bit microprocessor and the concept of the microcomputer workstation. Earlier 8-bit microcomputer-based image processing systems are briefly examined, and a computer workstation architecture is discussed. Attention is given to a microcomputer workstation developed by Stanford University, and the design and implementation of a workstation network.
Evaluation of Service Level Agreement Approaches for Portfolio Management in the Financial Industry
NASA Astrophysics Data System (ADS)
Pontz, Tobias; Grauer, Manfred; Kuebert, Roland; Tenschert, Axel; Koller, Bastian
The idea of service-oriented Grid computing seems to have the potential for fundamental paradigm change and a new architectural alignment concerning the design of IT infrastructures. There is a wide range of technical approaches from scientific communities which describe basic infrastructures and middlewares for integrating Grid resources in order that by now Grid applications are technically realizable. Hence, Grid computing needs viable business models and enhanced infrastructures to move from academic application right up to commercial application. For a commercial usage of these evolutions service level agreements are needed. The developed approaches are primary of academic interest and mostly have not been put into practice. Based on a business use case of the financial industry, five service level agreement approaches have been evaluated in this paper. Based on the evaluation, a management architecture has been designed and implemented as a prototype.
NASA Astrophysics Data System (ADS)
Li, Ming; Yin, Hongxi; Xing, Fangyuan; Wang, Jingchao; Wang, Honghuan
2016-02-01
With the features of network virtualization and resource programming, Software Defined Optical Network (SDON) is considered as the future development trend of optical network, provisioning a more flexible, efficient and open network function, supporting intraconnection and interconnection of data centers. Meanwhile cloud platform can provide powerful computing, storage and management capabilities. In this paper, with the coordination of SDON and cloud platform, a multi-domain SDON architecture based on cloud control plane has been proposed, which is composed of data centers with database (DB), path computation element (PCE), SDON controller and orchestrator. In addition, the structure of the multidomain SDON orchestrator and OpenFlow-enabled optical node are proposed to realize the combination of centralized and distributed effective management and control platform. Finally, the functional verification and demonstration are performed through our optical experiment network.
NASA Astrophysics Data System (ADS)
Zhang, Yunju; Chen, Zhongyi; Guo, Ming; Lin, Shunsheng; Yan, Yinyang
2018-01-01
With the large capacity of the power system, the development trend of the large unit and the high voltage, the scheduling operation is becoming more frequent and complicated, and the probability of operation error increases. This paper aims at the problem of the lack of anti-error function, single scheduling function and low working efficiency for technical support system in regional regulation and integration, the integrated construction of the error prevention of the integrated architecture of the system of dispatching anti - error of dispatching anti - error of power network based on cloud computing has been proposed. Integrated system of error prevention of Energy Management System, EMS, and Operation Management System, OMS have been constructed either. The system architecture has good scalability and adaptability, which can improve the computational efficiency, reduce the cost of system operation and maintenance, enhance the ability of regional regulation and anti-error checking with broad development prospects.
Distributive, Non-destructive Real-time System and Method for Snowpack Monitoring
NASA Technical Reports Server (NTRS)
Frolik, Jeff (Inventor); Skalka, Christian (Inventor)
2013-01-01
A ground-based system that provides quasi real-time measurement and collection of snow-water equivalent (SWE) data in remote settings is provided. The disclosed invention is significantly less expensive and easier to deploy than current methods and less susceptible to terrain and snow bridging effects. Embodiments of the invention include remote data recovery solutions. Compared to current infrastructure using existing SWE technology, the disclosed invention allows more SWE sites to be installed for similar cost and effort, in a greater variety of terrain; thus, enabling data collection at improved spatial resolutions. The invention integrates a novel computational architecture with new sensor technologies. The invention's computational architecture is based on wireless sensor networks, comprised of programmable, low-cost, low-powered nodes capable of sophisticated sensor control and remote data communication. The invention also includes measuring attenuation of electromagnetic radiation, an approach that is immune to snow bridging and significantly reduces sensor footprints.
Toe, Kyaw Kyar; Huang, Weimin; Yang, Tao; Duan, Yuping; Zhou, Jiayin; Su, Yi; Teo, Soo-Kng; Kumar, Selvaraj Senthil; Lim, Calvin Chi-Wan; Chui, Chee Kong; Chang, Stephen
2015-08-01
This work presents a surgical training system that incorporates cutting operation of soft tissue simulated based on a modified pre-computed linear elastic model in the Simulation Open Framework Architecture (SOFA) environment. A precomputed linear elastic model used for the simulation of soft tissue deformation involves computing the compliance matrix a priori based on the topological information of the mesh. While this process may require a few minutes to several hours, based on the number of vertices in the mesh, it needs only to be computed once and allows real-time computation of the subsequent soft tissue deformation. However, as the compliance matrix is based on the initial topology of the mesh, it does not allow any topological changes during simulation, such as cutting or tearing of the mesh. This work proposes a way to modify the pre-computed data by correcting the topological connectivity in the compliance matrix, without re-computing the compliance matrix which is computationally expensive.
Martinez-Espronceda, Miguel; Martinez, Ignacio; Serrano, Luis; Led, Santiago; Trigo, Jesús Daniel; Marzo, Asier; Escayola, Javier; Garcia, José
2011-05-01
Traditionally, e-Health solutions were located at the point of care (PoC), while the new ubiquitous user-centered paradigm draws on standard-based personal health devices (PHDs). Such devices place strict constraints on computation and battery efficiency that encouraged the International Organization for Standardization/IEEE11073 (X73) standard for medical devices to evolve from X73PoC to X73PHD. In this context, low-voltage low-power (LV-LP) technologies meet the restrictions of X73PHD-compliant devices. Since X73PHD does not approach the software architecture, the accomplishment of an efficient design falls directly on the software developer. Therefore, computational and battery performance of such LV-LP-constrained devices can even be outperformed through an efficient X73PHD implementation design. In this context, this paper proposes a new methodology to implement X73PHD into microcontroller-based platforms with LV-LP constraints. Such implementation methodology has been developed through a patterns-based approach and applied to a number of X73PHD-compliant agents (including weighing scale, blood pressure monitor, and thermometer specializations) and microprocessor architectures (8, 16, and 32 bits) as a proof of concept. As a reference, the results obtained in the weighing scale guarantee all features of X73PHD running over a microcontroller architecture based on ARM7TDMI requiring only 168 B of RAM and 2546 B of flash memory.
An event-based architecture for solving constraint satisfaction problems
Mostafa, Hesham; Müller, Lorenz K.; Indiveri, Giacomo
2015-01-01
Constraint satisfaction problems are ubiquitous in many domains. They are typically solved using conventional digital computing architectures that do not reflect the distributed nature of many of these problems, and are thus ill-suited for solving them. Here we present a parallel analogue/digital hardware architecture specifically designed to solve such problems. We cast constraint satisfaction problems as networks of stereotyped nodes that communicate using digital pulses, or events. Each node contains an oscillator implemented using analogue circuits. The non-repeating phase relations among the oscillators drive the exploration of the solution space. We show that this hardware architecture can yield state-of-the-art performance on random SAT problems under reasonable assumptions on the implementation. We present measurements from a prototype electronic chip to demonstrate that a physical implementation of the proposed architecture is robust to practical non-idealities and to validate the theory proposed. PMID:26642827
Convolutional networks for fast, energy-efficient neuromorphic computing
Esser, Steven K.; Merolla, Paul A.; Arthur, John V.; Cassidy, Andrew S.; Appuswamy, Rathinakumar; Andreopoulos, Alexander; Berg, David J.; McKinstry, Jeffrey L.; Melano, Timothy; Barch, Davis R.; di Nolfo, Carmelo; Datta, Pallab; Amir, Arnon; Taba, Brian; Flickner, Myron D.; Modha, Dharmendra S.
2016-01-01
Deep networks are now able to achieve human-level performance on a broad spectrum of recognition tasks. Independently, neuromorphic computing has now demonstrated unprecedented energy-efficiency through a new chip architecture based on spiking neurons, low precision synapses, and a scalable communication network. Here, we demonstrate that neuromorphic computing, despite its novel architectural primitives, can implement deep convolution networks that (i) approach state-of-the-art classification accuracy across eight standard datasets encompassing vision and speech, (ii) perform inference while preserving the hardware’s underlying energy-efficiency and high throughput, running on the aforementioned datasets at between 1,200 and 2,600 frames/s and using between 25 and 275 mW (effectively >6,000 frames/s per Watt), and (iii) can be specified and trained using backpropagation with the same ease-of-use as contemporary deep learning. This approach allows the algorithmic power of deep learning to be merged with the efficiency of neuromorphic processors, bringing the promise of embedded, intelligent, brain-inspired computing one step closer. PMID:27651489
Convolutional networks for fast, energy-efficient neuromorphic computing.
Esser, Steven K; Merolla, Paul A; Arthur, John V; Cassidy, Andrew S; Appuswamy, Rathinakumar; Andreopoulos, Alexander; Berg, David J; McKinstry, Jeffrey L; Melano, Timothy; Barch, Davis R; di Nolfo, Carmelo; Datta, Pallab; Amir, Arnon; Taba, Brian; Flickner, Myron D; Modha, Dharmendra S
2016-10-11
Deep networks are now able to achieve human-level performance on a broad spectrum of recognition tasks. Independently, neuromorphic computing has now demonstrated unprecedented energy-efficiency through a new chip architecture based on spiking neurons, low precision synapses, and a scalable communication network. Here, we demonstrate that neuromorphic computing, despite its novel architectural primitives, can implement deep convolution networks that (i) approach state-of-the-art classification accuracy across eight standard datasets encompassing vision and speech, (ii) perform inference while preserving the hardware's underlying energy-efficiency and high throughput, running on the aforementioned datasets at between 1,200 and 2,600 frames/s and using between 25 and 275 mW (effectively >6,000 frames/s per Watt), and (iii) can be specified and trained using backpropagation with the same ease-of-use as contemporary deep learning. This approach allows the algorithmic power of deep learning to be merged with the efficiency of neuromorphic processors, bringing the promise of embedded, intelligent, brain-inspired computing one step closer.
Exploring Asynchronous Many-Task Runtime Systems toward Extreme Scales
DOE Office of Scientific and Technical Information (OSTI.GOV)
Knight, Samuel; Baker, Gavin Matthew; Gamell, Marc
2015-10-01
Major exascale computing reports indicate a number of software challenges to meet the dramatic change of system architectures in near future. While several-orders-of-magnitude increase in parallelism is the most commonly cited of those, hurdles also include performance heterogeneity of compute nodes across the system, increased imbalance between computational capacity and I/O capabilities, frequent system interrupts, and complex hardware architectures. Asynchronous task-parallel programming models show a great promise in addressing these issues, but are not yet fully understood nor developed su ciently for computational science and engineering application codes. We address these knowledge gaps through quantitative and qualitative exploration of leadingmore » candidate solutions in the context of engineering applications at Sandia. In this poster, we evaluate MiniAero code ported to three leading candidate programming models (Charm++, Legion and UINTAH) to examine the feasibility of these models that permits insertion of new programming model elements into an existing code base.« less
Approximation algorithms for planning and control
NASA Technical Reports Server (NTRS)
Boddy, Mark; Dean, Thomas
1989-01-01
A control system operating in a complex environment will encounter a variety of different situations, with varying amounts of time available to respond to critical events. Ideally, such a control system will do the best possible with the time available. In other words, its responses should approximate those that would result from having unlimited time for computation, where the degree of the approximation depends on the amount of time it actually has. There exist approximation algorithms for a wide variety of problems. Unfortunately, the solution to any reasonably complex control problem will require solving several computationally intensive problems. Algorithms for successive approximation are a subclass of the class of anytime algorithms, algorithms that return answers for any amount of computation time, where the answers improve as more time is allotted. An architecture is described for allocating computation time to a set of anytime algorithms, based on expectations regarding the value of the answers they return. The architecture described is quite general, producing optimal schedules for a set of algorithms under widely varying conditions.
Automated Generation of Message-Passing Programs: An Evaluation Using CAPTools
NASA Technical Reports Server (NTRS)
Hribar, Michelle R.; Jin, Haoqiang; Yan, Jerry C.; Saini, Subhash (Technical Monitor)
1998-01-01
Scientists at NASA Ames Research Center have been developing computational aeroscience applications on highly parallel architectures over the past ten years. During that same time period, a steady transition of hardware and system software also occurred, forcing us to expend great efforts into migrating and re-coding our applications. As applications and machine architectures become increasingly complex, the cost and time required for this process will become prohibitive. In this paper, we present the first set of results in our evaluation of interactive parallelization tools. In particular, we evaluate CAPTool's ability to parallelize computational aeroscience applications. CAPTools was tested on serial versions of the NAS Parallel Benchmarks and ARC3D, a computational fluid dynamics application, on two platforms: the SGI Origin 2000 and the Cray T3E. This evaluation includes performance, amount of user interaction required, limitations and portability. Based on these results, a discussion on the feasibility of computer aided parallelization of aerospace applications is presented along with suggestions for future work.
THE COMPUTER AND THE ARCHITECTURAL PROFESSION.
ERIC Educational Resources Information Center
HAVILAND, DAVID S.
THE ROLE OF ADVANCING TECHNOLOGY IN THE FIELD OF ARCHITECTURE IS DISCUSSED IN THIS REPORT. PROBLEMS IN COMMUNICATION AND THE DESIGN PROCESS ARE IDENTIFIED. ADVANTAGES AND DISADVANTAGES OF COMPUTERS ARE MENTIONED IN RELATION TO MAN AND MACHINE INTERACTION. PRESENT AND FUTURE IMPLICATIONS OF COMPUTER USAGE ARE IDENTIFIED AND DISCUSSED WITH RESPECT…
The Contribution of Visualization to Learning Computer Architecture
ERIC Educational Resources Information Center
Yehezkel, Cecile; Ben-Ari, Mordechai; Dreyfus, Tommy
2007-01-01
This paper describes a visualization environment and associated learning activities designed to improve learning of computer architecture. The environment, EasyCPU, displays a model of the components of a computer and the dynamic processes involved in program execution. We present the results of a research program that analysed the contribution of…
Technology advances and market forces: Their impact on high performance architectures
NASA Technical Reports Server (NTRS)
Best, D. R.
1978-01-01
Reasonable projections into future supercomputer architectures and technology require an analysis of the computer industry market environment, the current capabilities and trends within the component industry, and the research activities on computer architecture in the industrial and academic communities. Management, programmer, architect, and user must cooperate to increase the efficiency of supercomputer development efforts. Care must be taken to match the funding, compiler, architecture and application with greater attention to testability, maintainability, reliability, and usability than supercomputer development programs of the past.
Behavioral Reference Model for Pervasive Healthcare Systems.
Tahmasbi, Arezoo; Adabi, Sahar; Rezaee, Ali
2016-12-01
The emergence of mobile healthcare systems is an important outcome of application of pervasive computing concepts for medical care purposes. These systems provide the facilities and infrastructure required for automatic and ubiquitous sharing of medical information. Healthcare systems have a dynamic structure and configuration, therefore having an architecture is essential for future development of these systems. The need for increased response rate, problem limited storage, accelerated processing and etc. the tendency toward creating a new generation of healthcare system architecture highlight the need for further focus on cloud-based solutions for transfer data and data processing challenges. Integrity and reliability of healthcare systems are of critical importance, as even the slightest error may put the patients' lives in danger; therefore acquiring a behavioral model for these systems and developing the tools required to model their behaviors are of significant importance. The high-level designs may contain some flaws, therefor the system must be fully examined for different scenarios and conditions. This paper presents a software architecture for development of healthcare systems based on pervasive computing concepts, and then models the behavior of described system. A set of solutions are then proposed to improve the design's qualitative characteristics including, availability, interoperability and performance.
Analyzing Resiliency of the Smart Grid Communication Architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Anas AlMajali, Anas; Viswanathan, Arun; Neuman, Clifford
Smart grids are susceptible to cyber-attack as a result of new communication, control and computation techniques employed in the grid. In this paper, we characterize and analyze the resiliency of smart grid communication architecture, specifically an RF mesh based architecture, under cyber attacks. We analyze the resiliency of the communication architecture by studying the performance of high-level smart grid functions such as metering, and demand response which depend on communication. Disrupting the operation of these functions impacts the operational resiliency of the smart grid. Our analysis shows that it takes an attacker only a small fraction of meters to compromisemore » the communication resiliency of the smart grid. We discuss the implications of our result to critical smart grid functions and to the overall security of the smart grid.« less
Neural architecture design based on extreme learning machine.
Bueno-Crespo, Andrés; García-Laencina, Pedro J; Sancho-Gómez, José-Luis
2013-12-01
Selection of the optimal neural architecture to solve a pattern classification problem entails to choose the relevant input units, the number of hidden neurons and its corresponding interconnection weights. This problem has been widely studied in many research works but their solutions usually involve excessive computational cost in most of the problems and they do not provide a unique solution. This paper proposes a new technique to efficiently design the MultiLayer Perceptron (MLP) architecture for classification using the Extreme Learning Machine (ELM) algorithm. The proposed method provides a high generalization capability and a unique solution for the architecture design. Moreover, the selected final network only retains those input connections that are relevant for the classification task. Experimental results show these advantages. Copyright © 2013 Elsevier Ltd. All rights reserved.
Provable Transient Recovery for Frame-Based, Fault-Tolerant Computing Systems
NASA Technical Reports Server (NTRS)
DiVito, Ben L.; Butler, Ricky W.
1992-01-01
We present a formal verification of the transient fault recovery aspects of the Reliable Computing Platform (RCP), a fault-tolerant computing system architecture for digital flight control applications. The RCP uses NMR-style redundancy to mask faults and internal majority voting to purge the effects of transient faults. The system design has been formally specified and verified using the EHDM verification system. Our formalization accommodates a wide variety of voting schemes for purging the effects of transients.
A Case Study on Neural Inspired Dynamic Memory Management Strategies for High Performance Computing.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vineyard, Craig Michael; Verzi, Stephen Joseph
As high performance computing architectures pursue more computational power there is a need for increased memory capacity and bandwidth as well. A multi-level memory (MLM) architecture addresses this need by combining multiple memory types with different characteristics as varying levels of the same architecture. How to efficiently utilize this memory infrastructure is an unknown challenge, and in this research we sought to investigate whether neural inspired approaches can meaningfully help with memory management. In particular we explored neurogenesis inspired re- source allocation, and were able to show a neural inspired mixed controller policy can beneficially impact how MLM architectures utilizemore » memory.« less
Jin, Miaomiao; Cheng, Long; Li, Yi; Hu, Siyu; Lu, Ke; Chen, Jia; Duan, Nian; Wang, Zhuorui; Zhou, Yaxiong; Chang, Ting-Chang; Miao, Xiangshui
2018-06-27
Owing to the capability of integrating the information storage and computing in the same physical location, in-memory computing with memristors has become a research hotspot as a promising route for non von Neumann architecture. However, it is still a challenge to develop high performance devices as well as optimized logic methodologies to realize energy-efficient computing. Herein, filamentary Cu/GeTe/TiN memristor is reported to show satisfactory properties with nanosecond switching speed (< 60 ns), low voltage operation (< 2 V), high endurance (>104 cycles) and good retention (>104 s @85℃). It is revealed that the charge carrier conduction mechanisms in high resistance and low resistance states are Schottky emission and hopping transport between the adjacent Cu clusters, respectively, based on the analysis of current-voltage behaviors and resistance-temperature characteristics. An intuitive picture is given to describe the dynamic processes of resistive switching. Moreover, based on the basic material implication (IMP) logic circuit, we proposed a reconfigurable logic method and experimentally implemented IMP, NOT, OR, and COPY logic functions. Design of a one-bit full adder with reduction in computational sequences and its validation in simulation further demonstrate the potential practical application. The results provide important progress towards understanding of resistive switching mechanism and realization of energy-efficient in-memory computing architecture. © 2018 IOP Publishing Ltd.
Computing architecture for autonomous microgrids
Goldsmith, Steven Y.
2015-09-29
A computing architecture that facilitates autonomously controlling operations of a microgrid is described herein. A microgrid network includes numerous computing devices that execute intelligent agents, each of which is assigned to a particular entity (load, source, storage device, or switch) in the microgrid. The intelligent agents can execute in accordance with predefined protocols to collectively perform computations that facilitate uninterrupted control of the .
NASA Astrophysics Data System (ADS)
Singh, Surya P. N.; Thayer, Scott M.
2002-02-01
This paper presents a novel algorithmic architecture for the coordination and control of large scale distributed robot teams derived from the constructs found within the human immune system. Using this as a guide, the Immunology-derived Distributed Autonomous Robotics Architecture (IDARA) distributes tasks so that broad, all-purpose actions are refined and followed by specific and mediated responses based on each unit's utility and capability to timely address the system's perceived need(s). This method improves on initial developments in this area by including often overlooked interactions of the innate immune system resulting in a stronger first-order, general response mechanism. This allows for rapid reactions in dynamic environments, especially those lacking significant a priori information. As characterized via computer simulation of a of a self-healing mobile minefield having up to 7,500 mines and 2,750 robots, IDARA provides an efficient, communications light, and scalable architecture that yields significant operation and performance improvements for large-scale multi-robot coordination and control.
Cavity-based architecture to preserve quantum coherence and entanglement
NASA Astrophysics Data System (ADS)
Man, Zhong-Xiao; Xia, Yun-Jie; Lo Franco, Rosario
2015-09-01
Quantum technology relies on the utilization of resources, like quantum coherence and entanglement, which allow quantum information and computation processing. This achievement is however jeopardized by the detrimental effects of the environment surrounding any quantum system, so that finding strategies to protect quantum resources is essential. Non-Markovian and structured environments are useful tools to this aim. Here we show how a simple environmental architecture made of two coupled lossy cavities enables a switch between Markovian and non-Markovian regimes for the dynamics of a qubit embedded in one of the cavity. Furthermore, qubit coherence can be indefinitely preserved if the cavity without qubit is perfect. We then focus on entanglement control of two independent qubits locally subject to such an engineered environment and discuss its feasibility in the framework of circuit quantum electrodynamics. With up-to-date experimental parameters, we show that our architecture allows entanglement lifetimes orders of magnitude longer than the spontaneous lifetime without local cavity couplings. This cavity-based architecture is straightforwardly extendable to many qubits for scalability.
Cavity-based architecture to preserve quantum coherence and entanglement.
Man, Zhong-Xiao; Xia, Yun-Jie; Lo Franco, Rosario
2015-09-09
Quantum technology relies on the utilization of resources, like quantum coherence and entanglement, which allow quantum information and computation processing. This achievement is however jeopardized by the detrimental effects of the environment surrounding any quantum system, so that finding strategies to protect quantum resources is essential. Non-Markovian and structured environments are useful tools to this aim. Here we show how a simple environmental architecture made of two coupled lossy cavities enables a switch between Markovian and non-Markovian regimes for the dynamics of a qubit embedded in one of the cavity. Furthermore, qubit coherence can be indefinitely preserved if the cavity without qubit is perfect. We then focus on entanglement control of two independent qubits locally subject to such an engineered environment and discuss its feasibility in the framework of circuit quantum electrodynamics. With up-to-date experimental parameters, we show that our architecture allows entanglement lifetimes orders of magnitude longer than the spontaneous lifetime without local cavity couplings. This cavity-based architecture is straightforwardly extendable to many qubits for scalability.
Cavity-based architecture to preserve quantum coherence and entanglement
Man, Zhong-Xiao; Xia, Yun-Jie; Lo Franco, Rosario
2015-01-01
Quantum technology relies on the utilization of resources, like quantum coherence and entanglement, which allow quantum information and computation processing. This achievement is however jeopardized by the detrimental effects of the environment surrounding any quantum system, so that finding strategies to protect quantum resources is essential. Non-Markovian and structured environments are useful tools to this aim. Here we show how a simple environmental architecture made of two coupled lossy cavities enables a switch between Markovian and non-Markovian regimes for the dynamics of a qubit embedded in one of the cavity. Furthermore, qubit coherence can be indefinitely preserved if the cavity without qubit is perfect. We then focus on entanglement control of two independent qubits locally subject to such an engineered environment and discuss its feasibility in the framework of circuit quantum electrodynamics. With up-to-date experimental parameters, we show that our architecture allows entanglement lifetimes orders of magnitude longer than the spontaneous lifetime without local cavity couplings. This cavity-based architecture is straightforwardly extendable to many qubits for scalability. PMID:26351004
An Object-Oriented Architecture for a Web-Based CAI System.
ERIC Educational Resources Information Center
Nakabayashi, Kiyoshi; Hoshide, Takahide; Seshimo, Hitoshi; Fukuhara, Yoshimi
This paper describes the design and implementation of an object-oriented World Wide Web-based CAI (Computer-Assisted Instruction) system. The goal of the design is to provide a flexible CAI/ITS (Intelligent Tutoring System) framework with full extendibility and reusability, as well as to exploit Web-based software technologies such as JAVA, ASP (a…
Raingauge-Based Rainfall Nowcasting with Artificial Neural Network
NASA Astrophysics Data System (ADS)
Liong, Shie-Yui; He, Shan
2010-05-01
Rainfall forecasting and nowcasting are of great importance, for instance, in real-time flood early warning systems. Long term rainfall forecasting demands global climate, land, and sea data, thus, large computing power and storage capacity are required. Rainfall nowcasting's computing requirement, on the other hand, is much less. Rainfall nowcasting may use data captured by radar and/or weather stations. This paper presents the application of Artificial Neural Network (ANN) on rainfall nowcasting using data observed at weather and/or rainfall stations. The study focuses on the North-East monsoon period (December, January and February) in Singapore. Rainfall and weather data from ten stations, between 2000 and 2006, were selected and divided into three groups for training, over-fitting test and validation of the ANN. Several neural network architectures were tried in the study. Two architectures, Backpropagation ANN and Group Method of Data Handling ANN, yielded better rainfall nowcasting, up to two hours, than the other architectures. The obtained rainfall nowcasts were then used by a catchment model to forecast catchment runoff. The results of runoff forecast are encouraging and promising.With ANN's high computational speed, the proposed approach may be deliverable for creating the real-time flood early warning system.
Majorana-Based Fermionic Quantum Computation.
O'Brien, T E; Rożek, P; Akhmerov, A R
2018-06-01
Because Majorana zero modes store quantum information nonlocally, they are protected from noise, and have been proposed as a building block for a quantum computer. We show how to use the same protection from noise to implement universal fermionic quantum computation. Our architecture requires only two Majorana modes to encode a fermionic quantum degree of freedom, compared to alternative implementations which require a minimum of four Majorana modes for a spin quantum degree of freedom. The fermionic degrees of freedom support both unitary coupled cluster variational quantum eigensolver and quantum phase estimation algorithms, proposed for quantum chemistry simulations. Because we avoid the Jordan-Wigner transformation, our scheme has a lower overhead for implementing both of these algorithms, allowing for simulation of the Trotterized Hubbard Hamiltonian in O(1) time per unitary step. We finally demonstrate magic state distillation in our fermionic architecture, giving a universal set of topologically protected fermionic quantum gates.
Majorana-Based Fermionic Quantum Computation
NASA Astrophysics Data System (ADS)
O'Brien, T. E.; RoŻek, P.; Akhmerov, A. R.
2018-06-01
Because Majorana zero modes store quantum information nonlocally, they are protected from noise, and have been proposed as a building block for a quantum computer. We show how to use the same protection from noise to implement universal fermionic quantum computation. Our architecture requires only two Majorana modes to encode a fermionic quantum degree of freedom, compared to alternative implementations which require a minimum of four Majorana modes for a spin quantum degree of freedom. The fermionic degrees of freedom support both unitary coupled cluster variational quantum eigensolver and quantum phase estimation algorithms, proposed for quantum chemistry simulations. Because we avoid the Jordan-Wigner transformation, our scheme has a lower overhead for implementing both of these algorithms, allowing for simulation of the Trotterized Hubbard Hamiltonian in O (1 ) time per unitary step. We finally demonstrate magic state distillation in our fermionic architecture, giving a universal set of topologically protected fermionic quantum gates.
SU (2) lattice gauge theory simulations on Fermi GPUs
NASA Astrophysics Data System (ADS)
Cardoso, Nuno; Bicudo, Pedro
2011-05-01
In this work we explore the performance of CUDA in quenched lattice SU (2) simulations. CUDA, NVIDIA Compute Unified Device Architecture, is a hardware and software architecture developed by NVIDIA for computing on the GPU. We present an analysis and performance comparison between the GPU and CPU in single and double precision. Analyses with multiple GPUs and two different architectures (G200 and Fermi architectures) are also presented. In order to obtain a high performance, the code must be optimized for the GPU architecture, i.e., an implementation that exploits the memory hierarchy of the CUDA programming model. We produce codes for the Monte Carlo generation of SU (2) lattice gauge configurations, for the mean plaquette, for the Polyakov Loop at finite T and for the Wilson loop. We also present results for the potential using many configurations (50,000) without smearing and almost 2000 configurations with APE smearing. With two Fermi GPUs we have achieved an excellent performance of 200× the speed over one CPU, in single precision, around 110 Gflops/s. We also find that, using the Fermi architecture, double precision computations for the static quark-antiquark potential are not much slower (less than 2× slower) than single precision computations.
Li, Zhenlong; Yang, Chaowei; Jin, Baoxuan; Yu, Manzhu; Liu, Kai; Sun, Min; Zhan, Matthew
2015-01-01
Geoscience observations and model simulations are generating vast amounts of multi-dimensional data. Effectively analyzing these data are essential for geoscience studies. However, the tasks are challenging for geoscientists because processing the massive amount of data is both computing and data intensive in that data analytics requires complex procedures and multiple tools. To tackle these challenges, a scientific workflow framework is proposed for big geoscience data analytics. In this framework techniques are proposed by leveraging cloud computing, MapReduce, and Service Oriented Architecture (SOA). Specifically, HBase is adopted for storing and managing big geoscience data across distributed computers. MapReduce-based algorithm framework is developed to support parallel processing of geoscience data. And service-oriented workflow architecture is built for supporting on-demand complex data analytics in the cloud environment. A proof-of-concept prototype tests the performance of the framework. Results show that this innovative framework significantly improves the efficiency of big geoscience data analytics by reducing the data processing time as well as simplifying data analytical procedures for geoscientists. PMID:25742012
NASA Astrophysics Data System (ADS)
Cao, Jian; Li, Qi; Cheng, Jicheng
2005-10-01
This paper discusses the concept, key technologies and main application of Spatial Services Grid. The technologies of Grid computing and Webservice is playing a revolutionary role in studying the spatial information services. The concept of the SSG (Spatial Services Grid) is put forward based on the SIG (Spatial Information Grid) and OGSA (open grid service architecture). Firstly, the grid computing is reviewed and the key technologies of SIG and their main applications are reviewed. Secondly, the grid computing and three kinds of SIG (in broad sense)--SDG (spatial data grid), SIG (spatial information grid) and SSG (spatial services grid) and their relationships are proposed. Thirdly, the key technologies of the SSG (spatial services grid) is put forward. Finally, three representative applications of SSG (spatial services grid) are discussed. The first application is urban location based services gird, which is a typical spatial services grid and can be constructed on OGSA (Open Grid Services Architecture) and digital city platform. The second application is region sustainable development grid which is the key to the urban development. The third application is Region disaster and emergency management services grid.
Li, Zhenlong; Yang, Chaowei; Jin, Baoxuan; Yu, Manzhu; Liu, Kai; Sun, Min; Zhan, Matthew
2015-01-01
Geoscience observations and model simulations are generating vast amounts of multi-dimensional data. Effectively analyzing these data are essential for geoscience studies. However, the tasks are challenging for geoscientists because processing the massive amount of data is both computing and data intensive in that data analytics requires complex procedures and multiple tools. To tackle these challenges, a scientific workflow framework is proposed for big geoscience data analytics. In this framework techniques are proposed by leveraging cloud computing, MapReduce, and Service Oriented Architecture (SOA). Specifically, HBase is adopted for storing and managing big geoscience data across distributed computers. MapReduce-based algorithm framework is developed to support parallel processing of geoscience data. And service-oriented workflow architecture is built for supporting on-demand complex data analytics in the cloud environment. A proof-of-concept prototype tests the performance of the framework. Results show that this innovative framework significantly improves the efficiency of big geoscience data analytics by reducing the data processing time as well as simplifying data analytical procedures for geoscientists.
Rapid Human-Computer Interactive Conceptual Design of Mobile and Manipulative Robot Systems
2015-05-19
algorithm based on Age-Fitness Pareto Optimization (AFPO) ([9]) with an additional user prefer- ence objective and a neural network-based user model, we...greater than 40, which is about 5 times further than any robot traveled in our experiments. 6 3.3 Methods The algorithm uses a client -server computational...architecture. The client here is an interactive pro- gram which takes a pair of controllers as input, simulates4 two copies of the robot with
NASA Astrophysics Data System (ADS)
Prasad, Guru; Jayaram, Sanjay; Ward, Jami; Gupta, Pankaj
2004-09-01
In this paper, Aximetric proposes a decentralized Command and Control (C2) architecture for a distributed control of a cluster of on-board health monitoring and software enabled control systems called
Production Level CFD Code Acceleration for Hybrid Many-Core Architectures
NASA Technical Reports Server (NTRS)
Duffy, Austen C.; Hammond, Dana P.; Nielsen, Eric J.
2012-01-01
In this work, a novel graphics processing unit (GPU) distributed sharing model for hybrid many-core architectures is introduced and employed in the acceleration of a production-level computational fluid dynamics (CFD) code. The latest generation graphics hardware allows multiple processor cores to simultaneously share a single GPU through concurrent kernel execution. This feature has allowed the NASA FUN3D code to be accelerated in parallel with up to four processor cores sharing a single GPU. For codes to scale and fully use resources on these and the next generation machines, codes will need to employ some type of GPU sharing model, as presented in this work. Findings include the effects of GPU sharing on overall performance. A discussion of the inherent challenges that parallel unstructured CFD codes face in accelerator-based computing environments is included, with considerations for future generation architectures. This work was completed by the author in August 2010, and reflects the analysis and results of the time.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Saad, Tony; Sutherland, James C.
To address the coding and software challenges of modern hybrid architectures, we propose an approach to multiphysics code development for high-performance computing. This approach is based on using a Domain Specific Language (DSL) in tandem with a directed acyclic graph (DAG) representation of the problem to be solved that allows runtime algorithm generation. When coupled with a large-scale parallel framework, the result is a portable development framework capable of executing on hybrid platforms and handling the challenges of multiphysics applications. In addition, we share our experience developing a code in such an environment – an effort that spans an interdisciplinarymore » team of engineers and computer scientists.« less
Saad, Tony; Sutherland, James C.
2016-05-04
To address the coding and software challenges of modern hybrid architectures, we propose an approach to multiphysics code development for high-performance computing. This approach is based on using a Domain Specific Language (DSL) in tandem with a directed acyclic graph (DAG) representation of the problem to be solved that allows runtime algorithm generation. When coupled with a large-scale parallel framework, the result is a portable development framework capable of executing on hybrid platforms and handling the challenges of multiphysics applications. In addition, we share our experience developing a code in such an environment – an effort that spans an interdisciplinarymore » team of engineers and computer scientists.« less
Vehicle Maneuver Detection with Accelerometer-Based Classification.
Cervantes-Villanueva, Javier; Carrillo-Zapata, Daniel; Terroso-Saenz, Fernando; Valdes-Vela, Mercedes; Skarmeta, Antonio F
2016-09-29
In the mobile computing era, smartphones have become instrumental tools to develop innovative mobile context-aware systems. In that sense, their usage in the vehicular domain eases the development of novel and personal transportation solutions. In this frame, the present work introduces an innovative mechanism to perceive the current kinematic state of a vehicle on the basis of the accelerometer data from a smartphone mounted in the vehicle. Unlike previous proposals, the introduced architecture targets the computational limitations of such devices to carry out the detection process following an incremental approach. For its realization, we have evaluated different classification algorithms to act as agents within the architecture. Finally, our approach has been tested with a real-world dataset collected by means of the ad hoc mobile application developed.
Model of a programmable quantum processing unit based on a quantum transistor effect
NASA Astrophysics Data System (ADS)
Ablayev, Farid; Andrianov, Sergey; Fetisov, Danila; Moiseev, Sergey; Terentyev, Alexandr; Urmanchev, Andrey; Vasiliev, Alexander
2018-02-01
In this paper we propose a model of a programmable quantum processing device realizable with existing nano-photonic technologies. It can be viewed as a basis for new high performance hardware architectures. Protocols for physical implementation of device on the controlled photon transfer and atomic transitions are presented. These protocols are designed for executing basic single-qubit and multi-qubit gates forming a universal set. We analyze the possible operation of this quantum computer scheme. Then we formalize the physical architecture by a mathematical model of a Quantum Processing Unit (QPU), which we use as a basis for the Quantum Programming Framework. This framework makes it possible to perform universal quantum computations in a multitasking environment.
Optimized planning methodologies of ASON implementation
NASA Astrophysics Data System (ADS)
Zhou, Michael M.; Tamil, Lakshman S.
2005-02-01
Advanced network planning concerns effective network-resource allocation for dynamic and open business environment. Planning methodologies of ASON implementation based on qualitative analysis and mathematical modeling are presented in this paper. The methodology includes method of rationalizing technology and architecture, building network and nodal models, and developing dynamic programming for multi-period deployment. The multi-layered nodal architecture proposed here can accommodate various nodal configurations for a multi-plane optical network and the network modeling presented here computes the required network elements for optimizing resource allocation.