Sample records for underlying computer architecture

  1. Tutorial: Computer architecture

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gajski, D.D.; Milutinovic, V.M.; Siegel, H.J.

    1986-01-01

    This book presents the state-of-the-art in advanced computer architecture. It deals with the concepts underlying current architectures and covers approaches and techniques being used in the design of advanced computer systems.

  2. A computer architecture for intelligent machines

    NASA Technical Reports Server (NTRS)

    Lefebvre, D. R.; Saridis, G. N.

    1992-01-01

    The theory of intelligent machines proposes a hierarchical organization for the functions of an autonomous robot based on the principle of increasing precision with decreasing intelligence. An analytic formulation of this theory using information-theoretic measures of uncertainty for each level of the intelligent machine has been developed. The authors present a computer architecture that implements the lower two levels of the intelligent machine. The architecture supports an event-driven programming paradigm that is independent of the underlying computer architecture and operating system. Execution-level controllers for motion and vision systems are briefly addressed, as well as the Petri net transducer software used to implement coordination-level functions. A case study illustrates how this computer architecture integrates real-time and higher-level control of manipulator and vision systems.

  3. Progress in a novel architecture for high performance processing

    NASA Astrophysics Data System (ADS)

    Zhang, Zhiwei; Liu, Meng; Liu, Zijun; Du, Xueliang; Xie, Shaolin; Ma, Hong; Ding, Guangxin; Ren, Weili; Zhou, Fabiao; Sun, Wenqin; Wang, Huijuan; Wang, Donglin

    2018-04-01

    The high performance processing (HPP) is an innovative architecture which targets on high performance computing with excellent power efficiency and computing performance. It is suitable for data intensive applications like supercomputing, machine learning and wireless communication. An example chip with four application-specific integrated circuit (ASIC) cores which is the first generation of HPP cores has been taped out successfully under Taiwan Semiconductor Manufacturing Company (TSMC) 40 nm low power process. The innovative architecture shows great energy efficiency over the traditional central processing unit (CPU) and general-purpose computing on graphics processing units (GPGPU). Compared with MaPU, HPP has made great improvement in architecture. The chip with 32 HPP cores is being developed under TSMC 16 nm field effect transistor (FFC) technology process and is planed to use commercially. The peak performance of this chip can reach 4.3 teraFLOPS (TFLOPS) and its power efficiency is up to 89.5 gigaFLOPS per watt (GFLOPS/W).

  4. A brick-architecture-based mobile under-vehicle inspection system

    NASA Astrophysics Data System (ADS)

    Qian, Cheng; Page, David; Koschan, Andreas; Abidi, Mongi

    2005-05-01

    In this paper, a mobile scanning system for real-time under-vehicle inspection is presented, which is founded on a "Brick" architecture. In this "Brick" architecture, the inspection system is basically decomposed into bricks of three kinds: sensing, mobility, and computing. These bricks are physically and logically independent and communicate with each other by wireless communication. Each brick is mainly composed by five modules: data acquisition, data processing, data transmission, power, and self-management. These five modules can be further decomposed into submodules where the function and the interface are well-defined. Based on this architecture, the system is built by four bricks: two sensing bricks consisting of a range scanner and a line CCD, one mobility brick, and one computing brick. The sensing bricks capture geometric data and texture data of the under-vehicle scene, while the mobility brick provides positioning data along the motion path. Data of these three modalities are transmitted to the computing brick where they are fused and reconstruct a 3D under-vehicle model for visualization and danger inspection. This system has been successfully used in several military applications and proved to be an effective safer method for national security.

  5. Sigint Application for Polymorphous Computing Architecture (PCA): Wideband DF

    DTIC Science & Technology

    2006-08-01

    Polymorphous Computing Architecture (PCA) program as stated by Robert Graybill is to Develop the computing foundation for agile systems by establishing...ubiquitous MUSIC algorithm rely upon an underlying narrowband signal model [8]. In this case, narrowband means that the signal bandwidth is less than...a wideband DF algorithm is needed to compensate for this model inadequacy. Among the various wideband DF techniques available, the coherent signal

  6. A computer architecture for intelligent machines

    NASA Technical Reports Server (NTRS)

    Lefebvre, D. R.; Saridis, G. N.

    1991-01-01

    The Theory of Intelligent Machines proposes a hierarchical organization for the functions of an autonomous robot based on the Principle of Increasing Precision With Decreasing Intelligence. An analytic formulation of this theory using information-theoretic measures of uncertainty for each level of the intelligent machine has been developed in recent years. A computer architecture that implements the lower two levels of the intelligent machine is presented. The architecture supports an event-driven programming paradigm that is independent of the underlying computer architecture and operating system. Details of Execution Level controllers for motion and vision systems are addressed, as well as the Petri net transducer software used to implement Coordination Level functions. Extensions to UNIX and VxWorks operating systems which enable the development of a heterogeneous, distributed application are described. A case study illustrates how this computer architecture integrates real-time and higher-level control of manipulator and vision systems.

  7. Sensing and perception: Connectionist approaches to subcognitive computing

    NASA Technical Reports Server (NTRS)

    Skrrypek, J.

    1987-01-01

    New approaches to machine sensing and perception are presented. The motivation for crossdisciplinary studies of perception in terms of AI and neurosciences is suggested. The question of computing architecture granularity as related to global/local computation underlying perceptual function is considered and examples of two environments are given. Finally, the examples of using one of the environments, UCLA PUNNS, to study neural architectures for visual function are presented.

  8. The Evaluation of Rekeying Protocols Within the Hubenko Architecture as Applied to Wireless Sensor Networks

    DTIC Science & Technology

    2009-03-01

    SENSOR NETWORKS THESIS Presented to the Faculty Department of Electrical and Computer Engineering Graduate School of Engineering and...hierarchical, and Secure Lock within a wireless sensor network (WSN) under the Hubenko architecture. Using a Matlab computer simulation, the impact of the...rekeying protocol should be applied given particular network parameters, such as WSN size. 10 1.3 Experimental Approach A computer simulation in

  9. An Object Oriented Extensible Architecture for Affordable Aerospace Propulsion Systems

    NASA Technical Reports Server (NTRS)

    Follen, Gregory J.; Lytle, John K. (Technical Monitor)

    2002-01-01

    Driven by a need to explore and develop propulsion systems that exceeded current computing capabilities, NASA Glenn embarked on a novel strategy leading to the development of an architecture that enables propulsion simulations never thought possible before. Full engine 3 Dimensional Computational Fluid Dynamic propulsion system simulations were deemed impossible due to the impracticality of the hardware and software computing systems required. However, with a software paradigm shift and an embracing of parallel and distributed processing, an architecture was designed to meet the needs of future propulsion system modeling. The author suggests that the architecture designed at the NASA Glenn Research Center for propulsion system modeling has potential for impacting the direction of development of affordable weapons systems currently under consideration by the Applied Vehicle Technology Panel (AVT). This paper discusses the salient features of the NPSS Architecture including its interface layer, object layer, implementation for accessing legacy codes, numerical zooming infrastructure and its computing layer. The computing layer focuses on the use and deployment of these propulsion simulations on parallel and distributed computing platforms which has been the focus of NASA Ames. Additional features of the object oriented architecture that support MultiDisciplinary (MD) Coupling, computer aided design (CAD) access and MD coupling objects will be discussed. Included will be a discussion of the successes, challenges and benefits of implementing this architecture.

  10. Real World Cognitive Multi-Tasking and Problem Solving: A Large Scale Cognitive Architecture Simulation Through High Performance Computing-Project Casie

    DTIC Science & Technology

    2008-03-01

    computational version of the CASIE architecture serves to demonstrate the functionality of our primary theories. However, implementation of several other...following facts. First, based on Theorem 3 and Theorem 5, the objective function is non -increasing under updating rule (6); second, by the criteria for...reassignment in updating rule (7), it is trivial to show that the objective function is non -increasing under updating rule (7). A Unified View to Graph

  11. Simulating Hydrologic Flow and Reactive Transport with PFLOTRAN and PETSc on Emerging Fine-Grained Parallel Computer Architectures

    NASA Astrophysics Data System (ADS)

    Mills, R. T.; Rupp, K.; Smith, B. F.; Brown, J.; Knepley, M.; Zhang, H.; Adams, M.; Hammond, G. E.

    2017-12-01

    As the high-performance computing community pushes towards the exascale horizon, power and heat considerations have driven the increasing importance and prevalence of fine-grained parallelism in new computer architectures. High-performance computing centers have become increasingly reliant on GPGPU accelerators and "manycore" processors such as the Intel Xeon Phi line, and 512-bit SIMD registers have even been introduced in the latest generation of Intel's mainstream Xeon server processors. The high degree of fine-grained parallelism and more complicated memory hierarchy considerations of such "manycore" processors present several challenges to existing scientific software. Here, we consider how the massively parallel, open-source hydrologic flow and reactive transport code PFLOTRAN - and the underlying Portable, Extensible Toolkit for Scientific Computation (PETSc) library on which it is built - can best take advantage of such architectures. We will discuss some key features of these novel architectures and our code optimizations and algorithmic developments targeted at them, and present experiences drawn from working with a wide range of PFLOTRAN benchmark problems on these architectures.

  12. Evidence of common and separate eye and hand accumulators underlying flexible eye-hand coordination

    PubMed Central

    Jana, Sumitash; Gopal, Atul

    2016-01-01

    Eye and hand movements are initiated by anatomically separate regions in the brain, and yet these movements can be flexibly coupled and decoupled, depending on the need. The computational architecture that enables this flexible coupling of independent effectors is not understood. Here, we studied the computational architecture that enables flexible eye-hand coordination using a drift diffusion framework, which predicts that the variability of the reaction time (RT) distribution scales with its mean. We show that a common stochastic accumulator to threshold, followed by a noisy effector-dependent delay, explains eye-hand RT distributions and their correlation in a visual search task that required decision-making, while an interactive eye and hand accumulator model did not. In contrast, in an eye-hand dual task, an interactive model better predicted the observed correlations and RT distributions than a common accumulator model. Notably, these two models could only be distinguished on the basis of the variability and not the means of the predicted RT distributions. Additionally, signatures of separate initiation signals were also observed in a small fraction of trials in the visual search task, implying that these distinct computational architectures were not a manifestation of the task design per se. Taken together, our results suggest two unique computational architectures for eye-hand coordination, with task context biasing the brain toward instantiating one of the two architectures. NEW & NOTEWORTHY Previous studies on eye-hand coordination have considered mainly the means of eye and hand reaction time (RT) distributions. Here, we leverage the approximately linear relationship between the mean and standard deviation of RT distributions, as predicted by the drift-diffusion model, to propose the existence of two distinct computational architectures underlying coordinated eye-hand movements. These architectures, for the first time, provide a computational basis for the flexible coupling between eye and hand movements. PMID:27784809

  13. An Object Oriented Extensible Architecture for Affordable Aerospace Propulsion Systems

    NASA Technical Reports Server (NTRS)

    Follen, Gregory J.

    2003-01-01

    Driven by a need to explore and develop propulsion systems that exceeded current computing capabilities, NASA Glenn embarked on a novel strategy leading to the development of an architecture that enables propulsion simulations never thought possible before. Full engine 3 Dimensional Computational Fluid Dynamic propulsion system simulations were deemed impossible due to the impracticality of the hardware and software computing systems required. However, with a software paradigm shift and an embracing of parallel and distributed processing, an architecture was designed to meet the needs of future propulsion system modeling. The author suggests that the architecture designed at the NASA Glenn Research Center for propulsion system modeling has potential for impacting the direction of development of affordable weapons systems currently under consideration by the Applied Vehicle Technology Panel (AVT).

  14. Traffic information computing platform for big data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Duan, Zongtao, E-mail: ztduan@chd.edu.cn; Li, Ying, E-mail: ztduan@chd.edu.cn; Zheng, Xibin, E-mail: ztduan@chd.edu.cn

    Big data environment create data conditions for improving the quality of traffic information service. The target of this article is to construct a traffic information computing platform for big data environment. Through in-depth analysis the connotation and technology characteristics of big data and traffic information service, a distributed traffic atomic information computing platform architecture is proposed. Under the big data environment, this type of traffic atomic information computing architecture helps to guarantee the traffic safety and efficient operation, more intelligent and personalized traffic information service can be used for the traffic information users.

  15. Programming for 1.6 Millon cores: Early experiences with IBM's BG/Q SMP architecture

    NASA Astrophysics Data System (ADS)

    Glosli, James

    2013-03-01

    With the stall in clock cycle improvements a decade ago, the drive for computational performance has continues along a path of increasing core counts on a processor. The multi-core evolution has been expressed in both a symmetric multi processor (SMP) architecture and cpu/GPU architecture. Debates rage in the high performance computing (HPC) community which architecture best serves HPC. In this talk I will not attempt to resolve that debate but perhaps fuel it. I will discuss the experience of exploiting Sequoia, a 98304 node IBM Blue Gene/Q SMP at Lawrence Livermore National Laboratory. The advantages and challenges of leveraging the computational power BG/Q will be detailed through the discussion of two applications. The first application is a Molecular Dynamics code called ddcMD. This is a code developed over the last decade at LLNL and ported to BG/Q. The second application is a cardiac modeling code called Cardioid. This is a code that was recently designed and developed at LLNL to exploit the fine scale parallelism of BG/Q's SMP architecture. Through the lenses of these efforts I'll illustrate the need to rethink how we express and implement our computational approaches. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

  16. Modeling driver behavior in a cognitive architecture.

    PubMed

    Salvucci, Dario D

    2006-01-01

    This paper explores the development of a rigorous computational model of driver behavior in a cognitive architecture--a computational framework with underlying psychological theories that incorporate basic properties and limitations of the human system. Computational modeling has emerged as a powerful tool for studying the complex task of driving, allowing researchers to simulate driver behavior and explore the parameters and constraints of this behavior. An integrated driver model developed in the ACT-R (Adaptive Control of Thought-Rational) cognitive architecture is described that focuses on the component processes of control, monitoring, and decision making in a multilane highway environment. This model accounts for the steering profiles, lateral position profiles, and gaze distributions of human drivers during lane keeping, curve negotiation, and lane changing. The model demonstrates how cognitive architectures facilitate understanding of driver behavior in the context of general human abilities and constraints and how the driving domain benefits cognitive architectures by pushing model development toward more complex, realistic tasks. The model can also serve as a core computational engine for practical applications that predict and recognize driver behavior and distraction.

  17. Distributed chemical computing using ChemStar: an open source java remote method invocation architecture applied to large scale molecular data from PubChem.

    PubMed

    Karthikeyan, M; Krishnan, S; Pandey, Anil Kumar; Bender, Andreas; Tropsha, Alexander

    2008-04-01

    We present the application of a Java remote method invocation (RMI) based open source architecture to distributed chemical computing. This architecture was previously employed for distributed data harvesting of chemical information from the Internet via the Google application programming interface (API; ChemXtreme). Due to its open source character and its flexibility, the underlying server/client framework can be quickly adopted to virtually every computational task that can be parallelized. Here, we present the server/client communication framework as well as an application to distributed computing of chemical properties on a large scale (currently the size of PubChem; about 18 million compounds), using both the Marvin toolkit as well as the open source JOELib package. As an application, for this set of compounds, the agreement of log P and TPSA between the packages was compared. Outliers were found to be mostly non-druglike compounds and differences could usually be explained by differences in the underlying algorithms. ChemStar is the first open source distributed chemical computing environment built on Java RMI, which is also easily adaptable to user demands due to its "plug-in architecture". The complete source codes as well as calculated properties along with links to PubChem resources are available on the Internet via a graphical user interface at http://moltable.ncl.res.in/chemstar/.

  18. Verification methodology for fault-tolerant, fail-safe computers applied to maglev control computer systems. Final report, July 1991-May 1993

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lala, J.H.; Nagle, G.A.; Harper, R.E.

    1993-05-01

    The Maglev control computer system should be designed to verifiably possess high reliability and safety as well as high availability to make Maglev a dependable and attractive transportation alternative to the public. A Maglev control computer system has been designed using a design-for-validation methodology developed earlier under NASA and SDIO sponsorship for real-time aerospace applications. The present study starts by defining the maglev mission scenario and ends with the definition of a maglev control computer architecture. Key intermediate steps included definitions of functional and dependability requirements, synthesis of two candidate architectures, development of qualitative and quantitative evaluation criteria, and analyticalmore » modeling of the dependability characteristics of the two architectures. Finally, the applicability of the design-for-validation methodology was also illustrated by applying it to the German Transrapid TR07 maglev control system.« less

  19. Network architecture test-beds as platforms for ubiquitous computing.

    PubMed

    Roscoe, Timothy

    2008-10-28

    Distributed systems research, and in particular ubiquitous computing, has traditionally assumed the Internet as a basic underlying communications substrate. Recently, however, the networking research community has come to question the fundamental design or 'architecture' of the Internet. This has been led by two observations: first, that the Internet as it stands is now almost impossible to evolve to support new functionality; and second, that modern applications of all kinds now use the Internet rather differently, and frequently implement their own 'overlay' networks above it to work around its perceived deficiencies. In this paper, I discuss recent academic projects to allow disruptive change to the Internet architecture, and also outline a radically different view of networking for ubiquitous computing that such proposals might facilitate.

  20. Benchmarking high performance computing architectures with CMS’ skeleton framework

    NASA Astrophysics Data System (ADS)

    Sexton-Kennedy, E.; Gartung, P.; Jones, C. D.

    2017-10-01

    In 2012 CMS evaluated which underlying concurrency technology would be the best to use for its multi-threaded framework. The available technologies were evaluated on the high throughput computing systems dominating the resources in use at that time. A skeleton framework benchmarking suite that emulates the tasks performed within a CMSSW application was used to select Intel’s Thread Building Block library, based on the measured overheads in both memory and CPU on the different technologies benchmarked. In 2016 CMS will get access to high performance computing resources that use new many core architectures; machines such as Cori Phase 1&2, Theta, Mira. Because of this we have revived the 2012 benchmark to test it’s performance and conclusions on these new architectures. This talk will discuss the results of this exercise.

  1. Engineering Technology Education Bibliography, 1990.

    ERIC Educational Resources Information Center

    Dyrud, Marilyn A.

    1991-01-01

    Lists over 340 materials published in 1990 related to engineering technology education and grouped under the following headings: administration; architectural; computer-assisted design/management (CAD/CAM); civil; computers; curriculum; electrical/electronics; industrial; industry/government/employers; instructional technology; laboratories;…

  2. GPU and APU computations of Finite Time Lyapunov Exponent fields

    NASA Astrophysics Data System (ADS)

    Conti, Christian; Rossinelli, Diego; Koumoutsakos, Petros

    2012-03-01

    We present GPU and APU accelerated computations of Finite-Time Lyapunov Exponent (FTLE) fields. The calculation of FTLEs is a computationally intensive process, as in order to obtain the sharp ridges associated with the Lagrangian Coherent Structures an extensive resampling of the flow field is required. The computational performance of this resampling is limited by the memory bandwidth of the underlying computer architecture. The present technique harnesses data-parallel execution of many-core architectures and relies on fast and accurate evaluations of moment conserving functions for the mesh to particle interpolations. We demonstrate how the computation of FTLEs can be efficiently performed on a GPU and on an APU through OpenCL and we report over one order of magnitude improvements over multi-threaded executions in FTLE computations of bluff body flows.

  3. Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster

    NASA Technical Reports Server (NTRS)

    Jost, Gabriele; Jin, Haoqiang; anMey, Dieter; Hatay, Ferhat F.

    2003-01-01

    With the advent of parallel hardware and software technologies users are faced with the challenge to choose a programming paradigm best suited for the underlying computer architecture. With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors (SMP), parallel programming techniques have evolved to support parallelism beyond a single level. Which programming paradigm is the best will depend on the nature of the given problem, the hardware architecture, and the available software. In this study we will compare different programming paradigms for the parallelization of a selected benchmark application on a cluster of SMP nodes. We compare the timings of different implementations of the same CFD benchmark application employing the same numerical algorithm on a cluster of Sun Fire SMP nodes. The rest of the paper is structured as follows: In section 2 we briefly discuss the programming models under consideration. We describe our compute platform in section 3. The different implementations of our benchmark code are described in section 4 and the performance results are presented in section 5. We conclude our study in section 6.

  4. Benchmarking high performance computing architectures with CMS’ skeleton framework

    DOE PAGES

    Sexton-Kennedy, E.; Gartung, P.; Jones, C. D.

    2017-11-23

    Here, in 2012 CMS evaluated which underlying concurrency technology would be the best to use for its multi-threaded framework. The available technologies were evaluated on the high throughput computing systems dominating the resources in use at that time. A skeleton framework benchmarking suite that emulates the tasks performed within a CMSSW application was used to select Intel’s Thread Building Block library, based on the measured overheads in both memory and CPU on the different technologies benchmarked. In 2016 CMS will get access to high performance computing resources that use new many core architectures; machines such as Cori Phase 1&2, Theta,more » Mira. Because of this we have revived the 2012 benchmark to test it’s performance and conclusions on these new architectures. This talk will discuss the results of this exercise.« less

  5. A novel strategy for load balancing of distributed medical applications.

    PubMed

    Logeswaran, Rajasvaran; Chen, Li-Choo

    2012-04-01

    Current trends in medicine, specifically in the electronic handling of medical applications, ranging from digital imaging, paperless hospital administration and electronic medical records, telemedicine, to computer-aided diagnosis, creates a burden on the network. Distributed Service Architectures, such as Intelligent Network (IN), Telecommunication Information Networking Architecture (TINA) and Open Service Access (OSA), are able to meet this new challenge. Distribution enables computational tasks to be spread among multiple processors; hence, performance is an important issue. This paper proposes a novel approach in load balancing, the Random Sender Initiated Algorithm, for distribution of tasks among several nodes sharing the same computational object (CO) instances in Distributed Service Architectures. Simulations illustrate that the proposed algorithm produces better network performance than the benchmark load balancing algorithms-the Random Node Selection Algorithm and the Shortest Queue Algorithm, especially under medium and heavily loaded conditions.

  6. Benchmarking high performance computing architectures with CMS’ skeleton framework

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sexton-Kennedy, E.; Gartung, P.; Jones, C. D.

    Here, in 2012 CMS evaluated which underlying concurrency technology would be the best to use for its multi-threaded framework. The available technologies were evaluated on the high throughput computing systems dominating the resources in use at that time. A skeleton framework benchmarking suite that emulates the tasks performed within a CMSSW application was used to select Intel’s Thread Building Block library, based on the measured overheads in both memory and CPU on the different technologies benchmarked. In 2016 CMS will get access to high performance computing resources that use new many core architectures; machines such as Cori Phase 1&2, Theta,more » Mira. Because of this we have revived the 2012 benchmark to test it’s performance and conclusions on these new architectures. This talk will discuss the results of this exercise.« less

  7. An Object-Oriented Network-Centric Software Architecture for Physical Computing

    NASA Astrophysics Data System (ADS)

    Palmer, Richard

    1997-08-01

    Recent developments in object-oriented computer languages and infrastructure such as the Internet, Web browsers, and the like provide an opportunity to define a more productive computational environment for scientific programming that is based more closely on the underlying mathematics describing physics than traditional programming languages such as FORTRAN or C++. In this talk I describe an object-oriented software architecture for representing physical problems that includes classes for such common mathematical objects as geometry, boundary conditions, partial differential and integral equations, discretization and numerical solution methods, etc. In practice, a scientific program written using this architecture looks remarkably like the mathematics used to understand the problem, is typically an order of magnitude smaller than traditional FORTRAN or C++ codes, and hence easier to understand, debug, describe, etc. All objects in this architecture are ``network-enabled,'' which means that components of a software solution to a physical problem can be transparently loaded from anywhere on the Internet or other global network. The architecture is expressed as an ``API,'' or application programmers interface specification, with reference embeddings in Java, Python, and C++. A C++ class library for an early version of this API has been implemented for machines ranging from PC's to the IBM SP2, meaning that phidentical codes run on all architectures.

  8. Architectural Aspects of Grid Computing and its Global Prospects for E-Science Community

    NASA Astrophysics Data System (ADS)

    Ahmad, Mushtaq

    2008-05-01

    The paper reviews the imminent Architectural Aspects of Grid Computing for e-Science community for scientific research and business/commercial collaboration beyond physical boundaries. Grid Computing provides all the needed facilities; hardware, software, communication interfaces, high speed internet, safe authentication and secure environment for collaboration of research projects around the globe. It provides highly fast compute engine for those scientific and engineering research projects and business/commercial applications which are heavily compute intensive and/or require humongous amounts of data. It also makes possible the use of very advanced methodologies, simulation models, expert systems and treasure of knowledge available around the globe under the umbrella of knowledge sharing. Thus it makes possible one of the dreams of global village for the benefit of e-Science community across the globe.

  9. Information Architecture for Quality Management Support in Hospitals.

    PubMed

    Rocha, Álvaro; Freixo, Jorge

    2015-10-01

    Quality Management occupies a strategic role in organizations, and the adoption of computer tools within an aligned information architecture facilitates the challenge of making more with less, promoting the development of a competitive edge and sustainability. A formal Information Architecture (IA) lends organizations an enhanced knowledge but, above all, favours management. This simplifies the reinvention of processes, the reformulation of procedures, bridging and the cooperation amongst the multiple actors of an organization. In the present investigation work we planned the IA for the Quality Management System (QMS) of a Hospital, which allowed us to develop and implement the QUALITUS (QUALITUS, name of the computer application developed to support Quality Management in a Hospital Unit) computer application. This solution translated itself in significant gains for the Hospital Unit under study, accelerating the quality management process and reducing the tasks, the number of documents, the information to be filled in and information errors, amongst others.

  10. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lee, Hsien-Hsin S

    The overall objective of this research project is to develop novel architectural techniques as well as system software to achieve a highly secure and intrusion-tolerant computing system. Such system will be autonomous, self-adapting, introspective, with self-healing capability under the circumstances of improper operations, abnormal workloads, and malicious attacks. The scope of this research includes: (1) System-wide, unified introspection techniques for autonomic systems, (2) Secure information-flow microarchitecture, (3) Memory-centric security architecture, (4) Authentication control and its implication to security, (5) Digital right management, (5) Microarchitectural denial-of-service attacks on shared resources. During the period of the project, we developed several architectural techniquesmore » and system software for achieving a robust, secure, and reliable computing system toward our goal.« less

  11. Simulation Accelerator

    NASA Technical Reports Server (NTRS)

    1998-01-01

    Under a NASA SBIR (Small Business Innovative Research) contract, (NAS5-30905), EAI Simulation Associates, Inc., developed a new digital simulation computer, Starlight(tm). With an architecture based on the analog model of computation, Starlight(tm) outperforms all other computers on a wide range of continuous system simulation. This system is used in a variety of applications, including aerospace, automotive, electric power and chemical reactors.

  12. Authentication and Authorization of End User in Microservice Architecture

    NASA Astrophysics Data System (ADS)

    He, Xiuyu; Yang, Xudong

    2017-10-01

    As the market and business continues to expand; the traditional single monolithic architecture is facing more and more challenges. The development of cloud computing and container technology promote microservice architecture became more popular. While the low coupling, fine granularity, scalability, flexibility and independence of the microservice architecture bring convenience, the inherent complexity of the distributed system make the security of microservice architecture important and difficult. This paper aims to study the authentication and authorization of the end user under the microservice architecture. By comparing with the traditional measures and researching on existing technology, this paper put forward a set of authentication and authorization strategies suitable for microservice architecture, such as distributed session, SSO solutions, client-side JSON web token and JWT + API Gateway, and summarize the advantages and disadvantages of each method.

  13. Advanced information processing system: The Army fault tolerant architecture conceptual study. Volume 1: Army fault tolerant architecture overview

    NASA Technical Reports Server (NTRS)

    Harper, R. E.; Alger, L. S.; Babikyan, C. A.; Butler, B. P.; Friend, S. A.; Ganska, R. J.; Lala, J. H.; Masotto, T. K.; Meyer, A. J.; Morton, D. P.

    1992-01-01

    Digital computing systems needed for Army programs such as the Computer-Aided Low Altitude Helicopter Flight Program and the Armored Systems Modernization (ASM) vehicles may be characterized by high computational throughput and input/output bandwidth, hard real-time response, high reliability and availability, and maintainability, testability, and producibility requirements. In addition, such a system should be affordable to produce, procure, maintain, and upgrade. To address these needs, the Army Fault Tolerant Architecture (AFTA) is being designed and constructed under a three-year program comprised of a conceptual study, detailed design and fabrication, and demonstration and validation phases. Described here are the results of the conceptual study phase of the AFTA development. Given here is an introduction to the AFTA program, its objectives, and key elements of its technical approach. A format is designed for representing mission requirements in a manner suitable for first order AFTA sizing and analysis, followed by a discussion of the current state of mission requirements acquisition for the targeted Army missions. An overview is given of AFTA's architectural theory of operation.

  14. Why is a computational framework for motivational and metacognitive control needed?

    NASA Astrophysics Data System (ADS)

    Sun, Ron

    2018-01-01

    This paper discusses, in the context of computational modelling and simulation of cognition, the relevance of deeper structures in the control of behaviour. Such deeper structures include motivational control of behaviour, which provides underlying causes for actions, and also metacognitive control, which provides higher-order processes for monitoring and regulation. It is argued that such deeper structures are important and thus cannot be ignored in computational cognitive architectures. A general framework based on the Clarion cognitive architecture is outlined that emphasises the interaction amongst action selection, motivation, and metacognition. The upshot is that it is necessary to incorporate all essential processes; short of that, the understanding of cognition can only be incomplete.

  15. Understanding Evolutionary Potential in Virtual CPU Instruction Set Architectures

    PubMed Central

    Bryson, David M.; Ofria, Charles

    2013-01-01

    We investigate fundamental decisions in the design of instruction set architectures for linear genetic programs that are used as both model systems in evolutionary biology and underlying solution representations in evolutionary computation. We subjected digital organisms with each tested architecture to seven different computational environments designed to present a range of evolutionary challenges. Our goal was to engineer a general purpose architecture that would be effective under a broad range of evolutionary conditions. We evaluated six different types of architectural features for the virtual CPUs: (1) genetic flexibility: we allowed digital organisms to more precisely modify the function of genetic instructions, (2) memory: we provided an increased number of registers in the virtual CPUs, (3) decoupled sensors and actuators: we separated input and output operations to enable greater control over data flow. We also tested a variety of methods to regulate expression: (4) explicit labels that allow programs to dynamically refer to specific genome positions, (5) position-relative search instructions, and (6) multiple new flow control instructions, including conditionals and jumps. Each of these features also adds complication to the instruction set and risks slowing evolution due to epistatic interactions. Two features (multiple argument specification and separated I/O) demonstrated substantial improvements in the majority of test environments, along with versions of each of the remaining architecture modifications that show significant improvements in multiple environments. However, some tested modifications were detrimental, though most exhibit no systematic effects on evolutionary potential, highlighting the robustness of digital evolution. Combined, these observations enhance our understanding of how instruction architecture impacts evolutionary potential, enabling the creation of architectures that support more rapid evolution of complex solutions to a broad range of challenges. PMID:24376669

  16. Implementation of Helioseismic Data Reduction and Diagnostic Techniques on Massively Parallel Architectures

    NASA Technical Reports Server (NTRS)

    Korzennik, Sylvain

    1997-01-01

    Under the direction of Dr. Rhodes, and the technical supervision of Dr. Korzennik, the data assimilation of high spatial resolution solar dopplergrams has been carried out throughout the program on the Intel Delta Touchstone supercomputer. With the help of a research assistant, partially supported by this grant, and under the supervision of Dr. Korzennik, code development was carried out at SAO, using various available resources. To ensure cross-platform portability, PVM was selected as the message passing library. A parallel implementation of power spectra computation for helioseismology data reduction, using PVM was successfully completed. It was successfully ported to SMP architectures (i.e. SUN), and to some MPP architectures (i.e. the CM5). Due to limitation of the implementation of PVM on the Cray T3D, the port to that architecture was not completed at the time.

  17. Computers in Academic Architecture Libraries.

    ERIC Educational Resources Information Center

    Willis, Alfred; And Others

    1992-01-01

    Computers are widely used in architectural research and teaching in U.S. schools of architecture. A survey of libraries serving these schools sought information on the emphasis placed on computers by the architectural curriculum, accessibility of computers to library staff, and accessibility of computers to library patrons. Survey results and…

  18. NETRA: A parallel architecture for integrated vision systems. 1: Architecture and organization

    NASA Technical Reports Server (NTRS)

    Choudhary, Alok N.; Patel, Janak H.; Ahuja, Narendra

    1989-01-01

    Computer vision is regarded as one of the most complex and computationally intensive problems. An integrated vision system (IVS) is considered to be a system that uses vision algorithms from all levels of processing for a high level application (such as object recognition). A model of computation is presented for parallel processing for an IVS. Using the model, desired features and capabilities of a parallel architecture suitable for IVSs are derived. Then a multiprocessor architecture (called NETRA) is presented. This architecture is highly flexible without the use of complex interconnection schemes. The topology of NETRA is recursively defined and hence is easily scalable from small to large systems. Homogeneity of NETRA permits fault tolerance and graceful degradation under faults. It is a recursively defined tree-type hierarchical architecture where each of the leaf nodes consists of a cluster of processors connected with a programmable crossbar with selective broadcast capability to provide for desired flexibility. A qualitative evaluation of NETRA is presented. Then general schemes are described to map parallel algorithms onto NETRA. Algorithms are classified according to their communication requirements for parallel processing. An extensive analysis of inter-cluster communication strategies in NETRA is presented, and parameters affecting performance of parallel algorithms when mapped on NETRA are discussed. Finally, a methodology to evaluate performance of algorithms on NETRA is described.

  19. PISCES: An environment for parallel scientific computation

    NASA Technical Reports Server (NTRS)

    Pratt, T. W.

    1985-01-01

    The parallel implementation of scientific computing environment (PISCES) is a project to provide high-level programming environments for parallel MIMD computers. Pisces 1, the first of these environments, is a FORTRAN 77 based environment which runs under the UNIX operating system. The Pisces 1 user programs in Pisces FORTRAN, an extension of FORTRAN 77 for parallel processing. The major emphasis in the Pisces 1 design is in providing a carefully specified virtual machine that defines the run-time environment within which Pisces FORTRAN programs are executed. Each implementation then provides the same virtual machine, regardless of differences in the underlying architecture. The design is intended to be portable to a variety of architectures. Currently Pisces 1 is implemented on a network of Apollo workstations and on a DEC VAX uniprocessor via simulation of the task level parallelism. An implementation for the Flexible Computing Corp. FLEX/32 is under construction. An introduction to the Pisces 1 virtual computer and the FORTRAN 77 extensions is presented. An example of an algorithm for the iterative solution of a system of equations is given. The most notable features of the design are the provision for several granularities of parallelism in programs and the provision of a window mechanism for distributed access to large arrays of data.

  20. A supportive architecture for CFD-based design optimisation

    NASA Astrophysics Data System (ADS)

    Li, Ni; Su, Zeya; Bi, Zhuming; Tian, Chao; Ren, Zhiming; Gong, Guanghong

    2014-03-01

    Multi-disciplinary design optimisation (MDO) is one of critical methodologies to the implementation of enterprise systems (ES). MDO requiring the analysis of fluid dynamics raises a special challenge due to its extremely intensive computation. The rapid development of computational fluid dynamic (CFD) technique has caused a rise of its applications in various fields. Especially for the exterior designs of vehicles, CFD has become one of the three main design tools comparable to analytical approaches and wind tunnel experiments. CFD-based design optimisation is an effective way to achieve the desired performance under the given constraints. However, due to the complexity of CFD, integrating with CFD analysis in an intelligent optimisation algorithm is not straightforward. It is a challenge to solve a CFD-based design problem, which is usually with high dimensions, and multiple objectives and constraints. It is desirable to have an integrated architecture for CFD-based design optimisation. However, our review on existing works has found that very few researchers have studied on the assistive tools to facilitate CFD-based design optimisation. In the paper, a multi-layer architecture and a general procedure are proposed to integrate different CFD toolsets with intelligent optimisation algorithms, parallel computing technique and other techniques for efficient computation. In the proposed architecture, the integration is performed either at the code level or data level to fully utilise the capabilities of different assistive tools. Two intelligent algorithms are developed and embedded with parallel computing. These algorithms, together with the supportive architecture, lay a solid foundation for various applications of CFD-based design optimisation. To illustrate the effectiveness of the proposed architecture and algorithms, the case studies on aerodynamic shape design of a hypersonic cruising vehicle are provided, and the result has shown that the proposed architecture and developed algorithms have performed successfully and efficiently in dealing with the design optimisation with over 200 design variables.

  1. On the impact of approximate computation in an analog DeSTIN architecture.

    PubMed

    Young, Steven; Lu, Junjie; Holleman, Jeremy; Arel, Itamar

    2014-05-01

    Deep machine learning (DML) holds the potential to revolutionize machine learning by automating rich feature extraction, which has become the primary bottleneck of human engineering in pattern recognition systems. However, the heavy computational burden renders DML systems implemented on conventional digital processors impractical for large-scale problems. The highly parallel computations required to implement large-scale deep learning systems are well suited to custom hardware. Analog computation has demonstrated power efficiency advantages of multiple orders of magnitude relative to digital systems while performing nonideal computations. In this paper, we investigate typical error sources introduced by analog computational elements and their impact on system-level performance in DeSTIN--a compositional deep learning architecture. These inaccuracies are evaluated on a pattern classification benchmark, clearly demonstrating the robustness of the underlying algorithm to the errors introduced by analog computational elements. A clear understanding of the impacts of nonideal computations is necessary to fully exploit the efficiency of analog circuits.

  2. Fast underdetermined BSS architecture design methodology for real time applications.

    PubMed

    Mopuri, Suresh; Reddy, P Sreenivasa; Acharyya, Amit; Naik, Ganesh R

    2015-01-01

    In this paper, we propose a high speed architecture design methodology for the Under-determined Blind Source Separation (UBSS) algorithm using our recently proposed high speed Discrete Hilbert Transform (DHT) targeting real time applications. In UBSS algorithm, unlike the typical BSS, the number of sensors are less than the number of the sources, which is of more interest in the real time applications. The DHT architecture has been implemented based on sub matrix multiplication method to compute M point DHT, which uses N point architecture recursively and where M is an integer multiples of N. The DHT architecture and state of the art architecture are coded in VHDL for 16 bit word length and ASIC implementation is carried out using UMC 90 - nm technology @V DD = 1V and @ 1MHZ clock frequency. The proposed architecture implementation and experimental comparison results show that the DHT design is two times faster than state of the art architecture.

  3. Center for Aeronautics and Space Information Sciences

    NASA Technical Reports Server (NTRS)

    Flynn, Michael J.

    1992-01-01

    This report summarizes the research done during 1991/92 under the Center for Aeronautics and Space Information Science (CASIS) program. The topics covered are computer architecture, networking, and neural nets.

  4. Noise tolerant spatiotemporal chaos computing.

    PubMed

    Kia, Behnam; Kia, Sarvenaz; Lindner, John F; Sinha, Sudeshna; Ditto, William L

    2014-12-01

    We introduce and design a noise tolerant chaos computing system based on a coupled map lattice (CML) and the noise reduction capabilities inherent in coupled dynamical systems. The resulting spatiotemporal chaos computing system is more robust to noise than a single map chaos computing system. In this CML based approach to computing, under the coupled dynamics, the local noise from different nodes of the lattice diffuses across the lattice, and it attenuates each other's effects, resulting in a system with less noise content and a more robust chaos computing architecture.

  5. Noise tolerant spatiotemporal chaos computing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kia, Behnam; Kia, Sarvenaz; Ditto, William L.

    We introduce and design a noise tolerant chaos computing system based on a coupled map lattice (CML) and the noise reduction capabilities inherent in coupled dynamical systems. The resulting spatiotemporal chaos computing system is more robust to noise than a single map chaos computing system. In this CML based approach to computing, under the coupled dynamics, the local noise from different nodes of the lattice diffuses across the lattice, and it attenuates each other's effects, resulting in a system with less noise content and a more robust chaos computing architecture.

  6. Emerging Neuromorphic Computing Architectures & Enabling Hardware for Cognitive Information Processing Applications

    DTIC Science & Technology

    2010-06-01

    DATES COVEREDAPR 2009 – JAN 2010 (From - To) APR 2009 – JAN 2010 4. TITLE AND SUBTITLE EMERGING NEUROMORPHIC COMPUTING ARCHITECTURES AND ENABLING...14. ABSTRACT The highly cross-disciplinary emerging field of neuromorphic computing architectures for cognitive information processing applications...belief systems, software, computer engineering, etc. In our effort to develop cognitive systems atop a neuromorphic computing architecture, we explored

  7. High Performance GPU-Based Fourier Volume Rendering.

    PubMed

    Abdellah, Marwan; Eldeib, Ayman; Sharawi, Amr

    2015-01-01

    Fourier volume rendering (FVR) is a significant visualization technique that has been used widely in digital radiography. As a result of its (N (2)log⁡N) time complexity, it provides a faster alternative to spatial domain volume rendering algorithms that are (N (3)) computationally complex. Relying on the Fourier projection-slice theorem, this technique operates on the spectral representation of a 3D volume instead of processing its spatial representation to generate attenuation-only projections that look like X-ray radiographs. Due to the rapid evolution of its underlying architecture, the graphics processing unit (GPU) became an attractive competent platform that can deliver giant computational raw power compared to the central processing unit (CPU) on a per-dollar-basis. The introduction of the compute unified device architecture (CUDA) technology enables embarrassingly-parallel algorithms to run efficiently on CUDA-capable GPU architectures. In this work, a high performance GPU-accelerated implementation of the FVR pipeline on CUDA-enabled GPUs is presented. This proposed implementation can achieve a speed-up of 117x compared to a single-threaded hybrid implementation that uses the CPU and GPU together by taking advantage of executing the rendering pipeline entirely on recent GPU architectures.

  8. Memristor-Based Synapse Design and Training Scheme for Neuromorphic Computing Architecture

    DTIC Science & Technology

    2012-06-01

    system level built upon the conventional Von Neumann computer architecture [2][3]. Developing the neuromorphic architecture at chip level by...SCHEME FOR NEUROMORPHIC COMPUTING ARCHITECTURE 5a. CONTRACT NUMBER FA8750-11-2-0046 5b. GRANT NUMBER N/A 5c. PROGRAM ELEMENT NUMBER 62788F 6...creation of memristor-based neuromorphic computing architecture. Rather than the existing crossbar-based neuron network designs, we focus on memristor

  9. A Debugger for Computational Grid Applications

    NASA Technical Reports Server (NTRS)

    Hood, Robert; Jost, Gabriele

    2000-01-01

    The p2d2 project at NAS has built a debugger for applications running on heterogeneous computational grids. It employs a client-server architecture to simplify the implementation. Its user interface has been designed to provide process control and state examination functions on a computation containing a large number of processes. It can find processes participating in distributed computations even when those processes were not created under debugger control. These process identification techniques work both on conventional distributed executions as well as those on a computational grid.

  10. Connectionist Models for Intelligent Computation

    DTIC Science & Technology

    1989-07-26

    Intelligent Canputation 12. PERSONAL AUTHOR(S) H.H. Chen and Y.C. Lee 13a. o R,POT Cal 13b TIME lVD/rED 14 DATE OF REPORT (Year, Month, Day) JS PAGE...fied Project Title: Connectionist Models-for Intelligent Computation Contract/Grant No.: AFOSR-87-0388 Contract/Grant Period of Performance: Sept. 1...underlying principles, architectures and appilications of artificial neural networks for intelligent computations.o, Approach: -) We use both numerical

  11. VASSAR: Value assessment of system architectures using rules

    NASA Astrophysics Data System (ADS)

    Selva, D.; Crawley, E. F.

    A key step of the mission development process is the selection of a system architecture, i.e., the layout of the major high-level system design decisions. This step typically involves the identification of a set of candidate architectures and a cost-benefit analysis to compare them. Computational tools have been used in the past to bring rigor and consistency into this process. These tools can automatically generate architectures by enumerating different combinations of decisions and options. They can also evaluate these architectures by applying cost models and simplified performance models. Current performance models are purely quantitative tools that are best fit for the evaluation of the technical performance of mission design. However, assessing the relative merit of a system architecture is a much more holistic task than evaluating performance of a mission design. Indeed, the merit of a system architecture comes from satisfying a variety of stakeholder needs, some of which are easy to quantify, and some of which are harder to quantify (e.g., elegance, scientific value, political robustness, flexibility). Moreover, assessing the merit of a system architecture at these very early stages of design often requires dealing with a mix of: a) quantitative and semi-qualitative data; objective and subjective information. Current computational tools are poorly suited for these purposes. In this paper, we propose a general methodology that can used to assess the relative merit of several candidate system architectures under the presence of objective, subjective, quantitative, and qualitative stakeholder needs. The methodology called VASSAR (Value ASsessment for System Architectures using Rules). The major underlying assumption of the VASSAR methodology is that the merit of a system architecture can assessed by comparing the capabilities of the architecture with the stakeholder requirements. Hence for example, a candidate architecture that fully satisfies all critical sta- eholder requirements is a good architecture. The assessment process is thus fundamentally seen as a pattern matching process where capabilities match requirements, which motivates the use of rule-based expert systems (RBES). This paper describes the VASSAR methodology and shows how it can be applied to a large complex space system, namely an Earth observation satellite system. Companion papers show its applicability to the NASA space communications and navigation program and the joint NOAA-DoD NPOESS program.

  12. A fast, programmable hardware architecture for spaceborne SAR processing

    NASA Technical Reports Server (NTRS)

    Bennett, J. R.; Cumming, I. G.; Lim, J.; Wedding, R. M.

    1983-01-01

    The launch of spaceborne SARs during the 1980's is discussed. The satellite SARs require high quality and high throughput ground processors. Compression ratios in range and azimuth of greater than 500 and 150 respectively lead to frequency domain processing and data computation rates in excess of 2000 million real operations per second for C-band SARs under consideration. Various hardware architectures are examined and two promising candidates and proceeds to recommend a fast, programmable hardware architecture for spaceborne SAR processing are selected. Modularity and programmability are introduced as desirable attributes for the purpose of HTSP hardware selection.

  13. CFD Research, Parallel Computation and Aerodynamic Optimization

    NASA Technical Reports Server (NTRS)

    Ryan, James S.

    1995-01-01

    During the last five years, CFD has matured substantially. Pure CFD research remains to be done, but much of the focus has shifted to integration of CFD into the design process. The work under these cooperative agreements reflects this trend. The recent work, and work which is planned, is designed to enhance the competitiveness of the US aerospace industry. CFD and optimization approaches are being developed and tested, so that the industry can better choose which methods to adopt in their design processes. The range of computer architectures has been dramatically broadened, as the assumption that only huge vector supercomputers could be useful has faded. Today, researchers and industry can trade off time, cost, and availability, choosing vector supercomputers, scalable parallel architectures, networked workstations, or heterogenous combinations of these to complete required computations efficiently.

  14. Developing a Distributed Computing Architecture at Arizona State University.

    ERIC Educational Resources Information Center

    Armann, Neil; And Others

    1994-01-01

    Development of Arizona State University's computing architecture, designed to ensure that all new distributed computing pieces will work together, is described. Aspects discussed include the business rationale, the general architectural approach, characteristics and objectives of the architecture, specific services, and impact on the university…

  15. Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures

    DTIC Science & Technology

    2017-10-04

    Report: Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures The views, opinions and/or findings contained in this...Chapel Hill Title: Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures Report Term: 0-Other Email: dm...algorithms for scientific and geometric computing by exploiting the power and performance efficiency of heterogeneous shared memory architectures . These

  16. Automation of reliability evaluation procedures through CARE - The computer-aided reliability estimation program.

    NASA Technical Reports Server (NTRS)

    Mathur, F. P.

    1972-01-01

    Description of an on-line interactive computer program called CARE (Computer-Aided Reliability Estimation) which can model self-repair and fault-tolerant organizations and perform certain other functions. Essentially CARE consists of a repository of mathematical equations defining the various basic redundancy schemes. These equations, under program control, are then interrelated to generate the desired mathematical model to fit the architecture of the system under evaluation. The mathematical model is then supplied with ground instances of its variables and is then evaluated to generate values for the reliability-theoretic functions applied to the model.

  17. Design and Development of a Run-Time Monitor for Multi-Core Architectures in Cloud Computing

    PubMed Central

    Kang, Mikyung; Kang, Dong-In; Crago, Stephen P.; Park, Gyung-Leen; Lee, Junghoon

    2011-01-01

    Cloud computing is a new information technology trend that moves computing and data away from desktops and portable PCs into large data centers. The basic principle of cloud computing is to deliver applications as services over the Internet as well as infrastructure. A cloud is a type of parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources. The large-scale distributed applications on a cloud require adaptive service-based software, which has the capability of monitoring system status changes, analyzing the monitored information, and adapting its service configuration while considering tradeoffs among multiple QoS features simultaneously. In this paper, we design and develop a Run-Time Monitor (RTM) which is a system software to monitor the application behavior at run-time, analyze the collected information, and optimize cloud computing resources for multi-core architectures. RTM monitors application software through library instrumentation as well as underlying hardware through a performance counter optimizing its computing configuration based on the analyzed data. PMID:22163811

  18. Design and development of a run-time monitor for multi-core architectures in cloud computing.

    PubMed

    Kang, Mikyung; Kang, Dong-In; Crago, Stephen P; Park, Gyung-Leen; Lee, Junghoon

    2011-01-01

    Cloud computing is a new information technology trend that moves computing and data away from desktops and portable PCs into large data centers. The basic principle of cloud computing is to deliver applications as services over the Internet as well as infrastructure. A cloud is a type of parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources. The large-scale distributed applications on a cloud require adaptive service-based software, which has the capability of monitoring system status changes, analyzing the monitored information, and adapting its service configuration while considering tradeoffs among multiple QoS features simultaneously. In this paper, we design and develop a Run-Time Monitor (RTM) which is a system software to monitor the application behavior at run-time, analyze the collected information, and optimize cloud computing resources for multi-core architectures. RTM monitors application software through library instrumentation as well as underlying hardware through a performance counter optimizing its computing configuration based on the analyzed data.

  19. Design and deployment of an elastic network test-bed in IHEP data center based on SDN

    NASA Astrophysics Data System (ADS)

    Zeng, Shan; Qi, Fazhi; Chen, Gang

    2017-10-01

    High energy physics experiments produce huge amounts of raw data, while because of the sharing characteristics of the network resources, there is no guarantee of the available bandwidth for each experiment which may cause link congestion problems. On the other side, with the development of cloud computing technologies, IHEP have established a cloud platform based on OpenStack which can ensure the flexibility of the computing and storage resources, and more and more computing applications have been deployed on virtual machines established by OpenStack. However, under the traditional network architecture, network capability can’t be required elastically, which becomes the bottleneck of restricting the flexible application of cloud computing. In order to solve the above problems, we propose an elastic cloud data center network architecture based on SDN, and we also design a high performance controller cluster based on OpenDaylight. In the end, we present our current test results.

  20. Frances: A Tool for Understanding Computer Architecture and Assembly Language

    ERIC Educational Resources Information Center

    Sondag, Tyler; Pokorny, Kian L.; Rajan, Hridesh

    2012-01-01

    Students in all areas of computing require knowledge of the computing device including software implementation at the machine level. Several courses in computer science curricula address these low-level details such as computer architecture and assembly languages. For such courses, there are advantages to studying real architectures instead of…

  1. Efficient parallel architecture for highly coupled real-time linear system applications

    NASA Technical Reports Server (NTRS)

    Carroll, Chester C.; Homaifar, Abdollah; Barua, Soumavo

    1988-01-01

    A systematic procedure is developed for exploiting the parallel constructs of computation in a highly coupled, linear system application. An overall top-down design approach is adopted. Differential equations governing the application under consideration are partitioned into subtasks on the basis of a data flow analysis. The interconnected task units constitute a task graph which has to be computed in every update interval. Multiprocessing concepts utilizing parallel integration algorithms are then applied for efficient task graph execution. A simple scheduling routine is developed to handle task allocation while in the multiprocessor mode. Results of simulation and scheduling are compared on the basis of standard performance indices. Processor timing diagrams are developed on the basis of program output accruing to an optimal set of processors. Basic architectural attributes for implementing the system are discussed together with suggestions for processing element design. Emphasis is placed on flexible architectures capable of accommodating widely varying application specifics.

  2. Outline of a novel architecture for cortical computation.

    PubMed

    Majumdar, Kaushik

    2008-03-01

    In this paper a novel architecture for cortical computation has been proposed. This architecture is composed of computing paths consisting of neurons and synapses. These paths have been decomposed into lateral, longitudinal and vertical components. Cortical computation has then been decomposed into lateral computation (LaC), longitudinal computation (LoC) and vertical computation (VeC). It has been shown that various loop structures in the cortical circuit play important roles in cortical computation as well as in memory storage and retrieval, keeping in conformity with the molecular basis of short and long term memory. A new learning scheme for the brain has also been proposed and how it is implemented within the proposed architecture has been explained. A few mathematical results about the architecture have been proposed, some of which are without proof.

  3. Architecture Adaptive Computing Environment

    NASA Technical Reports Server (NTRS)

    Dorband, John E.

    2006-01-01

    Architecture Adaptive Computing Environment (aCe) is a software system that includes a language, compiler, and run-time library for parallel computing. aCe was developed to enable programmers to write programs, more easily than was previously possible, for a variety of parallel computing architectures. Heretofore, it has been perceived to be difficult to write parallel programs for parallel computers and more difficult to port the programs to different parallel computing architectures. In contrast, aCe is supportable on all high-performance computing architectures. Currently, it is supported on LINUX clusters. aCe uses parallel programming constructs that facilitate writing of parallel programs. Such constructs were used in single-instruction/multiple-data (SIMD) programming languages of the 1980s, including Parallel Pascal, Parallel Forth, C*, *LISP, and MasPar MPL. In aCe, these constructs are extended and implemented for both SIMD and multiple- instruction/multiple-data (MIMD) architectures. Two new constructs incorporated in aCe are those of (1) scalar and virtual variables and (2) pre-computed paths. The scalar-and-virtual-variables construct increases flexibility in optimizing memory utilization in various architectures. The pre-computed-paths construct enables the compiler to pre-compute part of a communication operation once, rather than computing it every time the communication operation is performed.

  4. Hypercube matrix computation task

    NASA Technical Reports Server (NTRS)

    Calalo, Ruel H.; Imbriale, William A.; Jacobi, Nathan; Liewer, Paulett C.; Lockhart, Thomas G.; Lyzenga, Gregory A.; Lyons, James R.; Manshadi, Farzin; Patterson, Jean E.

    1988-01-01

    A major objective of the Hypercube Matrix Computation effort at the Jet Propulsion Laboratory (JPL) is to investigate the applicability of a parallel computing architecture to the solution of large-scale electromagnetic scattering problems. Three scattering analysis codes are being implemented and assessed on a JPL/California Institute of Technology (Caltech) Mark 3 Hypercube. The codes, which utilize different underlying algorithms, give a means of evaluating the general applicability of this parallel architecture. The three analysis codes being implemented are a frequency domain method of moments code, a time domain finite difference code, and a frequency domain finite elements code. These analysis capabilities are being integrated into an electromagnetics interactive analysis workstation which can serve as a design tool for the construction of antennas and other radiating or scattering structures. The first two years of work on the Hypercube Matrix Computation effort is summarized. It includes both new developments and results as well as work previously reported in the Hypercube Matrix Computation Task: Final Report for 1986 to 1987 (JPL Publication 87-18).

  5. Parallel Computing Strategies for Irregular Algorithms

    NASA Technical Reports Server (NTRS)

    Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)

    2002-01-01

    Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.

  6. The architecture of a distributed medical dictionary.

    PubMed

    Fowler, J; Buffone, G; Moreau, D

    1995-01-01

    Exploiting high-speed computer networks to provide a national medical information infrastructure is a goal for medical informatics. The Distributed Medical Dictionary under development at Baylor College of Medicine is a model for an architecture that supports collaborative development of a distributed online medical terminology knowledge-base. A prototype is described that illustrates the concept. Issues that must be addressed by such a system include high availability, acceptable response time, support for local idiom, and control of vocabulary.

  7. A Nanotechnology-Ready Computing Scheme based on a Weakly Coupled Oscillator Network

    NASA Astrophysics Data System (ADS)

    Vodenicarevic, Damir; Locatelli, Nicolas; Abreu Araujo, Flavio; Grollier, Julie; Querlioz, Damien

    2017-03-01

    With conventional transistor technologies reaching their limits, alternative computing schemes based on novel technologies are currently gaining considerable interest. Notably, promising computing approaches have proposed to leverage the complex dynamics emerging in networks of coupled oscillators based on nanotechnologies. The physical implementation of such architectures remains a true challenge, however, as most proposed ideas are not robust to nanotechnology devices’ non-idealities. In this work, we propose and investigate the implementation of an oscillator-based architecture, which can be used to carry out pattern recognition tasks, and which is tailored to the specificities of nanotechnologies. This scheme relies on a weak coupling between oscillators, and does not require a fine tuning of the coupling values. After evaluating its reliability under the severe constraints associated to nanotechnologies, we explore the scalability of such an architecture, suggesting its potential to realize pattern recognition tasks using limited resources. We show that it is robust to issues like noise, variability and oscillator non-linearity. Defining network optimization design rules, we show that nano-oscillator networks could be used for efficient cognitive processing.

  8. Stream Processors

    NASA Astrophysics Data System (ADS)

    Erez, Mattan; Dally, William J.

    Stream processors, like other multi core architectures partition their functional units and storage into multiple processing elements. In contrast to typical architectures, which contain symmetric general-purpose cores and a cache hierarchy, stream processors have a significantly leaner design. Stream processors are specifically designed for the stream execution model, in which applications have large amounts of explicit parallel computation, structured and predictable control, and memory accesses that can be performed at a coarse granularity. Applications in the streaming model are expressed in a gather-compute-scatter form, yielding programs with explicit control over transferring data to and from on-chip memory. Relying on these characteristics, which are common to many media processing and scientific computing applications, stream architectures redefine the boundary between software and hardware responsibilities with software bearing much of the complexity required to manage concurrency, locality, and latency tolerance. Thus, stream processors have minimal control consisting of fetching medium- and coarse-grained instructions and executing them directly on the many ALUs. Moreover, the on-chip storage hierarchy of stream processors is under explicit software control, as is all communication, eliminating the need for complex reactive hardware mechanisms.

  9. A Nanotechnology-Ready Computing Scheme based on a Weakly Coupled Oscillator Network.

    PubMed

    Vodenicarevic, Damir; Locatelli, Nicolas; Abreu Araujo, Flavio; Grollier, Julie; Querlioz, Damien

    2017-03-21

    With conventional transistor technologies reaching their limits, alternative computing schemes based on novel technologies are currently gaining considerable interest. Notably, promising computing approaches have proposed to leverage the complex dynamics emerging in networks of coupled oscillators based on nanotechnologies. The physical implementation of such architectures remains a true challenge, however, as most proposed ideas are not robust to nanotechnology devices' non-idealities. In this work, we propose and investigate the implementation of an oscillator-based architecture, which can be used to carry out pattern recognition tasks, and which is tailored to the specificities of nanotechnologies. This scheme relies on a weak coupling between oscillators, and does not require a fine tuning of the coupling values. After evaluating its reliability under the severe constraints associated to nanotechnologies, we explore the scalability of such an architecture, suggesting its potential to realize pattern recognition tasks using limited resources. We show that it is robust to issues like noise, variability and oscillator non-linearity. Defining network optimization design rules, we show that nano-oscillator networks could be used for efficient cognitive processing.

  10. A Nanotechnology-Ready Computing Scheme based on a Weakly Coupled Oscillator Network

    PubMed Central

    Vodenicarevic, Damir; Locatelli, Nicolas; Abreu Araujo, Flavio; Grollier, Julie; Querlioz, Damien

    2017-01-01

    With conventional transistor technologies reaching their limits, alternative computing schemes based on novel technologies are currently gaining considerable interest. Notably, promising computing approaches have proposed to leverage the complex dynamics emerging in networks of coupled oscillators based on nanotechnologies. The physical implementation of such architectures remains a true challenge, however, as most proposed ideas are not robust to nanotechnology devices’ non-idealities. In this work, we propose and investigate the implementation of an oscillator-based architecture, which can be used to carry out pattern recognition tasks, and which is tailored to the specificities of nanotechnologies. This scheme relies on a weak coupling between oscillators, and does not require a fine tuning of the coupling values. After evaluating its reliability under the severe constraints associated to nanotechnologies, we explore the scalability of such an architecture, suggesting its potential to realize pattern recognition tasks using limited resources. We show that it is robust to issues like noise, variability and oscillator non-linearity. Defining network optimization design rules, we show that nano-oscillator networks could be used for efficient cognitive processing. PMID:28322262

  11. A global distributed storage architecture

    NASA Technical Reports Server (NTRS)

    Lionikis, Nemo M.; Shields, Michael F.

    1996-01-01

    NSA architects and planners have come to realize that to gain the maximum benefit from, and keep pace with, emerging technologies, we must move to a radically different computing architecture. The compute complex of the future will be a distributed heterogeneous environment, where, to a much greater extent than today, network-based services are invoked to obtain resources. Among the rewards of implementing the services-based view are that it insulates the user from much of the complexity of our multi-platform, networked, computer and storage environment and hides its diverse underlying implementation details. In this paper, we will describe one of the fundamental services being built in our envisioned infrastructure; a global, distributed archive with near-real-time access characteristics. Our approach for adapting mass storage services to this infrastructure will become clear as the service is discussed.

  12. Lattice QCD Calculations in Nuclear Physics towards the Exascale

    NASA Astrophysics Data System (ADS)

    Joo, Balint

    2017-01-01

    The combination of algorithmic advances and new highly parallel computing architectures are enabling lattice QCD calculations to tackle ever more complex problems in nuclear physics. In this talk I will review some computational challenges that are encountered in large scale cold nuclear physics campaigns such as those in hadron spectroscopy calculations. I will discuss progress in addressing these with algorithmic improvements such as multi-grid solvers and software for recent hardware architectures such as GPUs and Intel Xeon Phi, Knights Landing. Finally, I will highlight some current topics for research and development as we head towards the Exascale era This material is funded by the U.S. Department of Energy, Office Of Science, Offices of Nuclear Physics, High Energy Physics and Advanced Scientific Computing Research, as well as the Office of Nuclear Physics under contract DE-AC05-06OR23177.

  13. Supporting Undergraduate Computer Architecture Students Using a Visual MIPS64 CPU Simulator

    ERIC Educational Resources Information Center

    Patti, D.; Spadaccini, A.; Palesi, M.; Fazzino, F.; Catania, V.

    2012-01-01

    The topics of computer architecture are always taught using an Assembly dialect as an example. The most commonly used textbooks in this field use the MIPS64 Instruction Set Architecture (ISA) to help students in learning the fundamentals of computer architecture because of its orthogonality and its suitability for real-world applications. This…

  14. Memristor-Based Computing Architecture: Design Methodologies and Circuit Techniques

    DTIC Science & Technology

    2013-03-01

    MEMRISTOR-BASED COMPUTING ARCHITECTURE : DESIGN METHODOLOGIES AND CIRCUIT TECHNIQUES POLYTECHNIC INSTITUTE OF NEW YORK UNIVERSITY...TECHNICAL REPORT 3. DATES COVERED (From - To) OCT 2010 – OCT 2012 4. TITLE AND SUBTITLE MEMRISTOR-BASED COMPUTING ARCHITECTURE : DESIGN METHODOLOGIES...schemes for a memristor-based reconfigurable architecture design have not been fully explored yet. Therefore, in this project, we investigated

  15. Cooperating knowledge-based systems

    NASA Technical Reports Server (NTRS)

    Feigenbaum, Edward A.; Buchanan, Bruce G.

    1988-01-01

    This final report covers work performed under Contract NCC2-220 between NASA Ames Research Center and the Knowledge Systems Laboratory, Stanford University. The period of research was from March 1, 1987 to February 29, 1988. Topics covered were as follows: (1) concurrent architectures for knowledge-based systems; (2) methods for the solution of geometric constraint satisfaction problems, and (3) reasoning under uncertainty. The research in concurrent architectures was co-funded by DARPA, as part of that agency's Strategic Computing Program. The research has been in progress since 1985, under DARPA and NASA sponsorship. The research in geometric constraint satisfaction has been done in the context of a particular application, that of determining the 3-D structure of complex protein molecules, using the constraints inferred from NMR measurements.

  16. An Evaluation of an Ada Implementation of the Rete Algorithm for Embedded Flight Processors

    DTIC Science & Technology

    1990-12-01

    computers was desired. The VAX VMS operating system has many built-in methods for determining program performance (including VAX PCA), but these methods... overviev , of the target environment-- the MIL-STD-1750A VHSIC Avionic Modular Processor ( VA.IP, running under the Ada Avionics Real-Time Software (AARTS... computers . Mil-STD-1750A, the Air Force’s standard flight computer architecture, however, places severe constraints on applications software processing

  17. PICSiP: new system-in-package technology using a high bandwidth photonic interconnection layer for converged microsystems

    NASA Astrophysics Data System (ADS)

    Tekin, Tolga; Töpper, Michael; Reichl, Herbert

    2009-05-01

    Technological frontiers between semiconductor technology, packaging, and system design are disappearing. Scaling down geometries [1] alone does not provide improvement of performance, less power, smaller size, and lower cost. It will require "More than Moore" [2] through the tighter integration of system level components at the package level. System-in-Package (SiP) will deliver the efficient use of three dimensions (3D) through innovation in packaging and interconnect technology. A key bottleneck to the implementation of high-performance microelectronic systems, including SiP, is the lack of lowlatency, high-bandwidth, and high density off-chip interconnects. Some of the challenges in achieving high-bandwidth chip-to-chip communication using electrical interconnects include the high losses in the substrate dielectric, reflections and impedance discontinuities, and susceptibility to crosstalk [3]. Obviously, the incentive for the use of photonics to overcome the challenges and leverage low-latency and highbandwidth communication will enable the vision of optical computing within next generation architectures. Supercomputers of today offer sustained performance of more than petaflops, which can be increased by utilizing optical interconnects. Next generation computing architectures are needed with ultra low power consumption; ultra high performance with novel interconnection technologies. In this paper we will discuss a CMOS compatible underlying technology to enable next generation optical computing architectures. By introducing a new optical layer within the 3D SiP, the development of converged microsystems, deployment for next generation optical computing architecture will be leveraged.

  18. Brain architecture: a design for natural computation.

    PubMed

    Kaiser, Marcus

    2007-12-15

    Fifty years ago, John von Neumann compared the architecture of the brain with that of the computers he invented and which are still in use today. In those days, the organization of computers was based on concepts of brain organization. Here, we give an update on current results on the global organization of neural systems. For neural systems, we outline how the spatial and topological architecture of neuronal and cortical networks facilitates robustness against failures, fast processing and balanced network activation. Finally, we discuss mechanisms of self-organization for such architectures. After all, the organization of the brain might again inspire computer architecture.

  19. System and Propagation Availability Analysis for NASA's Advanced Air Transportation Technologies

    NASA Technical Reports Server (NTRS)

    Ugweje, Okechukwu C.

    2000-01-01

    This report summarizes the research on the System and Propagation Availability Analysis for NASA's project on Advanced Air Transportation Technologies (AATT). The objectives of the project were to determine the communication systems requirements and architecture, and to investigate the effect of propagation on the transmission of space information. In this report, results from the first year investigation are presented and limitations are highlighted. To study the propagation links, an understanding of the total system architecture is necessary since the links form the major component of the overall architecture. This study was conducted by way of analysis, modeling and simulation on the system communication links. The overall goals was to develop an understanding of the space communication requirements relevant to the AATT project, and then analyze the links taking into consideration system availability under adverse atmospheric weather conditions. This project began with a preliminary study of the end-to-end system architecture by modeling a representative communication system in MATLAB SIMULINK. Based on the defining concepts, the possibility of computer modeling was determined. The investigations continue with the parametric studies of the communication system architecture. These studies were also carried out with SIMULINK modeling and simulation. After a series of modifications, two end-to-end communication links were identified as the most probable models for the communication architecture. Link budget calculations were then performed in MATHCAD and MATLAB for the identified communication scenarios. A remarkable outcome of this project is the development of a graphic user interface (GUI) program for the computation of the link budget parameters in real time. Using this program, one can interactively compute the link budget requirements after supplying a few necessary parameters. It provides a framework for the eventual automation of several computations required in many experimental NASA missions. For the first year of this project, most of the stated objectives were accomplished. We were able to identify probable communication systems architectures, model and analyze several communication links, perform numerous simulation on different system models, and then develop a program for the link budget analysis. However, most of the work is still unfinished. The effect of propagation on the transmission of information in the identified communication channels has not been performed. Propagation effects cannot be studied until the system under consideration is identified and characterized. To study the propagation links, an understanding of the total communications architecture is necessary. It is important to mention that the original project was intended for two years and the results presented here are only for the first year of research. It is prudent therefore that these efforts be continued in order to obtain a complete picture of the system and propagation availability requirements.

  20. On implementation of DCTCP on three-tier and fat-tree data center network topologies.

    PubMed

    Zafar, Saima; Bashir, Abeer; Chaudhry, Shafique Ahmad

    2016-01-01

    A data center is a facility for housing computational and storage systems interconnected through a communication network called data center network (DCN). Due to a tremendous growth in the computational power, storage capacity and the number of inter-connected servers, the DCN faces challenges concerning efficiency, reliability and scalability. Although transmission control protocol (TCP) is a time-tested transport protocol in the Internet, DCN challenges such as inadequate buffer space in switches and bandwidth limitations have prompted the researchers to propose techniques to improve TCP performance or design new transport protocols for DCN. Data center TCP (DCTCP) emerge as one of the most promising solutions in this domain which employs the explicit congestion notification feature of TCP to enhance the TCP congestion control algorithm. While DCTCP has been analyzed for two-tier tree-based DCN topology for traffic between servers in the same rack which is common in cloud applications, it remains oblivious to the traffic patterns common in university and private enterprise networks which traverse the complete network interconnect spanning upper tier layers. We also recognize that DCTCP performance cannot remain unaffected by the underlying DCN architecture hence there is a need to test and compare DCTCP performance when implemented over diverse DCN architectures. Some of the most notable DCN architectures are the legacy three-tier, fat-tree, BCube, DCell, VL2, and CamCube. In this research, we simulate the two switch-centric DCN architectures; the widely deployed legacy three-tier architecture and the promising fat-tree architecture using network simulator and analyze the performance of DCTCP in terms of throughput and delay for realistic traffic patterns. We also examine how DCTCP prevents incast and outcast congestion when realistic DCN traffic patterns are employed in above mentioned topologies. Our results show that the underlying DCN architecture significantly impacts DCTCP performance. We find that DCTCP gives optimal performance in fat-tree topology and is most suitable for large networks.

  1. A new software-based architecture for quantum computer

    NASA Astrophysics Data System (ADS)

    Wu, Nan; Song, FangMin; Li, Xiangdong

    2010-04-01

    In this paper, we study a reliable architecture of a quantum computer and a new instruction set and machine language for the architecture, which can improve the performance and reduce the cost of the quantum computing. We also try to address some key issues in detail in the software-driven universal quantum computers.

  2. Development of iterative techniques for the solution of unsteady compressible viscous flows

    NASA Technical Reports Server (NTRS)

    Sankar, Lakshmi; Hixon, Duane

    1993-01-01

    The work done under this project was documented in detail as the Ph. D. dissertation of Dr. Duane Hixon. The objectives of the research project were evaluation of the generalized minimum residual method (GMRES) as a tool for accelerating 2-D and 3-D unsteady flows and evaluation of the suitability of the GMRES algorithm for unsteady flows, computed on parallel computer architectures.

  3. Architectures for single-chip image computing

    NASA Astrophysics Data System (ADS)

    Gove, Robert J.

    1992-04-01

    This paper will focus on the architectures of VLSI programmable processing components for image computing applications. TI, the maker of industry-leading RISC, DSP, and graphics components, has developed an architecture for a new-generation of image processors capable of implementing a plurality of image, graphics, video, and audio computing functions. We will show that the use of a single-chip heterogeneous MIMD parallel architecture best suits this class of processors--those which will dominate the desktop multimedia, document imaging, computer graphics, and visualization systems of this decade.

  4. Transitioning ISR architecture into the cloud

    NASA Astrophysics Data System (ADS)

    Lash, Thomas D.

    2012-06-01

    Emerging cloud computing platforms offer an ideal opportunity for Intelligence, Surveillance, and Reconnaissance (ISR) intelligence analysis. Cloud computing platforms help overcome challenges and limitations of traditional ISR architectures. Modern ISR architectures can benefit from examining commercial cloud applications, especially as they relate to user experience, usage profiling, and transformational business models. This paper outlines legacy ISR architectures and their limitations, presents an overview of cloud technologies and their applications to the ISR intelligence mission, and presents an idealized ISR architecture implemented with cloud computing.

  5. Architecture-Adaptive Computing Environment: A Tool for Teaching Parallel Programming

    NASA Technical Reports Server (NTRS)

    Dorband, John E.; Aburdene, Maurice F.

    2002-01-01

    Recently, networked and cluster computation have become very popular. This paper is an introduction to a new C based parallel language for architecture-adaptive programming, aCe C. The primary purpose of aCe (Architecture-adaptive Computing Environment) is to encourage programmers to implement applications on parallel architectures by providing them the assurance that future architectures will be able to run their applications with a minimum of modification. A secondary purpose is to encourage computer architects to develop new types of architectures by providing an easily implemented software development environment and a library of test applications. This new language should be an ideal tool to teach parallel programming. In this paper, we will focus on some fundamental features of aCe C.

  6. Toward a Fault Tolerant Architecture for Vital Medical-Based Wearable Computing.

    PubMed

    Abdali-Mohammadi, Fardin; Bajalan, Vahid; Fathi, Abdolhossein

    2015-12-01

    Advancements in computers and electronic technologies have led to the emergence of a new generation of efficient small intelligent systems. The products of such technologies might include Smartphones and wearable devices, which have attracted the attention of medical applications. These products are used less in critical medical applications because of their resource constraint and failure sensitivity. This is due to the fact that without safety considerations, small-integrated hardware will endanger patients' lives. Therefore, proposing some principals is required to construct wearable systems in healthcare so that the existing concerns are dealt with. Accordingly, this paper proposes an architecture for constructing wearable systems in critical medical applications. The proposed architecture is a three-tier one, supporting data flow from body sensors to cloud. The tiers of this architecture include wearable computers, mobile computing, and mobile cloud computing. One of the features of this architecture is its high possible fault tolerance due to the nature of its components. Moreover, the required protocols are presented to coordinate the components of this architecture. Finally, the reliability of this architecture is assessed by simulating the architecture and its components, and other aspects of the proposed architecture are discussed.

  7. Analyzing Resiliency of the Smart Grid Communication Architectures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Anas AlMajali, Anas; Viswanathan, Arun; Neuman, Clifford

    Smart grids are susceptible to cyber-attack as a result of new communication, control and computation techniques employed in the grid. In this paper, we characterize and analyze the resiliency of smart grid communication architecture, specifically an RF mesh based architecture, under cyber attacks. We analyze the resiliency of the communication architecture by studying the performance of high-level smart grid functions such as metering, and demand response which depend on communication. Disrupting the operation of these functions impacts the operational resiliency of the smart grid. Our analysis shows that it takes an attacker only a small fraction of meters to compromisemore » the communication resiliency of the smart grid. We discuss the implications of our result to critical smart grid functions and to the overall security of the smart grid.« less

  8. An event-based architecture for solving constraint satisfaction problems

    PubMed Central

    Mostafa, Hesham; Müller, Lorenz K.; Indiveri, Giacomo

    2015-01-01

    Constraint satisfaction problems are ubiquitous in many domains. They are typically solved using conventional digital computing architectures that do not reflect the distributed nature of many of these problems, and are thus ill-suited for solving them. Here we present a parallel analogue/digital hardware architecture specifically designed to solve such problems. We cast constraint satisfaction problems as networks of stereotyped nodes that communicate using digital pulses, or events. Each node contains an oscillator implemented using analogue circuits. The non-repeating phase relations among the oscillators drive the exploration of the solution space. We show that this hardware architecture can yield state-of-the-art performance on random SAT problems under reasonable assumptions on the implementation. We present measurements from a prototype electronic chip to demonstrate that a physical implementation of the proposed architecture is robust to practical non-idealities and to validate the theory proposed. PMID:26642827

  9. Advanced computer architecture specification for automated weld systems

    NASA Technical Reports Server (NTRS)

    Katsinis, Constantine

    1994-01-01

    This report describes the requirements for an advanced automated weld system and the associated computer architecture, and defines the overall system specification from a broad perspective. According to the requirements of welding procedures as they relate to an integrated multiaxis motion control and sensor architecture, the computer system requirements are developed based on a proven multiple-processor architecture with an expandable, distributed-memory, single global bus architecture, containing individual processors which are assigned to specific tasks that support sensor or control processes. The specified architecture is sufficiently flexible to integrate previously developed equipment, be upgradable and allow on-site modifications.

  10. Quantum Computing Architectural Design

    NASA Astrophysics Data System (ADS)

    West, Jacob; Simms, Geoffrey; Gyure, Mark

    2006-03-01

    Large scale quantum computers will invariably require scalable architectures in addition to high fidelity gate operations. Quantum computing architectural design (QCAD) addresses the problems of actually implementing fault-tolerant algorithms given physical and architectural constraints beyond those of basic gate-level fidelity. Here we introduce a unified framework for QCAD that enables the scientist to study the impact of varying error correction schemes, architectural parameters including layout and scheduling, and physical operations native to a given architecture. Our software package, aptly named QCAD, provides compilation, manipulation/transformation, multi-paradigm simulation, and visualization tools. We demonstrate various features of the QCAD software package through several examples.

  11. Recursive computer architecture for VLSI

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Treleaven, P.C.; Hopkins, R.P.

    1982-01-01

    A general-purpose computer architecture based on the concept of recursion and suitable for VLSI computer systems built from replicated (lego-like) computing elements is presented. The recursive computer architecture is defined by presenting a program organisation, a machine organisation and an experimental machine implementation oriented to VLSI. The experimental implementation is being restricted to simple, identical microcomputers each containing a memory, a processor and a communications capability. This future generation of lego-like computer systems are termed fifth generation computers by the Japanese. 30 references.

  12. Hypercluster Parallel Processor

    NASA Technical Reports Server (NTRS)

    Blech, Richard A.; Cole, Gary L.; Milner, Edward J.; Quealy, Angela

    1992-01-01

    Hypercluster computer system includes multiple digital processors, operation of which coordinated through specialized software. Configurable according to various parallel-computing architectures of shared-memory or distributed-memory class, including scalar computer, vector computer, reduced-instruction-set computer, and complex-instruction-set computer. Designed as flexible, relatively inexpensive system that provides single programming and operating environment within which one can investigate effects of various parallel-computing architectures and combinations on performance in solution of complicated problems like those of three-dimensional flows in turbomachines. Hypercluster software and architectural concepts are in public domain.

  13. Implementing Scientific Simulation Codes Highly Tailored for Vector Architectures Using Custom Configurable Computing Machines

    NASA Technical Reports Server (NTRS)

    Rutishauser, David

    2006-01-01

    The motivation for this work comes from an observation that amidst the push for Massively Parallel (MP) solutions to high-end computing problems such as numerical physical simulations, large amounts of legacy code exist that are highly optimized for vector supercomputers. Because re-hosting legacy code often requires a complete re-write of the original code, which can be a very long and expensive effort, this work examines the potential to exploit reconfigurable computing machines in place of a vector supercomputer to implement an essentially unmodified legacy source code. Custom and reconfigurable computing resources could be used to emulate an original application's target platform to the extent required to achieve high performance. To arrive at an architecture that delivers the desired performance subject to limited resources involves solving a multi-variable optimization problem with constraints. Prior research in the area of reconfigurable computing has demonstrated that designing an optimum hardware implementation of a given application under hardware resource constraints is an NP-complete problem. The premise of the approach is that the general issue of applying reconfigurable computing resources to the implementation of an application, maximizing the performance of the computation subject to physical resource constraints, can be made a tractable problem by assuming a computational paradigm, such as vector processing. This research contributes a formulation of the problem and a methodology to design a reconfigurable vector processing implementation of a given application that satisfies a performance metric. A generic, parametric, architectural framework for vector processing implemented in reconfigurable logic is developed as a target for a scheduling/mapping algorithm that maps an input computation to a given instance of the architecture. This algorithm is integrated with an optimization framework to arrive at a specification of the architecture parameters that attempts to minimize execution time, while staying within resource constraints. The flexibility of using a custom reconfigurable implementation is exploited in a unique manner to leverage the lessons learned in vector supercomputer development. The vector processing framework is tailored to the application, with variable parameters that are fixed in traditional vector processing. Benchmark data that demonstrates the functionality and utility of the approach is presented. The benchmark data includes an identified bottleneck in a real case study example vector code, the NASA Langley Terminal Area Simulation System (TASS) application.

  14. High-performance computing with quantum processing units

    DOE PAGES

    Britt, Keith A.; Oak Ridge National Lab.; Humble, Travis S.; ...

    2017-03-01

    The prospects of quantum computing have driven efforts to realize fully functional quantum processing units (QPUs). Recent success in developing proof-of-principle QPUs has prompted the question of how to integrate these emerging processors into modern high-performance computing (HPC) systems. We examine how QPUs can be integrated into current and future HPC system architectures by accounting for func- tional and physical design requirements. We identify two integration pathways that are differentiated by infrastructure constraints on the QPU and the use cases expected for the HPC system. This includes a tight integration that assumes infrastructure bottlenecks can be overcome as well asmore » a loose integration that as- sumes they cannot. We find that the performance of both approaches is likely to depend on the quantum interconnect that serves to entangle multiple QPUs. As a result, we also identify several challenges in assessing QPU performance for HPC, and we consider new metrics that capture the interplay between system architecture and the quantum parallelism underlying computational performance.« less

  15. Convolutional networks for fast, energy-efficient neuromorphic computing

    PubMed Central

    Esser, Steven K.; Merolla, Paul A.; Arthur, John V.; Cassidy, Andrew S.; Appuswamy, Rathinakumar; Andreopoulos, Alexander; Berg, David J.; McKinstry, Jeffrey L.; Melano, Timothy; Barch, Davis R.; di Nolfo, Carmelo; Datta, Pallab; Amir, Arnon; Taba, Brian; Flickner, Myron D.; Modha, Dharmendra S.

    2016-01-01

    Deep networks are now able to achieve human-level performance on a broad spectrum of recognition tasks. Independently, neuromorphic computing has now demonstrated unprecedented energy-efficiency through a new chip architecture based on spiking neurons, low precision synapses, and a scalable communication network. Here, we demonstrate that neuromorphic computing, despite its novel architectural primitives, can implement deep convolution networks that (i) approach state-of-the-art classification accuracy across eight standard datasets encompassing vision and speech, (ii) perform inference while preserving the hardware’s underlying energy-efficiency and high throughput, running on the aforementioned datasets at between 1,200 and 2,600 frames/s and using between 25 and 275 mW (effectively >6,000 frames/s per Watt), and (iii) can be specified and trained using backpropagation with the same ease-of-use as contemporary deep learning. This approach allows the algorithmic power of deep learning to be merged with the efficiency of neuromorphic processors, bringing the promise of embedded, intelligent, brain-inspired computing one step closer. PMID:27651489

  16. Convolutional networks for fast, energy-efficient neuromorphic computing.

    PubMed

    Esser, Steven K; Merolla, Paul A; Arthur, John V; Cassidy, Andrew S; Appuswamy, Rathinakumar; Andreopoulos, Alexander; Berg, David J; McKinstry, Jeffrey L; Melano, Timothy; Barch, Davis R; di Nolfo, Carmelo; Datta, Pallab; Amir, Arnon; Taba, Brian; Flickner, Myron D; Modha, Dharmendra S

    2016-10-11

    Deep networks are now able to achieve human-level performance on a broad spectrum of recognition tasks. Independently, neuromorphic computing has now demonstrated unprecedented energy-efficiency through a new chip architecture based on spiking neurons, low precision synapses, and a scalable communication network. Here, we demonstrate that neuromorphic computing, despite its novel architectural primitives, can implement deep convolution networks that (i) approach state-of-the-art classification accuracy across eight standard datasets encompassing vision and speech, (ii) perform inference while preserving the hardware's underlying energy-efficiency and high throughput, running on the aforementioned datasets at between 1,200 and 2,600 frames/s and using between 25 and 275 mW (effectively >6,000 frames/s per Watt), and (iii) can be specified and trained using backpropagation with the same ease-of-use as contemporary deep learning. This approach allows the algorithmic power of deep learning to be merged with the efficiency of neuromorphic processors, bringing the promise of embedded, intelligent, brain-inspired computing one step closer.

  17. High-performance computing with quantum processing units

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Britt, Keith A.; Oak Ridge National Lab.; Humble, Travis S.

    The prospects of quantum computing have driven efforts to realize fully functional quantum processing units (QPUs). Recent success in developing proof-of-principle QPUs has prompted the question of how to integrate these emerging processors into modern high-performance computing (HPC) systems. We examine how QPUs can be integrated into current and future HPC system architectures by accounting for func- tional and physical design requirements. We identify two integration pathways that are differentiated by infrastructure constraints on the QPU and the use cases expected for the HPC system. This includes a tight integration that assumes infrastructure bottlenecks can be overcome as well asmore » a loose integration that as- sumes they cannot. We find that the performance of both approaches is likely to depend on the quantum interconnect that serves to entangle multiple QPUs. As a result, we also identify several challenges in assessing QPU performance for HPC, and we consider new metrics that capture the interplay between system architecture and the quantum parallelism underlying computational performance.« less

  18. Approximation algorithms for planning and control

    NASA Technical Reports Server (NTRS)

    Boddy, Mark; Dean, Thomas

    1989-01-01

    A control system operating in a complex environment will encounter a variety of different situations, with varying amounts of time available to respond to critical events. Ideally, such a control system will do the best possible with the time available. In other words, its responses should approximate those that would result from having unlimited time for computation, where the degree of the approximation depends on the amount of time it actually has. There exist approximation algorithms for a wide variety of problems. Unfortunately, the solution to any reasonably complex control problem will require solving several computationally intensive problems. Algorithms for successive approximation are a subclass of the class of anytime algorithms, algorithms that return answers for any amount of computation time, where the answers improve as more time is allotted. An architecture is described for allocating computation time to a set of anytime algorithms, based on expectations regarding the value of the answers they return. The architecture described is quite general, producing optimal schedules for a set of algorithms under widely varying conditions.

  19. Modeling aspects of human memory for scientific study.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Caudell, Thomas P.; Watson, Patrick; McDaniel, Mark A.

    Working with leading experts in the field of cognitive neuroscience and computational intelligence, SNL has developed a computational architecture that represents neurocognitive mechanisms associated with how humans remember experiences in their past. The architecture represents how knowledge is organized and updated through information from individual experiences (episodes) via the cortical-hippocampal declarative memory system. We compared the simulated behavioral characteristics with those of humans measured under well established experimental standards, controlling for unmodeled aspects of human processing, such as perception. We used this knowledge to create robust simulations of & human memory behaviors that should help move the scientific community closermore » to understanding how humans remember information. These behaviors were experimentally validated against actual human subjects, which was published. An important outcome of the validation process will be the joining of specific experimental testing procedures from the field of neuroscience with computational representations from the field of cognitive modeling and simulation.« less

  20. VLSI implementation of a new LMS-based algorithm for noise removal in ECG signal

    NASA Astrophysics Data System (ADS)

    Satheeskumaran, S.; Sabrigiriraj, M.

    2016-06-01

    Least mean square (LMS)-based adaptive filters are widely deployed for removing artefacts in electrocardiogram (ECG) due to less number of computations. But they posses high mean square error (MSE) under noisy environment. The transform domain variable step-size LMS algorithm reduces the MSE at the cost of computational complexity. In this paper, a variable step-size delayed LMS adaptive filter is used to remove the artefacts from the ECG signal for improved feature extraction. The dedicated digital Signal processors provide fast processing, but they are not flexible. By using field programmable gate arrays, the pipelined architectures can be used to enhance the system performance. The pipelined architecture can enhance the operation efficiency of the adaptive filter and save the power consumption. This technique provides high signal-to-noise ratio and low MSE with reduced computational complexity; hence, it is a useful method for monitoring patients with heart-related problem.

  1. VLSI processors for signal detection in SETI

    NASA Technical Reports Server (NTRS)

    Duluk, J. F.; Linscott, I. R.; Peterson, A. M.; Burr, J.; Ekroot, B.; Twicken, J.

    1989-01-01

    The objective of the Search for Extraterrestrial Intelligence (SETI) is to locate an artificially created signal coming from a distant star. This is done in two steps: (1) spectral analysis of an incoming radio frequency band, and (2) pattern detection for narrow-band signals. Both steps are computationally expensive and require the development of specially designed computer architectures. To reduce the size and cost of the SETI signal detection machine, two custom VLSI chips are under development. The first chip, the SETI DSP Engine, is used in the spectrum analyzer and is specially designed to compute Discrete Fourier Transforms (DFTs). It is a high-speed arithmetic processor that has two adders, one multiplier-accumulator, and three four-port memories. The second chip is a new type of Content-Addressable Memory. It is the heart of an associative processor that is used for pattern detection. Both chips incorporate many innovative circuits and architectural features.

  2. VLSI processors for signal detection in SETI.

    PubMed

    Duluk, J F; Linscott, I R; Peterson, A M; Burr, J; Ekroot, B; Twicken, J

    1989-01-01

    The objective of the Search for Extraterrestrial Intelligence (SETI) is to locate an artificially created signal coming from a distant star. This is done in two steps: (1) spectral analysis of an incoming radio frequency band, and (2) pattern detection for narrow-band signals. Both steps are computationally expensive and require the development of specially designed computer architectures. To reduce the size and cost of the SETI signal detection machine, two custom VLSI chips are under development. The first chip, the SETI DSP Engine, is used in the spectrum analyzer and is specially designed to compute Discrete Fourier Transforms (DFTs). It is a high-speed arithmetic processor that has two adders, one multiplier-accumulator, and three four-port memories. The second chip is a new type of Content-Addressable Memory. It is the heart of an associative processor that is used for pattern detection. Both chips incorporate many innovative circuits and architectural features.

  3. Distributed Computing Architecture for Image-Based Wavefront Sensing and 2 D FFTs

    NASA Technical Reports Server (NTRS)

    Smith, Jeffrey S.; Dean, Bruce H.; Haghani, Shadan

    2006-01-01

    Image-based wavefront sensing (WFS) provides significant advantages over interferometric-based wavefi-ont sensors such as optical design simplicity and stability. However, the image-based approach is computational intensive, and therefore, specialized high-performance computing architectures are required in applications utilizing the image-based approach. The development and testing of these high-performance computing architectures are essential to such missions as James Webb Space Telescope (JWST), Terrestial Planet Finder-Coronagraph (TPF-C and CorSpec), and Spherical Primary Optical Telescope (SPOT). The development of these specialized computing architectures require numerous two-dimensional Fourier Transforms, which necessitate an all-to-all communication when applied on a distributed computational architecture. Several solutions for distributed computing are presented with an emphasis on a 64 Node cluster of DSPs, multiple DSP FPGAs, and an application of low-diameter graph theory. Timing results and performance analysis will be presented. The solutions offered could be applied to other all-to-all communication and scientifically computationally complex problems.

  4. SpaceWire- Based Control System Architecture for the Lightweight Advanced Robotic Arm Demonstrator [LARAD

    NASA Astrophysics Data System (ADS)

    Rucinski, Marek; Coates, Adam; Montano, Giuseppe; Allouis, Elie; Jameux, David

    2015-09-01

    The Lightweight Advanced Robotic Arm Demonstrator (LARAD) is a state-of-the-art, two-meter long robotic arm for planetary surface exploration currently being developed by a UK consortium led by Airbus Defence and Space Ltd under contract to the UK Space Agency (CREST-2 programme). LARAD has a modular design, which allows for experimentation with different electronics and control software. The control system architecture includes the on-board computer, control software and firmware, and the communication infrastructure (e.g. data links, switches) connecting on-board computer(s), sensors, actuators and the end-effector. The purpose of the control system is to operate the arm according to pre-defined performance requirements, monitoring its behaviour in real-time and performing safing/recovery actions in case of faults. This paper reports on the results of a recent study about the feasibility of the development and integration of a novel control system architecture for LARAD fully based on the SpaceWire protocol. The current control system architecture is based on the combination of two communication protocols, Ethernet and CAN. The new SpaceWire-based control system will allow for improved monitoring and telecommanding performance thanks to higher communication data rate, allowing for the adoption of advanced control schemes, potentially based on multiple vision sensors, and for the handling of sophisticated end-effectors that require fine control, such as science payloads or robotic hands.

  5. Analysis OpenMP performance of AMD and Intel architecture for breaking waves simulation using MPS

    NASA Astrophysics Data System (ADS)

    Alamsyah, M. N. A.; Utomo, A.; Gunawan, P. H.

    2018-03-01

    Simulation of breaking waves by using Navier-Stokes equation via moving particle semi-implicit method (MPS) over close domain is given. The results show the parallel computing on multicore architecture using OpenMP platform can reduce the computational time almost half of the serial time. Here, the comparison using two computer architectures (AMD and Intel) are performed. The results using Intel architecture is shown better than AMD architecture in CPU time. However, in efficiency, the computer with AMD architecture gives slightly higher than the Intel. For the simulation by 1512 number of particles, the CPU time using Intel and AMD are 12662.47 and 28282.30 respectively. Moreover, the efficiency using similar number of particles, AMD obtains 50.09 % and Intel up to 49.42 %.

  6. Stochastic Spiking Neural Networks Enabled by Magnetic Tunnel Junctions: From Nontelegraphic to Telegraphic Switching Regimes

    NASA Astrophysics Data System (ADS)

    Liyanagedera, Chamika M.; Sengupta, Abhronil; Jaiswal, Akhilesh; Roy, Kaushik

    2017-12-01

    Stochastic spiking neural networks based on nanoelectronic spin devices can be a possible pathway to achieving "brainlike" compact and energy-efficient cognitive intelligence. The computational model attempt to exploit the intrinsic device stochasticity of nanoelectronic synaptic or neural components to perform learning or inference. However, there has been limited analysis on the scaling effect of stochastic spin devices and its impact on the operation of such stochastic networks at the system level. This work attempts to explore the design space and analyze the performance of nanomagnet-based stochastic neuromorphic computing architectures for magnets with different barrier heights. We illustrate how the underlying network architecture must be modified to account for the random telegraphic switching behavior displayed by magnets with low barrier heights as they are scaled into the superparamagnetic regime. We perform a device-to-system-level analysis on a deep neural-network architecture for a digit-recognition problem on the MNIST data set.

  7. GPU-completeness: theory and implications

    NASA Astrophysics Data System (ADS)

    Lin, I.-Jong

    2011-01-01

    This paper formalizes a major insight into a class of algorithms that relate parallelism and performance. The purpose of this paper is to define a class of algorithms that trades off parallelism for quality of result (e.g. visual quality, compression rate), and we propose a similar method for algorithmic classification based on NP-Completeness techniques, applied toward parallel acceleration. We will define this class of algorithm as "GPU-Complete" and will postulate the necessary properties of the algorithms for admission into this class. We will also formally relate his algorithmic space and imaging algorithms space. This concept is based upon our experience in the print production area where GPUs (Graphic Processing Units) have shown a substantial cost/performance advantage within the context of HPdelivered enterprise services and commercial printing infrastructure. While CPUs and GPUs are converging in their underlying hardware and functional blocks, their system behaviors are clearly distinct in many ways: memory system design, programming paradigms, and massively parallel SIMD architecture. There are applications that are clearly suited to each architecture: for CPU: language compilation, word processing, operating systems, and other applications that are highly sequential in nature; for GPU: video rendering, particle simulation, pixel color conversion, and other problems clearly amenable to massive parallelization. While GPUs establishing themselves as a second, distinct computing architecture from CPUs, their end-to-end system cost/performance advantage in certain parts of computation inform the structure of algorithms and their efficient parallel implementations. While GPUs are merely one type of architecture for parallelization, we show that their introduction into the design space of printing systems demonstrate the trade-offs against competing multi-core, FPGA, and ASIC architectures. While each architecture has its own optimal application, we believe that the selection of architecture can be defined in terms of properties of GPU-Completeness. For a welldefined subset of algorithms, GPU-Completeness is intended to connect the parallelism, algorithms and efficient architectures into a unified framework to show that multiple layers of parallel implementation are guided by the same underlying trade-off.

  8. A synchronized computational architecture for generalized bilateral control of robot arms

    NASA Technical Reports Server (NTRS)

    Bejczy, Antal K.; Szakaly, Zoltan

    1987-01-01

    This paper describes a computational architecture for an interconnected high speed distributed computing system for generalized bilateral control of robot arms. The key method of the architecture is the use of fully synchronized, interrupt driven software. Since an objective of the development is to utilize the processing resources efficiently, the synchronization is done in the hardware level to reduce system software overhead. The architecture also achieves a balaced load on the communication channel. The paper also describes some architectural relations to trading or sharing manual and automatic control.

  9. Performance Analysis of Cloud Computing Architectures Using Discrete Event Simulation

    NASA Technical Reports Server (NTRS)

    Stocker, John C.; Golomb, Andrew M.

    2011-01-01

    Cloud computing offers the economic benefit of on-demand resource allocation to meet changing enterprise computing needs. However, the flexibility of cloud computing is disadvantaged when compared to traditional hosting in providing predictable application and service performance. Cloud computing relies on resource scheduling in a virtualized network-centric server environment, which makes static performance analysis infeasible. We developed a discrete event simulation model to evaluate the overall effectiveness of organizations in executing their workflow in traditional and cloud computing architectures. The two part model framework characterizes both the demand using a probability distribution for each type of service request as well as enterprise computing resource constraints. Our simulations provide quantitative analysis to design and provision computing architectures that maximize overall mission effectiveness. We share our analysis of key resource constraints in cloud computing architectures and findings on the appropriateness of cloud computing in various applications.

  10. Architectural Specialization for Inter-Iteration Loop Dependence Patterns

    DTIC Science & Technology

    2015-10-01

    Architectural Specialization for Inter-Iteration Loop Dependence Patterns Christopher Batten Computer Systems Laboratory School of Electrical and...Trends in Computer Architecture Transistors (Thousands) Frequency (MHz) Typical Power (W) MIPS R2K Intel P4 DEC Alpha 21264 Data collected by M...T as ks p er Jo ule ) Simple Processor Design Power Constraint High-Performance Architectures Embedded Architectures Design Performance

  11. Manyscale Computing for Sensor Processing in Support of Space Situational Awareness

    NASA Astrophysics Data System (ADS)

    Schmalz, M.; Chapman, W.; Hayden, E.; Sahni, S.; Ranka, S.

    2014-09-01

    Increasing image and signal data burden associated with sensor data processing in support of space situational awareness implies continuing computational throughput growth beyond the petascale regime. In addition to growing applications data burden and diversity, the breadth, diversity and scalability of high performance computing architectures and their various organizations challenge the development of a single, unifying, practicable model of parallel computation. Therefore, models for scalable parallel processing have exploited architectural and structural idiosyncrasies, yielding potential misapplications when legacy programs are ported among such architectures. In response to this challenge, we have developed a concise, efficient computational paradigm and software called Manyscale Computing to facilitate efficient mapping of annotated application codes to heterogeneous parallel architectures. Our theory, algorithms, software, and experimental results support partitioning and scheduling of application codes for envisioned parallel architectures, in terms of work atoms that are mapped (for example) to threads or thread blocks on computational hardware. Because of the rigor, completeness, conciseness, and layered design of our manyscale approach, application-to-architecture mapping is feasible and scalable for architectures at petascales, exascales, and above. Further, our methodology is simple, relying primarily on a small set of primitive mapping operations and support routines that are readily implemented on modern parallel processors such as graphics processing units (GPUs) and hybrid multi-processors (HMPs). In this paper, we overview the opportunities and challenges of manyscale computing for image and signal processing in support of space situational awareness applications. We discuss applications in terms of a layered hardware architecture (laboratory > supercomputer > rack > processor > component hierarchy). Demonstration applications include performance analysis and results in terms of execution time as well as storage, power, and energy consumption for bus-connected and/or networked architectures. The feasibility of the manyscale paradigm is demonstrated by addressing four principal challenges: (1) architectural/structural diversity, parallelism, and locality, (2) masking of I/O and memory latencies, (3) scalability of design as well as implementation, and (4) efficient representation/expression of parallel applications. Examples will demonstrate how manyscale computing helps solve these challenges efficiently on real-world computing systems.

  12. HTMT-class Latency Tolerant Parallel Architecture for Petaflops Scale Computation

    NASA Technical Reports Server (NTRS)

    Sterling, Thomas; Bergman, Larry

    2000-01-01

    Computational Aero Sciences and other numeric intensive computation disciplines demand computing throughputs substantially greater than the Teraflops scale systems only now becoming available. The related fields of fluids, structures, thermal, combustion, and dynamic controls are among the interdisciplinary areas that in combination with sufficient resolution and advanced adaptive techniques may force performance requirements towards Petaflops. This will be especially true for compute intensive models such as Navier-Stokes are or when such system models are only part of a larger design optimization computation involving many design points. Yet recent experience with conventional MPP configurations comprising commodity processing and memory components has shown that larger scale frequently results in higher programming difficulty and lower system efficiency. While important advances in system software and algorithms techniques have had some impact on efficiency and programmability for certain classes of problems, in general it is unlikely that software alone will resolve the challenges to higher scalability. As in the past, future generations of high-end computers may require a combination of hardware architecture and system software advances to enable efficient operation at a Petaflops level. The NASA led HTMT project has engaged the talents of a broad interdisciplinary team to develop a new strategy in high-end system architecture to deliver petaflops scale computing in the 2004/5 timeframe. The Hybrid-Technology, MultiThreaded parallel computer architecture incorporates several advanced technologies in combination with an innovative dynamic adaptive scheduling mechanism to provide unprecedented performance and efficiency within practical constraints of cost, complexity, and power consumption. The emerging superconductor Rapid Single Flux Quantum electronics can operate at 100 GHz (the record is 770 GHz) and one percent of the power required by convention semiconductor logic. Wave Division Multiplexing optical communications can approach a peak per fiber bandwidth of 1 Tbps and the new Data Vortex network topology employing this technology can connect tens of thousands of ports providing a bi-section bandwidth on the order of a Petabyte per second with latencies well below 100 nanoseconds, even under heavy loads. Processor-in-Memory (PIM) technology combines logic and memory on the same chip exposing the internal bandwidth of the memory row buffers at low latency. And holographic storage photorefractive storage technologies provide high-density memory with access a thousand times faster than conventional disk technologies. Together these technologies enable a new class of shared memory system architecture with a peak performance in the range of a Petaflops but size and power requirements comparable to today's largest Teraflops scale systems. To achieve high-sustained performance, HTMT combines an advanced multithreading processor architecture with a memory-driven coarse-grained latency management strategy called "percolation", yielding high efficiency while reducing the much of the parallel programming burden. This paper will present the basic system architecture characteristics made possible through this series of advanced technologies and then give a detailed description of the new percolation approach to runtime latency management.

  13. RRAM-based parallel computing architecture using k-nearest neighbor classification for pattern recognition

    NASA Astrophysics Data System (ADS)

    Jiang, Yuning; Kang, Jinfeng; Wang, Xinan

    2017-03-01

    Resistive switching memory (RRAM) is considered as one of the most promising devices for parallel computing solutions that may overcome the von Neumann bottleneck of today’s electronic systems. However, the existing RRAM-based parallel computing architectures suffer from practical problems such as device variations and extra computing circuits. In this work, we propose a novel parallel computing architecture for pattern recognition by implementing k-nearest neighbor classification on metal-oxide RRAM crossbar arrays. Metal-oxide RRAM with gradual RESET behaviors is chosen as both the storage and computing components. The proposed architecture is tested by the MNIST database. High speed (~100 ns per example) and high recognition accuracy (97.05%) are obtained. The influence of several non-ideal device properties is also discussed, and it turns out that the proposed architecture shows great tolerance to device variations. This work paves a new way to achieve RRAM-based parallel computing hardware systems with high performance.

  14. Cognitive Architectures and Human-Computer Interaction. Introduction to Special Issue.

    ERIC Educational Resources Information Center

    Gray, Wayne D.; Young, Richard M.; Kirschenbaum, Susan S.

    1997-01-01

    In this introduction to a special issue on cognitive architectures and human-computer interaction (HCI), editors and contributors provide a brief overview of cognitive architectures. The following four architectures represented by articles in this issue are: Soar; LICAI (linked model of comprehension-based action planning and instruction taking);…

  15. Biomimetic design processes in architecture: morphogenetic and evolutionary computational design.

    PubMed

    Menges, Achim

    2012-03-01

    Design computation has profound impact on architectural design methods. This paper explains how computational design enables the development of biomimetic design processes specific to architecture, and how they need to be significantly different from established biomimetic processes in engineering disciplines. The paper first explains the fundamental difference between computer-aided and computational design in architecture, as the understanding of this distinction is of critical importance for the research presented. Thereafter, the conceptual relation and possible transfer of principles from natural morphogenesis to design computation are introduced and the related developments of generative, feature-based, constraint-based, process-based and feedback-based computational design methods are presented. This morphogenetic design research is then related to exploratory evolutionary computation, followed by the presentation of two case studies focusing on the exemplary development of spatial envelope morphologies and urban block morphologies.

  16. HpQTL: a geometric morphometric platform to compute the genetic architecture of heterophylly.

    PubMed

    Sun, Lidan; Wang, Jing; Zhu, Xuli; Jiang, Libo; Gosik, Kirk; Sang, Mengmeng; Sun, Fengsuo; Cheng, Tangren; Zhang, Qixiang; Wu, Rongling

    2017-02-15

    Heterophylly, i.e. morphological changes in leaves along the axis of an individual plant, is regarded as a strategy used by plants to cope with environmental change. However, little is known of the extent to which heterophylly is controlled by genes and how each underlying gene exerts its effect on heterophyllous variation. We described a geometric morphometric model that can quantify heterophylly in plants and further constructed an R-based computing platform by integrating this model into a genetic mapping and association setting. The platform, named HpQTL, allows specific quantitative trait loci mediating heterophyllous variation to be mapped throughout the genome. The statistical properties of HpQTL were examined and validated via computer simulation. Its biological relevance was demonstrated by results from a real data analysis of heterophylly in a wood plant, mei (Prunus mume). HpQTL provides a powerful tool to analyze heterophylly and its underlying genetic architecture in a quantitative manner. It also contributes a new approach for genome-wide association studies aimed to dissect the programmed regulation of plant development and evolution. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  17. The flight telerobotic servicer: From functional architecture to computer architecture

    NASA Technical Reports Server (NTRS)

    Lumia, Ronald; Fiala, John

    1989-01-01

    After a brief tutorial on the NASA/National Bureau of Standards Standard Reference Model for Telerobot Control System Architecture (NASREM) functional architecture, the approach to its implementation is shown. First, interfaces must be defined which are capable of supporting the known algorithms. This is illustrated by considering the interfaces required for the SERVO level of the NASREM functional architecture. After interface definition, the specific computer architecture for the implementation must be determined. This choice is obviously technology dependent. An example illustrating one possible mapping of the NASREM functional architecture to a particular set of computers which implements it is shown. The result of choosing the NASREM functional architecture is that it provides a technology independent paradigm which can be mapped into a technology dependent implementation capable of evolving with technology in the laboratory and in space.

  18. Innovative architectures for dense multi-microprocessor computers

    NASA Technical Reports Server (NTRS)

    Donaldson, Thomas; Doty, Karl; Engle, Steven W.; Larson, Robert E.; O'Reilly, John G.

    1988-01-01

    The results of a Phase I Small Business Innovative Research (SBIR) project performed for the NASA Langley Computational Structural Mechanics Group are described. The project resulted in the identification of a family of chordal-ring interconnection architectures with excellent potential to serve as the basis for new multimicroprocessor (MMP) computers. The paper presents examples of how computational algorithms from structural mechanics can be efficiently implemented on the chordal-ring architecture.

  19. Fault-tolerant linear optical quantum computing with small-amplitude coherent States.

    PubMed

    Lund, A P; Ralph, T C; Haselgrove, H L

    2008-01-25

    Quantum computing using two coherent states as a qubit basis is a proposed alternative architecture with lower overheads but has been questioned as a practical way of performing quantum computing due to the fragility of diagonal states with large coherent amplitudes. We show that using error correction only small amplitudes (alpha>1.2) are required for fault-tolerant quantum computing. We study fault tolerance under the effects of small amplitudes and loss using a Monte Carlo simulation. The first encoding level resources are orders of magnitude lower than the best single photon scheme.

  20. PISCES 2 users manual

    NASA Technical Reports Server (NTRS)

    Pratt, Terrence W.

    1987-01-01

    PISCES 2 is a programming environment and set of extensions to Fortran 77 for parallel programming. It is intended to provide a basis for writing programs for scientific and engineering applications on parallel computers in a way that is relatively independent of the particular details of the underlying computer architecture. This user's manual provides a complete description of the PISCES 2 system as it is currently implemented on the 20 processor Flexible FLEX/32 at NASA Langley Research Center.

  1. Distributed computing environments for future space control systems

    NASA Technical Reports Server (NTRS)

    Viallefont, Pierre

    1993-01-01

    The aim of this paper is to present the results of a CNES research project on distributed computing systems. The purpose of this research was to study the impact of the use of new computer technologies in the design and development of future space applications. The first part of this study was a state-of-the-art review of distributed computing systems. One of the interesting ideas arising from this review is the concept of a 'virtual computer' allowing the distributed hardware architecture to be hidden from a software application. The 'virtual computer' can improve system performance by adapting the best architecture (addition of computers) to the software application without having to modify its source code. This concept can also decrease the cost and obsolescence of the hardware architecture. In order to verify the feasibility of the 'virtual computer' concept, a prototype representative of a distributed space application is being developed independently of the hardware architecture.

  2. Electro-Optic Computing Architectures. Volume I

    DTIC Science & Technology

    1998-02-01

    The objective of the Electro - Optic Computing Architecture (EOCA) program was to develop multi-function electro - optic interfaces and optical...interconnect units to enhance the performance of parallel processor systems and form the building blocks for future electro - optic computing architectures...Specifically, three multi-function interface modules were targeted for development - an Electro - Optic Interface (EOI), an Optical Interconnection Unit (OW

  3. Universal computer control system (UCCS) for space telerobots

    NASA Technical Reports Server (NTRS)

    Bejczy, Antal K.; Szakaly, Zoltan

    1987-01-01

    A universal computer control system (UCCS) is under development for all motor elements of a space telerobot. The basic hardware architecture and software design of UCCS are described, together with the rich motor sensing, control, and self-test capabilities of this all-computerized motor control system. UCCS is integrated into a multibus computer environment with direct interface to higher level control processors, uses pulsewidth multiplier power amplifiers, and one unit can control up to sixteen different motors simultaneously at a high I/O rate. UCCS performance capabilities are illustrated by a few data.

  4. Combining Topological Hardware and Topological Software: Color-Code Quantum Computing with Topological Superconductor Networks

    NASA Astrophysics Data System (ADS)

    Litinski, Daniel; Kesselring, Markus S.; Eisert, Jens; von Oppen, Felix

    2017-07-01

    We present a scalable architecture for fault-tolerant topological quantum computation using networks of voltage-controlled Majorana Cooper pair boxes and topological color codes for error correction. Color codes have a set of transversal gates which coincides with the set of topologically protected gates in Majorana-based systems, namely, the Clifford gates. In this way, we establish color codes as providing a natural setting in which advantages offered by topological hardware can be combined with those arising from topological error-correcting software for full-fledged fault-tolerant quantum computing. We provide a complete description of our architecture, including the underlying physical ingredients. We start by showing that in topological superconductor networks, hexagonal cells can be employed to serve as physical qubits for universal quantum computation, and we present protocols for realizing topologically protected Clifford gates. These hexagonal-cell qubits allow for a direct implementation of open-boundary color codes with ancilla-free syndrome read-out and logical T gates via magic-state distillation. For concreteness, we describe how the necessary operations can be implemented using networks of Majorana Cooper pair boxes, and we give a feasibility estimate for error correction in this architecture. Our approach is motivated by nanowire-based networks of topological superconductors, but it could also be realized in alternative settings such as quantum-Hall-superconductor hybrids.

  5. Image-Processing Software For A Hypercube Computer

    NASA Technical Reports Server (NTRS)

    Lee, Meemong; Mazer, Alan S.; Groom, Steven L.; Williams, Winifred I.

    1992-01-01

    Concurrent Image Processing Executive (CIPE) is software system intended to develop and use image-processing application programs on concurrent computing environment. Designed to shield programmer from complexities of concurrent-system architecture, it provides interactive image-processing environment for end user. CIPE utilizes architectural characteristics of particular concurrent system to maximize efficiency while preserving architectural independence from user and programmer. CIPE runs on Mark-IIIfp 8-node hypercube computer and associated SUN-4 host computer.

  6. Experimental Comparison of Two Quantum Computing Architectures

    DTIC Science & Technology

    2017-03-28

    IN A U G U RA L A RT IC LE CO M PU TE R SC IE N CE S Experimental comparison of two quantum computing architectures Norbert M. Linkea,b,1, Dmitri...the vast computing power a universal quantumcomputer could offer, several candidate systems are being explored. They have allowed experimental ...existing systems and the role of architecture in quantum computer design . These will be crucial for the realization of more advanced future incarna

  7. DOE Office of Scientific and Technical Information (OSTI.GOV)

    None, None

    Smart grids are susceptible to cyber-attack as a result of new communication, control and computation techniques employed in the grid. In this paper, we characterize and analyze the resiliency of smart grid communication architecture, specifically an RF mesh based architecture, under cyber attacks. We analyze the resiliency of the communication architecture by studying the performance of high-level smart grid functions such as metering, and demand response which depend on communication. Disrupting the operation of these functions impacts the operational resiliency of the smart grid. Our analysis shows that it takes an attacker only a small fraction of meters to compromisemore » the communication resiliency of the smart grid. We discuss the implications of our result to critical smart grid functions and to the overall security of the smart grid.« less

  8. Requirements for plug and play information infrastructure frameworks and architectures to enable virtual enterprises

    NASA Astrophysics Data System (ADS)

    Bolton, Richard W.; Dewey, Allen; Horstmann, Paul W.; Laurentiev, John

    1997-01-01

    This paper examines the role virtual enterprises will have in supporting future business engagements and resulting technology requirements. Two representative end-user scenarios are proposed that define the requirements for 'plug-and-play' information infrastructure frameworks and architectures necessary to enable 'virtual enterprises' in US manufacturing industries. The scenarios provide a high- level 'needs analysis' for identifying key technologies, defining a reference architecture, and developing compliant reference implementations. Virtual enterprises are short- term consortia or alliances of companies formed to address fast-changing opportunities. Members of a virtual enterprise carry out their tasks as if they all worked for a single organization under 'one roof', using 'plug-and-play' information infrastructure frameworks and architectures to access and manage all information needed to support the product cycle. 'Plug-and-play' information infrastructure frameworks and architectures are required to enhance collaboration between companies corking together on different aspects of a manufacturing process. This new form of collaborative computing will decrease cycle-time and increase responsiveness to change.

  9. A learnable parallel processing architecture towards unity of memory and computing

    NASA Astrophysics Data System (ADS)

    Li, H.; Gao, B.; Chen, Z.; Zhao, Y.; Huang, P.; Ye, H.; Liu, L.; Liu, X.; Kang, J.

    2015-08-01

    Developing energy-efficient parallel information processing systems beyond von Neumann architecture is a long-standing goal of modern information technologies. The widely used von Neumann computer architecture separates memory and computing units, which leads to energy-hungry data movement when computers work. In order to meet the need of efficient information processing for the data-driven applications such as big data and Internet of Things, an energy-efficient processing architecture beyond von Neumann is critical for the information society. Here we show a non-von Neumann architecture built of resistive switching (RS) devices named “iMemComp”, where memory and logic are unified with single-type devices. Leveraging nonvolatile nature and structural parallelism of crossbar RS arrays, we have equipped “iMemComp” with capabilities of computing in parallel and learning user-defined logic functions for large-scale information processing tasks. Such architecture eliminates the energy-hungry data movement in von Neumann computers. Compared with contemporary silicon technology, adder circuits based on “iMemComp” can improve the speed by 76.8% and the power dissipation by 60.3%, together with a 700 times aggressive reduction in the circuit area.

  10. A learnable parallel processing architecture towards unity of memory and computing.

    PubMed

    Li, H; Gao, B; Chen, Z; Zhao, Y; Huang, P; Ye, H; Liu, L; Liu, X; Kang, J

    2015-08-14

    Developing energy-efficient parallel information processing systems beyond von Neumann architecture is a long-standing goal of modern information technologies. The widely used von Neumann computer architecture separates memory and computing units, which leads to energy-hungry data movement when computers work. In order to meet the need of efficient information processing for the data-driven applications such as big data and Internet of Things, an energy-efficient processing architecture beyond von Neumann is critical for the information society. Here we show a non-von Neumann architecture built of resistive switching (RS) devices named "iMemComp", where memory and logic are unified with single-type devices. Leveraging nonvolatile nature and structural parallelism of crossbar RS arrays, we have equipped "iMemComp" with capabilities of computing in parallel and learning user-defined logic functions for large-scale information processing tasks. Such architecture eliminates the energy-hungry data movement in von Neumann computers. Compared with contemporary silicon technology, adder circuits based on "iMemComp" can improve the speed by 76.8% and the power dissipation by 60.3%, together with a 700 times aggressive reduction in the circuit area.

  11. Improving Conceptual Design for Launch Vehicles

    NASA Technical Reports Server (NTRS)

    Olds, John R.

    1998-01-01

    This report summarizes activities performed during the second year of a three year cooperative agreement between NASA - Langley Research Center and Georgia Tech. Year 1 of the project resulted in the creation of a new Cost and Business Assessment Model (CABAM) for estimating the economic performance of advanced reusable launch vehicles including non-recurring costs, recurring costs, and revenue. The current year (second year) activities were focused on the evaluation of automated, collaborative design frameworks (computation architectures or computational frameworks) for automating the design process in advanced space vehicle design. Consistent with NASA's new thrust area in developing and understanding Intelligent Synthesis Environments (ISE), the goals of this year's research efforts were to develop and apply computer integration techniques and near-term computational frameworks for conducting advanced space vehicle design. NASA - Langley (VAB) has taken a lead role in developing a web-based computing architectures within which the designer can interact with disciplinary analysis tools through a flexible web interface. The advantages of this approach are, 1) flexible access to the designer interface through a simple web browser (e.g. Netscape Navigator), 2) ability to include existing 'legacy' codes, and 3) ability to include distributed analysis tools running on remote computers. To date, VAB's internal emphasis has been on developing this test system for the planetary entry mission under the joint Integrated Design System (IDS) program with NASA - Ames and JPL. Georgia Tech's complementary goals this year were to: 1) Examine an alternate 'custom' computational architecture for the three-discipline IDS planetary entry problem to assess the advantages and disadvantages relative to the web-based approach.and 2) Develop and examine a web-based interface and framework for a typical launch vehicle design problem.

  12. Digital optical computers at the optoelectronic computing systems center

    NASA Technical Reports Server (NTRS)

    Jordan, Harry F.

    1991-01-01

    The Digital Optical Computing Program within the National Science Foundation Engineering Research Center for Opto-electronic Computing Systems has as its specific goal research on optical computing architectures suitable for use at the highest possible speeds. The program can be targeted toward exploiting the time domain because other programs in the Center are pursuing research on parallel optical systems, exploiting optical interconnection and optical devices and materials. Using a general purpose computing architecture as the focus, we are developing design techniques, tools and architecture for operation at the speed of light limit. Experimental work is being done with the somewhat low speed components currently available but with architectures which will scale up in speed as faster devices are developed. The design algorithms and tools developed for a general purpose, stored program computer are being applied to other systems such as optimally controlled optical communication networks.

  13. Performance Analysis of GFDL's GCM Line-By-Line Radiative Transfer Model on GPU and MIC Architectures

    NASA Astrophysics Data System (ADS)

    Menzel, R.; Paynter, D.; Jones, A. L.

    2017-12-01

    Due to their relatively low computational cost, radiative transfer models in global climate models (GCMs) run on traditional CPU architectures generally consist of shortwave and longwave parameterizations over a small number of wavelength bands. With the rise of newer GPU and MIC architectures, however, the performance of high resolution line-by-line radiative transfer models may soon approach those of the physical parameterizations currently employed in GCMs. Here we present an analysis of the current performance of a new line-by-line radiative transfer model currently under development at GFDL. Although originally designed to specifically exploit GPU architectures through the use of CUDA, the radiative transfer model has recently been extended to include OpenMP in an effort to also effectively target MIC architectures such as Intel's Xeon Phi. Using input data provided by the upcoming Radiative Forcing Model Intercomparison Project (RFMIP, as part of CMIP 6), we compare model results and performance data for various model configurations and spectral resolutions run on both GPU and Intel Knights Landing architectures to analogous runs of the standard Oxford Reference Forward Model on traditional CPUs.

  14. Real-Time Model and Simulation Architecture for Half- and Full-Bridge Modular Multilevel Converters

    NASA Astrophysics Data System (ADS)

    Ashourloo, Mojtaba

    This work presents an equivalent model and simulation architecture for real-time electromagnetic transient analysis of either half-bridge or full-bridge modular multilevel converter (MMC) with 400 sub-modules (SMs) per arm. The proposed CPU/FPGA-based architecture is optimized for the parallel implementation of the presented MMC model on the FPGA and is beneficiary of a high-throughput floating-point computational engine. The developed real-time simulation architecture is capable of simulating MMCs with 400 SMs per arm at 825 nanoseconds. To address the difficulties of the sorting process implementation, a modified Odd-Even Bubble sorting is presented in this work. The comparison of the results under various test scenarios reveals that the proposed real-time simulator is representing the system responses in the same way of its corresponding off-line counterpart obtained from the PSCAD/EMTDC program.

  15. Experimental Adiabatic Quantum Factorization under Ambient Conditions Based on a Solid-State Single Spin System.

    PubMed

    Xu, Kebiao; Xie, Tianyu; Li, Zhaokai; Xu, Xiangkun; Wang, Mengqi; Ye, Xiangyu; Kong, Fei; Geng, Jianpei; Duan, Changkui; Shi, Fazhan; Du, Jiangfeng

    2017-03-31

    The adiabatic quantum computation is a universal and robust method of quantum computing. In this architecture, the problem can be solved by adiabatically evolving the quantum processor from the ground state of a simple initial Hamiltonian to that of a final one, which encodes the solution of the problem. Adiabatic quantum computation has been proved to be a compatible candidate for scalable quantum computation. In this Letter, we report on the experimental realization of an adiabatic quantum algorithm on a single solid spin system under ambient conditions. All elements of adiabatic quantum computation, including initial state preparation, adiabatic evolution (simulated by optimal control), and final state read-out, are realized experimentally. As an example, we found the ground state of the problem Hamiltonian S_{z}I_{z} on our adiabatic quantum processor, which can be mapped to the factorization of 35 into its prime factors 5 and 7.

  16. Experimental Adiabatic Quantum Factorization under Ambient Conditions Based on a Solid-State Single Spin System

    NASA Astrophysics Data System (ADS)

    Xu, Kebiao; Xie, Tianyu; Li, Zhaokai; Xu, Xiangkun; Wang, Mengqi; Ye, Xiangyu; Kong, Fei; Geng, Jianpei; Duan, Changkui; Shi, Fazhan; Du, Jiangfeng

    2017-03-01

    The adiabatic quantum computation is a universal and robust method of quantum computing. In this architecture, the problem can be solved by adiabatically evolving the quantum processor from the ground state of a simple initial Hamiltonian to that of a final one, which encodes the solution of the problem. Adiabatic quantum computation has been proved to be a compatible candidate for scalable quantum computation. In this Letter, we report on the experimental realization of an adiabatic quantum algorithm on a single solid spin system under ambient conditions. All elements of adiabatic quantum computation, including initial state preparation, adiabatic evolution (simulated by optimal control), and final state read-out, are realized experimentally. As an example, we found the ground state of the problem Hamiltonian SzIz on our adiabatic quantum processor, which can be mapped to the factorization of 35 into its prime factors 5 and 7.

  17. Resource Efficient Hardware Architecture for Fast Computation of Running Max/Min Filters

    PubMed Central

    Torres-Huitzil, Cesar

    2013-01-01

    Running max/min filters on rectangular kernels are widely used in many digital signal and image processing applications. Filtering with a k × k kernel requires of k 2 − 1 comparisons per sample for a direct implementation; thus, performance scales expensively with the kernel size k. Faster computations can be achieved by kernel decomposition and using constant time one-dimensional algorithms on custom hardware. This paper presents a hardware architecture for real-time computation of running max/min filters based on the van Herk/Gil-Werman (HGW) algorithm. The proposed architecture design uses less computation and memory resources than previously reported architectures when targeted to Field Programmable Gate Array (FPGA) devices. Implementation results show that the architecture is able to compute max/min filters, on 1024 × 1024 images with up to 255 × 255 kernels, in around 8.4 milliseconds, 120 frames per second, at a clock frequency of 250 MHz. The implementation is highly scalable for the kernel size with good performance/area tradeoff suitable for embedded applications. The applicability of the architecture is shown for local adaptive image thresholding. PMID:24288456

  18. Evaluating a digital ship design tool prototype: Designers' perceptions of novel ergonomics software.

    PubMed

    Mallam, Steven C; Lundh, Monica; MacKinnon, Scott N

    2017-03-01

    Computer-aided solutions are essential for naval architects to manage and optimize technical complexities when developing a ship's design. Although there are an array of software solutions aimed to optimize the human element in design, practical ergonomics methodologies and technological solutions have struggled to gain widespread application in ship design processes. This paper explores how a new ergonomics technology is perceived by naval architecture students using a mixed-methods framework. Thirteen Naval Architecture and Ocean Engineering Masters students participated in the study. Overall, results found participants perceived the software and its embedded ergonomics tools to benefit their design work, increasing their empathy and ability to understand the work environment and work demands end-users face. However, participant's questioned if ergonomics could be practically and efficiently implemented under real-world project constraints. This revealed underlying social biases and a fundamental lack of understanding in engineering postgraduate students regarding applied ergonomics in naval architecture. Copyright © 2016 Elsevier Ltd. All rights reserved.

  19. Clinical implementation of a GPU-based simplified Monte Carlo method for a treatment planning system of proton beam therapy.

    PubMed

    Kohno, R; Hotta, K; Nishioka, S; Matsubara, K; Tansho, R; Suzuki, T

    2011-11-21

    We implemented the simplified Monte Carlo (SMC) method on graphics processing unit (GPU) architecture under the computer-unified device architecture platform developed by NVIDIA. The GPU-based SMC was clinically applied for four patients with head and neck, lung, or prostate cancer. The results were compared to those obtained by a traditional CPU-based SMC with respect to the computation time and discrepancy. In the CPU- and GPU-based SMC calculations, the estimated mean statistical errors of the calculated doses in the planning target volume region were within 0.5% rms. The dose distributions calculated by the GPU- and CPU-based SMCs were similar, within statistical errors. The GPU-based SMC showed 12.30-16.00 times faster performance than the CPU-based SMC. The computation time per beam arrangement using the GPU-based SMC for the clinical cases ranged 9-67 s. The results demonstrate the successful application of the GPU-based SMC to a clinical proton treatment planning.

  20. Electro-Optic Computing Architectures: Volume II. Components and System Design and Analysis

    DTIC Science & Technology

    1998-02-01

    The objective of the Electro - Optic Computing Architecture (EOCA) program was to develop multi-function electro - optic interfaces and optical...interconnect units to enhance the performance of parallel processor systems and form the building blocks for future electro - optic computing architectures...Specifically, three multi-function interface modules were targeted for development - an Electro - Optic Interface (EOI), an Optical Interconnection Unit

  1. Compact VLSI neural computer integrated with active pixel sensor for real-time ATR applications

    NASA Astrophysics Data System (ADS)

    Fang, Wai-Chi; Udomkesmalee, Gabriel; Alkalai, Leon

    1997-04-01

    A compact VLSI neural computer integrated with an active pixel sensor has been under development to mimic what is inherent in biological vision systems. This electronic eye- brain computer is targeted for real-time machine vision applications which require both high-bandwidth communication and high-performance computing for data sensing, synergy of multiple types of sensory information, feature extraction, target detection, target recognition, and control functions. The neural computer is based on a composite structure which combines Annealing Cellular Neural Network (ACNN) and Hierarchical Self-Organization Neural Network (HSONN). The ACNN architecture is a programmable and scalable multi- dimensional array of annealing neurons which are locally connected with their local neurons. Meanwhile, the HSONN adopts a hierarchical structure with nonlinear basis functions. The ACNN+HSONN neural computer is effectively designed to perform programmable functions for machine vision processing in all levels with its embedded host processor. It provides a two order-of-magnitude increase in computation power over the state-of-the-art microcomputer and DSP microelectronics. A compact current-mode VLSI design feasibility of the ACNN+HSONN neural computer is demonstrated by a 3D 16X8X9-cube neural processor chip design in a 2-micrometers CMOS technology. Integration of this neural computer as one slice of a 4'X4' multichip module into the 3D MCM based avionics architecture for NASA's New Millennium Program is also described.

  2. An energy efficient and high speed architecture for convolution computing based on binary resistive random access memory

    NASA Astrophysics Data System (ADS)

    Liu, Chen; Han, Runze; Zhou, Zheng; Huang, Peng; Liu, Lifeng; Liu, Xiaoyan; Kang, Jinfeng

    2018-04-01

    In this work we present a novel convolution computing architecture based on metal oxide resistive random access memory (RRAM) to process the image data stored in the RRAM arrays. The proposed image storage architecture shows performances of better speed-device consumption efficiency compared with the previous kernel storage architecture. Further we improve the architecture for a high accuracy and low power computing by utilizing the binary storage and the series resistor. For a 28 × 28 image and 10 kernels with a size of 3 × 3, compared with the previous kernel storage approach, the newly proposed architecture shows excellent performances including: 1) almost 100% accuracy within 20% LRS variation and 90% HRS variation; 2) more than 67 times speed boost; 3) 71.4% energy saving.

  3. Parallel processing architecture for computing inverse differential kinematic equations of the PUMA arm

    NASA Technical Reports Server (NTRS)

    Hsia, T. C.; Lu, G. Z.; Han, W. H.

    1987-01-01

    In advanced robot control problems, on-line computation of inverse Jacobian solution is frequently required. Parallel processing architecture is an effective way to reduce computation time. A parallel processing architecture is developed for the inverse Jacobian (inverse differential kinematic equation) of the PUMA arm. The proposed pipeline/parallel algorithm can be inplemented on an IC chip using systolic linear arrays. This implementation requires 27 processing cells and 25 time units. Computation time is thus significantly reduced.

  4. Hypercube matrix computation task

    NASA Technical Reports Server (NTRS)

    Calalo, R.; Imbriale, W.; Liewer, P.; Lyons, J.; Manshadi, F.; Patterson, J.

    1987-01-01

    The Hypercube Matrix Computation (Year 1986-1987) task investigated the applicability of a parallel computing architecture to the solution of large scale electromagnetic scattering problems. Two existing electromagnetic scattering codes were selected for conversion to the Mark III Hypercube concurrent computing environment. They were selected so that the underlying numerical algorithms utilized would be different thereby providing a more thorough evaluation of the appropriateness of the parallel environment for these types of problems. The first code was a frequency domain method of moments solution, NEC-2, developed at Lawrence Livermore National Laboratory. The second code was a time domain finite difference solution of Maxwell's equations to solve for the scattered fields. Once the codes were implemented on the hypercube and verified to obtain correct solutions by comparing the results with those from sequential runs, several measures were used to evaluate the performance of the two codes. First, a comparison was provided of the problem size possible on the hypercube with 128 megabytes of memory for a 32-node configuration with that available in a typical sequential user environment of 4 to 8 megabytes. Then, the performance of the codes was anlyzed for the computational speedup attained by the parallel architecture.

  5. Architectures for Quantum Simulation Showing a Quantum Speedup

    NASA Astrophysics Data System (ADS)

    Bermejo-Vega, Juan; Hangleiter, Dominik; Schwarz, Martin; Raussendorf, Robert; Eisert, Jens

    2018-04-01

    One of the main aims in the field of quantum simulation is to achieve a quantum speedup, often referred to as "quantum computational supremacy," referring to the experimental realization of a quantum device that computationally outperforms classical computers. In this work, we show that one can devise versatile and feasible schemes of two-dimensional, dynamical, quantum simulators showing such a quantum speedup, building on intermediate problems involving nonadaptive, measurement-based, quantum computation. In each of the schemes, an initial product state is prepared, potentially involving an element of randomness as in disordered models, followed by a short-time evolution under a basic translationally invariant Hamiltonian with simple nearest-neighbor interactions and a mere sampling measurement in a fixed basis. The correctness of the final-state preparation in each scheme is fully efficiently certifiable. We discuss experimental necessities and possible physical architectures, inspired by platforms of cold atoms in optical lattices and a number of others, as well as specific assumptions that enter the complexity-theoretic arguments. This work shows that benchmark settings exhibiting a quantum speedup may require little control, in contrast to universal quantum computing. Thus, our proposal puts a convincing experimental demonstration of a quantum speedup within reach in the near term.

  6. A scalable architecture for online anomaly detection of WLCG batch jobs

    NASA Astrophysics Data System (ADS)

    Kuehn, E.; Fischer, M.; Giffels, M.; Jung, C.; Petzold, A.

    2016-10-01

    For data centres it is increasingly important to monitor the network usage, and learn from network usage patterns. Especially configuration issues or misbehaving batch jobs preventing a smooth operation need to be detected as early as possible. At the GridKa data and computing centre we therefore operate a tool BPNetMon for monitoring traffic data and characteristics of WLCG batch jobs and pilots locally on different worker nodes. On the one hand local information itself are not sufficient to detect anomalies for several reasons, e.g. the underlying job distribution on a single worker node might change or there might be a local misconfiguration. On the other hand a centralised anomaly detection approach does not scale regarding network communication as well as computational costs. We therefore propose a scalable architecture based on concepts of a super-peer network.

  7. Localization Framework for Real-Time UAV Autonomous Landing: An On-Ground Deployed Visual Approach

    PubMed Central

    Kong, Weiwei; Hu, Tianjiang; Zhang, Daibing; Shen, Lincheng; Zhang, Jianwei

    2017-01-01

    One of the greatest challenges for fixed-wing unmanned aircraft vehicles (UAVs) is safe landing. Hereafter, an on-ground deployed visual approach is developed in this paper. This approach is definitely suitable for landing within the global navigation satellite system (GNSS)-denied environments. As for applications, the deployed guidance system makes full use of the ground computing resource and feedbacks the aircraft’s real-time localization to its on-board autopilot. Under such circumstances, a separate long baseline stereo architecture is proposed to possess an extendable baseline and wide-angle field of view (FOV) against the traditional fixed baseline schemes. Furthermore, accuracy evaluation of the new type of architecture is conducted by theoretical modeling and computational analysis. Dataset-driven experimental results demonstrate the feasibility and effectiveness of the developed approach. PMID:28629189

  8. Localization Framework for Real-Time UAV Autonomous Landing: An On-Ground Deployed Visual Approach.

    PubMed

    Kong, Weiwei; Hu, Tianjiang; Zhang, Daibing; Shen, Lincheng; Zhang, Jianwei

    2017-06-19

    [-5]One of the greatest challenges for fixed-wing unmanned aircraft vehicles (UAVs) is safe landing. Hereafter, an on-ground deployed visual approach is developed in this paper. This approach is definitely suitable for landing within the global navigation satellite system (GNSS)-denied environments. As for applications, the deployed guidance system makes full use of the ground computing resource and feedbacks the aircraft's real-time localization to its on-board autopilot. Under such circumstances, a separate long baseline stereo architecture is proposed to possess an extendable baseline and wide-angle field of view (FOV) against the traditional fixed baseline schemes. Furthermore, accuracy evaluation of the new type of architecture is conducted by theoretical modeling and computational analysis. Dataset-driven experimental results demonstrate the feasibility and effectiveness of the developed approach.

  9. High performance network and channel-based storage

    NASA Technical Reports Server (NTRS)

    Katz, Randy H.

    1991-01-01

    In the traditional mainframe-centered view of a computer system, storage devices are coupled to the system through complex hardware subsystems called input/output (I/O) channels. With the dramatic shift towards workstation-based computing, and its associated client/server model of computation, storage facilities are now found attached to file servers and distributed throughout the network. We discuss the underlying technology trends that are leading to high performance network-based storage, namely advances in networks, storage devices, and I/O controller and server architectures. We review several commercial systems and research prototypes that are leading to a new approach to high performance computing based on network-attached storage.

  10. Automatic control of a negative ion source

    NASA Astrophysics Data System (ADS)

    Saadatmand, K.; Sredniawski, J.; Solensten, L.

    1989-04-01

    A CAMAC based control architecture is devised for a Berkeley-type H - volume ion source [1]. The architecture employs three 80386 TM PCs. One PC is dedicated to control and monitoring of source operation. The other PC functions with digitizers to provide data acquisition of waveforms. The third PC is used for off-line analysis. Initially, operation of the source was put under remote computer control (supervisory). This was followed by development of an automated startup procedure. Finally, a study of the physics of operation is now underway to establish a data base from which automatic beam optimization can be derived.

  11. Framework for a clinical information system.

    PubMed

    Van De Velde, R; Lansiers, R; Antonissen, G

    2002-01-01

    The design and implementation of Clinical Information System architecture is presented. This architecture has been developed and implemented based on components following a strong underlying conceptual and technological model. Common Object Request Broker and n-tier technology featuring centralised and departmental clinical information systems as the back-end store for all clinical data are used. Servers located in the "middle" tier apply the clinical (business) model and application rules. The main characteristics are the focus on modelling and reuse of both data and business logic. Scalability as well as adaptability to constantly changing requirements via component driven computing are the main reasons for that approach.

  12. Collaborative Design Practices in Technology Mediated Learning

    ERIC Educational Resources Information Center

    Seitamaa-Hakkarainen, Pirita; Kangas, Kaiju; Raunio, Anna-Mari; Hakkarainen, Kai

    2012-01-01

    The present article examines how practices of computer-supported collaborative designing may be implemented in an elementary classroom. We present a case study in which 12-year-old students engaged in architectural design under the guidance of their teacher and a professional designer. The students were engaged in all aspects of design processes,…

  13. A support architecture for reliable distributed computing systems

    NASA Technical Reports Server (NTRS)

    Dasgupta, Partha; Leblanc, Richard J., Jr.

    1988-01-01

    The Clouds project is well underway to its goal of building a unified distributed operating system supporting the object model. The operating system design uses the object concept of structuring software at all levels of the system. The basic operating system was developed and work is under progress to build a usable system.

  14. Pyramidal neurovision architecture for vision machines

    NASA Astrophysics Data System (ADS)

    Gupta, Madan M.; Knopf, George K.

    1993-08-01

    The vision system employed by an intelligent robot must be active; active in the sense that it must be capable of selectively acquiring the minimal amount of relevant information for a given task. An efficient active vision system architecture that is based loosely upon the parallel-hierarchical (pyramidal) structure of the biological visual pathway is presented in this paper. Although the computational architecture of the proposed pyramidal neuro-vision system is far less sophisticated than the architecture of the biological visual pathway, it does retain some essential features such as the converging multilayered structure of its biological counterpart. In terms of visual information processing, the neuro-vision system is constructed from a hierarchy of several interactive computational levels, whereupon each level contains one or more nonlinear parallel processors. Computationally efficient vision machines can be developed by utilizing both the parallel and serial information processing techniques within the pyramidal computing architecture. A computer simulation of a pyramidal vision system for active scene surveillance is presented.

  15. Collaborative Working Architecture for IoT-Based Applications.

    PubMed

    Mora, Higinio; Signes-Pont, María Teresa; Gil, David; Johnsson, Magnus

    2018-05-23

    The new sensing applications need enhanced computing capabilities to handle the requirements of complex and huge data processing. The Internet of Things (IoT) concept brings processing and communication features to devices. In addition, the Cloud Computing paradigm provides resources and infrastructures for performing the computations and outsourcing the work from the IoT devices. This scenario opens new opportunities for designing advanced IoT-based applications, however, there is still much research to be done to properly gear all the systems for working together. This work proposes a collaborative model and an architecture to take advantage of the available computing resources. The resulting architecture involves a novel network design with different levels which combines sensing and processing capabilities based on the Mobile Cloud Computing (MCC) paradigm. An experiment is included to demonstrate that this approach can be used in diverse real applications. The results show the flexibility of the architecture to perform complex computational tasks of advanced applications.

  16. Optimizing Engineering Tools Using Modern Ground Architectures

    DTIC Science & Technology

    2017-12-01

    Considerations,” International Journal of Computer Science & Engineering Survey , vol. 5, no. 4, 2014. [10] R. Bell. (n.d). A beginner’s guide to big O notation...scientific community. Traditional computing architectures were not capable of processing the data efficiently, or in some cases, could not process the...thesis investigates how these modern computing architectures could be leveraged by industry and academia to improve the performance and capabilities of

  17. Code Modernization of VPIC

    NASA Astrophysics Data System (ADS)

    Bird, Robert; Nystrom, David; Albright, Brian

    2017-10-01

    The ability of scientific simulations to effectively deliver performant computation is increasingly being challenged by successive generations of high-performance computing architectures. Code development to support efficient computation on these modern architectures is both expensive, and highly complex; if it is approached without due care, it may also not be directly transferable between subsequent hardware generations. Previous works have discussed techniques to support the process of adapting a legacy code for modern hardware generations, but despite the breakthroughs in the areas of mini-app development, portable-performance, and cache oblivious algorithms the problem still remains largely unsolved. In this work we demonstrate how a focus on platform agnostic modern code-development can be applied to Particle-in-Cell (PIC) simulations to facilitate effective scientific delivery. This work builds directly on our previous work optimizing VPIC, in which we replaced intrinsic based vectorisation with compile generated auto-vectorization to improve the performance and portability of VPIC. In this work we present the use of a specialized SIMD queue for processing some particle operations, and also preview a GPU capable OpenMP variant of VPIC. Finally we include a lessons learnt. Work performed under the auspices of the U.S. Dept. of Energy by the Los Alamos National Security, LLC Los Alamos National Laboratory under contract DE-AC52-06NA25396 and supported by the LANL LDRD program.

  18. Architecture independent environment for developing engineering software on MIMD computers

    NASA Technical Reports Server (NTRS)

    Valimohamed, Karim A.; Lopez, L. A.

    1990-01-01

    Engineers are constantly faced with solving problems of increasing complexity and detail. Multiple Instruction stream Multiple Data stream (MIMD) computers have been developed to overcome the performance limitations of serial computers. The hardware architectures of MIMD computers vary considerably and are much more sophisticated than serial computers. Developing large scale software for a variety of MIMD computers is difficult and expensive. There is a need to provide tools that facilitate programming these machines. First, the issues that must be considered to develop those tools are examined. The two main areas of concern were architecture independence and data management. Architecture independent software facilitates software portability and improves the longevity and utility of the software product. It provides some form of insurance for the investment of time and effort that goes into developing the software. The management of data is a crucial aspect of solving large engineering problems. It must be considered in light of the new hardware organizations that are available. Second, the functional design and implementation of a software environment that facilitates developing architecture independent software for large engineering applications are described. The topics of discussion include: a description of the model that supports the development of architecture independent software; identifying and exploiting concurrency within the application program; data coherence; engineering data base and memory management.

  19. Stability and performance tradeoffs in bi-lateral telemanipulation

    NASA Technical Reports Server (NTRS)

    Hannaford, Blake

    1989-01-01

    Kinesthetic force feedback provides measurable increase in remote manipulation system performance. Intensive computation time requirements or operation under conditions of time delay can cause serious stability problems in control-system design. Here, a simplified linear analysis of this stability problem is presented for the forward-flow generalized architecture, applying the hybrid two-port representation to express the loop gain of the traditional master-slave architecture, which can be subjected to similar analysis. The hybrid two-port representation is also used to express the effects on the fidelity of manipulation or feel of one design approach used to stabilize the forward-flow architecture. The results suggest that, when local force feedback at the slave side is used to reduce manipulator stability problems, a price is paid in terms of telemanipulation fidelity.

  20. From black box to toolbox: Outlining device functionality, engagement activities, and the pervasive information architecture of mHealth interventions.

    PubMed

    Danaher, Brian G; Brendryen, Håvar; Seeley, John R; Tyler, Milagra S; Woolley, Tim

    2015-03-01

    mHealth interventions that deliver content via mobile phones represent a burgeoning area of health behavior change. The current paper examines two themes that can inform the underlying design of mHealth interventions: (1) mobile device functionality, which represents the technological toolbox available to intervention developers; and (2) the pervasive information architecture of mHealth interventions, which determines how intervention content can be delivered concurrently using mobile phones, personal computers, and other devices. We posit that developers of mHealth interventions will be better able to achieve the promise of this burgeoning arena by leveraging the toolbox and functionality of mobile devices in order to engage participants and encourage meaningful behavior change within the context of a carefully designed pervasive information architecture.

  1. Computer vision camera with embedded FPGA processing

    NASA Astrophysics Data System (ADS)

    Lecerf, Antoine; Ouellet, Denis; Arias-Estrada, Miguel

    2000-03-01

    Traditional computer vision is based on a camera-computer system in which the image understanding algorithms are embedded in the computer. To circumvent the computational load of vision algorithms, low-level processing and imaging hardware can be integrated in a single compact module where a dedicated architecture is implemented. This paper presents a Computer Vision Camera based on an open architecture implemented in an FPGA. The system is targeted to real-time computer vision tasks where low level processing and feature extraction tasks can be implemented in the FPGA device. The camera integrates a CMOS image sensor, an FPGA device, two memory banks, and an embedded PC for communication and control tasks. The FPGA device is a medium size one equivalent to 25,000 logic gates. The device is connected to two high speed memory banks, an IS interface, and an imager interface. The camera can be accessed for architecture programming, data transfer, and control through an Ethernet link from a remote computer. A hardware architecture can be defined in a Hardware Description Language (like VHDL), simulated and synthesized into digital structures that can be programmed into the FPGA and tested on the camera. The architecture of a classical multi-scale edge detection algorithm based on a Laplacian of Gaussian convolution has been developed to show the capabilities of the system.

  2. State-of-the-art in Heterogeneous Computing

    DOE PAGES

    Brodtkorb, Andre R.; Dyken, Christopher; Hagen, Trond R.; ...

    2010-01-01

    Node level heterogeneous architectures have become attractive during the last decade for several reasons: compared to traditional symmetric CPUs, they offer high peak performance and are energy and/or cost efficient. With the increase of fine-grained parallelism in high-performance computing, as well as the introduction of parallelism in workstations, there is an acute need for a good overview and understanding of these architectures. We give an overview of the state-of-the-art in heterogeneous computing, focusing on three commonly found architectures: the Cell Broadband Engine Architecture, graphics processing units (GPUs), and field programmable gate arrays (FPGAs). We present a review of hardware, availablemore » software tools, and an overview of state-of-the-art techniques and algorithms. Furthermore, we present a qualitative and quantitative comparison of the architectures, and give our view on the future of heterogeneous computing.« less

  3. Advanced computer architecture for large-scale real-time applications.

    DOT National Transportation Integrated Search

    1973-04-01

    Air traffic control automation is identified as a crucial problem which provides a complex, real-time computer application environment. A novel computer architecture in the form of a pipeline associative processor is conceived to achieve greater perf...

  4. Integrating Computing Resources: A Shared Distributed Architecture for Academics and Administrators.

    ERIC Educational Resources Information Center

    Beltrametti, Monica; English, Will

    1994-01-01

    Development and implementation of a shared distributed computing architecture at the University of Alberta (Canada) are described. Aspects discussed include design of the architecture, users' views of the electronic environment, technical and managerial challenges, and the campuswide human infrastructures needed to manage such an integrated…

  5. ''Beauty of Wholeness and Beauty of Partiality.'' New Terms Defining the Concept of Beauty in Architecture in Terms of Sustainability and Computer Aided Design

    ERIC Educational Resources Information Center

    Farid, Ayman A.; Zaghloul, Weaam M.; Dewidar, Khaled M.

    2014-01-01

    The great shift in sustainability and computer aided design in the field of architecture caused a remarkable change in the architecture philosophy, new aspects of beauty and aesthetic values are being introduced, and traditional definitions for beauty cannot fully cover this aspects, which causes a gap between; new architecture works criticism and…

  6. Programmable hardware for reconfigurable computing systems

    NASA Astrophysics Data System (ADS)

    Smith, Stephen

    1996-10-01

    In 1945 the work of J. von Neumann and H. Goldstein created the principal architecture for electronic computation that has now lasted fifty years. Nevertheless alternative architectures have been created that have computational capability, for special tasks, far beyond that feasible with von Neumann machines. The emergence of high capacity programmable logic devices has made the realization of these architectures practical. The original ENIAC and EDVAC machines were conceived to solve special mathematical problems that were far from today's concept of 'killer applications.' In a similar vein programmable hardware computation is being used today to solve unique mathematical problems. Our programmable hardware activity is focused on the research and development of novel computational systems based upon the reconfigurability of our programmable logic devices. We explore our programmable logic architectures and their implications for programmable hardware. One programmable hardware board implementation is detailed.

  7. Execution environment for intelligent real-time control systems

    NASA Technical Reports Server (NTRS)

    Sztipanovits, Janos

    1987-01-01

    Modern telerobot control technology requires the integration of symbolic and non-symbolic programming techniques, different models of parallel computations, and various programming paradigms. The Multigraph Architecture, which has been developed for the implementation of intelligent real-time control systems is described. The layered architecture includes specific computational models, integrated execution environment and various high-level tools. A special feature of the architecture is the tight coupling between the symbolic and non-symbolic computations. It supports not only a data interface, but also the integration of the control structures in a parallel computing environment.

  8. Efficient Phase Unwrapping Architecture for Digital Holographic Microscopy

    PubMed Central

    Hwang, Wen-Jyi; Cheng, Shih-Chang; Cheng, Chau-Jern

    2011-01-01

    This paper presents a novel phase unwrapping architecture for accelerating the computational speed of digital holographic microscopy (DHM). A fast Fourier transform (FFT) based phase unwrapping algorithm providing a minimum squared error solution is adopted for hardware implementation because of its simplicity and robustness to noise. The proposed architecture is realized in a pipeline fashion to maximize throughput of the computation. Moreover, the number of hardware multipliers and dividers are minimized to reduce the hardware costs. The proposed architecture is used as a custom user logic in a system on programmable chip (SOPC) for physical performance measurement. Experimental results reveal that the proposed architecture is effective for expediting the computational speed while consuming low hardware resources for designing an embedded DHM system. PMID:22163688

  9. Development of a Subcell Based Modeling Approach for Modeling the Architecturally Dependent Impact Response of Triaxially Braided Polymer Matrix Composites

    NASA Technical Reports Server (NTRS)

    Sorini, Chris; Chattopadhyay, Aditi; Goldberg, Robert K.; Kohlman, Lee W.

    2016-01-01

    Understanding the high velocity impact response of polymer matrix composites with complex architectures is critical to many aerospace applications, including engine fan blade containment systems where the structure must be able to completely contain fan blades in the event of a blade-out. Despite the benefits offered by these materials, the complex nature of textile composites presents a significant challenge for the prediction of deformation and damage under both quasi-static and impact loading conditions. The relatively large mesoscale repeating unit cell (in comparison to the size of structural components) causes the material to behave like a structure rather than a homogeneous material. Impact experiments conducted at NASA Glenn Research Center have shown the damage patterns to be a function of the underlying material architecture. Traditional computational techniques that involve modeling these materials using smeared homogeneous, orthotropic material properties at the macroscale result in simulated damage patterns that are a function of the structural geometry, but not the material architecture. In order to preserve heterogeneity at the highest length scale in a robust yet computationally efficient manner, and capture the architecturally dependent damage patterns, a previously-developed subcell modeling approach where the braided composite unit cell is approximated as a series of four adjacent laminated composites is utilized. This work discusses the implementation of the subcell methodology into the commercial transient dynamic finite element code LS-DYNA (Livermore Software Technology Corp.). Verification and validation studies are also presented, including simulation of the tensile response of straight-sided and notched quasi-static coupons composed of a T700/PR520 triaxially braided [0deg/60deg/-60deg] composite. Based on the results of the verification and validation studies, advantages and limitations of the methodology as well as plans for future work are discussed.

  10. Managing Power Heterogeneity

    NASA Astrophysics Data System (ADS)

    Pruhs, Kirk

    A particularly important emergent technology is heterogeneous processors (or cores), which many computer architects believe will be the dominant architectural design in the future. The main advantage of a heterogeneous architecture, relative to an architecture of identical processors, is that it allows for the inclusion of processors whose design is specialized for particular types of jobs, and for jobs to be assigned to a processor best suited for that job. Most notably, it is envisioned that these heterogeneous architectures will consist of a small number of high-power high-performance processors for critical jobs, and a larger number of lower-power lower-performance processors for less critical jobs. Naturally, the lower-power processors would be more energy efficient in terms of the computation performed per unit of energy expended, and would generate less heat per unit of computation. For a given area and power budget, heterogeneous designs can give significantly better performance for standard workloads. Moreover, even processors that were designed to be homogeneous, are increasingly likely to be heterogeneous at run time: the dominant underlying cause is the increasing variability in the fabrication process as the feature size is scaled down (although run time faults will also play a role). Since manufacturing yields would be unacceptably low if every processor/core was required to be perfect, and since there would be significant performance loss from derating the entire chip to the functioning of the least functional processor (which is what would be required in order to attain processor homogeneity), some processor heterogeneity seems inevitable in chips with many processors/cores.

  11. Computer Architects.

    ERIC Educational Resources Information Center

    Betts, Janelle Lyon

    2001-01-01

    Describes a high school art assignment in which students utilize Appleworks or Claris Works to design their own house, after learning about architectural styles and how to use the computer program. States that the project develops student computer skills and increases student knowledge about architecture. (CMK)

  12. The graphical brain: Belief propagation and active inference

    PubMed Central

    Friston, Karl J.; Parr, Thomas; de Vries, Bert

    2018-01-01

    This paper considers functional integration in the brain from a computational perspective. We ask what sort of neuronal message passing is mandated by active inference—and what implications this has for context-sensitive connectivity at microscopic and macroscopic levels. In particular, we formulate neuronal processing as belief propagation under deep generative models. Crucially, these models can entertain both discrete and continuous states, leading to distinct schemes for belief updating that play out on the same (neuronal) architecture. Technically, we use Forney (normal) factor graphs to elucidate the requisite message passing in terms of its form and scheduling. To accommodate mixed generative models (of discrete and continuous states), one also has to consider link nodes or factors that enable discrete and continuous representations to talk to each other. When mapping the implicit computational architecture onto neuronal connectivity, several interesting features emerge. For example, Bayesian model averaging and comparison, which link discrete and continuous states, may be implemented in thalamocortical loops. These and other considerations speak to a computational connectome that is inherently state dependent and self-organizing in ways that yield to a principled (variational) account. We conclude with simulations of reading that illustrate the implicit neuronal message passing, with a special focus on how discrete (semantic) representations inform, and are informed by, continuous (visual) sampling of the sensorium. Author Summary This paper considers functional integration in the brain from a computational perspective. We ask what sort of neuronal message passing is mandated by active inference—and what implications this has for context-sensitive connectivity at microscopic and macroscopic levels. In particular, we formulate neuronal processing as belief propagation under deep generative models that can entertain both discrete and continuous states. This leads to distinct schemes for belief updating that play out on the same (neuronal) architecture. Technically, we use Forney (normal) factor graphs to characterize the requisite message passing, and link this formal characterization to canonical microcircuits and extrinsic connectivity in the brain. PMID:29417960

  13. Evaluation of Visual Computer Simulator for Computer Architecture Education

    ERIC Educational Resources Information Center

    Imai, Yoshiro; Imai, Masatoshi; Moritoh, Yoshio

    2013-01-01

    This paper presents trial evaluation of a visual computer simulator in 2009-2011, which has been developed to play some roles of both instruction facility and learning tool simultaneously. And it illustrates an example of Computer Architecture education for University students and usage of e-Learning tool for Assembly Programming in order to…

  14. Design of a massively parallel computer using bit serial processing elements

    NASA Technical Reports Server (NTRS)

    Aburdene, Maurice F.; Khouri, Kamal S.; Piatt, Jason E.; Zheng, Jianqing

    1995-01-01

    A 1-bit serial processor designed for a parallel computer architecture is described. This processor is used to develop a massively parallel computational engine, with a single instruction-multiple data (SIMD) architecture. The computer is simulated and tested to verify its operation and to measure its performance for further development.

  15. A heterogeneous hierarchical architecture for real-time computing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Skroch, D.A.; Fornaro, R.J.

    The need for high-speed data acquisition and control algorithms has prompted continued research in the area of multiprocessor systems and related programming techniques. The result presented here is a unique hardware and software architecture for high-speed real-time computer systems. The implementation of a prototype of this architecture has required the integration of architecture, operating systems and programming languages into a cohesive unit. This report describes a Heterogeneous Hierarchial Architecture for Real-Time (H{sup 2} ART) and system software for program loading and interprocessor communication.

  16. Three Program Architecture for Design Optimization

    NASA Technical Reports Server (NTRS)

    Miura, Hirokazu; Olson, Lawrence E. (Technical Monitor)

    1998-01-01

    In this presentation, I would like to review historical perspective on the program architecture used to build design optimization capabilities based on mathematical programming and other numerical search techniques. It is rather straightforward to classify the program architecture in three categories as shown above. However, the relative importance of each of the three approaches has not been static, instead dynamically changing as the capabilities of available computational resource increases. For example, we considered that the direct coupling architecture would never be used for practical problems, but availability of such computer systems as multi-processor. In this presentation, I would like to review the roles of three architecture from historical as well as current and future perspective. There may also be some possibility for emergence of hybrid architecture. I hope to provide some seeds for active discussion where we are heading to in the very dynamic environment for high speed computing and communication.

  17. An Experiment in the Use of Computer-Based Education to Teach Energy Considerations in Architectural Design.

    ERIC Educational Resources Information Center

    Arumi, Francisco N.

    Computer programs capable of describing the thermal behavior of buildings are used to help architectural students understand environmental systems. The Numerical Simulation Laboratory at the Architectural School of the University of Texas at Austin was developed to provide the necessary software capable of simulating the energy transactions…

  18. Multiplexing electro-optic architectures for advanced aircraft integrated flight control systems

    NASA Technical Reports Server (NTRS)

    Seal, D. W.

    1989-01-01

    This report describes the results of a 10 month program sponsored by NASA. The objective of this program was to evaluate various optical sensor modulation technologies and to design an optimal Electro-Optic Architecture (EOA) for servicing remote clusters of sensors and actuators in advanced aircraft flight control systems. The EOA's supply optical power to remote sensors and actuators, process the modulated optical signals returned from the sensors, and produce conditioned electrical signals acceptable for use by a digital flight control computer or Vehicle Management System (VMS) computer. This study was part of a multi-year initiative under the Fiber Optic Control System Integration (FOCSI) program to design, develop, and test a totally integrated fiber optic flight/propulsion control system for application to advanced aircraft. Unlike earlier FOCSI studies, this program concentrated on the design of the EOA interface rather than the optical transducer technology itself.

  19. Toward a Theory of Variation in the Organization of the Word Reading System

    ERIC Educational Resources Information Center

    Rueckl, Jay G.

    2016-01-01

    The strategy underlying most computational models of word reading is to specify the organization of the reading system--its architecture and the processes and representations it employs--and to demonstrate that this organization would give rise to the behavior observed in word reading tasks. This approach fails to adequately address the variation…

  20. Modeling the impact of scaffold architecture and mechanical loading on collagen turnover in engineered cardiovascular tissues.

    PubMed

    Argento, G; de Jonge, N; Söntjens, S H M; Oomens, C W J; Bouten, C V C; Baaijens, F P T

    2015-06-01

    The anisotropic collagen architecture of an engineered cardiovascular tissue has a major impact on its in vivo mechanical performance. This evolving collagen architecture is determined by initial scaffold microstructure and mechanical loading. Here, we developed and validated a theoretical and computational microscale model to quantitatively understand the interplay between scaffold architecture and mechanical loading on collagen synthesis and degradation. Using input from experimental studies, we hypothesize that both the microstructure of the scaffold and the loading conditions influence collagen turnover. The evaluation of the mechanical and topological properties of in vitro engineered constructs reveals that the formation of extracellular matrix layers on top of the scaffold surface influences the mechanical anisotropy on the construct. Results show that the microscale model can successfully capture the collagen arrangement between the fibers of an electrospun scaffold under static and cyclic loading conditions. Contact guidance by the scaffold, and not applied load, dominates the collagen architecture. Therefore, when the collagen grows inside the pores of the scaffold, pronounced scaffold anisotropy guarantees the development of a construct that mimics the mechanical anisotropy of the native cardiovascular tissue.

  1. A software methodology for compiling quantum programs

    NASA Astrophysics Data System (ADS)

    Häner, Thomas; Steiger, Damian S.; Svore, Krysta; Troyer, Matthias

    2018-04-01

    Quantum computers promise to transform our notions of computation by offering a completely new paradigm. To achieve scalable quantum computation, optimizing compilers and a corresponding software design flow will be essential. We present a software architecture for compiling quantum programs from a high-level language program to hardware-specific instructions. We describe the necessary layers of abstraction and their differences and similarities to classical layers of a computer-aided design flow. For each layer of the stack, we discuss the underlying methods for compilation and optimization. Our software methodology facilitates more rapid innovation among quantum algorithm designers, quantum hardware engineers, and experimentalists. It enables scalable compilation of complex quantum algorithms and can be targeted to any specific quantum hardware implementation.

  2. Concurrent extensions to the FORTRAN language for parallel programming of computational fluid dynamics algorithms

    NASA Technical Reports Server (NTRS)

    Weeks, Cindy Lou

    1986-01-01

    Experiments were conducted at NASA Ames Research Center to define multi-tasking software requirements for multiple-instruction, multiple-data stream (MIMD) computer architectures. The focus was on specifying solutions for algorithms in the field of computational fluid dynamics (CFD). The program objectives were to allow researchers to produce usable parallel application software as soon as possible after acquiring MIMD computer equipment, to provide researchers with an easy-to-learn and easy-to-use parallel software language which could be implemented on several different MIMD machines, and to enable researchers to list preferred design specifications for future MIMD computer architectures. Analysis of CFD algorithms indicated that extensions of an existing programming language, adaptable to new computer architectures, provided the best solution to meeting program objectives. The CoFORTRAN Language was written in response to these objectives and to provide researchers a means to experiment with parallel software solutions to CFD algorithms on machines with parallel architectures.

  3. Neuromorphic Computing, Architectures, Models, and Applications. A Beyond-CMOS Approach to Future Computing, June 29-July 1, 2016, Oak Ridge, TN

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Potok, Thomas; Schuman, Catherine; Patton, Robert

    The White House and Department of Energy have been instrumental in driving the development of a neuromorphic computing program to help the United States continue its lead in basic research into (1) Beyond Exascale—high performance computing beyond Moore’s Law and von Neumann architectures, (2) Scientific Discovery—new paradigms for understanding increasingly large and complex scientific data, and (3) Emerging Architectures—assessing the potential of neuromorphic and quantum architectures. Neuromorphic computing spans a broad range of scientific disciplines from materials science to devices, to computer science, to neuroscience, all of which are required to solve the neuromorphic computing grand challenge. In our workshopmore » we focus on the computer science aspects, specifically from a neuromorphic device through an application. Neuromorphic devices present a very different paradigm to the computer science community from traditional von Neumann architectures, which raises six major questions about building a neuromorphic application from the device level. We used these fundamental questions to organize the workshop program and to direct the workshop panels and discussions. From the white papers, presentations, panels, and discussions, there emerged several recommendations on how to proceed.« less

  4. Neuromorphic Computing – From Materials Research to Systems Architecture Roundtable

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Schuller, Ivan K.; Stevens, Rick; Pino, Robinson

    2015-10-29

    Computation in its many forms is the engine that fuels our modern civilization. Modern computation—based on the von Neumann architecture—has allowed, until now, the development of continuous improvements, as predicted by Moore’s law. However, computation using current architectures and materials will inevitably—within the next 10 years—reach a limit because of fundamental scientific reasons. DOE convened a roundtable of experts in neuromorphic computing systems, materials science, and computer science in Washington on October 29-30, 2015 to address the following basic questions: Can brain-like (“neuromorphic”) computing devices based on new material concepts and systems be developed to dramatically outperform conventional CMOS basedmore » technology? If so, what are the basic research challenges for materials sicence and computing? The overarching answer that emerged was: The development of novel functional materials and devices incorporated into unique architectures will allow a revolutionary technological leap toward the implementation of a fully “neuromorphic” computer. To address this challenge, the following issues were considered: The main differences between neuromorphic and conventional computing as related to: signaling models, timing/clock, non-volatile memory, architecture, fault tolerance, integrated memory and compute, noise tolerance, analog vs. digital, and in situ learning New neuromorphic architectures needed to: produce lower energy consumption, potential novel nanostructured materials, and enhanced computation Device and materials properties needed to implement functions such as: hysteresis, stability, and fault tolerance Comparisons of different implementations: spin torque, memristors, resistive switching, phase change, and optical schemes for enhanced breakthroughs in performance, cost, fault tolerance, and/or manufacturability.« less

  5. A Study of Complex Deep Learning Networks on High Performance, Neuromorphic, and Quantum Computers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Potok, Thomas E; Schuman, Catherine D; Young, Steven R

    Current Deep Learning models use highly optimized convolutional neural networks (CNN) trained on large graphical processing units (GPU)-based computers with a fairly simple layered network topology, i.e., highly connected layers, without intra-layer connections. Complex topologies have been proposed, but are intractable to train on current systems. Building the topologies of the deep learning network requires hand tuning, and implementing the network in hardware is expensive in both cost and power. In this paper, we evaluate deep learning models using three different computing architectures to address these problems: quantum computing to train complex topologies, high performance computing (HPC) to automatically determinemore » network topology, and neuromorphic computing for a low-power hardware implementation. Due to input size limitations of current quantum computers we use the MNIST dataset for our evaluation. The results show the possibility of using the three architectures in tandem to explore complex deep learning networks that are untrainable using a von Neumann architecture. We show that a quantum computer can find high quality values of intra-layer connections and weights, while yielding a tractable time result as the complexity of the network increases; a high performance computer can find optimal layer-based topologies; and a neuromorphic computer can represent the complex topology and weights derived from the other architectures in low power memristive hardware. This represents a new capability that is not feasible with current von Neumann architecture. It potentially enables the ability to solve very complicated problems unsolvable with current computing technologies.« less

  6. Playable Serious Games for Studying and Programming Computational STEM and Informatics Applications of Distributed and Parallel Computer Architectures

    ERIC Educational Resources Information Center

    Amenyo, John-Thones

    2012-01-01

    Carefully engineered playable games can serve as vehicles for students and practitioners to learn and explore the programming of advanced computer architectures to execute applications, such as high performance computing (HPC) and complex, inter-networked, distributed systems. The article presents families of playable games that are grounded in…

  7. Generating and executing programs for a floating point single instruction multiple data instruction set architecture

    DOEpatents

    Gschwind, Michael K

    2013-04-16

    Mechanisms for generating and executing programs for a floating point (FP) only single instruction multiple data (SIMD) instruction set architecture (ISA) are provided. A computer program product comprising a computer recordable medium having a computer readable program recorded thereon is provided. The computer readable program, when executed on a computing device, causes the computing device to receive one or more instructions and execute the one or more instructions using logic in an execution unit of the computing device. The logic implements a floating point (FP) only single instruction multiple data (SIMD) instruction set architecture (ISA), based on data stored in a vector register file of the computing device. The vector register file is configured to store both scalar and floating point values as vectors having a plurality of vector elements.

  8. Highly parallel computation

    NASA Technical Reports Server (NTRS)

    Denning, Peter J.; Tichy, Walter F.

    1990-01-01

    Highly parallel computing architectures are the only means to achieve the computation rates demanded by advanced scientific problems. A decade of research has demonstrated the feasibility of such machines and current research focuses on which architectures designated as multiple instruction multiple datastream (MIMD) and single instruction multiple datastream (SIMD) have produced the best results to date; neither shows a decisive advantage for most near-homogeneous scientific problems. For scientific problems with many dissimilar parts, more speculative architectures such as neural networks or data flow may be needed.

  9. Switching from computer to microcomputer architecture education

    NASA Astrophysics Data System (ADS)

    Bolanakis, Dimosthenis E.; Kotsis, Konstantinos T.; Laopoulos, Theodore

    2010-03-01

    In the last decades, the technological and scientific evolution of the computing discipline has been widely affecting research in software engineering education, which nowadays advocates more enlightened and liberal ideas. This article reviews cross-disciplinary research on a computer architecture class in consideration of its switching to microcomputer architecture. The authors present their strategies towards a successful crossing of boundaries between engineering disciplines. This communication aims at providing a different aspect on professional courses that are, nowadays, addressed at the expense of traditional courses.

  10. Three-Dimensional Nanobiocomputing Architectures With Neuronal Hypercells

    DTIC Science & Technology

    2007-06-01

    Neumann architectures, and CMOS fabrication. Novel solutions of massive parallel distributed computing and processing (pipelined due to systolic... and processing platforms utilizing molecular hardware within an enabling organization and architecture. The design technology is based on utilizing a...Microsystems and Nanotechnologies investigated a novel 3D3 (Hardware Software Nanotechnology) technology to design super-high performance computing

  11. PathCase-SB architecture and database design

    PubMed Central

    2011-01-01

    Background Integration of metabolic pathways resources and regulatory metabolic network models, and deploying new tools on the integrated platform can help perform more effective and more efficient systems biology research on understanding the regulation in metabolic networks. Therefore, the tasks of (a) integrating under a single database environment regulatory metabolic networks and existing models, and (b) building tools to help with modeling and analysis are desirable and intellectually challenging computational tasks. Description PathCase Systems Biology (PathCase-SB) is built and released. The PathCase-SB database provides data and API for multiple user interfaces and software tools. The current PathCase-SB system provides a database-enabled framework and web-based computational tools towards facilitating the development of kinetic models for biological systems. PathCase-SB aims to integrate data of selected biological data sources on the web (currently, BioModels database and KEGG), and to provide more powerful and/or new capabilities via the new web-based integrative framework. This paper describes architecture and database design issues encountered in PathCase-SB's design and implementation, and presents the current design of PathCase-SB's architecture and database. Conclusions PathCase-SB architecture and database provide a highly extensible and scalable environment with easy and fast (real-time) access to the data in the database. PathCase-SB itself is already being used by researchers across the world. PMID:22070889

  12. Switching from Computer to Microcomputer Architecture Education

    ERIC Educational Resources Information Center

    Bolanakis, Dimosthenis E.; Kotsis, Konstantinos T.; Laopoulos, Theodore

    2010-01-01

    In the last decades, the technological and scientific evolution of the computing discipline has been widely affecting research in software engineering education, which nowadays advocates more enlightened and liberal ideas. This article reviews cross-disciplinary research on a computer architecture class in consideration of its switching to…

  13. An Architecture for Cross-Cloud System Management

    NASA Astrophysics Data System (ADS)

    Dodda, Ravi Teja; Smith, Chris; van Moorsel, Aad

    The emergence of the cloud computing paradigm promises flexibility and adaptability through on-demand provisioning of compute resources. As the utilization of cloud resources extends beyond a single provider, for business as well as technical reasons, the issue of effectively managing such resources comes to the fore. Different providers expose different interfaces to their compute resources utilizing varied architectures and implementation technologies. This heterogeneity poses a significant system management problem, and can limit the extent to which the benefits of cross-cloud resource utilization can be realized. We address this problem through the definition of an architecture to facilitate the management of compute resources from different cloud providers in an homogenous manner. This preserves the flexibility and adaptability promised by the cloud computing paradigm, whilst enabling the benefits of cross-cloud resource utilization to be realized. The practical efficacy of the architecture is demonstrated through an implementation utilizing compute resources managed through different interfaces on the Amazon Elastic Compute Cloud (EC2) service. Additionally, we provide empirical results highlighting the performance differential of these different interfaces, and discuss the impact of this performance differential on efficiency and profitability.

  14. Information processing architecture of functionally defined clusters in the macaque cortex.

    PubMed

    Shen, Kelly; Bezgin, Gleb; Hutchison, R Matthew; Gati, Joseph S; Menon, Ravi S; Everling, Stefan; McIntosh, Anthony R

    2012-11-28

    Computational and empirical neuroimaging studies have suggested that the anatomical connections between brain regions primarily constrain their functional interactions. Given that the large-scale organization of functional networks is determined by the temporal relationships between brain regions, the structural limitations may extend to the global characteristics of functional networks. Here, we explored the extent to which the functional network community structure is determined by the underlying anatomical architecture. We directly compared macaque (Macaca fascicularis) functional connectivity (FC) assessed using spontaneous blood oxygen level-dependent functional magnetic resonance imaging (BOLD-fMRI) to directed anatomical connectivity derived from macaque axonal tract tracing studies. Consistent with previous reports, FC increased with increasing strength of anatomical connection, and FC was also present between regions that had no direct anatomical connection. We observed moderate similarity between the FC of each region and its anatomical connectivity. Notably, anatomical connectivity patterns, as described by structural motifs, were different within and across functional modules: partitioning of the functional network was supported by dense bidirectional anatomical connections within clusters and unidirectional connections between clusters. Together, our data directly demonstrate that the FC patterns observed in resting-state BOLD-fMRI are dictated by the underlying neuroanatomical architecture. Importantly, we show how this architecture contributes to the global organizational principles of both functional specialization and integration.

  15. A Low Cost VLSI Architecture for Spike Sorting Based on Feature Extraction with Peak Search.

    PubMed

    Chang, Yuan-Jyun; Hwang, Wen-Jyi; Chen, Chih-Chang

    2016-12-07

    The goal of this paper is to present a novel VLSI architecture for spike sorting with high classification accuracy, low area costs and low power consumption. A novel feature extraction algorithm with low computational complexities is proposed for the design of the architecture. In the feature extraction algorithm, a spike is separated into two portions based on its peak value. The area of each portion is then used as a feature. The algorithm is simple to implement and less susceptible to noise interference. Based on the algorithm, a novel architecture capable of identifying peak values and computing spike areas concurrently is proposed. To further accelerate the computation, a spike can be divided into a number of segments for the local feature computation. The local features are subsequently merged with the global ones by a simple hardware circuit. The architecture can also be easily operated in conjunction with the circuits for commonly-used spike detection algorithms, such as the Non-linear Energy Operator (NEO). The architecture has been implemented by an Application-Specific Integrated Circuit (ASIC) with 90-nm technology. Comparisons to the existing works show that the proposed architecture is well suited for real-time multi-channel spike detection and feature extraction requiring low hardware area costs, low power consumption and high classification accuracy.

  16. A Self-Synthesis Approach to Perceptual Learning for Multisensory Fusion in Robotics

    PubMed Central

    Axenie, Cristian; Richter, Christoph; Conradt, Jörg

    2016-01-01

    Biological and technical systems operate in a rich multimodal environment. Due to the diversity of incoming sensory streams a system perceives and the variety of motor capabilities a system exhibits there is no single representation and no singular unambiguous interpretation of such a complex scene. In this work we propose a novel sensory processing architecture, inspired by the distributed macro-architecture of the mammalian cortex. The underlying computation is performed by a network of computational maps, each representing a different sensory quantity. All the different sensory streams enter the system through multiple parallel channels. The system autonomously associates and combines them into a coherent representation, given incoming observations. These processes are adaptive and involve learning. The proposed framework introduces mechanisms for self-creation and learning of the functional relations between the computational maps, encoding sensorimotor streams, directly from the data. Its intrinsic scalability, parallelisation, and automatic adaptation to unforeseen sensory perturbations make our approach a promising candidate for robust multisensory fusion in robotic systems. We demonstrate this by applying our model to a 3D motion estimation on a quadrotor. PMID:27775621

  17. Generic Divide and Conquer Internet-Based Computing

    NASA Technical Reports Server (NTRS)

    Radenski, Atanas; Follen, Gregory J. (Technical Monitor)

    2001-01-01

    The rapid growth of internet-based applications and the proliferation of networking technologies have been transforming traditional commercial application areas as well as computer and computational sciences and engineering. This growth stimulates the exploration of new, internet-oriented software technologies that can open new research and application opportunities not only for the commercial world, but also for the scientific and high -performance computing applications community. The general goal of this research project is to contribute to better understanding of the transition to internet-based high -performance computing and to develop solutions for some of the difficulties of this transition. More specifically, our goal is to design an architecture for generic divide and conquer internet-based computing, to develop a portable implementation of this architecture, to create an example library of high-performance divide-and-conquer computing agents that run on top of this architecture, and to evaluate the performance of these agents. We have been designing an architecture that incorporates a master task-pool server and utilizes satellite computational servers that operate on the Internet in a dynamically changing large configuration of lower-end nodes provided by volunteer contributors. Our designed architecture is intended to be complementary to and accessible from computational grids such as Globus, Legion, and Condor. Grids provide remote access to existing high-end computing resources; in contrast, our goal is to utilize idle processor time of lower-end internet nodes. Our project is focused on a generic divide-and-conquer paradigm and its applications that operate on a loose and ever changing pool of lower-end internet nodes.

  18. THE COMPUTER AND THE ARCHITECTURAL PROFESSION.

    ERIC Educational Resources Information Center

    HAVILAND, DAVID S.

    THE ROLE OF ADVANCING TECHNOLOGY IN THE FIELD OF ARCHITECTURE IS DISCUSSED IN THIS REPORT. PROBLEMS IN COMMUNICATION AND THE DESIGN PROCESS ARE IDENTIFIED. ADVANTAGES AND DISADVANTAGES OF COMPUTERS ARE MENTIONED IN RELATION TO MAN AND MACHINE INTERACTION. PRESENT AND FUTURE IMPLICATIONS OF COMPUTER USAGE ARE IDENTIFIED AND DISCUSSED WITH RESPECT…

  19. The Contribution of Visualization to Learning Computer Architecture

    ERIC Educational Resources Information Center

    Yehezkel, Cecile; Ben-Ari, Mordechai; Dreyfus, Tommy

    2007-01-01

    This paper describes a visualization environment and associated learning activities designed to improve learning of computer architecture. The environment, EasyCPU, displays a model of the components of a computer and the dynamic processes involved in program execution. We present the results of a research program that analysed the contribution of…

  20. Technology advances and market forces: Their impact on high performance architectures

    NASA Technical Reports Server (NTRS)

    Best, D. R.

    1978-01-01

    Reasonable projections into future supercomputer architectures and technology require an analysis of the computer industry market environment, the current capabilities and trends within the component industry, and the research activities on computer architecture in the industrial and academic communities. Management, programmer, architect, and user must cooperate to increase the efficiency of supercomputer development efforts. Care must be taken to match the funding, compiler, architecture and application with greater attention to testability, maintainability, reliability, and usability than supercomputer development programs of the past.

  1. A Case Study on Neural Inspired Dynamic Memory Management Strategies for High Performance Computing.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vineyard, Craig Michael; Verzi, Stephen Joseph

    As high performance computing architectures pursue more computational power there is a need for increased memory capacity and bandwidth as well. A multi-level memory (MLM) architecture addresses this need by combining multiple memory types with different characteristics as varying levels of the same architecture. How to efficiently utilize this memory infrastructure is an unknown challenge, and in this research we sought to investigate whether neural inspired approaches can meaningfully help with memory management. In particular we explored neurogenesis inspired re- source allocation, and were able to show a neural inspired mixed controller policy can beneficially impact how MLM architectures utilizemore » memory.« less

  2. Computing architecture for autonomous microgrids

    DOEpatents

    Goldsmith, Steven Y.

    2015-09-29

    A computing architecture that facilitates autonomously controlling operations of a microgrid is described herein. A microgrid network includes numerous computing devices that execute intelligent agents, each of which is assigned to a particular entity (load, source, storage device, or switch) in the microgrid. The intelligent agents can execute in accordance with predefined protocols to collectively perform computations that facilitate uninterrupted control of the .

  3. Blueprint for a microwave trapped ion quantum computer.

    PubMed

    Lekitsch, Bjoern; Weidt, Sebastian; Fowler, Austin G; Mølmer, Klaus; Devitt, Simon J; Wunderlich, Christof; Hensinger, Winfried K

    2017-02-01

    The availability of a universal quantum computer may have a fundamental impact on a vast number of research fields and on society as a whole. An increasingly large scientific and industrial community is working toward the realization of such a device. An arbitrarily large quantum computer may best be constructed using a modular approach. We present a blueprint for a trapped ion-based scalable quantum computer module, making it possible to create a scalable quantum computer architecture based on long-wavelength radiation quantum gates. The modules control all operations as stand-alone units, are constructed using silicon microfabrication techniques, and are within reach of current technology. To perform the required quantum computations, the modules make use of long-wavelength radiation-based quantum gate technology. To scale this microwave quantum computer architecture to a large size, we present a fully scalable design that makes use of ion transport between different modules, thereby allowing arbitrarily many modules to be connected to construct a large-scale device. A high error-threshold surface error correction code can be implemented in the proposed architecture to execute fault-tolerant operations. With appropriate adjustments, the proposed modules are also suitable for alternative trapped ion quantum computer architectures, such as schemes using photonic interconnects.

  4. JETSPIN: A specific-purpose open-source software for simulations of nanofiber electrospinning

    NASA Astrophysics Data System (ADS)

    Lauricella, Marco; Pontrelli, Giuseppe; Coluzza, Ivan; Pisignano, Dario; Succi, Sauro

    2015-12-01

    We present the open-source computer program JETSPIN, specifically designed to simulate the electrospinning process of nanofibers. Its capabilities are shown with proper reference to the underlying model, as well as a description of the relevant input variables and associated test-case simulations. The various interactions included in the electrospinning model implemented in JETSPIN are discussed in detail. The code is designed to exploit different computational architectures, from single to parallel processor workstations. This paper provides an overview of JETSPIN, focusing primarily on its structure, parallel implementations, functionality, performance, and availability.

  5. Examining ion channel properties using free-energy methods.

    PubMed

    Domene, Carmen; Furini, Simone

    2009-01-01

    Recent advances in structural biology have revealed the architecture of a number of transmembrane channels, allowing for these complex biological systems to be understood in atomistic detail. Computational simulations are a powerful tool by which the dynamic and energetic properties, and thereby the function of these protein architectures, can be investigated. The experimentally observable properties of a system are often determined more by energetic than dynamics, and therefore understanding the underlying free energy (FE) of biophysical processes is of crucial importance. Critical to the accurate evaluation of FE values are the problems of obtaining accurate sampling of complex biological energy landscapes, and of obtaining accurate representations of the potential energy of a system, this latter problem having been addressed through the development of molecular force fields. While these challenges are common to all FE methods, depending on the system under study, and the questions being asked of it, one technique for FE calculation may be preferable to another, the choice of method and simulation protocol being crucial to achieve efficiency. Applied in a correct manner, FE calculations represent a predictive and affordable computational tool with which to make relevant contact with experiments. This chapter, therefore, aims to give an overview of the most widely implemented computational methods used to calculate the FE associated with particular biochemical or biophysical events, and to highlight their recent applications to ion channels. Copyright © 2009 Elsevier Inc. All rights reserved.

  6. Computer architecture for efficient algorithmic executions in real-time systems: New technology for avionics systems and advanced space vehicles

    NASA Technical Reports Server (NTRS)

    Carroll, Chester C.; Youngblood, John N.; Saha, Aindam

    1987-01-01

    Improvements and advances in the development of computer architecture now provide innovative technology for the recasting of traditional sequential solutions into high-performance, low-cost, parallel system to increase system performance. Research conducted in development of specialized computer architecture for the algorithmic execution of an avionics system, guidance and control problem in real time is described. A comprehensive treatment of both the hardware and software structures of a customized computer which performs real-time computation of guidance commands with updated estimates of target motion and time-to-go is presented. An optimal, real-time allocation algorithm was developed which maps the algorithmic tasks onto the processing elements. This allocation is based on the critical path analysis. The final stage is the design and development of the hardware structures suitable for the efficient execution of the allocated task graph. The processing element is designed for rapid execution of the allocated tasks. Fault tolerance is a key feature of the overall architecture. Parallel numerical integration techniques, tasks definitions, and allocation algorithms are discussed. The parallel implementation is analytically verified and the experimental results are presented. The design of the data-driven computer architecture, customized for the execution of the particular algorithm, is discussed.

  7. Computer architecture for efficient algorithmic executions in real-time systems: new technology for avionics systems and advanced space vehicles

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Carroll, C.C.; Youngblood, J.N.; Saha, A.

    1987-12-01

    Improvements and advances in the development of computer architecture now provide innovative technology for the recasting of traditional sequential solutions into high-performance, low-cost, parallel system to increase system performance. Research conducted in development of specialized computer architecture for the algorithmic execution of an avionics system, guidance and control problem in real time is described. A comprehensive treatment of both the hardware and software structures of a customized computer which performs real-time computation of guidance commands with updated estimates of target motion and time-to-go is presented. An optimal, real-time allocation algorithm was developed which maps the algorithmic tasks onto the processingmore » elements. This allocation is based on the critical path analysis. The final stage is the design and development of the hardware structures suitable for the efficient execution of the allocated task graph. The processing element is designed for rapid execution of the allocated tasks. Fault tolerance is a key feature of the overall architecture. Parallel numerical integration techniques, tasks definitions, and allocation algorithms are discussed. The parallel implementation is analytically verified and the experimental results are presented. The design of the data-driven computer architecture, customized for the execution of the particular algorithm, is discussed.« less

  8. By Hand or Not By-Hand: A Case Study of Alternative Approaches to Parallelize CFD Applications

    NASA Technical Reports Server (NTRS)

    Yan, Jerry C.; Bailey, David (Technical Monitor)

    1997-01-01

    While parallel processing promises to speed up applications by several orders of magnitude, the performance achieved still depends upon several factors, including the multiprocessor architecture, system software, data distribution and alignment, as well as the methods used for partitioning the application and mapping its components onto the architecture. The existence of the Gorden Bell Prize given out at Supercomputing every year suggests that while good performance can be attained for real applications on general purpose multiprocessors, the large investment in man-power and time still has to be repeated for each application-machine combination. As applications and machine architectures become more complex, the cost and time-delays for obtaining performance by hand will become prohibitive. Computer users today can turn to three possible avenues for help: parallel libraries, parallel languages and compilers, interactive parallelization tools. The success of these methodologies, in turn, depends on proper application of data dependency analysis, program structure recognition and transformation, performance prediction as well as exploitation of user supplied knowledge. NASA has been developing multidisciplinary applications on highly parallel architectures under the High Performance Computing and Communications Program. Over the past six years, the transition of underlying hardware and system software have forced the scientists to spend a large effort to migrate and recede their applications. Various attempts to exploit software tools to automate the parallelization process have not produced favorable results. In this paper, we report our most recent experience with CAPTOOL, a package developed at Greenwich University. We have chosen CAPTOOL for three reasons: 1. CAPTOOL accepts a FORTRAN 77 program as input. This suggests its potential applicability to a large collection of legacy codes currently in use. 2. CAPTOOL employs domain decomposition to obtain parallelism. Although the fact that not all kinds of parallelism are handled may seem unappealing, many NASA applications in computational aerosciences as well as earth and space sciences are amenable to domain decomposition. 3. CAPTOOL generates code for a large variety of environments employed across NASA centers: MPI/PVM on network of workstations to the IBS/SP2 and CRAY/T3D.

  9. SU (2) lattice gauge theory simulations on Fermi GPUs

    NASA Astrophysics Data System (ADS)

    Cardoso, Nuno; Bicudo, Pedro

    2011-05-01

    In this work we explore the performance of CUDA in quenched lattice SU (2) simulations. CUDA, NVIDIA Compute Unified Device Architecture, is a hardware and software architecture developed by NVIDIA for computing on the GPU. We present an analysis and performance comparison between the GPU and CPU in single and double precision. Analyses with multiple GPUs and two different architectures (G200 and Fermi architectures) are also presented. In order to obtain a high performance, the code must be optimized for the GPU architecture, i.e., an implementation that exploits the memory hierarchy of the CUDA programming model. We produce codes for the Monte Carlo generation of SU (2) lattice gauge configurations, for the mean plaquette, for the Polyakov Loop at finite T and for the Wilson loop. We also present results for the potential using many configurations (50,000) without smearing and almost 2000 configurations with APE smearing. With two Fermi GPUs we have achieved an excellent performance of 200× the speed over one CPU, in single precision, around 110 Gflops/s. We also find that, using the Fermi architecture, double precision computations for the static quark-antiquark potential are not much slower (less than 2× slower) than single precision computations.

  10. Dynamic effects of root system architecture improve root water uptake in 1-D process-based soil-root hydrodynamics

    NASA Astrophysics Data System (ADS)

    Bouda, Martin; Saiers, James E.

    2017-12-01

    Root system architecture (RSA) can significantly affect plant access to water, total transpiration, as well as its partitioning by soil depth, with implications for surface heat, water, and carbon budgets. Despite recent advances in land surface model (LSM) descriptions of plant hydraulics, descriptions of RSA have not been included because of their three-dimensional complexity, which makes them generally too computationally costly. Here we demonstrate a new, process-based 1D layered model that captures the dynamic shifts in water potential gradients of 3D RSA under different soil moisture conditions: the RSA stencil. Using root systems calibrated to the rooting profiles of four plant functional types (PFT) of the Community Land Model, we show that the RSA stencil predicts plant water potentials within 2% to the outputs of a full 3D model, under the same assumptions on soil moisture heterogeneity, despite its trivial computational cost, resulting in improved predictions of water uptake and soil moisture compared to a model without RSA in a transient simulation. Our results suggest that LSM predictions of soil moisture dynamics and dependent variables can be improved by the implementation of this model, calibrated for individual PFTs using field observations.

  11. A distributed agent architecture for real-time knowledge-based systems: Real-time expert systems project, phase 1

    NASA Technical Reports Server (NTRS)

    Lee, S. Daniel

    1990-01-01

    We propose a distributed agent architecture (DAA) that can support a variety of paradigms based on both traditional real-time computing and artificial intelligence. DAA consists of distributed agents that are classified into two categories: reactive and cognitive. Reactive agents can be implemented directly in Ada to meet hard real-time requirements and be deployed on on-board embedded processors. A traditional real-time computing methodology under consideration is the rate monotonic theory that can guarantee schedulability based on analytical methods. AI techniques under consideration for reactive agents are approximate or anytime reasoning that can be implemented using Bayesian belief networks as in Guardian. Cognitive agents are traditional expert systems that can be implemented in ART-Ada to meet soft real-time requirements. During the initial design of cognitive agents, it is critical to consider the migration path that would allow initial deployment on ground-based workstations with eventual deployment on on-board processors. ART-Ada technology enables this migration while Lisp-based technologies make it difficult if not impossible. In addition to reactive and cognitive agents, a meta-level agent would be needed to coordinate multiple agents and to provide meta-level control.

  12. Unleashing the Power of Distributed CPU/GPU Architectures: Massive Astronomical Data Analysis and Visualization Case Study

    NASA Astrophysics Data System (ADS)

    Hassan, A. H.; Fluke, C. J.; Barnes, D. G.

    2012-09-01

    Upcoming and future astronomy research facilities will systematically generate terabyte-sized data sets moving astronomy into the Petascale data era. While such facilities will provide astronomers with unprecedented levels of accuracy and coverage, the increases in dataset size and dimensionality will pose serious computational challenges for many current astronomy data analysis and visualization tools. With such data sizes, even simple data analysis tasks (e.g. calculating a histogram or computing data minimum/maximum) may not be achievable without access to a supercomputing facility. To effectively handle such dataset sizes, which exceed today's single machine memory and processing limits, we present a framework that exploits the distributed power of GPUs and many-core CPUs, with a goal of providing data analysis and visualizing tasks as a service for astronomers. By mixing shared and distributed memory architectures, our framework effectively utilizes the underlying hardware infrastructure handling both batched and real-time data analysis and visualization tasks. Offering such functionality as a service in a “software as a service” manner will reduce the total cost of ownership, provide an easy to use tool to the wider astronomical community, and enable a more optimized utilization of the underlying hardware infrastructure.

  13. A real-time architecture for time-aware agents.

    PubMed

    Prouskas, Konstantinos-Vassileios; Pitt, Jeremy V

    2004-06-01

    This paper describes the specification and implementation of a new three-layer time-aware agent architecture. This architecture is designed for applications and environments where societies of humans and agents play equally active roles, but interact and operate in completely different time frames. The architecture consists of three layers: the April real-time run-time (ART) layer, the time aware layer (TAL), and the application agents layer (AAL). The ART layer forms the underlying real-time agent platform. An original online, real-time, dynamic priority-based scheduling algorithm is described for scheduling the computation time of agent processes, and it is shown that the algorithm's O(n) complexity and scalable performance are sufficient for application in real-time domains. The TAL layer forms an abstraction layer through which human and agent interactions are temporally unified, that is, handled in a common way irrespective of their temporal representation and scale. A novel O(n2) interaction scheduling algorithm is described for predicting and guaranteeing interactions' initiation and completion times. The time-aware predicting component of a workflow management system is also presented as an instance of the AAL layer. The described time-aware architecture addresses two key challenges in enabling agents to be effectively configured and applied in environments where humans and agents play equally active roles. It provides flexibility and adaptability in its real-time mechanisms while placing them under direct agent control, and it temporally unifies human and agent interactions.

  14. Multiprocessor architecture: Synthesis and evaluation

    NASA Technical Reports Server (NTRS)

    Standley, Hilda M.

    1990-01-01

    Multiprocessor computed architecture evaluation for structural computations is the focus of the research effort described. Results obtained are expected to lead to more efficient use of existing architectures and to suggest designs for new, application specific, architectures. The brief descriptions given outline a number of related efforts directed toward this purpose. The difficulty is analyzing an existing architecture or in designing a new computer architecture lies in the fact that the performance of a particular architecture, within the context of a given application, is determined by a number of factors. These include, but are not limited to, the efficiency of the computation algorithm, the programming language and support environment, the quality of the program written in the programming language, the multiplicity of the processing elements, the characteristics of the individual processing elements, the interconnection network connecting processors and non-local memories, and the shared memory organization covering the spectrum from no shared memory (all local memory) to one global access memory. These performance determiners may be loosely classified as being software or hardware related. This distinction is not clear or even appropriate in many cases. The effect of the choice of algorithm is ignored by assuming that the algorithm is specified as given. Effort directed toward the removal of the effect of the programming language and program resulted in the design of a high-level parallel programming language. Two characteristics of the fundamental structure of the architecture (memory organization and interconnection network) are examined.

  15. Feasibility study, software design, layout and simulation of a two-dimensional Fast Fourier Transform machine for use in optical array interferometry

    NASA Technical Reports Server (NTRS)

    Boriakoff, Valentin

    1994-01-01

    The goal of this project was the feasibility study of a particular architecture of a digital signal processing machine operating in real time which could do in a pipeline fashion the computation of the fast Fourier transform (FFT) of a time-domain sampled complex digital data stream. The particular architecture makes use of simple identical processors (called inner product processors) in a linear organization called a systolic array. Through computer simulation the new architecture to compute the FFT with systolic arrays was proved to be viable, and computed the FFT correctly and with the predicted particulars of operation. Integrated circuits to compute the operations expected of the vital node of the systolic architecture were proven feasible, and even with a 2 micron VLSI technology can execute the required operations in the required time. Actual construction of the integrated circuits was successful in one variant (fixed point) and unsuccessful in the other (floating point).

  16. An S N Algorithm for Modern Architectures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Baker, Randal Scott

    2016-08-29

    LANL discrete ordinates transport packages are required to perform large, computationally intensive time-dependent calculations on massively parallel architectures, where even a single such calculation may need many months to complete. While KBA methods scale out well to very large numbers of compute nodes, we are limited by practical constraints on the number of such nodes we can actually apply to any given calculation. Instead, we describe a modified KBA algorithm that allows realization of the reductions in solution time offered by both the current, and future, architectural changes within a compute node.

  17. Hierarchial parallel computer architecture defined by computational multidisciplinary mechanics

    NASA Technical Reports Server (NTRS)

    Padovan, Joe; Gute, Doug; Johnson, Keith

    1989-01-01

    The goal is to develop an architecture for parallel processors enabling optimal handling of multi-disciplinary computation of fluid-solid simulations employing finite element and difference schemes. The goals, philosphical and modeling directions, static and dynamic poly trees, example problems, interpolative reduction, the impact on solvers are shown in viewgraph form.

  18. Parallel computer vision

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Uhr, L.

    1987-01-01

    This book is written by research scientists involved in the development of massively parallel, but hierarchically structured, algorithms, architectures, and programs for image processing, pattern recognition, and computer vision. The book gives an integrated picture of the programs and algorithms that are being developed, and also of the multi-computer hardware architectures for which these systems are designed.

  19. Computer Architecture's Changing Role in Rebooting Computing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    DeBenedictis, Erik P.

    In this paper, Windows 95 started the Wintel era, in which Microsoft Windows running on Intel x86 microprocessors dominated the computer industry and changed the world. Retaining the x86 instruction set across many generations let users buy new and more capable microprocessors without having to buy software to work with new architectures.

  20. Computer Architecture's Changing Role in Rebooting Computing

    DOE PAGES

    DeBenedictis, Erik P.

    2017-04-26

    In this paper, Windows 95 started the Wintel era, in which Microsoft Windows running on Intel x86 microprocessors dominated the computer industry and changed the world. Retaining the x86 instruction set across many generations let users buy new and more capable microprocessors without having to buy software to work with new architectures.

  1. Design of a fault tolerant airborne digital computer. Volume 1: Architecture

    NASA Technical Reports Server (NTRS)

    Wensley, J. H.; Levitt, K. N.; Green, M. W.; Goldberg, J.; Neumann, P. G.

    1973-01-01

    This volume is concerned with the architecture of a fault tolerant digital computer for an advanced commercial aircraft. All of the computations of the aircraft, including those presently carried out by analogue techniques, are to be carried out in this digital computer. Among the important qualities of the computer are the following: (1) The capacity is to be matched to the aircraft environment. (2) The reliability is to be selectively matched to the criticality and deadline requirements of each of the computations. (3) The system is to be readily expandable. contractible, and (4) The design is to appropriate to post 1975 technology. Three candidate architectures are discussed and assessed in terms of the above qualities. Of the three candidates, a newly conceived architecture, Software Implemented Fault Tolerance (SIFT), provides the best match to the above qualities. In addition SIFT is particularly simple and believable. The other candidates, Bus Checker System (BUCS), also newly conceived in this project, and the Hopkins multiprocessor are potentially more efficient than SIFT in the use of redundancy, but otherwise are not as attractive.

  2. Formal design and verification of a reliable computing platform for real-time control. Phase 1: Results

    NASA Technical Reports Server (NTRS)

    Divito, Ben L.; Butler, Ricky W.; Caldwell, James L.

    1990-01-01

    A high-level design is presented for a reliable computing platform for real-time control applications. Design tradeoffs and analyses related to the development of the fault-tolerant computing platform are discussed. The architecture is formalized and shown to satisfy a key correctness property. The reliable computing platform uses replicated processors and majority voting to achieve fault tolerance. Under the assumption of a majority of processors working in each frame, it is shown that the replicated system computes the same results as a single processor system not subject to failures. Sufficient conditions are obtained to establish that the replicated system recovers from transient faults within a bounded amount of time. Three different voting schemes are examined and proved to satisfy the bounded recovery time conditions.

  3. Using a software-defined computer in teaching the basics of computer architecture and operation

    NASA Astrophysics Data System (ADS)

    Kosowska, Julia; Mazur, Grzegorz

    2017-08-01

    The paper describes the concept and implementation of SDC_One software-defined computer designed for experimental and didactic purposes. Equipped with extensive hardware monitoring mechanisms, the device enables the students to monitor the computer's operation on bus transfer cycle or instruction cycle basis, providing the practical illustration of basic aspects of computer's operation. In the paper, we describe the hardware monitoring capabilities of SDC_One and some scenarios of using it in teaching the basics of computer architecture and microprocessor operation.

  4. A Serial Bus Architecture for Parallel Processing Systems

    DTIC Science & Technology

    1986-09-01

    pins are needed to effect the data transfer. As Integrated Circuits grow in computational power, more communication capacity is needed, pushing...chip. The wider the communication path the more pins are needed to effect the data transfer. As Integrated Circuits grow in computational power, more...13 2. A Suitable Architecture Sought 14 II. OPTIMUM ARCHITECTURE OF LARGE INTEGRATED A. PARTIONING SILICON FOR MAXIMUM 1? 1. Transistor

  5. Partitioning in Avionics Architectures: Requirements, Mechanisms, and Assurance

    NASA Technical Reports Server (NTRS)

    Rushby, John

    1999-01-01

    Automated aircraft control has traditionally been divided into distinct "functions" that are implemented separately (e.g., autopilot, autothrottle, flight management); each function has its own fault-tolerant computer system, and dependencies among different functions are generally limited to the exchange of sensor and control data. A by-product of this "federated" architecture is that faults are strongly contained within the computer system of the function where they occur and cannot readily propagate to affect the operation of other functions. More modern avionics architectures contemplate supporting multiple functions on a single, shared, fault-tolerant computer system where natural fault containment boundaries are less sharply defined. Partitioning uses appropriate hardware and software mechanisms to restore strong fault containment to such integrated architectures. This report examines the requirements for partitioning, mechanisms for their realization, and issues in providing assurance for partitioning. Because partitioning shares some concerns with computer security, security models are reviewed and compared with the concerns of partitioning.

  6. Exascale computing and what it means for shock physics

    NASA Astrophysics Data System (ADS)

    Germann, Timothy

    2015-06-01

    The U.S. Department of Energy is preparing to launch an Exascale Computing Initiative, to address the myriad challenges required to deploy and effectively utilize an exascale-class supercomputer (i.e., one capable of performing 1018 operations per second) in the 2023 timeframe. Since physical (power dissipation) requirements limit clock rates to at most a few GHz, this will necessitate the coordination of on the order of a billion concurrent operations, requiring sophisticated system and application software, and underlying mathematical algorithms, that may differ radically from traditional approaches. Even at the smaller workstation or cluster level of computation, the massive concurrency and heterogeneity within each processor will impact computational scientists. Through the multi-institutional, multi-disciplinary Exascale Co-design Center for Materials in Extreme Environments (ExMatEx), we have initiated an early and deep collaboration between domain (computational materials) scientists, applied mathematicians, computer scientists, and hardware architects, in order to establish the relationships between algorithms, software stacks, and architectures needed to enable exascale-ready materials science application codes within the next decade. In my talk, I will discuss these challenges, and what it will mean for exascale-era electronic structure, molecular dynamics, and engineering-scale simulations of shock-compressed condensed matter. In particular, we anticipate that the emerging hierarchical, heterogeneous architectures can be exploited to achieve higher physical fidelity simulations using adaptive physics refinement. This work is supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research.

  7. A Fog Computing and Cloudlet Based Augmented Reality System for the Industry 4.0 Shipyard.

    PubMed

    Fernández-Caramés, Tiago M; Fraga-Lamas, Paula; Suárez-Albela, Manuel; Vilar-Montesinos, Miguel

    2018-06-02

    Augmented Reality (AR) is one of the key technologies pointed out by Industry 4.0 as a tool for enhancing the next generation of automated and computerized factories. AR can also help shipbuilding operators, since they usually need to interact with information (e.g., product datasheets, instructions, maintenance procedures, quality control forms) that could be handled easily and more efficiently through AR devices. This is the reason why Navantia, one of the 10 largest shipbuilders in the world, is studying the application of AR (among other technologies) in different shipyard environments in a project called "Shipyard 4.0". This article presents Navantia's industrial AR (IAR) architecture, which is based on cloudlets and on the fog computing paradigm. Both technologies are ideal for supporting physically-distributed, low-latency and QoS-aware applications that decrease the network traffic and the computational load of traditional cloud computing systems. The proposed IAR communications architecture is evaluated in real-world scenarios with payload sizes according to demanding Microsoft HoloLens applications and when using a cloud, a cloudlet and a fog computing system. The results show that, in terms of response delay, the fog computing system is the fastest when transferring small payloads (less than 128 KB), while for larger file sizes, the cloudlet solution is faster than the others. Moreover, under high loads (with many concurrent IAR clients), the cloudlet in some cases is more than four times faster than the fog computing system in terms of response delay.

  8. System architecture of a gallium arsenide one-gigahertz digital IC tester

    NASA Technical Reports Server (NTRS)

    Fouts, Douglas J.; Johnson, John M.; Butner, Steven E.; Long, Stephen I.

    1987-01-01

    The design for a 1-GHz digital integrated circuit tester for the evaluation of custom GaAs chips and subsystems is discussed. Technology-related problems affecting the design of a GaAs computer are discussed, with emphasis on the problems introduced by long printed-circuit-board interconnect. High-speed interface modules provide a link between the low-speed microprocessor and the chip under test. Memory-multiplexer and memory-shift register architectures for the storage of test vectors are described in addition to an architecture for local data storage consisting of a long chain of GaAs shift registers. The tester is constructed around a VME system card cage and backplane, and very little high-speed interconnect exists between boards. The tester has a three part self-test consisting of a CPU board confidence test, a main memory confidence test, and a high-speed interface module functional test.

  9. The NASA/OAST telerobot testbed architecture

    NASA Technical Reports Server (NTRS)

    Matijevic, J. R.; Zimmerman, W. F.; Dolinsky, S.

    1989-01-01

    Through a phased development such as a laboratory-based research testbed, the NASA/OAST Telerobot Testbed provides an environment for system test and demonstration of the technology which will usefully complement, significantly enhance, or even replace manned space activities. By integrating advanced sensing, robotic manipulation and intelligent control under human-interactive supervision, the Testbed will ultimately demonstrate execution of a variety of generic tasks suggestive of space assembly, maintenance, repair, and telescience. The Testbed system features a hierarchical layered control structure compatible with the incorporation of evolving technologies as they become available. The Testbed system is physically implemented in a computing architecture which allows for ease of integration of these technologies while preserving the flexibility for test of a variety of man-machine modes. The development currently in progress on the functional and implementation architectures of the NASA/OAST Testbed and capabilities planned for the coming years are presented.

  10. Advanced cloud fault tolerance system

    NASA Astrophysics Data System (ADS)

    Sumangali, K.; Benny, Niketa

    2017-11-01

    Cloud computing has become a prevalent on-demand service on the internet to store, manage and process data. A pitfall that accompanies cloud computing is the failures that can be encountered in the cloud. To overcome these failures, we require a fault tolerance mechanism to abstract faults from users. We have proposed a fault tolerant architecture, which is a combination of proactive and reactive fault tolerance. This architecture essentially increases the reliability and the availability of the cloud. In the future, we would like to compare evaluations of our proposed architecture with existing architectures and further improve it.

  11. Heavy Lift Vehicle (HLV) Avionics Flight Computing Architecture Study

    NASA Technical Reports Server (NTRS)

    Hodson, Robert F.; Chen, Yuan; Morgan, Dwayne R.; Butler, A. Marc; Sdhuh, Joseph M.; Petelle, Jennifer K.; Gwaltney, David A.; Coe, Lisa D.; Koelbl, Terry G.; Nguyen, Hai D.

    2011-01-01

    A NASA multi-Center study team was assembled from LaRC, MSFC, KSC, JSC and WFF to examine potential flight computing architectures for a Heavy Lift Vehicle (HLV) to better understand avionics drivers. The study examined Design Reference Missions (DRMs) and vehicle requirements that could impact the vehicles avionics. The study considered multiple self-checking and voting architectural variants and examined reliability, fault-tolerance, mass, power, and redundancy management impacts. Furthermore, a goal of the study was to develop the skills and tools needed to rapidly assess additional architectures should requirements or assumptions change.

  12. Decision-theoretic saliency: computational principles, biological plausibility, and implications for neurophysiology and psychophysics.

    PubMed

    Gao, Dashan; Vasconcelos, Nuno

    2009-01-01

    A decision-theoretic formulation of visual saliency, first proposed for top-down processing (object recognition) (Gao & Vasconcelos, 2005a), is extended to the problem of bottom-up saliency. Under this formulation, optimality is defined in the minimum probability of error sense, under a constraint of computational parsimony. The saliency of the visual features at a given location of the visual field is defined as the power of those features to discriminate between the stimulus at the location and a null hypothesis. For bottom-up saliency, this is the set of visual features that surround the location under consideration. Discrimination is defined in an information-theoretic sense and the optimal saliency detector derived for a class of stimuli that complies with known statistical properties of natural images. It is shown that under the assumption that saliency is driven by linear filtering, the optimal detector consists of what is usually referred to as the standard architecture of V1: a cascade of linear filtering, divisive normalization, rectification, and spatial pooling. The optimal detector is also shown to replicate the fundamental properties of the psychophysics of saliency: stimulus pop-out, saliency asymmetries for stimulus presence versus absence, disregard of feature conjunctions, and Weber's law. Finally, it is shown that the optimal saliency architecture can be applied to the solution of generic inference problems. In particular, for the class of stimuli studied, it performs the three fundamental operations of statistical inference: assessment of probabilities, implementation of Bayes decision rule, and feature selection.

  13. The new landscape of parallel computer architecture

    NASA Astrophysics Data System (ADS)

    Shalf, John

    2007-07-01

    The past few years has seen a sea change in computer architecture that will impact every facet of our society as every electronic device from cell phone to supercomputer will need to confront parallelism of unprecedented scale. Whereas the conventional multicore approach (2, 4, and even 8 cores) adopted by the computing industry will eventually hit a performance plateau, the highest performance per watt and per chip area is achieved using manycore technology (hundreds or even thousands of cores). However, fully unleashing the potential of the manycore approach to ensure future advances in sustained computational performance will require fundamental advances in computer architecture and programming models that are nothing short of reinventing computing. In this paper we examine the reasons behind the movement to exponentially increasing parallelism, and its ramifications for system design, applications and programming models.

  14. Characterizing attention with predictive network models

    PubMed Central

    Rosenberg, M. D.; Finn, E. S.; Scheinost, D.; Constable, R. T.; Chun, M. M.

    2017-01-01

    Recent work shows that models based on functional connectivity in large-scale brain networks can predict individuals’ attentional abilities. Some of the first generalizable neuromarkers of cognitive function, these models also inform our basic understanding of attention, providing empirical evidence that (1) attention is a network property of brain computation, (2) the functional architecture that underlies attention can be measured while people are not engaged in any explicit task, and (3) this architecture supports a general attentional ability common to several lab-based tasks and impaired in attention deficit hyperactivity disorder. Looking ahead, connectivity-based predictive models of attention and other cognitive abilities and behaviors may potentially improve the assessment, diagnosis, and treatment of clinical dysfunction. PMID:28238605

  15. Architecutres, Models, Algorithms, and Software Tools for Configurable Computing

    DTIC Science & Technology

    2000-03-06

    and J.G. Nash. The gated interconnection network for dynamic programming. Plenum, 1988 . [18] Ju wook Jang, Heonchul Park, and Viktor K. Prasanna. A ...Sep. 1997. [2] C. Ebeling, D. C. Cronquist , P. Franklin and C. Fisher, "RaPiD - A configurable computing architecture for compute-intensive...ABSTRACT (Maximum 200 words) The Models, Algorithms, and Architectures for Reconfigurable Computing (MAARC) project developed a sound framework for

  16. Architectures of Kepler Planet Systems with Approximate Bayesian Computation

    NASA Astrophysics Data System (ADS)

    Morehead, Robert C.; Ford, Eric B.

    2015-12-01

    The distribution of period normalized transit duration ratios among Kepler’s multiple transiting planet systems constrains the distributions of mutual orbital inclinations and orbital eccentricities. However, degeneracies in these parameters tied to the underlying number of planets in these systems complicate their interpretation. To untangle the true architecture of planet systems, the mutual inclination, eccentricity, and underlying planet number distributions must be considered simultaneously. The complexities of target selection, transit probability, detection biases, vetting, and follow-up observations make it impractical to write an explicit likelihood function. Approximate Bayesian computation (ABC) offers an intriguing path forward. In its simplest form, ABC generates a sample of trial population parameters from a prior distribution to produce synthetic datasets via a physically-motivated forward model. Samples are then accepted or rejected based on how close they come to reproducing the actual observed dataset to some tolerance. The accepted samples form a robust and useful approximation of the true posterior distribution of the underlying population parameters. We build on the considerable progress from the field of statistics to develop sequential algorithms for performing ABC in an efficient and flexible manner. We demonstrate the utility of ABC in exoplanet populations and present new constraints on the distributions of mutual orbital inclinations, eccentricities, and the relative number of short-period planets per star. We conclude with a discussion of the implications for other planet occurrence rate calculations, such as eta-Earth.

  17. Blueprint for a microwave trapped ion quantum computer

    PubMed Central

    Lekitsch, Bjoern; Weidt, Sebastian; Fowler, Austin G.; Mølmer, Klaus; Devitt, Simon J.; Wunderlich, Christof; Hensinger, Winfried K.

    2017-01-01

    The availability of a universal quantum computer may have a fundamental impact on a vast number of research fields and on society as a whole. An increasingly large scientific and industrial community is working toward the realization of such a device. An arbitrarily large quantum computer may best be constructed using a modular approach. We present a blueprint for a trapped ion–based scalable quantum computer module, making it possible to create a scalable quantum computer architecture based on long-wavelength radiation quantum gates. The modules control all operations as stand-alone units, are constructed using silicon microfabrication techniques, and are within reach of current technology. To perform the required quantum computations, the modules make use of long-wavelength radiation–based quantum gate technology. To scale this microwave quantum computer architecture to a large size, we present a fully scalable design that makes use of ion transport between different modules, thereby allowing arbitrarily many modules to be connected to construct a large-scale device. A high error–threshold surface error correction code can be implemented in the proposed architecture to execute fault-tolerant operations. With appropriate adjustments, the proposed modules are also suitable for alternative trapped ion quantum computer architectures, such as schemes using photonic interconnects. PMID:28164154

  18. Analysis of Introducing Active Learning Methodologies in a Basic Computer Architecture Course

    ERIC Educational Resources Information Center

    Arbelaitz, Olatz; José I. Martín; Muguerza, Javier

    2015-01-01

    This paper presents an analysis of introducing active methodologies in the Computer Architecture course taught in the second year of the Computer Engineering Bachelor's degree program at the University of the Basque Country (UPV/EHU), Spain. The paper reports the experience from three academic years, 2011-2012, 2012-2013, and 2013-2014, in which…

  19. A Survey and Evaluation of Simulators Suitable for Teaching Courses in Computer Architecture and Organization

    ERIC Educational Resources Information Center

    Nikolic, B.; Radivojevic, Z.; Djordjevic, J.; Milutinovic, V.

    2009-01-01

    Courses in Computer Architecture and Organization are regularly included in Computer Engineering curricula. These courses are usually organized in such a way that students obtain not only a purely theoretical experience, but also a practical understanding of the topics lectured. This practical work is usually done in a laboratory using simulators…

  20. A Project-Based Learning Approach to Programmable Logic Design and Computer Architecture

    ERIC Educational Resources Information Center

    Kellett, C. M.

    2012-01-01

    This paper describes a course in programmable logic design and computer architecture as it is taught at the University of Newcastle, Australia. The course is designed around a major design project and has two supplemental assessment tasks that are also described. The context of the Computer Engineering degree program within which the course is…

  1. From Archi Torture to Architecture: Undergraduate Students Design and Implement Computers Using the Multimedia Logic Emulator

    ERIC Educational Resources Information Center

    Stanley, Timothy D.; Wong, Lap Kei; Prigmore, Daniel; Benson, Justin; Fishler, Nathan; Fife, Leslie; Colton, Don

    2007-01-01

    Students learn better when they both hear and do. In computer architecture courses "doing" can be difficult in small schools without hardware laboratories hosted by computer engineering, electrical engineering, or similar departments. Software solutions exist. Our success with George Mills' Multimedia Logic (MML) is the focus of this paper. MML…

  2. Partitioning problems in parallel, pipelined and distributed computing

    NASA Technical Reports Server (NTRS)

    Bokhari, S.

    1985-01-01

    The problem of optimally assigning the modules of a parallel program over the processors of a multiple computer system is addressed. A Sum-Bottleneck path algorithm is developed that permits the efficient solution of many variants of this problem under some constraints on the structure of the partitions. In particular, the following problems are solved optimally for a single-host, multiple satellite system: partitioning multiple chain structured parallel programs, multiple arbitrarily structured serial programs and single tree structured parallel programs. In addition, the problems of partitioning chain structured parallel programs across chain connected systems and across shared memory (or shared bus) systems are also solved under certain constraints. All solutions for parallel programs are equally applicable to pipelined programs. These results extend prior research in this area by explicitly taking concurrency into account and permit the efficient utilization of multiple computer architectures for a wide range of problems of practical interest.

  3. The Role of Sketch in Architecture Design

    NASA Astrophysics Data System (ADS)

    Li, Yanjin; Ning, Wen

    2017-06-01

    With the continuous development of computer technology, we rely more and more on the computer and pay more and more attention to the final design results, so that we ignore the importance of the sketch. However, the sketch is the most basic and effective way of architecture design. Based on the study of the sketch of Tjibao Cultural Center of sketch, the paper explores the role of sketch in architecture design .

  4. SU (2) lattice gauge theory simulations on Fermi GPUs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cardoso, Nuno, E-mail: nunocardoso@cftp.ist.utl.p; Bicudo, Pedro, E-mail: bicudo@ist.utl.p

    2011-05-10

    In this work we explore the performance of CUDA in quenched lattice SU (2) simulations. CUDA, NVIDIA Compute Unified Device Architecture, is a hardware and software architecture developed by NVIDIA for computing on the GPU. We present an analysis and performance comparison between the GPU and CPU in single and double precision. Analyses with multiple GPUs and two different architectures (G200 and Fermi architectures) are also presented. In order to obtain a high performance, the code must be optimized for the GPU architecture, i.e., an implementation that exploits the memory hierarchy of the CUDA programming model. We produce codes formore » the Monte Carlo generation of SU (2) lattice gauge configurations, for the mean plaquette, for the Polyakov Loop at finite T and for the Wilson loop. We also present results for the potential using many configurations (50,000) without smearing and almost 2000 configurations with APE smearing. With two Fermi GPUs we have achieved an excellent performance of 200x the speed over one CPU, in single precision, around 110 Gflops/s. We also find that, using the Fermi architecture, double precision computations for the static quark-antiquark potential are not much slower (less than 2x slower) than single precision computations.« less

  5. Exploration of operator method digital optical computers for application to NASA

    NASA Technical Reports Server (NTRS)

    1990-01-01

    Digital optical computer design has been focused primarily towards parallel (single point-to-point interconnection) implementation. This architecture is compared to currently developing VHSIC systems. Using demonstrated multichannel acousto-optic devices, a figure of merit can be formulated. The focus is on a figure of merit termed Gate Interconnect Bandwidth Product (GIBP). Conventional parallel optical digital computer architecture demonstrates only marginal competitiveness at best when compared to projected semiconductor implements. Global, analog global, quasi-digital, and full digital interconnects are briefly examined as alternative to parallel digital computer architecture. Digital optical computing is becoming a very tough competitor to semiconductor technology since it can support a very high degree of three dimensional interconnect density and high degrees of Fan-In without capacitive loading effects at very low power consumption levels.

  6. Mirror representations innate versus determined by experience: a viewpoint from learning theory.

    PubMed

    Giese, Martin A

    2014-04-01

    From the viewpoint of pattern recognition and computational learning, mirror neurons form an interesting multimodal representation that links action perception and planning. While it seems unlikely that all details of such representations are specified by the genetic code, robust learning of such complex representations likely requires an appropriate interplay between plasticity, generalization, and anatomical constraints of the underlying neural architecture.

  7. Neural-Network Object-Recognition Program

    NASA Technical Reports Server (NTRS)

    Spirkovska, L.; Reid, M. B.

    1993-01-01

    HONTIOR computer program implements third-order neural network exhibiting invariance under translation, change of scale, and in-plane rotation. Invariance incorporated directly into architecture of network. Only one view of each object needed to train network for two-dimensional-translation-invariant recognition of object. Also used for three-dimensional-transformation-invariant recognition by training network on only set of out-of-plane rotated views. Written in C language.

  8. Layered Architectures for Quantum Computers and Quantum Repeaters

    NASA Astrophysics Data System (ADS)

    Jones, Nathan C.

    This chapter examines how to organize quantum computers and repeaters using a systematic framework known as layered architecture, where machine control is organized in layers associated with specialized tasks. The framework is flexible and could be used for analysis and comparison of quantum information systems. To demonstrate the design principles in practice, we develop architectures for quantum computers and quantum repeaters based on optically controlled quantum dots, showing how a myriad of technologies must operate synchronously to achieve fault-tolerance. Optical control makes information processing in this system very fast, scalable to large problem sizes, and extendable to quantum communication.

  9. Neural simulations on multi-core architectures.

    PubMed

    Eichner, Hubert; Klug, Tobias; Borst, Alexander

    2009-01-01

    Neuroscience is witnessing increasing knowledge about the anatomy and electrophysiological properties of neurons and their connectivity, leading to an ever increasing computational complexity of neural simulations. At the same time, a rather radical change in personal computer technology emerges with the establishment of multi-cores: high-density, explicitly parallel processor architectures for both high performance as well as standard desktop computers. This work introduces strategies for the parallelization of biophysically realistic neural simulations based on the compartmental modeling technique and results of such an implementation, with a strong focus on multi-core architectures and automation, i.e. user-transparent load balancing.

  10. Neural Simulations on Multi-Core Architectures

    PubMed Central

    Eichner, Hubert; Klug, Tobias; Borst, Alexander

    2009-01-01

    Neuroscience is witnessing increasing knowledge about the anatomy and electrophysiological properties of neurons and their connectivity, leading to an ever increasing computational complexity of neural simulations. At the same time, a rather radical change in personal computer technology emerges with the establishment of multi-cores: high-density, explicitly parallel processor architectures for both high performance as well as standard desktop computers. This work introduces strategies for the parallelization of biophysically realistic neural simulations based on the compartmental modeling technique and results of such an implementation, with a strong focus on multi-core architectures and automation, i.e. user-transparent load balancing. PMID:19636393

  11. Advanced flight computer. Special study

    NASA Technical Reports Server (NTRS)

    Coo, Dennis

    1995-01-01

    This report documents a special study to define a 32-bit radiation hardened, SEU tolerant flight computer architecture, and to investigate current or near-term technologies and development efforts that contribute to the Advanced Flight Computer (AFC) design and development. An AFC processing node architecture is defined. Each node may consist of a multi-chip processor as needed. The modular, building block approach uses VLSI technology and packaging methods that demonstrate a feasible AFC module in 1998 that meets that AFC goals. The defined architecture and approach demonstrate a clear low-risk, low-cost path to the 1998 production goal, with intermediate prototypes in 1996.

  12. Advanced information processing system for advanced launch system: Avionics architecture synthesis

    NASA Technical Reports Server (NTRS)

    Lala, Jaynarayan H.; Harper, Richard E.; Jaskowiak, Kenneth R.; Rosch, Gene; Alger, Linda S.; Schor, Andrei L.

    1991-01-01

    The Advanced Information Processing System (AIPS) is a fault-tolerant distributed computer system architecture that was developed to meet the real time computational needs of advanced aerospace vehicles. One such vehicle is the Advanced Launch System (ALS) being developed jointly by NASA and the Department of Defense to launch heavy payloads into low earth orbit at one tenth the cost (per pound of payload) of the current launch vehicles. An avionics architecture that utilizes the AIPS hardware and software building blocks was synthesized for ALS. The AIPS for ALS architecture synthesis process starting with the ALS mission requirements and ending with an analysis of the candidate ALS avionics architecture is described.

  13. GASP-PL/I Simulation of Integrated Avionic System Processor Architectures. M.S. Thesis

    NASA Technical Reports Server (NTRS)

    Brent, G. A.

    1978-01-01

    A development study sponsored by NASA was completed in July 1977 which proposed a complete integration of all aircraft instrumentation into a single modular system. Instead of using the current single-function aircraft instruments, computers compiled and displayed inflight information for the pilot. A processor architecture called the Team Architecture was proposed. This is a hardware/software approach to high-reliability computer systems. A follow-up study of the proposed Team Architecture is reported. GASP-PL/1 simulation models are used to evaluate the operating characteristics of the Team Architecture. The problem, model development, simulation programs and results at length are presented. Also included are program input formats, outputs and listings.

  14. Real-Time Cognitive Computing Architecture for Data Fusion in a Dynamic Environment

    NASA Technical Reports Server (NTRS)

    Duong, Tuan A.; Duong, Vu A.

    2012-01-01

    A novel cognitive computing architecture is conceptualized for processing multiple channels of multi-modal sensory data streams simultaneously, and fusing the information in real time to generate intelligent reaction sequences. This unique architecture is capable of assimilating parallel data streams that could be analog, digital, synchronous/asynchronous, and could be programmed to act as a knowledge synthesizer and/or an "intelligent perception" processor. In this architecture, the bio-inspired models of visual pathway and olfactory receptor processing are combined as processing components, to achieve the composite function of "searching for a source of food while avoiding the predator." The architecture is particularly suited for scene analysis from visual data and odorant.

  15. Electromagnetic Physics Models for Parallel Computing Architectures

    NASA Astrophysics Data System (ADS)

    Amadio, G.; Ananya, A.; Apostolakis, J.; Aurora, A.; Bandieramonte, M.; Bhattacharyya, A.; Bianchini, C.; Brun, R.; Canal, P.; Carminati, F.; Duhem, L.; Elvira, D.; Gheata, A.; Gheata, M.; Goulas, I.; Iope, R.; Jun, S. Y.; Lima, G.; Mohanty, A.; Nikitina, T.; Novak, M.; Pokorski, W.; Ribon, A.; Seghal, R.; Shadura, O.; Vallecorsa, S.; Wenzel, S.; Zhang, Y.

    2016-10-01

    The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part of the GeantV project. Results of preliminary performance evaluation and physics validation are presented as well.

  16. Optimal cube-connected cube multiprocessors

    NASA Technical Reports Server (NTRS)

    Sun, Xian-He; Wu, Jie

    1993-01-01

    Many CFD (computational fluid dynamics) and other scientific applications can be partitioned into subproblems. However, in general the partitioned subproblems are very large. They demand high performance computing power themselves, and the solutions of the subproblems have to be combined at each time step. The cube-connect cube (CCCube) architecture is studied. The CCCube architecture is an extended hypercube structure with each node represented as a cube. It requires fewer physical links between nodes than the hypercube, and provides the same communication support as the hypercube does on many applications. The reduced physical links can be used to enhance the bandwidth of the remaining links and, therefore, enhance the overall performance. The concept and the method to obtain optimal CCCubes, which are the CCCubes with a minimum number of links under a given total number of nodes, are proposed. The superiority of optimal CCCubes over standard hypercubes was also shown in terms of the link usage in the embedding of a binomial tree. A useful computation structure based on a semi-binomial tree for divide-and-conquer type of parallel algorithms was identified. It was shown that this structure can be implemented in optimal CCCubes without performance degradation compared with regular hypercubes. The result presented should provide a useful approach to design of scientific parallel computers.

  17. Computational structures for robotic computations

    NASA Technical Reports Server (NTRS)

    Lee, C. S. G.; Chang, P. R.

    1987-01-01

    The computational problem of inverse kinematics and inverse dynamics of robot manipulators by taking advantage of parallelism and pipelining architectures is discussed. For the computation of inverse kinematic position solution, a maximum pipelined CORDIC architecture has been designed based on a functional decomposition of the closed-form joint equations. For the inverse dynamics computation, an efficient p-fold parallel algorithm to overcome the recurrence problem of the Newton-Euler equations of motion to achieve the time lower bound of O(log sub 2 n) has also been developed.

  18. [Design and study of parallel computing environment of Monte Carlo simulation for particle therapy planning using a public cloud-computing infrastructure].

    PubMed

    Yokohama, Noriya

    2013-07-01

    This report was aimed at structuring the design of architectures and studying performance measurement of a parallel computing environment using a Monte Carlo simulation for particle therapy using a high performance computing (HPC) instance within a public cloud-computing infrastructure. Performance measurements showed an approximately 28 times faster speed than seen with single-thread architecture, combined with improved stability. A study of methods of optimizing the system operations also indicated lower cost.

  19. Next Generation Image-Based Phenotyping of Root System Architecture

    NASA Astrophysics Data System (ADS)

    Davis, T. W.; Shaw, N. M.; Cheng, H.; Larson, B. G.; Craft, E. J.; Shaff, J. E.; Schneider, D. J.; Piñeros, M. A.; Kochian, L. V.

    2016-12-01

    The development of the Plant Root Imaging and Data Acquisition (PRIDA) hardware/software system enables researchers to collect digital images, along with all the relevant experimental details, of a range of hydroponically grown agricultural crop roots for 2D and 3D trait analysis. Previous efforts of image-based root phenotyping focused on young cereals, such as rice; however, there is a growing need to measure both older and larger root systems, such as those of maize and sorghum, to improve our understanding of the underlying genetics that control favorable rooting traits for plant breeding programs to combat the agricultural risks presented by climate change. Therefore, a larger imaging apparatus has been prototyped for capturing 3D root architecture with an adaptive control system and innovative plant root growth media that retains three-dimensional root architectural features. New publicly available multi-platform software has been released with considerations for both high throughput (e.g., 3D imaging of a single root system in under ten minutes) and high portability (e.g., support for the Raspberry Pi computer). The software features unified data collection, management, exploration and preservation for continued trait and genetics analysis of root system architecture. The new system makes data acquisition efficient and includes features that address the needs of researchers and technicians, such as reduced imaging time, semi-automated camera calibration with uncertainty characterization, and safe storage of the critical experimental data.

  20. Solving the Cauchy-Riemann equations on parallel computers

    NASA Technical Reports Server (NTRS)

    Fatoohi, Raad A.; Grosch, Chester E.

    1987-01-01

    Discussed is the implementation of a single algorithm on three parallel-vector computers. The algorithm is a relaxation scheme for the solution of the Cauchy-Riemann equations; a set of coupled first order partial differential equations. The computers were chosen so as to encompass a variety of architectures. They are: the MPP, and SIMD machine with 16K bit serial processors; FLEX/32, an MIMD machine with 20 processors; and CRAY/2, an MIMD machine with four vector processors. The machine architectures are briefly described. The implementation of the algorithm is discussed in relation to these architectures and measures of the performance on each machine are given. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Conclusions are presented.

  1. A Modular Approach to Arithmetic and Logic Unit Design on a Reconfigurable Hardware Platform for Educational Purpose

    NASA Astrophysics Data System (ADS)

    Oztekin, Halit; Temurtas, Feyzullah; Gulbag, Ali

    The Arithmetic and Logic Unit (ALU) design is one of the important topics in Computer Architecture and Organization course in Computer and Electrical Engineering departments. There are ALU designs that have non-modular nature to be used as an educational tool. As the programmable logic technology has developed rapidly, it is feasible that ALU design based on Field Programmable Gate Array (FPGA) is implemented in this course. In this paper, we have adopted the modular approach to ALU design based on FPGA. All the modules in the ALU design are realized using schematic structure on Altera's Cyclone II Development board. Under this model, the ALU content is divided into four distinct modules. These are arithmetic unit except for multiplication and division operations, logic unit, multiplication unit and division unit. User can easily design any size of ALU unit since this approach has the modular nature. Then, this approach was applied to microcomputer architecture design named BZK.SAU.FPGA10.0 instead of the current ALU unit.

  2. Compact FPGA hardware architecture for public key encryption in embedded devices

    PubMed Central

    Morales-Sandoval, Miguel; Cumplido, René; Feregrino-Uribe, Claudia; Algredo-Badillo, Ignacio

    2018-01-01

    Security is a crucial requirement in the envisioned applications of the Internet of Things (IoT), where most of the underlying computing platforms are embedded systems with reduced computing capabilities and energy constraints. In this paper we present the design and evaluation of a scalable low-area FPGA hardware architecture that serves as a building block to accelerate the costly operations of exponentiation and multiplication in GF(p), commonly required in security protocols relying on public key encryption, such as in key agreement, authentication and digital signature. The proposed design can process operands of different size using the same datapath, which exhibits a significant reduction in area without loss of efficiency if compared to representative state of the art designs. For example, our design uses 96% less standard logic than a similar design optimized for performance, and 46% less resources than other design optimized for area. Even using fewer area resources, our design still performs better than its embedded software counterparts (190x and 697x). PMID:29360824

  3. Compact FPGA hardware architecture for public key encryption in embedded devices.

    PubMed

    Rodríguez-Flores, Luis; Morales-Sandoval, Miguel; Cumplido, René; Feregrino-Uribe, Claudia; Algredo-Badillo, Ignacio

    2018-01-01

    Security is a crucial requirement in the envisioned applications of the Internet of Things (IoT), where most of the underlying computing platforms are embedded systems with reduced computing capabilities and energy constraints. In this paper we present the design and evaluation of a scalable low-area FPGA hardware architecture that serves as a building block to accelerate the costly operations of exponentiation and multiplication in [Formula: see text], commonly required in security protocols relying on public key encryption, such as in key agreement, authentication and digital signature. The proposed design can process operands of different size using the same datapath, which exhibits a significant reduction in area without loss of efficiency if compared to representative state of the art designs. For example, our design uses 96% less standard logic than a similar design optimized for performance, and 46% less resources than other design optimized for area. Even using fewer area resources, our design still performs better than its embedded software counterparts (190x and 697x).

  4. Application of Tessellation in Architectural Geometry Design

    NASA Astrophysics Data System (ADS)

    Chang, Wei

    2018-06-01

    Tessellation plays a significant role in architectural geometry design, which is widely used both through history of architecture and in modern architectural design with the help of computer technology. Tessellation has been found since the birth of civilization. In terms of dimensions, there are two- dimensional tessellations and three-dimensional tessellations; in terms of symmetry, there are periodic tessellations and aperiodic tessellations. Besides, some special types of tessellations such as Voronoi Tessellation and Delaunay Triangles are also included. Both Geometry and Crystallography, the latter of which is the basic theory of three-dimensional tessellations, need to be studied. In history, tessellation was applied into skins or decorations in architecture. The development of Computer technology enables tessellation to be more powerful, as seen in surface control, surface display and structure design, etc. Therefore, research on the application of tessellation in architectural geometry design is of great necessity in architecture studies.

  5. Scalable software architecture for on-line multi-camera video processing

    NASA Astrophysics Data System (ADS)

    Camplani, Massimo; Salgado, Luis

    2011-03-01

    In this paper we present a scalable software architecture for on-line multi-camera video processing, that guarantees a good trade off between computational power, scalability and flexibility. The software system is modular and its main blocks are the Processing Units (PUs), and the Central Unit. The Central Unit works as a supervisor of the running PUs and each PU manages the acquisition phase and the processing phase. Furthermore, an approach to easily parallelize the desired processing application has been presented. In this paper, as case study, we apply the proposed software architecture to a multi-camera system in order to efficiently manage multiple 2D object detection modules in a real-time scenario. System performance has been evaluated under different load conditions such as number of cameras and image sizes. The results show that the software architecture scales well with the number of camera and can easily works with different image formats respecting the real time constraints. Moreover, the parallelization approach can be used in order to speed up the processing tasks with a low level of overhead.

  6. The RTE inversion on FPGA aboard the solar orbiter PHI instrument

    NASA Astrophysics Data System (ADS)

    Cobos Carrascosa, J. P.; Aparicio del Moral, B.; Ramos Mas, J. L.; Balaguer, M.; López Jiménez, A. C.; del Toro Iniesta, J. C.

    2016-07-01

    In this work we propose a multiprocessor architecture to reach high performance in floating point operations by using radiation tolerant FPGA devices, and under narrow time and power constraints. This architecture is used in the PHI instrument that carries out the scientific analysis aboard the ESA's Solar Orbiter mission. The proposed architecture, in a SIMD flavor, is aimed to be an accelerator within the Data Processing Unit (it is composed by a main Leon processor and two FPGAs) for carrying out the RTE inversion on board the spacecraft using a relatively slow FPGA device - Xilinx XQR4VSX55-. The proposed architecture squeezes the FPGA resources in order to reach the computational requirements and improves the ground-based system performance based on commercial CPUs regarding time and power consumption. In this work we demonstrate the feasibility of using this FPGA devices embedded in the SO/PHI instrument. With that goal in mind, we perform tests to evaluate the scientific results and to measure the processing time and power consumption for carrying out the RTE inversion.

  7. HYDRA : High-speed simulation architecture for precision spacecraft formation simulation

    NASA Technical Reports Server (NTRS)

    Martin, Bryan J.; Sohl, Garett.

    2003-01-01

    e Hierarchical Distributed Reconfigurable Architecture- is a scalable simulation architecture that provides flexibility and ease-of-use which take advantage of modern computation and communication hardware. It also provides the ability to implement distributed - or workstation - based simulations and high-fidelity real-time simulation from a common core. Originally designed to serve as a research platform for examining fundamental challenges in formation flying simulation for future space missions, it is also finding use in other missions and applications, all of which can take advantage of the underlying Object-Oriented structure to easily produce distributed simulations. Hydra automates the process of connecting disparate simulation components (Hydra Clients) through a client server architecture that uses high-level descriptions of data associated with each client to find and forge desirable connections (Hydra Services) at run time. Services communicate through the use of Connectors, which abstract messaging to provide single-interface access to any desired communication protocol, such as from shared-memory message passing to TCP/IP to ACE and COBRA. Hydra shares many features with the HLA, although providing more flexibility in connectivity services and behavior overriding.

  8. Architecture for a PACS primary diagnosis workstation

    NASA Astrophysics Data System (ADS)

    Shastri, Kaushal; Moran, Byron

    1990-08-01

    A major factor in determining the overall utility of a medical Picture Archiving and Communications (PACS) system is the functionality of the diagnostic workstation. Meyer-Ebrecht and Wendler [1] have proposed a modular picture computer architecture with high throughput and Perry et.al [2] have defined performance requirements for radiology workstations. In order to be clinically useful, a primary diagnosis workstation must not only provide functions of current viewing systems (e.g. mechanical alternators [3,4]) such as acceptable image quality, simultaneous viewing of multiple images, and rapid switching of image banks; but must also provide a diagnostic advantage over the current systems. This includes window-level functions on any image, simultaneous display of multi-modality images, rapid image manipulation, image processing, dynamic image display (cine), electronic image archival, hardcopy generation, image acquisition, network support, and an easy user interface. Implementation of such a workstation requires an underlying hardware architecture which provides high speed image transfer channels, local storage facilities, and image processing functions. This paper describes the hardware architecture of the Siemens Diagnostic Reporting Console (DRC) which meets these requirements.

  9. A single VLSI chip for computing syndromes in the (225, 223) Reed-Solomon decoder

    NASA Technical Reports Server (NTRS)

    Hsu, I. S.; Truong, T. K.; Shao, H. M.; Deutsch, L. J.

    1986-01-01

    A description of a single VLSI chip for computing syndromes in the (255, 223) Reed-Solomon decoder is presented. The architecture that leads to this single VLSI chip design makes use of the dual basis multiplication algorithm. The same architecture can be applied to design VLSI chips to compute various kinds of number theoretic transforms.

  10. A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL)

    NASA Technical Reports Server (NTRS)

    Carroll, Chester C.; Owen, Jeffrey E.

    1988-01-01

    A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL) is presented which overcomes the traditional disadvantages of simulations executed on a digital computer. The incorporation of parallel processing allows the mapping of simulations into a digital computer to be done in the same inherently parallel manner as they are currently mapped onto an analog computer. The direct-execution format maximizes the efficiency of the executed code since the need for a high level language compiler is eliminated. Resolution is greatly increased over that which is available with an analog computer without the sacrifice in execution speed normally expected with digitial computer simulations. Although this report covers all aspects of the new architecture, key emphasis is placed on the processing element configuration and the microprogramming of the ACLS constructs. The execution times for all ACLS constructs are computed using a model of a processing element based on the AMD 29000 CPU and the AMD 29027 FPU. The increase in execution speed provided by parallel processing is exemplified by comparing the derived execution times of two ACSL programs with the execution times for the same programs executed on a similar sequential architecture.

  11. The thermodynamic efficiency of computations made in cells across the range of life

    NASA Astrophysics Data System (ADS)

    Kempes, Christopher P.; Wolpert, David; Cohen, Zachary; Pérez-Mercader, Juan

    2017-11-01

    Biological organisms must perform computation as they grow, reproduce and evolve. Moreover, ever since Landauer's bound was proposed, it has been known that all computation has some thermodynamic cost-and that the same computation can be achieved with greater or smaller thermodynamic cost depending on how it is implemented. Accordingly an important issue concerning the evolution of life is assessing the thermodynamic efficiency of the computations performed by organisms. This issue is interesting both from the perspective of how close life has come to maximally efficient computation (presumably under the pressure of natural selection), and from the practical perspective of what efficiencies we might hope that engineered biological computers might achieve, especially in comparison with current computational systems. Here we show that the computational efficiency of translation, defined as free energy expended per amino acid operation, outperforms the best supercomputers by several orders of magnitude, and is only about an order of magnitude worse than the Landauer bound. However, this efficiency depends strongly on the size and architecture of the cell in question. In particular, we show that the useful efficiency of an amino acid operation, defined as the bulk energy per amino acid polymerization, decreases for increasing bacterial size and converges to the polymerization cost of the ribosome. This cost of the largest bacteria does not change in cells as we progress through the major evolutionary shifts to both single- and multicellular eukaryotes. However, the rates of total computation per unit mass are non-monotonic in bacteria with increasing cell size, and also change across different biological architectures, including the shift from unicellular to multicellular eukaryotes. This article is part of the themed issue 'Reconceptualizing the origins of life'.

  12. Modeling a Wireless Network for International Space Station

    NASA Technical Reports Server (NTRS)

    Alena, Richard; Yaprak, Ece; Lamouri, Saad

    2000-01-01

    This paper describes the application of wireless local area network (LAN) simulation modeling methods to the hybrid LAN architecture designed for supporting crew-computing tools aboard the International Space Station (ISS). These crew-computing tools, such as wearable computers and portable advisory systems, will provide crew members with real-time vehicle and payload status information and access to digital technical and scientific libraries, significantly enhancing human capabilities in space. A wireless network, therefore, will provide wearable computer and remote instruments with the high performance computational power needed by next-generation 'intelligent' software applications. Wireless network performance in such simulated environments is characterized by the sustainable throughput of data under different traffic conditions. This data will be used to help plan the addition of more access points supporting new modules and more nodes for increased network capacity as the ISS grows.

  13. A System Architecture for Efficient Transmission of Massive DNA Sequencing Data.

    PubMed

    Sağiroğlu, Mahmut Şamİl; Külekcİ, M Oğuzhan

    2017-11-01

    The DNA sequencing data analysis pipelines require significant computational resources. In that sense, cloud computing infrastructures appear as a natural choice for this processing. However, the first practical difficulty in reaching the cloud computing services is the transmission of the massive DNA sequencing data from where they are produced to where they will be processed. The daily practice here begins with compressing the data in FASTQ file format, and then sending these data via fast data transmission protocols. In this study, we address the weaknesses in that daily practice and present a new system architecture that incorporates the computational resources available on the client side while dynamically adapting itself to the available bandwidth. Our proposal considers the real-life scenarios, where the bandwidth of the connection between the parties may fluctuate, and also the computing power on the client side may be of any size ranging from moderate personal computers to powerful workstations. The proposed architecture aims at utilizing both the communication bandwidth and the computing resources for satisfying the ultimate goal of reaching the results as early as possible. We present a prototype implementation of the proposed architecture, and analyze several real-life cases, which provide useful insights for the sequencing centers, especially on deciding when to use a cloud service and in what conditions.

  14. Computer graphics in architecture and engineering

    NASA Technical Reports Server (NTRS)

    Greenberg, D. P.

    1975-01-01

    The present status of the application of computer graphics to the building profession or architecture and its relationship to other scientific and technical areas were discussed. It was explained that, due to the fragmented nature of architecture and building activities (in contrast to the aerospace industry), a comprehensive, economic utilization of computer graphics in this area is not practical and its true potential cannot now be realized due to the present inability of architects and structural, mechanical, and site engineers to rely on a common data base. Future emphasis will therefore have to be placed on a vertical integration of the construction process and effective use of a three-dimensional data base, rather than on waiting for any technological breakthrough in interactive computing.

  15. Innovative architectures for dense multi-microprocessor computers

    NASA Technical Reports Server (NTRS)

    Larson, Robert E.

    1989-01-01

    The purpose is to summarize a Phase 1 SBIR project performed for the NASA/Langley Computational Structural Mechanics Group. The project was performed from February to August 1987. The main objectives of the project were to: (1) expand upon previous research into the application of chordal ring architectures to the general problem of designing multi-microcomputer architectures, (2) attempt to identify a family of chordal rings such that each chordal ring can be simply expanded to produce the next member of the family, (3) perform a preliminary, high-level design of an expandable multi-microprocessor computer based upon chordal rings, (4) analyze the potential use of chordal ring based multi-microprocessors for sparse matrix problems and other applications arising in computational structural mechanics.

  16. Fault tolerant architectures for integrated aircraft electronics systems

    NASA Technical Reports Server (NTRS)

    Levitt, K. N.; Melliar-Smith, P. M.; Schwartz, R. L.

    1983-01-01

    Work into possible architectures for future flight control computer systems is described. Ada for Fault-Tolerant Systems, the NETS Network Error-Tolerant System architecture, and voting in asynchronous systems are covered.

  17. HyperForest: A high performance multi-processor architecture for real-time intelligent systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Garcia, P. Jr.; Rebeil, J.P.; Pollard, H.

    1997-04-01

    Intelligent Systems are characterized by the intensive use of computer power. The computer revolution of the last few years is what has made possible the development of the first generation of Intelligent Systems. Software for second generation Intelligent Systems will be more complex and will require more powerful computing engines in order to meet real-time constraints imposed by new robots, sensors, and applications. A multiprocessor architecture was developed that merges the advantages of message-passing and shared-memory structures: expendability and real-time compliance. The HyperForest architecture will provide an expandable real-time computing platform for computationally intensive Intelligent Systems and open the doorsmore » for the application of these systems to more complex tasks in environmental restoration and cleanup projects, flexible manufacturing systems, and DOE`s own production and disassembly activities.« less

  18. ARACHNE: A neural-neuroglial network builder with remotely controlled parallel computing

    PubMed Central

    Rusakov, Dmitri A.; Savtchenko, Leonid P.

    2017-01-01

    Creating and running realistic models of neural networks has hitherto been a task for computing professionals rather than experimental neuroscientists. This is mainly because such networks usually engage substantial computational resources, the handling of which requires specific programing skills. Here we put forward a newly developed simulation environment ARACHNE: it enables an investigator to build and explore cellular networks of arbitrary biophysical and architectural complexity using the logic of NEURON and a simple interface on a local computer or a mobile device. The interface can control, through the internet, an optimized computational kernel installed on a remote computer cluster. ARACHNE can combine neuronal (wired) and astroglial (extracellular volume-transmission driven) network types and adopt realistic cell models from the NEURON library. The program and documentation (current version) are available at GitHub repository https://github.com/LeonidSavtchenko/Arachne under the MIT License (MIT). PMID:28362877

  19. Parallel Computation of the Jacobian Matrix for Nonlinear Equation Solvers Using MATLAB

    NASA Technical Reports Server (NTRS)

    Rose, Geoffrey K.; Nguyen, Duc T.; Newman, Brett A.

    2017-01-01

    Demonstrating speedup for parallel code on a multicore shared memory PC can be challenging in MATLAB due to underlying parallel operations that are often opaque to the user. This can limit potential for improvement of serial code even for the so-called embarrassingly parallel applications. One such application is the computation of the Jacobian matrix inherent to most nonlinear equation solvers. Computation of this matrix represents the primary bottleneck in nonlinear solver speed such that commercial finite element (FE) and multi-body-dynamic (MBD) codes attempt to minimize computations. A timing study using MATLAB's Parallel Computing Toolbox was performed for numerical computation of the Jacobian. Several approaches for implementing parallel code were investigated while only the single program multiple data (spmd) method using composite objects provided positive results. Parallel code speedup is demonstrated but the goal of linear speedup through the addition of processors was not achieved due to PC architecture.

  20. Automatic partitioning of unstructured meshes for the parallel solution of problems in computational mechanics

    NASA Technical Reports Server (NTRS)

    Farhat, Charbel; Lesoinne, Michel

    1993-01-01

    Most of the recently proposed computational methods for solving partial differential equations on multiprocessor architectures stem from the 'divide and conquer' paradigm and involve some form of domain decomposition. For those methods which also require grids of points or patches of elements, it is often necessary to explicitly partition the underlying mesh, especially when working with local memory parallel processors. In this paper, a family of cost-effective algorithms for the automatic partitioning of arbitrary two- and three-dimensional finite element and finite difference meshes is presented and discussed in view of a domain decomposed solution procedure and parallel processing. The influence of the algorithmic aspects of a solution method (implicit/explicit computations), and the architectural specifics of a multiprocessor (SIMD/MIMD, startup/transmission time), on the design of a mesh partitioning algorithm are discussed. The impact of the partitioning strategy on load balancing, operation count, operator conditioning, rate of convergence and processor mapping is also addressed. Finally, the proposed mesh decomposition algorithms are demonstrated with realistic examples of finite element, finite volume, and finite difference meshes associated with the parallel solution of solid and fluid mechanics problems on the iPSC/2 and iPSC/860 multiprocessors.

  1. An exploration of neuromorphic systems and related design issues/challenges in dark silicon era

    NASA Astrophysics Data System (ADS)

    Chandaliya, Mudit; Chaturvedi, Nitin; Gurunarayanan, S.

    2018-03-01

    The current microprocessors has shown a remarkable performance and memory capacity improvement since its innovation. However, due to power and thermal limitations, only a fraction of cores can operate at full frequency at any instant of time irrespective of the advantages of new technology generation. This phenomenon of under-utilization of microprocessor is called as dark silicon which leads to distraction in innovative computing. To overcome the limitation of utilization wall, IBM technologies explored and invented neurosynaptic system chips. It has opened a wide scope of research in the field of innovative computing, technology, material sciences, machine learning etc. In this paper, we first reviewed the diverse stages of research that have been influential in the innovation of neurosynaptic architectures. These, architectures focuses on the development of brain-like framework which is efficient enough to execute a broad set of computations in real time while maintaining ultra-low power consumption as well as area considerations in mind. We also reveal the inadvertent challenges and the opportunities of designing neuromorphic systems as presented by the existing technologies in the dark silicon era, which constitute the utmost area of research in future.

  2. High performance cellular level agent-based simulation with FLAME for the GPU.

    PubMed

    Richmond, Paul; Walker, Dawn; Coakley, Simon; Romano, Daniela

    2010-05-01

    Driven by the availability of experimental data and ability to simulate a biological scale which is of immediate interest, the cellular scale is fast emerging as an ideal candidate for middle-out modelling. As with 'bottom-up' simulation approaches, cellular level simulations demand a high degree of computational power, which in large-scale simulations can only be achieved through parallel computing. The flexible large-scale agent modelling environment (FLAME) is a template driven framework for agent-based modelling (ABM) on parallel architectures ideally suited to the simulation of cellular systems. It is available for both high performance computing clusters (www.flame.ac.uk) and GPU hardware (www.flamegpu.com) and uses a formal specification technique that acts as a universal modelling format. This not only creates an abstraction from the underlying hardware architectures, but avoids the steep learning curve associated with programming them. In benchmarking tests and simulations of advanced cellular systems, FLAME GPU has reported massive improvement in performance over more traditional ABM frameworks. This allows the time spent in the development and testing stages of modelling to be drastically reduced and creates the possibility of real-time visualisation for simple visual face-validation.

  3. Motion camera based on a custom vision sensor and an FPGA architecture

    NASA Astrophysics Data System (ADS)

    Arias-Estrada, Miguel

    1998-09-01

    A digital camera for custom focal plane arrays was developed. The camera allows the test and development of analog or mixed-mode arrays for focal plane processing. The camera is used with a custom sensor for motion detection to implement a motion computation system. The custom focal plane sensor detects moving edges at the pixel level using analog VLSI techniques. The sensor communicates motion events using the event-address protocol associated to a temporal reference. In a second stage, a coprocessing architecture based on a field programmable gate array (FPGA) computes the time-of-travel between adjacent pixels. The FPGA allows rapid prototyping and flexible architecture development. Furthermore, the FPGA interfaces the sensor to a compact PC computer which is used for high level control and data communication to the local network. The camera could be used in applications such as self-guided vehicles, mobile robotics and smart surveillance systems. The programmability of the FPGA allows the exploration of further signal processing like spatial edge detection or image segmentation tasks. The article details the motion algorithm, the sensor architecture, the use of the event- address protocol for velocity vector computation and the FPGA architecture used in the motion camera system.

  4. Recent Developments in the Application of Biologically Inspired Computation to Chemical Sensing

    NASA Astrophysics Data System (ADS)

    Marco, S.; Gutierrez-Gálvez, A.

    2009-05-01

    Biological olfaction outperforms chemical instrumentation in specificity, response time, detection limit, coding capacity, time stability, robustness, size, power consumption, and portability. This biological function provides outstanding performance due, to a large extent, to the unique architecture of the olfactory pathway, which combines a high degree of redundancy, an efficient combinatorial coding along with unmatched chemical information processing mechanisms. The last decade has witnessed important advances in the understanding of the computational primitives underlying the functioning of the olfactory system. In this work, the state of the art concerning biologically inspired computation for chemical sensing will be reviewed. Instead of reviewing the whole body of computational neuroscience of olfaction, we restrict this review to the application of models to the processing of real chemical sensor data.

  5. Topical perspective on massive threading and parallelism.

    PubMed

    Farber, Robert M

    2011-09-01

    Unquestionably computer architectures have undergone a recent and noteworthy paradigm shift that now delivers multi- and many-core systems with tens to many thousands of concurrent hardware processing elements per workstation or supercomputer node. GPGPU (General Purpose Graphics Processor Unit) technology in particular has attracted significant attention as new software development capabilities, namely CUDA (Compute Unified Device Architecture) and OpenCL™, have made it possible for students as well as small and large research organizations to achieve excellent speedup for many applications over more conventional computing architectures. The current scientific literature reflects this shift with numerous examples of GPGPU applications that have achieved one, two, and in some special cases, three-orders of magnitude increased computational performance through the use of massive threading to exploit parallelism. Multi-core architectures are also evolving quickly to exploit both massive-threading and massive-parallelism such as the 1.3 million threads Blue Waters supercomputer. The challenge confronting scientists in planning future experimental and theoretical research efforts--be they individual efforts with one computer or collaborative efforts proposing to use the largest supercomputers in the world is how to capitalize on these new massively threaded computational architectures--especially as not all computational problems will scale to massive parallelism. In particular, the costs associated with restructuring software (and potentially redesigning algorithms) to exploit the parallelism of these multi- and many-threaded machines must be considered along with application scalability and lifespan. This perspective is an overview of the current state of threading and parallelize with some insight into the future. Published by Elsevier Inc.

  6. Scaling Watershed Models: Modern Approaches to Science Computation with MapReduce, Parallelization, and Cloud Optimization

    EPA Science Inventory

    Environmental models are products of the computer architecture and software tools available at the time of development. Scientifically sound algorithms may persist in their original state even as system architectures and software development approaches evolve and progress. Dating...

  7. Modelling parallel programs and multiprocessor architectures with AXE

    NASA Technical Reports Server (NTRS)

    Yan, Jerry C.; Fineman, Charles E.

    1991-01-01

    AXE, An Experimental Environment for Parallel Systems, was designed to model and simulate for parallel systems at the process level. It provides an integrated environment for specifying computation models, multiprocessor architectures, data collection, and performance visualization. AXE is being used at NASA-Ames for developing resource management strategies, parallel problem formulation, multiprocessor architectures, and operating system issues related to the High Performance Computing and Communications Program. AXE's simple, structured user-interface enables the user to model parallel programs and machines precisely and efficiently. Its quick turn-around time keeps the user interested and productive. AXE models multicomputers. The user may easily modify various architectural parameters including the number of sites, connection topologies, and overhead for operating system activities. Parallel computations in AXE are represented as collections of autonomous computing objects known as players. Their use and behavior is described. Performance data of the multiprocessor model can be observed on a color screen. These include CPU and message routing bottlenecks, and the dynamic status of the software.

  8. A high performance parallel computing architecture for robust image features

    NASA Astrophysics Data System (ADS)

    Zhou, Renyan; Liu, Leibo; Wei, Shaojun

    2014-03-01

    A design of parallel architecture for image feature detection and description is proposed in this article. The major component of this architecture is a 2D cellular network composed of simple reprogrammable processors, enabling the Hessian Blob Detector and Haar Response Calculation, which are the most computing-intensive stage of the Speeded Up Robust Features (SURF) algorithm. Combining this 2D cellular network and dedicated hardware for SURF descriptors, this architecture achieves real-time image feature detection with minimal software in the host processor. A prototype FPGA implementation of the proposed architecture achieves 1318.9 GOPS general pixel processing @ 100 MHz clock and achieves up to 118 fps in VGA (640 × 480) image feature detection. The proposed architecture is stand-alone and scalable so it is easy to be migrated into VLSI implementation.

  9. The RISC (Reduced Instruction Set Computer) Architecture and Computer Performance Evaluation.

    DTIC Science & Technology

    1986-03-01

    time where the main emphasis of the evaluation process is put on the software . The model is intended to provide a tool for computer architects to use...program, or 3) Was to be implemented in random logic more effec- tively than the equivalent sequence of software instructions. Both data and address...definition is the IEEE standard 729-1983 stating Computer Architecture as: " The process of defining a collection of hardware and software components and

  10. First 3 years of operation of RIACS (Research Institute for Advanced Computer Science) (1983-1985)

    NASA Technical Reports Server (NTRS)

    Denning, P. J.

    1986-01-01

    The focus of the Research Institute for Advanced Computer Science (RIACS) is to explore matches between advanced computing architectures and the processes of scientific research. An architecture evaluation of the MIT static dataflow machine, specification of a graphical language for expressing distributed computations, and specification of an expert system for aiding in grid generation for two-dimensional flow problems was initiated. Research projects for 1984 and 1985 are summarized.

  11. Design and Analysis of Compact DNA Strand Displacement Circuits for Analog Computation Using Autocatalytic Amplifiers.

    PubMed

    Song, Tianqi; Garg, Sudhanshu; Mokhtar, Reem; Bui, Hieu; Reif, John

    2018-01-19

    A main goal in DNA computing is to build DNA circuits to compute designated functions using a minimal number of DNA strands. Here, we propose a novel architecture to build compact DNA strand displacement circuits to compute a broad scope of functions in an analog fashion. A circuit by this architecture is composed of three autocatalytic amplifiers, and the amplifiers interact to perform computation. We show DNA circuits to compute functions sqrt(x), ln(x) and exp(x) for x in tunable ranges with simulation results. A key innovation in our architecture, inspired by Napier's use of logarithm transforms to compute square roots on a slide rule, is to make use of autocatalytic amplifiers to do logarithmic and exponential transforms in concentration and time. In particular, we convert from the input that is encoded by the initial concentration of the input DNA strand, to time, and then back again to the output encoded by the concentration of the output DNA strand at equilibrium. This combined use of strand-concentration and time encoding of computational values may have impact on other forms of molecular computation.

  12. Performances of multiprocessor multidisk architectures for continuous media storage

    NASA Astrophysics Data System (ADS)

    Gennart, Benoit A.; Messerli, Vincent; Hersch, Roger D.

    1996-03-01

    Multimedia interfaces increase the need for large image databases, capable of storing and reading streams of data with strict synchronicity and isochronicity requirements. In order to fulfill these requirements, we consider a parallel image server architecture which relies on arrays of intelligent disk nodes, each disk node being composed of one processor and one or more disks. This contribution analyzes through bottleneck performance evaluation and simulation the behavior of two multi-processor multi-disk architectures: a point-to-point architecture and a shared-bus architecture similar to current multiprocessor workstation architectures. We compare the two architectures on the basis of two multimedia algorithms: the compute-bound frame resizing by resampling and the data-bound disk-to-client stream transfer. The results suggest that the shared bus is a potential bottleneck despite its very high hardware throughput (400Mbytes/s) and that an architecture with addressable local memories located closely to their respective processors could partially remove this bottleneck. The point- to-point architecture is scalable and able to sustain high throughputs for simultaneous compute- bound and data-bound operations.

  13. Electromagnetic physics models for parallel computing architectures

    DOE PAGES

    Amadio, G.; Ananya, A.; Apostolakis, J.; ...

    2016-11-21

    The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part ofmore » the GeantV project. Finally, the results of preliminary performance evaluation and physics validation are presented as well.« less

  14. Architectural Implications of Cloud Computing

    DTIC Science & Technology

    2011-10-24

    Public Cloud Infrastructure-as-a- Service (IaaS) Software -as-a- Service ( SaaS ) Cloud Computing Types Platform-as-a- Service (PaaS) Based on Type of...Twitter #SEIVirtualForum © 2011 Carnegie Mellon University Software -as-a- Service ( SaaS ) Model of software deployment in which a third-party...and System Solutions (RTSS) Program. Her current interests and projects are in service -oriented architecture (SOA), cloud computing, and context

  15. Generic Software for Emulating Multiprocessor Architectures.

    DTIC Science & Technology

    1985-05-01

    RD-A157 662 GENERIC SOFTWARE FOR EMULATING MULTIPROCESSOR 1/2 AlRCHITECTURES(J) MASSACHUSETTS INST OF TECH CAMBRIDGE U LRS LAB FOR COMPUTER SCIENCE R...AREA & WORK UNIT NUMBERS MIT Laboratory for Computer Science 545 Technology Square Cambridge, MA 02139 ____________ I I. CONTROLLING OFFICE NAME AND...aide If neceeasy end Identify by block number) Computer architecture, emulation, simulation, dataf low 20. ABSTRACT (Continue an reverse slde It

  16. Exploring Gigabyte Datasets in Real Time: Architectures, Interfaces and Time-Critical Design

    NASA Technical Reports Server (NTRS)

    Bryson, Steve; Gerald-Yamasaki, Michael (Technical Monitor)

    1998-01-01

    Architectures and Interfaces: The implications of real-time interaction on software architecture design: decoupling of interaction/graphics and computation into asynchronous processes. The performance requirements of graphics and computation for interaction. Time management in such an architecture. Examples of how visualization algorithms must be modified for high performance. Brief survey of interaction techniques and design, including direct manipulation and manipulation via widgets. talk discusses how human factors considerations drove the design and implementation of the virtual wind tunnel. Time-Critical Design: A survey of time-critical techniques for both computation and rendering. Emphasis on the assignment of a time budget to both the overall visualization environment and to each individual visualization technique in the environment. The estimation of the benefit and cost of an individual technique. Examples of the modification of visualization algorithms to allow time-critical control.

  17. Hardware architecture design of image restoration based on time-frequency domain computation

    NASA Astrophysics Data System (ADS)

    Wen, Bo; Zhang, Jing; Jiao, Zipeng

    2013-10-01

    The image restoration algorithms based on time-frequency domain computation is high maturity and applied widely in engineering. To solve the high-speed implementation of these algorithms, the TFDC hardware architecture is proposed. Firstly, the main module is designed, by analyzing the common processing and numerical calculation. Then, to improve the commonality, the iteration control module is planed for iterative algorithms. In addition, to reduce the computational cost and memory requirements, the necessary optimizations are suggested for the time-consuming module, which include two-dimensional FFT/IFFT and the plural calculation. Eventually, the TFDC hardware architecture is adopted for hardware design of real-time image restoration system. The result proves that, the TFDC hardware architecture and its optimizations can be applied to image restoration algorithms based on TFDC, with good algorithm commonality, hardware realizability and high efficiency.

  18. Universal Quantum Computing with Measurement-Induced Continuous-Variable Gate Sequence in a Loop-Based Architecture.

    PubMed

    Takeda, Shuntaro; Furusawa, Akira

    2017-09-22

    We propose a scalable scheme for optical quantum computing using measurement-induced continuous-variable quantum gates in a loop-based architecture. Here, time-bin-encoded quantum information in a single spatial mode is deterministically processed in a nested loop by an electrically programmable gate sequence. This architecture can process any input state and an arbitrary number of modes with almost minimum resources, and offers a universal gate set for both qubits and continuous variables. Furthermore, quantum computing can be performed fault tolerantly by a known scheme for encoding a qubit in an infinite-dimensional Hilbert space of a single light mode.

  19. Universal Quantum Computing with Measurement-Induced Continuous-Variable Gate Sequence in a Loop-Based Architecture

    NASA Astrophysics Data System (ADS)

    Takeda, Shuntaro; Furusawa, Akira

    2017-09-01

    We propose a scalable scheme for optical quantum computing using measurement-induced continuous-variable quantum gates in a loop-based architecture. Here, time-bin-encoded quantum information in a single spatial mode is deterministically processed in a nested loop by an electrically programmable gate sequence. This architecture can process any input state and an arbitrary number of modes with almost minimum resources, and offers a universal gate set for both qubits and continuous variables. Furthermore, quantum computing can be performed fault tolerantly by a known scheme for encoding a qubit in an infinite-dimensional Hilbert space of a single light mode.

  20. Heterogeneous computing architecture for fast detection of SNP-SNP interactions.

    PubMed

    Sluga, Davor; Curk, Tomaz; Zupan, Blaz; Lotric, Uros

    2014-06-25

    The extent of data in a typical genome-wide association study (GWAS) poses considerable computational challenges to software tools for gene-gene interaction discovery. Exhaustive evaluation of all interactions among hundreds of thousands to millions of single nucleotide polymorphisms (SNPs) may require weeks or even months of computation. Massively parallel hardware within a modern Graphic Processing Unit (GPU) and Many Integrated Core (MIC) coprocessors can shorten the run time considerably. While the utility of GPU-based implementations in bioinformatics has been well studied, MIC architecture has been introduced only recently and may provide a number of comparative advantages that have yet to be explored and tested. We have developed a heterogeneous, GPU and Intel MIC-accelerated software module for SNP-SNP interaction discovery to replace the previously single-threaded computational core in the interactive web-based data exploration program SNPsyn. We report on differences between these two modern massively parallel architectures and their software environments. Their utility resulted in an order of magnitude shorter execution times when compared to the single-threaded CPU implementation. GPU implementation on a single Nvidia Tesla K20 runs twice as fast as that for the MIC architecture-based Xeon Phi P5110 coprocessor, but also requires considerably more programming effort. General purpose GPUs are a mature platform with large amounts of computing power capable of tackling inherently parallel problems, but can prove demanding for the programmer. On the other hand the new MIC architecture, albeit lacking in performance reduces the programming effort and makes it up with a more general architecture suitable for a wider range of problems.

  1. Heterogeneous computing architecture for fast detection of SNP-SNP interactions

    PubMed Central

    2014-01-01

    Background The extent of data in a typical genome-wide association study (GWAS) poses considerable computational challenges to software tools for gene-gene interaction discovery. Exhaustive evaluation of all interactions among hundreds of thousands to millions of single nucleotide polymorphisms (SNPs) may require weeks or even months of computation. Massively parallel hardware within a modern Graphic Processing Unit (GPU) and Many Integrated Core (MIC) coprocessors can shorten the run time considerably. While the utility of GPU-based implementations in bioinformatics has been well studied, MIC architecture has been introduced only recently and may provide a number of comparative advantages that have yet to be explored and tested. Results We have developed a heterogeneous, GPU and Intel MIC-accelerated software module for SNP-SNP interaction discovery to replace the previously single-threaded computational core in the interactive web-based data exploration program SNPsyn. We report on differences between these two modern massively parallel architectures and their software environments. Their utility resulted in an order of magnitude shorter execution times when compared to the single-threaded CPU implementation. GPU implementation on a single Nvidia Tesla K20 runs twice as fast as that for the MIC architecture-based Xeon Phi P5110 coprocessor, but also requires considerably more programming effort. Conclusions General purpose GPUs are a mature platform with large amounts of computing power capable of tackling inherently parallel problems, but can prove demanding for the programmer. On the other hand the new MIC architecture, albeit lacking in performance reduces the programming effort and makes it up with a more general architecture suitable for a wider range of problems. PMID:24964802

  2. DOE Office of Scientific and Technical Information (OSTI.GOV)

    McCaskey, Alexander J.

    Hybrid programming models for beyond-CMOS technologies will prove critical for integrating new computing technologies alongside our existing infrastructure. Unfortunately the software infrastructure required to enable this is lacking or not available. XACC is a programming framework for extreme-scale, post-exascale accelerator architectures that integrates alongside existing conventional applications. It is a pluggable framework for programming languages developed for next-gen computing hardware architectures like quantum and neuromorphic computing. It lets computational scientists efficiently off-load classically intractable work to attached accelerators through user-friendly Kernel definitions. XACC makes post-exascale hybrid programming approachable for domain computational scientists.

  3. Application of computational physics within Northrop

    NASA Technical Reports Server (NTRS)

    George, M. W.; Ling, R. T.; Mangus, J. F.; Thompkins, W. T.

    1987-01-01

    An overview of Northrop programs in computational physics is presented. These programs depend on access to today's supercomputers, such as the Numerical Aerodynamical Simulator (NAS), and future growth on the continuing evolution of computational engines. Descriptions here are concentrated on the following areas: computational fluid dynamics (CFD), computational electromagnetics (CEM), computer architectures, and expert systems. Current efforts and future directions in these areas are presented. The impact of advances in the CFD area is described, and parallels are drawn to analagous developments in CEM. The relationship between advances in these areas and the development of advances (parallel) architectures and expert systems is also presented.

  4. All-memristive neuromorphic computing with level-tuned neurons

    NASA Astrophysics Data System (ADS)

    Pantazi, Angeliki; Woźniak, Stanisław; Tuma, Tomas; Eleftheriou, Evangelos

    2016-09-01

    In the new era of cognitive computing, systems will be able to learn and interact with the environment in ways that will drastically enhance the capabilities of current processors, especially in extracting knowledge from vast amount of data obtained from many sources. Brain-inspired neuromorphic computing systems increasingly attract research interest as an alternative to the classical von Neumann processor architecture, mainly because of the coexistence of memory and processing units. In these systems, the basic components are neurons interconnected by synapses. The neurons, based on their nonlinear dynamics, generate spikes that provide the main communication mechanism. The computational tasks are distributed across the neural network, where synapses implement both the memory and the computational units, by means of learning mechanisms such as spike-timing-dependent plasticity. In this work, we present an all-memristive neuromorphic architecture comprising neurons and synapses realized by using the physical properties and state dynamics of phase-change memristors. The architecture employs a novel concept of interconnecting the neurons in the same layer, resulting in level-tuned neuronal characteristics that preferentially process input information. We demonstrate the proposed architecture in the tasks of unsupervised learning and detection of multiple temporal correlations in parallel input streams. The efficiency of the neuromorphic architecture along with the homogenous neuro-synaptic dynamics implemented with nanoscale phase-change memristors represent a significant step towards the development of ultrahigh-density neuromorphic co-processors.

  5. All-memristive neuromorphic computing with level-tuned neurons.

    PubMed

    Pantazi, Angeliki; Woźniak, Stanisław; Tuma, Tomas; Eleftheriou, Evangelos

    2016-09-02

    In the new era of cognitive computing, systems will be able to learn and interact with the environment in ways that will drastically enhance the capabilities of current processors, especially in extracting knowledge from vast amount of data obtained from many sources. Brain-inspired neuromorphic computing systems increasingly attract research interest as an alternative to the classical von Neumann processor architecture, mainly because of the coexistence of memory and processing units. In these systems, the basic components are neurons interconnected by synapses. The neurons, based on their nonlinear dynamics, generate spikes that provide the main communication mechanism. The computational tasks are distributed across the neural network, where synapses implement both the memory and the computational units, by means of learning mechanisms such as spike-timing-dependent plasticity. In this work, we present an all-memristive neuromorphic architecture comprising neurons and synapses realized by using the physical properties and state dynamics of phase-change memristors. The architecture employs a novel concept of interconnecting the neurons in the same layer, resulting in level-tuned neuronal characteristics that preferentially process input information. We demonstrate the proposed architecture in the tasks of unsupervised learning and detection of multiple temporal correlations in parallel input streams. The efficiency of the neuromorphic architecture along with the homogenous neuro-synaptic dynamics implemented with nanoscale phase-change memristors represent a significant step towards the development of ultrahigh-density neuromorphic co-processors.

  6. An extensible circuit QED architecture for quantum computation

    NASA Astrophysics Data System (ADS)

    Dicarlo, Leo

    Realizing a logical qubit robust to single errors in its constituent physical elements is an immediate challenge for quantum information processing platforms. A longer-term challenge will be achieving quantum fault tolerance, i.e., improving logical qubit resilience by increasing redundancy in the underlying quantum error correction code (QEC). In QuTech, we target these challenges in collaboration with industrial and academic partners. I will present the circuit QED quantum hardware, room-temperature control electronics, and software components of the complete architecture. I will show the extensibility of each component to the Surface-17 and -49 circuits needed to reach the objectives with surface-code QEC, and provide an overview of latest developments. Research funded by IARPA and Intel Corporation.

  7. Specialized computer architectures for computational aerodynamics

    NASA Technical Reports Server (NTRS)

    Stevenson, D. K.

    1978-01-01

    In recent years, computational fluid dynamics has made significant progress in modelling aerodynamic phenomena. Currently, one of the major barriers to future development lies in the compute-intensive nature of the numerical formulations and the relative high cost of performing these computations on commercially available general purpose computers, a cost high with respect to dollar expenditure and/or elapsed time. Today's computing technology will support a program designed to create specialized computing facilities to be dedicated to the important problems of computational aerodynamics. One of the still unresolved questions is the organization of the computing components in such a facility. The characteristics of fluid dynamic problems which will have significant impact on the choice of computer architecture for a specialized facility are reviewed.

  8. A Programmable Five Qubit Quantum Computer Using Trapped Atomic Ions

    NASA Astrophysics Data System (ADS)

    Debnath, Shantanu

    Quantum computers can solve certain problems more efficiently compared to conventional classical methods. In the endeavor to build a quantum computer, several competing platforms have emerged that can implement certain quantum algorithms using a few qubits. However, the demonstrations so far have been done usually by tailoring the hardware to meet the requirements of a particular algorithm implemented for a limited number of instances. Although such proof of principal implementations are important to verify the working of algorithms on a physical system, they further need to have the potential to serve as a general purpose quantum computer allowing the flexibility required for running multiple algorithms and be scaled up to host more qubits. Here we demonstrate a small programmable quantum computer based on five trapped atomic ions each of which serves as a qubit. By optically resolving each ion we can individually address them in order to perform a complete set of single-qubit and fully connected two-qubit quantum gates and alsoperform efficient individual qubit measurements. We implement a computation architecture that accepts an algorithm from a user interface in the form of a standard logic gate sequence and decomposes it into fundamental quantum operations that are native to the hardware using a set of compilation instructions that are defined within the software. These operations are then effected through a pattern of laser pulses that perform coherent rotations on targeted qubits in the chain. The architecture implemented in the experiment therefore gives us unprecedented flexibility in the programming of any quantum algorithm while staying blind to the underlying hardware. As a demonstration we implement the Deutsch-Jozsa and Bernstein-Vazirani algorithms on the five-qubit processor and achieve average success rates of 95 and 90 percent, respectively. We also implement a five-qubit coherent quantum Fourier transform and examine its performance in the period finding and phase estimation protocol. We find fidelities of 84 and 62 percent, respectively. While maintaining the same computation architecture the system can be scaled to more ions using resources that scale favorably (O(N. 2)) with the numberof qubits N.

  9. Efficient architecture for spike sorting in reconfigurable hardware.

    PubMed

    Hwang, Wen-Jyi; Lee, Wei-Hao; Lin, Shiow-Jyu; Lai, Sheng-Ying

    2013-11-01

    This paper presents a novel hardware architecture for fast spike sorting. The architecture is able to perform both the feature extraction and clustering in hardware. The generalized Hebbian algorithm (GHA) and fuzzy C-means (FCM) algorithm are used for feature extraction and clustering, respectively. The employment of GHA allows efficient computation of principal components for subsequent clustering operations. The FCM is able to achieve near optimal clustering for spike sorting. Its performance is insensitive to the selection of initial cluster centers. The hardware implementations of GHA and FCM feature low area costs and high throughput. In the GHA architecture, the computation of different weight vectors share the same circuit for lowering the area costs. Moreover, in the FCM hardware implementation, the usual iterative operations for updating the membership matrix and cluster centroid are merged into one single updating process to evade the large storage requirement. To show the effectiveness of the circuit, the proposed architecture is physically implemented by field programmable gate array (FPGA). It is embedded in a System-on-Chip (SOC) platform for performance measurement. Experimental results show that the proposed architecture is an efficient spike sorting design for attaining high classification correct rate and high speed computation.

  10. Hybrid parallel computing architecture for multiview phase shifting

    NASA Astrophysics Data System (ADS)

    Zhong, Kai; Li, Zhongwei; Zhou, Xiaohui; Shi, Yusheng; Wang, Congjun

    2014-11-01

    The multiview phase-shifting method shows its powerful capability in achieving high resolution three-dimensional (3-D) shape measurement. Unfortunately, this ability results in very high computation costs and 3-D computations have to be processed offline. To realize real-time 3-D shape measurement, a hybrid parallel computing architecture is proposed for multiview phase shifting. In this architecture, the central processing unit can co-operate with the graphic processing unit (GPU) to achieve hybrid parallel computing. The high computation cost procedures, including lens distortion rectification, phase computation, correspondence, and 3-D reconstruction, are implemented in GPU, and a three-layer kernel function model is designed to simultaneously realize coarse-grained and fine-grained paralleling computing. Experimental results verify that the developed system can perform 50 fps (frame per second) real-time 3-D measurement with 260 K 3-D points per frame. A speedup of up to 180 times is obtained for the performance of the proposed technique using a NVIDIA GT560Ti graphics card rather than a sequential C in a 3.4 GHZ Inter Core i7 3770.

  11. The science of computing - Parallel computation

    NASA Technical Reports Server (NTRS)

    Denning, P. J.

    1985-01-01

    Although parallel computation architectures have been known for computers since the 1920s, it was only in the 1970s that microelectronic components technologies advanced to the point where it became feasible to incorporate multiple processors in one machine. Concommitantly, the development of algorithms for parallel processing also lagged due to hardware limitations. The speed of computing with solid-state chips is limited by gate switching delays. The physical limit implies that a 1 Gflop operational speed is the maximum for sequential processors. A computer recently introduced features a 'hypercube' architecture with 128 processors connected in networks at 5, 6 or 7 points per grid, depending on the design choice. Its computing speed rivals that of supercomputers, but at a fraction of the cost. The added speed with less hardware is due to parallel processing, which utilizes algorithms representing different parts of an equation that can be broken into simpler statements and processed simultaneously. Present, highly developed computer languages like FORTRAN, PASCAL, COBOL, etc., rely on sequential instructions. Thus, increased emphasis will now be directed at parallel processing algorithms to exploit the new architectures.

  12. Development of Rhizo-Columns for Nondestructive Root System Architecture Laboratory Measurements

    NASA Astrophysics Data System (ADS)

    Oostrom, M.; Johnson, T. J.; Varga, T.; Hess, N. J.; Wietsma, T. W.

    2016-12-01

    Numerical models for root water uptake in plant-soil systems have been developing rapidly, increasing the demand for laboratory experimental data to test and verify these models. Most of the increasingly detailed models are either compared to long-term field crop data or do not involve comparisons at all. Ideally, experiments would provide information on dynamic root system architecture (RSA) in combination with soil-pant hydraulics such as water pressures and volumetric water contents. Data obtained from emerging methods such as Spectral Induced Polarization (SIP) and x-ray computed tomography (x-ray CT) may be used to provide laboratory RSA data needed for model comparisons. Point measurements such as polymer tensiometers (PT) may provide soil moisture information over a large range of water pressures, from field capacity to the wilting point under drought conditions. In the presentation, we demonstrate a novel laboratory capability allowing for detailed RSA studies in large columns under controlled conditions using automated SIP, X-ray CT, and PT methods. Examples are shown for pea and corn root development under various moisture regimes.

  13. Optical imaging of architecture and function in the living brain sheds new light on cortical mechanisms underlying visual perception.

    PubMed

    Grinvald, A

    1992-01-01

    Long standing questions related to brain mechanisms underlying perception can finally be resolved by direct visualization of the architecture and function of mammalian cortex. This advance has been accomplished with the aid of two optical imaging techniques with which one can literally see how the brain functions. The upbringing of this technology required a multi-disciplinary approach integrating brain research with organic chemistry, spectroscopy, biophysics, computer sciences, optics and image processing. Beyond the technological ramifications, recent research shed new light on cortical mechanisms underlying sensory perception. Clinical applications of this technology for precise mapping of the cortical surface of patients during neurosurgery have begun. Below is a brief summary of our own research and a description of the technical specifications of the two optical imaging techniques. Like every technique, optical imaging also suffers from severe limitations. Here we mostly emphasize some of its advantages relative to all alternative imaging techniques currently in use. The limitations are critically discussed in our recent reviews. For a series of other reviews, see Cohen (1989).

  14. Analog Computation by DNA Strand Displacement Circuits.

    PubMed

    Song, Tianqi; Garg, Sudhanshu; Mokhtar, Reem; Bui, Hieu; Reif, John

    2016-08-19

    DNA circuits have been widely used to develop biological computing devices because of their high programmability and versatility. Here, we propose an architecture for the systematic construction of DNA circuits for analog computation based on DNA strand displacement. The elementary gates in our architecture include addition, subtraction, and multiplication gates. The input and output of these gates are analog, which means that they are directly represented by the concentrations of the input and output DNA strands, respectively, without requiring a threshold for converting to Boolean signals. We provide detailed domain designs and kinetic simulations of the gates to demonstrate their expected performance. On the basis of these gates, we describe how DNA circuits to compute polynomial functions of inputs can be built. Using Taylor Series and Newton Iteration methods, functions beyond the scope of polynomials can also be computed by DNA circuits built upon our architecture.

  15. Advanced Architectures for Astrophysical Supercomputing

    NASA Astrophysics Data System (ADS)

    Barsdell, B. R.; Barnes, D. G.; Fluke, C. J.

    2010-12-01

    Astronomers have come to rely on the increasing performance of computers to reduce, analyze, simulate and visualize their data. In this environment, faster computation can mean more science outcomes or the opening up of new parameter spaces for investigation. If we are to avoid major issues when implementing codes on advanced architectures, it is important that we have a solid understanding of our algorithms. A recent addition to the high-performance computing scene that highlights this point is the graphics processing unit (GPU). The hardware originally designed for speeding-up graphics rendering in video games is now achieving speed-ups of O(100×) in general-purpose computation - performance that cannot be ignored. We are using a generalized approach, based on the analysis of astronomy algorithms, to identify the optimal problem-types and techniques for taking advantage of both current GPU hardware and future developments in computing architectures.

  16. Computer Security Primer: Systems Architecture, Special Ontology and Cloud Virtual Machines

    ERIC Educational Resources Information Center

    Waguespack, Leslie J.

    2014-01-01

    With the increasing proliferation of multitasking and Internet-connected devices, security has reemerged as a fundamental design concern in information systems. The shift of IS curricula toward a largely organizational perspective of security leaves little room for focus on its foundation in systems architecture, the computational underpinnings of…

  17. Parallel Architectures and Parallel Algorithms for Integrated Vision Systems. Ph.D. Thesis

    NASA Technical Reports Server (NTRS)

    Choudhary, Alok Nidhi

    1989-01-01

    Computer vision is regarded as one of the most complex and computationally intensive problems. An integrated vision system (IVS) is a system that uses vision algorithms from all levels of processing to perform for a high level application (e.g., object recognition). An IVS normally involves algorithms from low level, intermediate level, and high level vision. Designing parallel architectures for vision systems is of tremendous interest to researchers. Several issues are addressed in parallel architectures and parallel algorithms for integrated vision systems.

  18. ATCA for Machines-- Advanced Telecommunications Computing Architecture

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Larsen, R.S.; /SLAC

    2008-04-22

    The Advanced Telecommunications Computing Architecture is a new industry open standard for electronics instrument modules and shelves being evaluated for the International Linear Collider (ILC). It is the first industrial standard designed for High Availability (HA). ILC availability simulations have shown clearly that the capabilities of ATCA are needed in order to achieve acceptable integrated luminosity. The ATCA architecture looks attractive for beam instruments and detector applications as well. This paper provides an overview of ongoing R&D including application of HA principles to power electronics systems.

  19. Combining metric episodes with semantic event concepts within the Symbolic and Sub-Symbolic Robotics Intelligence Control System (SS-RICS)

    NASA Astrophysics Data System (ADS)

    Kelley, Troy D.; McGhee, S.

    2013-05-01

    This paper describes the ongoing development of a robotic control architecture that inspired by computational cognitive architectures from the discipline of cognitive psychology. The Symbolic and Sub-Symbolic Robotics Intelligence Control System (SS-RICS) combines symbolic and sub-symbolic representations of knowledge into a unified control architecture. The new architecture leverages previous work in cognitive architectures, specifically the development of the Adaptive Character of Thought-Rational (ACT-R) and Soar. This paper details current work on learning from episodes or events. The use of episodic memory as a learning mechanism has, until recently, been largely ignored by computational cognitive architectures. This paper details work on metric level episodic memory streams and methods for translating episodes into abstract schemas. The presentation will include research on learning through novelty and self generated feedback mechanisms for autonomous systems.

  20. Computer architecture evaluation for structural dynamics computations: Project summary

    NASA Technical Reports Server (NTRS)

    Standley, Hilda M.

    1989-01-01

    The intent of the proposed effort is the examination of the impact of the elements of parallel architectures on the performance realized in a parallel computation. To this end, three major projects are developed: a language for the expression of high level parallelism, a statistical technique for the synthesis of multicomputer interconnection networks based upon performance prediction, and a queueing model for the analysis of shared memory hierarchies.

  1. A Medical Image Backup Architecture Based on a NoSQL Database and Cloud Computing Services.

    PubMed

    Santos Simões de Almeida, Luan Henrique; Costa Oliveira, Marcelo

    2015-01-01

    The use of digital systems for storing medical images generates a huge volume of data. Digital images are commonly stored and managed on a Picture Archiving and Communication System (PACS), under the DICOM standard. However, PACS is limited because it is strongly dependent on the server's physical space. Alternatively, Cloud Computing arises as an extensive, low cost, and reconfigurable resource. However, medical images contain patient information that can not be made available in a public cloud. Therefore, a mechanism to anonymize these images is needed. This poster presents a solution for this issue by taking digital images from PACS, converting the information contained in each image file to a NoSQL database, and using cloud computing to store digital images.

  2. Cross-Talk in Superconducting Transmon Quantum Computing Architecture

    NASA Astrophysics Data System (ADS)

    Abraham, David; Chow, Jerry; Corcoles, Antonio; Rothwell, Mary; Keefe, George; Gambetta, Jay; Steffen, Matthias; IBM Quantum Computing Team

    2013-03-01

    Superconducting transmon quantum computing test structures often exhibit significant undesired cross-talk. For experiments with only a handful of qubits this cross-talk can be quantified and understood, and therefore corrected. As quantum computing circuits become more complex, and thereby contain increasing numbers of qubits and resonators, it becomes more vital that the inadvertent coupling between these elements is minimized. The task of accurately controlling each single qubit to the level of precision required throughout the realization of a quantum algorithm is difficult by itself, but coupled with the need of nulling out leakage signals from neighboring qubits or resonators would quickly become impossible. We discuss an approach to solve this critical problem. We acknowledge support from IARPA under contract W911NF-10-1-0324.

  3. Topology optimization aided structural design: Interpretation, computational aspects and 3D printing.

    PubMed

    Kazakis, Georgios; Kanellopoulos, Ioannis; Sotiropoulos, Stefanos; Lagaros, Nikos D

    2017-10-01

    Construction industry has a major impact on the environment that we spend most of our life. Therefore, it is important that the outcome of architectural intuition performs well and complies with the design requirements. Architects usually describe as "optimal design" their choice among a rather limited set of design alternatives, dictated by their experience and intuition. However, modern design of structures requires accounting for a great number of criteria derived from multiple disciplines, often of conflicting nature. Such criteria derived from structural engineering, eco-design, bioclimatic and acoustic performance. The resulting vast number of alternatives enhances the need for computer-aided architecture in order to increase the possibility of arriving at a more preferable solution. Therefore, the incorporation of smart, automatic tools in the design process, able to further guide designer's intuition becomes even more indispensable. The principal aim of this study is to present possibilities to integrate automatic computational techniques related to topology optimization in the phase of intuition of civil structures as part of computer aided architectural design. In this direction, different aspects of a new computer aided architectural era related to the interpretation of the optimized designs, difficulties resulted from the increased computational effort and 3D printing capabilities are covered here in.

  4. Computer sciences

    NASA Technical Reports Server (NTRS)

    Smith, Paul H.

    1988-01-01

    The Computer Science Program provides advanced concepts, techniques, system architectures, algorithms, and software for both space and aeronautics information sciences and computer systems. The overall goal is to provide the technical foundation within NASA for the advancement of computing technology in aerospace applications. The research program is improving the state of knowledge of fundamental aerospace computing principles and advancing computing technology in space applications such as software engineering and information extraction from data collected by scientific instruments in space. The program includes the development of special algorithms and techniques to exploit the computing power provided by high performance parallel processors and special purpose architectures. Research is being conducted in the fundamentals of data base logic and improvement techniques for producing reliable computing systems.

  5. Influence of Additive Manufactured Scaffold Architecture on the Distribution of Surface Strains and Fluid Flow Shear Stresses and Expected Osteochondral Cell Differentiation.

    PubMed

    Hendrikson, Wim J; Deegan, Anthony J; Yang, Ying; van Blitterswijk, Clemens A; Verdonschot, Nico; Moroni, Lorenzo; Rouwkema, Jeroen

    2017-01-01

    Scaffolds for regenerative medicine applications should instruct cells with the appropriate signals, including biophysical stimuli such as stress and strain, to form the desired tissue. Apart from that, scaffolds, especially for load-bearing applications, should be capable of providing mechanical stability. Since both scaffold strength and stress-strain distributions throughout the scaffold depend on the scaffold's internal architecture, it is important to understand how changes in architecture influence these parameters. In this study, four scaffold designs with different architectures were produced using additive manufacturing. The designs varied in fiber orientation, while fiber diameter, spacing, and layer height remained constant. Based on micro-CT (μCT) scans, finite element models (FEMs) were derived for finite element analysis (FEA) and computational fluid dynamics (CFD). FEA of scaffold compression was validated using μCT scan data of compressed scaffolds. Results of the FEA and CFD showed a significant impact of scaffold architecture on fluid shear stress and mechanical strain distribution. The average fluid shear stress ranged from 3.6 mPa for a 0/90 architecture to 6.8 mPa for a 0/90 offset architecture, and the surface shear strain from 0.0096 for a 0/90 offset architecture to 0.0214 for a 0/90 architecture. This subsequently resulted in variations of the predicted cell differentiation stimulus values on the scaffold surface. Fluid shear stress was mainly influenced by pore shape and size, while mechanical strain distribution depended mainly on the presence or absence of supportive columns in the scaffold architecture. Together, these results corroborate that scaffold architecture can be exploited to design scaffolds with regions that guide specific tissue development under compression and perfusion. In conjunction with optimization of stimulation regimes during bioreactor cultures, scaffold architecture optimization can be used to improve scaffold design for tissue engineering purposes.

  6. Influence of Additive Manufactured Scaffold Architecture on the Distribution of Surface Strains and Fluid Flow Shear Stresses and Expected Osteochondral Cell Differentiation

    PubMed Central

    Hendrikson, Wim J.; Deegan, Anthony J.; Yang, Ying; van Blitterswijk, Clemens A.; Verdonschot, Nico; Moroni, Lorenzo; Rouwkema, Jeroen

    2017-01-01

    Scaffolds for regenerative medicine applications should instruct cells with the appropriate signals, including biophysical stimuli such as stress and strain, to form the desired tissue. Apart from that, scaffolds, especially for load-bearing applications, should be capable of providing mechanical stability. Since both scaffold strength and stress–strain distributions throughout the scaffold depend on the scaffold’s internal architecture, it is important to understand how changes in architecture influence these parameters. In this study, four scaffold designs with different architectures were produced using additive manufacturing. The designs varied in fiber orientation, while fiber diameter, spacing, and layer height remained constant. Based on micro-CT (μCT) scans, finite element models (FEMs) were derived for finite element analysis (FEA) and computational fluid dynamics (CFD). FEA of scaffold compression was validated using μCT scan data of compressed scaffolds. Results of the FEA and CFD showed a significant impact of scaffold architecture on fluid shear stress and mechanical strain distribution. The average fluid shear stress ranged from 3.6 mPa for a 0/90 architecture to 6.8 mPa for a 0/90 offset architecture, and the surface shear strain from 0.0096 for a 0/90 offset architecture to 0.0214 for a 0/90 architecture. This subsequently resulted in variations of the predicted cell differentiation stimulus values on the scaffold surface. Fluid shear stress was mainly influenced by pore shape and size, while mechanical strain distribution depended mainly on the presence or absence of supportive columns in the scaffold architecture. Together, these results corroborate that scaffold architecture can be exploited to design scaffolds with regions that guide specific tissue development under compression and perfusion. In conjunction with optimization of stimulation regimes during bioreactor cultures, scaffold architecture optimization can be used to improve scaffold design for tissue engineering purposes. PMID:28239606

  7. Cortical Coupling Reflects Bayesian Belief Updating in the Deployment of Spatial Attention.

    PubMed

    Vossel, Simone; Mathys, Christoph; Stephan, Klaas E; Friston, Karl J

    2015-08-19

    The deployment of visuospatial attention and the programming of saccades are governed by the inferred likelihood of events. In the present study, we combined computational modeling of psychophysical data with fMRI to characterize the computational and neural mechanisms underlying this flexible attentional control. Sixteen healthy human subjects performed a modified version of Posner's location-cueing paradigm in which the percentage of cue validity varied in time and the targets required saccadic responses. Trialwise estimates of the certainty (precision) of the prediction that the target would appear at the cued location were derived from a hierarchical Bayesian model fitted to individual trialwise saccadic response speeds. Trial-specific model parameters then entered analyses of fMRI data as parametric regressors. Moreover, dynamic causal modeling (DCM) was performed to identify the most likely functional architecture of the attentional reorienting network and its modulation by (Bayes-optimal) precision-dependent attention. While the frontal eye fields (FEFs), intraparietal sulcus, and temporoparietal junction (TPJ) of both hemispheres showed higher activity on invalid relative to valid trials, reorienting responses in right FEF, TPJ, and the putamen were significantly modulated by precision-dependent attention. Our DCM results suggested that the precision of predictability underlies the attentional modulation of the coupling of TPJ with FEF and the putamen. Our results shed new light on the computational architecture and neuronal network dynamics underlying the context-sensitive deployment of visuospatial attention. Spatial attention and its neural correlates in the human brain have been studied extensively with the help of fMRI and cueing paradigms in which the location of targets is pre-cued on a trial-by-trial basis. One aspect that has so far been neglected concerns the question of how the brain forms attentional expectancies when no a priori probability information is available but needs to be inferred from observations. This study elucidates the computational and neural mechanisms under which probabilistic inference governs attentional deployment. Our results show that Bayesian belief updating explains changes in cortical connectivity; in that directional influences from the temporoparietal junction on the frontal eye fields and the putamen were modulated by (Bayes-optimal) updates. Copyright © 2015 Vossel et al.

  8. Architecture for hospital information integration

    NASA Astrophysics Data System (ADS)

    Chimiak, William J.; Janariz, Daniel L.; Martinez, Ralph

    1999-07-01

    The ongoing integration of hospital information systems (HIS) continues. Data storage systems, data networks and computers improve, data bases grow and health-care applications increase. Some computer operating systems continue to evolve and some fade. Health care delivery now depends on this computer-assisted environment. The result is the critical harmonization of the various hospital information systems becomes increasingly difficult. The purpose of this paper is to present an architecture for HIS integration that is computer-language-neutral and computer- hardware-neutral for the informatics applications. The proposed architecture builds upon the work done at the University of Arizona on middleware, the work of the National Electrical Manufacturers Association, and the American College of Radiology. It is a fresh approach to allowing applications engineers to access medical data easily and thus concentrates on the application techniques in which they are expert without struggling with medical information syntaxes. The HIS can be modeled using a hierarchy of information sub-systems thus facilitating its understanding. The architecture includes the resulting information model along with a strict but intuitive application programming interface, managed by CORBA. The CORBA requirement facilitates interoperability. It should also reduce software and hardware development times.

  9. FPGA Implementation of Generalized Hebbian Algorithm for Texture Classification

    PubMed Central

    Lin, Shiow-Jyu; Hwang, Wen-Jyi; Lee, Wei-Hao

    2012-01-01

    This paper presents a novel hardware architecture for principal component analysis. The architecture is based on the Generalized Hebbian Algorithm (GHA) because of its simplicity and effectiveness. The architecture is separated into three portions: the weight vector updating unit, the principal computation unit and the memory unit. In the weight vector updating unit, the computation of different synaptic weight vectors shares the same circuit for reducing the area costs. To show the effectiveness of the circuit, a texture classification system based on the proposed architecture is physically implemented by Field Programmable Gate Array (FPGA). It is embedded in a System-On-Programmable-Chip (SOPC) platform for performance measurement. Experimental results show that the proposed architecture is an efficient design for attaining both high speed performance and low area costs. PMID:22778640

  10. DREAMS and IMAGE: A Model and Computer Implementation for Concurrent, Life-Cycle Design of Complex Systems

    NASA Technical Reports Server (NTRS)

    Hale, Mark A.; Craig, James I.; Mistree, Farrokh; Schrage, Daniel P.

    1995-01-01

    Computing architectures are being assembled that extend concurrent engineering practices by providing more efficient execution and collaboration on distributed, heterogeneous computing networks. Built on the successes of initial architectures, requirements for a next-generation design computing infrastructure can be developed. These requirements concentrate on those needed by a designer in decision-making processes from product conception to recycling and can be categorized in two areas: design process and design information management. A designer both designs and executes design processes throughout design time to achieve better product and process capabilities while expanding fewer resources. In order to accomplish this, information, or more appropriately design knowledge, needs to be adequately managed during product and process decomposition as well as recomposition. A foundation has been laid that captures these requirements in a design architecture called DREAMS (Developing Robust Engineering Analysis Models and Specifications). In addition, a computing infrastructure, called IMAGE (Intelligent Multidisciplinary Aircraft Generation Environment), is being developed that satisfies design requirements defined in DREAMS and incorporates enabling computational technologies.

  11. Gigaflop architecture, a hardware perspective

    NASA Technical Reports Server (NTRS)

    Feierbach, G. F.

    1978-01-01

    Any super computer built in the early 1980s will use components that are available by fall 1978. The architecture of such a system cannot depart radically from current super computers if the software experience painfully acquired from these computers in the 70's is to apply. Given the above constraints, 10 billion floating point operations per second (BFLOPS) are attainable and a problem memory of 512 million (64 bit) words could be supported by the technology of the time. In contrast to this, industry is likely to respond with commercially available machines with a performance of less than 150 MFLOPS. This is due to self-imposed constraints on the manufacturers to provide upward compatible architectures (same instruction set) and systems which can be sold in significant volumes. Since this computing speed is inadequate to meet the demands of computational fluid dynamics, a special processor is required. Issues which are felt to be significant in the pursuit of maximum compute capability in this special processor are discussed.

  12. Using Multimedia for Teaching Analysis in History of Modern Architecture.

    ERIC Educational Resources Information Center

    Perryman, Garry

    This paper presents a case for the development and support of a computer-based interactive multimedia program for teaching analysis in community college architecture design programs. Analysis in architecture design is an extremely important strategy for the teaching of higher-order thinking skills, which senior schools of architecture look for in…

  13. Multiple Embedded Processors for Fault-Tolerant Computing

    NASA Technical Reports Server (NTRS)

    Bolotin, Gary; Watson, Robert; Katanyoutanant, Sunant; Burke, Gary; Wang, Mandy

    2005-01-01

    A fault-tolerant computer architecture has been conceived in an effort to reduce vulnerability to single-event upsets (spurious bit flips caused by impingement of energetic ionizing particles or photons). As in some prior fault-tolerant architectures, the redundancy needed for fault tolerance is obtained by use of multiple processors in one computer. Unlike prior architectures, the multiple processors are embedded in a single field-programmable gate array (FPGA). What makes this new approach practical is the recent commercial availability of FPGAs that are capable of having multiple embedded processors. A working prototype (see figure) consists of two embedded IBM PowerPC 405 processor cores and a comparator built on a Xilinx Virtex-II Pro FPGA. This relatively simple instantiation of the architecture implements an error-detection scheme. A planned future version, incorporating four processors and two comparators, would correct some errors in addition to detecting them.

  14. A static data flow simulation study at Ames Research Center

    NASA Technical Reports Server (NTRS)

    Barszcz, Eric; Howard, Lauri S.

    1987-01-01

    Demands in computational power, particularly in the area of computational fluid dynamics (CFD), led NASA Ames Research Center to study advanced computer architectures. One architecture being studied is the static data flow architecture based on research done by Jack B. Dennis at MIT. To improve understanding of this architecture, a static data flow simulator, written in Pascal, has been implemented for use on a Cray X-MP/48. A matrix multiply and a two-dimensional fast Fourier transform (FFT), two algorithms used in CFD work at Ames, have been run on the simulator. Execution times can vary by a factor of more than 2 depending on the partitioning method used to assign instructions to processing elements. Service time for matching tokens has proved to be a major bottleneck. Loop control and array address calculation overhead can double the execution time. The best sustained MFLOPS rates were less than 50% of the maximum capability of the machine.

  15. Strategies for concurrent processing of complex algorithms in data driven architectures

    NASA Technical Reports Server (NTRS)

    Stoughton, John W.; Mielke, Roland R.

    1988-01-01

    The purpose is to document research to develop strategies for concurrent processing of complex algorithms in data driven architectures. The problem domain consists of decision-free algorithms having large-grained, computationally complex primitive operations. Such are often found in signal processing and control applications. The anticipated multiprocessor environment is a data flow architecture containing between two and twenty computing elements. Each computing element is a processor having local program memory, and which communicates with a common global data memory. A new graph theoretic model called ATAMM which establishes rules for relating a decomposed algorithm to its execution in a data flow architecture is presented. The ATAMM model is used to determine strategies to achieve optimum time performance and to develop a system diagnostic software tool. In addition, preliminary work on a new multiprocessor operating system based on the ATAMM specifications is described.

  16. Computer-implemented security evaluation methods, security evaluation systems, and articles of manufacture

    DOEpatents

    Muller, George; Perkins, Casey J.; Lancaster, Mary J.; MacDonald, Douglas G.; Clements, Samuel L.; Hutton, William J.; Patrick, Scott W.; Key, Bradley Robert

    2015-07-28

    Computer-implemented security evaluation methods, security evaluation systems, and articles of manufacture are described. According to one aspect, a computer-implemented security evaluation method includes accessing information regarding a physical architecture and a cyber architecture of a facility, building a model of the facility comprising a plurality of physical areas of the physical architecture, a plurality of cyber areas of the cyber architecture, and a plurality of pathways between the physical areas and the cyber areas, identifying a target within the facility, executing the model a plurality of times to simulate a plurality of attacks against the target by an adversary traversing at least one of the areas in the physical domain and at least one of the areas in the cyber domain, and using results of the executing, providing information regarding a security risk of the facility with respect to the target.

  17. Creating Communications, Computing, and Networking Technology Development Road Maps for Future NASA Human and Robotic Missions

    NASA Technical Reports Server (NTRS)

    Bhasin, Kul; Hayden, Jeffrey L.

    2005-01-01

    For human and robotic exploration missions in the Vision for Exploration, roadmaps are needed for capability development and investments based on advanced technology developments. A roadmap development process was undertaken for the needed communications, and networking capabilities and technologies for the future human and robotics missions. The underlying processes are derived from work carried out during development of the future space communications architecture, an d NASA's Space Architect Office (SAO) defined formats and structures for accumulating data. Interrelationships were established among emerging requirements, the capability analysis and technology status, and performance data. After developing an architectural communications and networking framework structured around the assumed needs for human and robotic exploration, in the vicinity of Earth, Moon, along the path to Mars, and in the vicinity of Mars, information was gathered from expert participants. This information was used to identify the capabilities expected from the new infrastructure and the technological gaps in the way of obtaining them. We define realistic, long-term space communication architectures based on emerging needs and translate the needs into interfaces, functions, and computer processing that will be required. In developing our roadmapping process, we defined requirements for achieving end-to-end activities that will be carried out by future NASA human and robotic missions. This paper describes: 10 the architectural framework developed for analysis; 2) our approach to gathering and analyzing data from NASA, industry, and academia; 3) an outline of the technology research to be done, including milestones for technology research and demonstrations with timelines; and 4) the technology roadmaps themselves.

  18. Enterprise application architecture development based on DoDAF and TOGAF

    NASA Astrophysics Data System (ADS)

    Tao, Zhi-Gang; Luo, Yun-Feng; Chen, Chang-Xin; Wang, Ming-Zhe; Ni, Feng

    2017-05-01

    For the purpose of supporting the design and analysis of enterprise application architecture, here, we report a tailored enterprise application architecture description framework and its corresponding design method. The presented framework can effectively support service-oriented architecting and cloud computing by creating the metadata model based on architecture content framework (ACF), DoDAF metamodel (DM2) and Cloud Computing Modelling Notation (CCMN). The framework also makes an effort to extend and improve the mapping between The Open Group Architecture Framework (TOGAF) application architectural inputs/outputs, deliverables and Department of Defence Architecture Framework (DoDAF)-described models. The roadmap of 52 DoDAF-described models is constructed by creating the metamodels of these described models and analysing the constraint relationship among metamodels. By combining the tailored framework and the roadmap, this article proposes a service-oriented enterprise application architecture development process. Finally, a case study is presented to illustrate the results of implementing the tailored framework in the Southern Base Management Support and Information Platform construction project using the development process proposed by the paper.

  19. Selecting a Benchmark Suite to Profile High-Performance Computing (HPC) Machines

    DTIC Science & Technology

    2014-11-01

    architectures. Machines now contain central processing units (CPUs), graphics processing units (GPUs), and many integrated core ( MIC ) architecture all...evaluate the feasibility and applicability of a new architecture just released to the market . Researchers are often unsure how available resources will...architectures. Having a suite of programs running on different architectures, such as GPUs, MICs , and CPUs, adds complexity and technical challenges

  20. Large-scale modeling of the primary visual cortex: influence of cortical architecture upon neuronal response.

    PubMed

    McLaughlin, David; Shapley, Robert; Shelley, Michael

    2003-01-01

    A large-scale computational model of a local patch of input layer 4 [Formula: see text] of the primary visual cortex (V1) of the macaque monkey, together with a coarse-grained reduction of the model, are used to understand potential effects of cortical architecture upon neuronal performance. Both the large-scale point neuron model and its asymptotic reduction are described. The work focuses upon orientation preference and selectivity, and upon the spatial distribution of neuronal responses across the cortical layer. Emphasis is given to the role of cortical architecture (the geometry of synaptic connectivity, of the ordered and disordered structure of input feature maps, and of their interplay) as mechanisms underlying cortical responses within the model. Specifically: (i) Distinct characteristics of model neuronal responses (firing rates and orientation selectivity) as they depend upon the neuron's location within the cortical layer relative to the pinwheel centers of the map of orientation preference; (ii) A time independent (DC) elevation in cortico-cortical conductances within the model, in contrast to a "push-pull" antagonism between excitation and inhibition; (iii) The use of asymptotic analysis to unveil mechanisms which underly these performances of the model; (iv) A discussion of emerging experimental data. The work illustrates that large-scale scientific computation--coupled together with analytical reduction, mathematical analysis, and experimental data, can provide significant understanding and intuition about the possible mechanisms of cortical response. It also illustrates that the idealization which is a necessary part of theoretical modeling can outline in sharp relief the consequences of differing alternative interpretations and mechanisms--with final arbiter being a body of experimental evidence whose measurements address the consequences of these analyses.

  1. Defining the computational structure of the motion detector in Drosophila

    PubMed Central

    Clark, Damon A.; Bursztyn, Limor; Horowitz, Mark; Schnitzer, Mark J.; Clandinin, Thomas R.

    2011-01-01

    SUMMARY Many animals rely on visual motion detection for survival. Motion information is extracted from spatiotemporal intensity patterns on the retina, a paradigmatic neural computation. A phenomenological model, the Hassenstein-Reichardt Correlator (HRC), relates visual inputs to neural and behavioral responses to motion, but the circuits that implement this computation remain unknown. Using cell-type specific genetic silencing, minimal motion stimuli, and in vivo calcium imaging, we examine two critical HRC inputs. These two pathways respond preferentially to light and dark moving edges. We demonstrate that these pathways perform overlapping but complementary subsets of the computations underlying the HRC. A numerical model implementing differential weighting of these operations displays the observed edge preferences. Intriguingly, these pathways are distinguished by their sensitivities to a stimulus correlation that corresponds to an illusory percept, “reverse phi”, that affects many species. Thus, this computational architecture may be widely used to achieve edge selectivity in motion detection. PMID:21689602

  2. Experimental comparison of two quantum computing architectures.

    PubMed

    Linke, Norbert M; Maslov, Dmitri; Roetteler, Martin; Debnath, Shantanu; Figgatt, Caroline; Landsman, Kevin A; Wright, Kenneth; Monroe, Christopher

    2017-03-28

    We run a selection of algorithms on two state-of-the-art 5-qubit quantum computers that are based on different technology platforms. One is a publicly accessible superconducting transmon device (www. ibm.com/ibm-q) with limited connectivity, and the other is a fully connected trapped-ion system. Even though the two systems have different native quantum interactions, both can be programed in a way that is blind to the underlying hardware, thus allowing a comparison of identical quantum algorithms between different physical systems. We show that quantum algorithms and circuits that use more connectivity clearly benefit from a better-connected system of qubits. Although the quantum systems here are not yet large enough to eclipse classical computers, this experiment exposes critical factors of scaling quantum computers, such as qubit connectivity and gate expressivity. In addition, the results suggest that codesigning particular quantum applications with the hardware itself will be paramount in successfully using quantum computers in the future.

  3. Special purpose parallel computer architecture for real-time control and simulation in robotic applications

    NASA Technical Reports Server (NTRS)

    Fijany, Amir (Inventor); Bejczy, Antal K. (Inventor)

    1993-01-01

    This is a real-time robotic controller and simulator which is a MIMD-SIMD parallel architecture for interfacing with an external host computer and providing a high degree of parallelism in computations for robotic control and simulation. It includes a host processor for receiving instructions from the external host computer and for transmitting answers to the external host computer. There are a plurality of SIMD microprocessors, each SIMD processor being a SIMD parallel processor capable of exploiting fine grain parallelism and further being able to operate asynchronously to form a MIMD architecture. Each SIMD processor comprises a SIMD architecture capable of performing two matrix-vector operations in parallel while fully exploiting parallelism in each operation. There is a system bus connecting the host processor to the plurality of SIMD microprocessors and a common clock providing a continuous sequence of clock pulses. There is also a ring structure interconnecting the plurality of SIMD microprocessors and connected to the clock for providing the clock pulses to the SIMD microprocessors and for providing a path for the flow of data and instructions between the SIMD microprocessors. The host processor includes logic for controlling the RRCS by interpreting instructions sent by the external host computer, decomposing the instructions into a series of computations to be performed by the SIMD microprocessors, using the system bus to distribute associated data among the SIMD microprocessors, and initiating activity of the SIMD microprocessors to perform the computations on the data by procedure call.

  4. Web-Based Architecture to Enable Compute-Intensive CAD Tools and Multi-user Synchronization in Teleradiology

    NASA Astrophysics Data System (ADS)

    Mehta, Neville; Kompalli, Suryaprakash; Chaudhary, Vipin

    Teleradiology is the electronic transmission of radiological patient images, such as x-rays, CT, or MR across multiple locations. The goal could be interpretation, consultation, or medical records keeping. Information technology solutions have enabled electronic records and their associated benefits are evident in health care today. However, salient aspects of collaborative interfaces, and computer assisted diagnostic (CAD) tools are yet to be integrated into workflow designs. The Computer Assisted Diagnostics and Interventions (CADI) group at the University at Buffalo has developed an architecture that facilitates web-enabled use of CAD tools, along with the novel concept of synchronized collaboration. The architecture can support multiple teleradiology applications and case studies are presented here.

  5. Design of a modular digital computer system, CDRL no. D001, final design plan

    NASA Technical Reports Server (NTRS)

    Easton, R. A.

    1975-01-01

    The engineering breadboard implementation for the CDRL no. D001 modular digital computer system developed during design of the logic system was documented. This effort followed the architecture study completed and documented previously, and was intended to verify the concepts of a fault tolerant, automatically reconfigurable, modular version of the computer system conceived during the architecture study. The system has a microprogrammed 32 bit word length, general register architecture and an instruction set consisting of a subset of the IBM System 360 instruction set plus additional fault tolerance firmware. The following areas were covered: breadboard packaging, central control element, central processing element, memory, input/output processor, and maintenance/status panel and electronics.

  6. Performance evaluation of throughput computing workloads using multi-core processors and graphics processors

    NASA Astrophysics Data System (ADS)

    Dave, Gaurav P.; Sureshkumar, N.; Blessy Trencia Lincy, S. S.

    2017-11-01

    Current trend in processor manufacturing focuses on multi-core architectures rather than increasing the clock speed for performance improvement. Graphic processors have become as commodity hardware for providing fast co-processing in computer systems. Developments in IoT, social networking web applications, big data created huge demand for data processing activities and such kind of throughput intensive applications inherently contains data level parallelism which is more suited for SIMD architecture based GPU. This paper reviews the architectural aspects of multi/many core processors and graphics processors. Different case studies are taken to compare performance of throughput computing applications using shared memory programming in OpenMP and CUDA API based programming.

  7. Architecture for reactive planning of robot actions

    NASA Astrophysics Data System (ADS)

    Riekki, Jukka P.; Roening, Juha

    1995-01-01

    In this article, a reactive system for planning robot actions is described. The described hierarchical control system architecture consists of planning-executing-monitoring-modelling elements (PEMM elements). A PEMM element is a goal-oriented, combined processing and data element. It includes a planner, an executor, a monitor, a modeler, and a local model. The elements form a tree-like structure. An element receives tasks from its ancestor and sends subtasks to its descendants. The model knowledge is distributed into the local models, which are connected to each other. The elements can be synchronized. The PEMM architecture is strictly hierarchical. It integrated planning, sensing, and modelling into a single framework. A PEMM-based control system is reactive, as it can cope with asynchronous events and operate under time constraints. The control system is intended to be used primarily to control mobile robots and robot manipulators in dynamic and partially unknown environments. It is suitable especially for applications consisting of physically separated devices and computing resources.

  8. Multi-processor including data flow accelerator module

    DOEpatents

    Davidson, George S.; Pierce, Paul E.

    1990-01-01

    An accelerator module for a data flow computer includes an intelligent memory. The module is added to a multiprocessor arrangement and uses a shared tagged memory architecture in the data flow computer. The intelligent memory module assigns locations for holding data values in correspondence with arcs leading to a node in a data dependency graph. Each primitive computation is associated with a corresponding memory cell, including a number of slots for operands needed to execute a primitive computation, a primitive identifying pointer, and linking slots for distributing the result of the cell computation to other cells requiring that result as an operand. Circuitry is provided for utilizing tag bits to determine automatically when all operands required by a processor are available and for scheduling the primitive for execution in a queue. Each memory cell of the module may be associated with any of the primitives, and the particular primitive to be executed by the processor associated with the cell is identified by providing an index, such as the cell number for the primitive, to the primitive lookup table of starting addresses. The module thus serves to perform functions previously performed by a number of sections of data flow architectures and coexists with conventional shared memory therein. A multiprocessing system including the module operates in a hybrid mode, wherein the same processing modules are used to perform some processing in a sequential mode, under immediate control of an operating system, while performing other processing in a data flow mode.

  9. An ATR architecture for algorithm development and testing

    NASA Astrophysics Data System (ADS)

    Breivik, Gøril M.; Løkken, Kristin H.; Brattli, Alvin; Palm, Hans C.; Haavardsholm, Trym

    2013-05-01

    A research platform with four cameras in the infrared and visible spectral domains is under development at the Norwegian Defence Research Establishment (FFI). The platform will be mounted on a high-speed jet aircraft and will primarily be used for image acquisition and for development and test of automatic target recognition (ATR) algorithms. The sensors on board produce large amounts of data, the algorithms can be computationally intensive and the data processing is complex. This puts great demands on the system architecture; it has to run in real-time and at the same time be suitable for algorithm development. In this paper we present an architecture for ATR systems that is designed to be exible, generic and efficient. The architecture is module based so that certain parts, e.g. specific ATR algorithms, can be exchanged without affecting the rest of the system. The modules are generic and can be used in various ATR system configurations. A software framework in C++ that handles large data ows in non-linear pipelines is used for implementation. The framework exploits several levels of parallelism and lets the hardware processing capacity be fully utilised. The ATR system is under development and has reached a first level that can be used for segmentation algorithm development and testing. The implemented system consists of several modules, and although their content is still limited, the segmentation module includes two different segmentation algorithms that can be easily exchanged. We demonstrate the system by applying the two segmentation algorithms to infrared images from sea trial recordings.

  10. Data Telemetry and Acquisition System for Acoustic Signal Processing Investigations.

    DTIC Science & Technology

    1996-02-20

    were VME- based computer systems operating under the VxWorks real - time operating system . Each system shared a common hardware and software... real - time operating system . It interfaces to the Berg PCM Decommutator board, which searches for the embedded synchronization word in the data and re...software were built on top of this architecture. The multi-tasking, message queue and memory management facilities of the VxWorks real - time operating system are

  11. Aero-Structural Assessment of an Inflatable Aerodynamic Decelerator

    NASA Technical Reports Server (NTRS)

    Sheta, Essam F.; Venugopalan, Vinod; Tan, X. G.; Liever, Peter A.; Habchi, Sami D.

    2010-01-01

    NASA is conducting an Entry, Descent and Landing Systems Analysis (EDL-SA) Study to determine the key technology development projects that should be undertaken for enabling the landing of large payloads on Mars for both human and robotic missions. Inflatable Aerodynamic Decelerators (IADs) are one of the candidate technologies. A variety of EDL architectures are under consideration. The current effort is conducted for development and simulations of computational framework for inflatable structures.

  12. Examining the architecture of cellular computing through a comparative study with a computer

    PubMed Central

    Wang, Degeng; Gribskov, Michael

    2005-01-01

    The computer and the cell both use information embedded in simple coding, the binary software code and the quadruple genomic code, respectively, to support system operations. A comparative examination of their system architecture as well as their information storage and utilization schemes is performed. On top of the code, both systems display a modular, multi-layered architecture, which, in the case of a computer, arises from human engineering efforts through a combination of hardware implementation and software abstraction. Using the computer as a reference system, a simplistic mapping of the architectural components between the two is easily detected. This comparison also reveals that a cell abolishes the software–hardware barrier through genomic encoding for the constituents of the biochemical network, a cell's ‘hardware’ equivalent to the computer central processing unit (CPU). The information loading (gene expression) process acts as a major determinant of the encoded constituent's abundance, which, in turn, often determines the ‘bandwidth’ of a biochemical pathway. Cellular processes are implemented in biochemical pathways in parallel manners. In a computer, on the other hand, the software provides only instructions and data for the CPU. A process represents just sequentially ordered actions by the CPU and only virtual parallelism can be implemented through CPU time-sharing. Whereas process management in a computer may simply mean job scheduling, coordinating pathway bandwidth through the gene expression machinery represents a major process management scheme in a cell. In summary, a cell can be viewed as a super-parallel computer, which computes through controlled hardware composition. While we have, at best, a very fragmented understanding of cellular operation, we have a thorough understanding of the computer throughout the engineering process. The potential utilization of this knowledge to the benefit of systems biology is discussed. PMID:16849179

  13. Examining the architecture of cellular computing through a comparative study with a computer.

    PubMed

    Wang, Degeng; Gribskov, Michael

    2005-06-22

    The computer and the cell both use information embedded in simple coding, the binary software code and the quadruple genomic code, respectively, to support system operations. A comparative examination of their system architecture as well as their information storage and utilization schemes is performed. On top of the code, both systems display a modular, multi-layered architecture, which, in the case of a computer, arises from human engineering efforts through a combination of hardware implementation and software abstraction. Using the computer as a reference system, a simplistic mapping of the architectural components between the two is easily detected. This comparison also reveals that a cell abolishes the software-hardware barrier through genomic encoding for the constituents of the biochemical network, a cell's "hardware" equivalent to the computer central processing unit (CPU). The information loading (gene expression) process acts as a major determinant of the encoded constituent's abundance, which, in turn, often determines the "bandwidth" of a biochemical pathway. Cellular processes are implemented in biochemical pathways in parallel manners. In a computer, on the other hand, the software provides only instructions and data for the CPU. A process represents just sequentially ordered actions by the CPU and only virtual parallelism can be implemented through CPU time-sharing. Whereas process management in a computer may simply mean job scheduling, coordinating pathway bandwidth through the gene expression machinery represents a major process management scheme in a cell. In summary, a cell can be viewed as a super-parallel computer, which computes through controlled hardware composition. While we have, at best, a very fragmented understanding of cellular operation, we have a thorough understanding of the computer throughout the engineering process. The potential utilization of this knowledge to the benefit of systems biology is discussed.

  14. SNAVA-A real-time multi-FPGA multi-model spiking neural network simulation architecture.

    PubMed

    Sripad, Athul; Sanchez, Giovanny; Zapata, Mireya; Pirrone, Vito; Dorta, Taho; Cambria, Salvatore; Marti, Albert; Krishnamourthy, Karthikeyan; Madrenas, Jordi

    2018-01-01

    Spiking Neural Networks (SNN) for Versatile Applications (SNAVA) simulation platform is a scalable and programmable parallel architecture that supports real-time, large-scale, multi-model SNN computation. This parallel architecture is implemented in modern Field-Programmable Gate Arrays (FPGAs) devices to provide high performance execution and flexibility to support large-scale SNN models. Flexibility is defined in terms of programmability, which allows easy synapse and neuron implementation. This has been achieved by using a special-purpose Processing Elements (PEs) for computing SNNs, and analyzing and customizing the instruction set according to the processing needs to achieve maximum performance with minimum resources. The parallel architecture is interfaced with customized Graphical User Interfaces (GUIs) to configure the SNN's connectivity, to compile the neuron-synapse model and to monitor SNN's activity. Our contribution intends to provide a tool that allows to prototype SNNs faster than on CPU/GPU architectures but significantly cheaper than fabricating a customized neuromorphic chip. This could be potentially valuable to the computational neuroscience and neuromorphic engineering communities. Copyright © 2017 Elsevier Ltd. All rights reserved.

  15. Design and Analysis of a Neuromemristive Reservoir Computing Architecture for Biosignal Processing

    PubMed Central

    Kudithipudi, Dhireesha; Saleh, Qutaiba; Merkel, Cory; Thesing, James; Wysocki, Bryant

    2016-01-01

    Reservoir computing (RC) is gaining traction in several signal processing domains, owing to its non-linear stateful computation, spatiotemporal encoding, and reduced training complexity over recurrent neural networks (RNNs). Previous studies have shown the effectiveness of software-based RCs for a wide spectrum of applications. A parallel body of work indicates that realizing RNN architectures using custom integrated circuits and reconfigurable hardware platforms yields significant improvements in power and latency. In this research, we propose a neuromemristive RC architecture, with doubly twisted toroidal structure, that is validated for biosignal processing applications. We exploit the device mismatch to implement the random weight distributions within the reservoir and propose mixed-signal subthreshold circuits for energy efficiency. A comprehensive analysis is performed to compare the efficiency of the neuromemristive RC architecture in both digital(reconfigurable) and subthreshold mixed-signal realizations. Both Electroencephalogram (EEG) and Electromyogram (EMG) biosignal benchmarks are used for validating the RC designs. The proposed RC architecture demonstrated an accuracy of 90 and 84% for epileptic seizure detection and EMG prosthetic finger control, respectively. PMID:26869876

  16. New paradigms in internal architecture design and freeform fabrication of tissue engineering porous scaffolds.

    PubMed

    Yoo, Dongjin

    2012-07-01

    Advanced additive manufacture (AM) techniques are now being developed to fabricate scaffolds with controlled internal pore architectures in the field of tissue engineering. In general, these techniques use a hybrid method which combines computer-aided design (CAD) with computer-aided manufacturing (CAM) tools to design and fabricate complicated three-dimensional (3D) scaffold models. The mathematical descriptions of micro-architectures along with the macro-structures of the 3D scaffold models are limited by current CAD technologies as well as by the difficulty of transferring the designed digital models to standard formats for fabrication. To overcome these difficulties, we have developed an efficient internal pore architecture design system based on triply periodic minimal surface (TPMS) unit cell libraries and associated computational methods to assemble TPMS unit cells into an entire scaffold model. In addition, we have developed a process planning technique based on TPMS internal architecture pattern of unit cells to generate tool paths for freeform fabrication of tissue engineering porous scaffolds. Copyright © 2012 IPEM. Published by Elsevier Ltd. All rights reserved.

  17. Embedded Data Processor and Portable Computer Technology testbeds

    NASA Technical Reports Server (NTRS)

    Alena, Richard; Liu, Yuan-Kwei; Goforth, Andre; Fernquist, Alan R.

    1993-01-01

    Attention is given to current activities in the Embedded Data Processor and Portable Computer Technology testbed configurations that are part of the Advanced Data Systems Architectures Testbed at the Information Sciences Division at NASA Ames Research Center. The Embedded Data Processor Testbed evaluates advanced microprocessors for potential use in mission and payload applications within the Space Station Freedom Program. The Portable Computer Technology (PCT) Testbed integrates and demonstrates advanced portable computing devices and data system architectures. The PCT Testbed uses both commercial and custom-developed devices to demonstrate the feasibility of functional expansion and networking for portable computers in flight missions.

  18. Development of Onboard Computer Complex for Russian Segment of ISS

    NASA Technical Reports Server (NTRS)

    Branets, V.; Brand, G.; Vlasov, R.; Graf, I.; Clubb, J.; Mikrin, E.; Samitov, R.

    1998-01-01

    Report present a description of the Onboard Computer Complex (CC) that was developed during the period of 1994-1998 for the Russian Segment of ISS. The system was developed in co-operation with NASA and ESA. ESA developed a new computation system under the RSC Energia Technical Assignment, called DMS-R. The CC also includes elements developed by Russian experts and organizations. A general architecture of the computer system and the characteristics of primary elements of this system are described. The system was integrated at RSC Energia with the participation of American and European specialists. The report contains information on software simulators, verification and de-bugging facilities witch were been developed for both stand-alone and integrated tests and verification. This CC serves as the basis for the Russian Segment Onboard Control Complex on ISS.

  19. Prospective Architectures for Onboard vs Cloud-Based Decision Making for Unmanned Aerial Systems

    NASA Technical Reports Server (NTRS)

    Sankararaman, Shankar; Teubert, Christopher

    2017-01-01

    This paper investigates propsective architectures for decision-making in unmanned aerial systems. When these unmanned vehicles operate in urban environments, there are several sources of uncertainty that affect their behavior, and decision-making algorithms need to be robust to account for these different sources of uncertainty. It is important to account for several risk-factors that affect the flight of these unmanned systems, and facilitate decision-making by taking into consideration these various risk-factors. In addition, there are several technical challenges related to autonomous flight of unmanned aerial systems; these challenges include sensing, obstacle detection, path planning and navigation, trajectory generation and selection, etc. Many of these activities require significant computational power and in many situations, all of these activities need to be performed in real-time. In order to efficiently integrate these activities, it is important to develop a systematic architecture that can facilitate real-time decision-making. Four prospective architectures are discussed in this paper; on one end of the spectrum, the first architecture considers all activities/computations being performed onboard the vehicle whereas on the other end of the spectrum, the fourth and final architecture considers all activities/computations being performed in the cloud, using a new service known as Prognostics as a Service that is being developed at NASA Ames Research Center. The four different architectures are compared, their advantages and disadvantages are explained and conclusions are presented.

  20. Survey of Intelligent Computer-Aided Training

    NASA Technical Reports Server (NTRS)

    Loftin, R. B.; Savely, Robert T.

    1992-01-01

    Intelligent Computer-Aided Training (ICAT) systems integrate artificial intelligence and simulation technologies to deliver training for complex, procedural tasks in a distributed, workstation-based environment. Such systems embody both the knowledge of how to perform a task and how to train someone to perform that task. This paper briefly reviews the antecedents of ICAT systems and describes the approach to their creation developed at the NASA Lyndon B. Johnson Space Center. In addition to the general ICAT architecture, specific ICAT applications that have been or are currently under development are discussed. ICAT systems can offer effective solutions to a number of training problems of interest to the aerospace community.

  1. Automated quantitative muscle biopsy analysis system

    NASA Technical Reports Server (NTRS)

    Castleman, Kenneth R. (Inventor)

    1980-01-01

    An automated system to aid the diagnosis of neuromuscular diseases by producing fiber size histograms utilizing histochemically stained muscle biopsy tissue. Televised images of the microscopic fibers are processed electronically by a multi-microprocessor computer, which isolates, measures, and classifies the fibers and displays the fiber size distribution. The architecture of the multi-microprocessor computer, which is iterated to any required degree of complexity, features a series of individual microprocessors P.sub.n each receiving data from a shared memory M.sub.n-1 and outputing processed data to a separate shared memory M.sub.n+1 under control of a program stored in dedicated memory M.sub.n.

  2. NASA Space Engineering Research Center for VLSI systems design

    NASA Technical Reports Server (NTRS)

    1991-01-01

    This annual review reports the center's activities and findings on very large scale integration (VLSI) systems design for 1990, including project status, financial support, publications, the NASA Space Engineering Research Center (SERC) Symposium on VLSI Design, research results, and outreach programs. Processor chips completed or under development are listed. Research results summarized include a design technique to harden complementary metal oxide semiconductors (CMOS) memory circuits against single event upset (SEU); improved circuit design procedures; and advances in computer aided design (CAD), communications, computer architectures, and reliability design. Also described is a high school teacher program that exposes teachers to the fundamentals of digital logic design.

  3. Learning of embodied interaction dynamics with recurrent neural networks: some exploratory experiments.

    PubMed

    Oubbati, Mohamed; Kord, Bahram; Koprinkova-Hristova, Petia; Palm, Günther

    2014-04-01

    The new tendency of artificial intelligence suggests that intelligence must be seen as a result of the interaction between brains, bodies and environments. This view implies that designing sophisticated behaviour requires a primary focus on how agents are functionally coupled to their environments. Under this perspective, we present early results with the application of reservoir computing as an efficient tool to understand how behaviour emerges from interaction. Specifically, we present reservoir computing models, that are inspired by imitation learning designs, to extract the essential components of behaviour that results from agent-environment interaction dynamics. Experimental results using a mobile robot are reported to validate the learning architectures.

  4. Learning of embodied interaction dynamics with recurrent neural networks: some exploratory experiments

    NASA Astrophysics Data System (ADS)

    Oubbati, Mohamed; Kord, Bahram; Koprinkova-Hristova, Petia; Palm, Günther

    2014-04-01

    The new tendency of artificial intelligence suggests that intelligence must be seen as a result of the interaction between brains, bodies and environments. This view implies that designing sophisticated behaviour requires a primary focus on how agents are functionally coupled to their environments. Under this perspective, we present early results with the application of reservoir computing as an efficient tool to understand how behaviour emerges from interaction. Specifically, we present reservoir computing models, that are inspired by imitation learning designs, to extract the essential components of behaviour that results from agent-environment interaction dynamics. Experimental results using a mobile robot are reported to validate the learning architectures.

  5. Livermore Big Artificial Neural Network Toolkit

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Essen, Brian Van; Jacobs, Sam; Kim, Hyojin

    2016-07-01

    LBANN is a toolkit that is designed to train artificial neural networks efficiently on high performance computing architectures. It is optimized to take advantages of key High Performance Computing features to accelerate neural network training. Specifically it is optimized for low-latency, high bandwidth interconnects, node-local NVRAM, node-local GPU accelerators, and high bandwidth parallel file systems. It is built on top of the open source Elemental distributed-memory dense and spars-direct linear algebra and optimization library that is released under the BSD license. The algorithms contained within LBANN are drawn from the academic literature and implemented to work within a distributed-memory framework.

  6. CMOL/CMOS hardware architectures and performance/price for Bayesian memory - The building block of intelligent systems

    NASA Astrophysics Data System (ADS)

    Zaveri, Mazad Shaheriar

    The semiconductor/computer industry has been following Moore's law for several decades and has reaped the benefits in speed and density of the resultant scaling. Transistor density has reached almost one billion per chip, and transistor delays are in picoseconds. However, scaling has slowed down, and the semiconductor industry is now facing several challenges. Hybrid CMOS/nano technologies, such as CMOL, are considered as an interim solution to some of the challenges. Another potential architectural solution includes specialized architectures for applications/models in the intelligent computing domain, one aspect of which includes abstract computational models inspired from the neuro/cognitive sciences. Consequently in this dissertation, we focus on the hardware implementations of Bayesian Memory (BM), which is a (Bayesian) Biologically Inspired Computational Model (BICM). This model is a simplified version of George and Hawkins' model of the visual cortex, which includes an inference framework based on Judea Pearl's belief propagation. We then present a "hardware design space exploration" methodology for implementing and analyzing the (digital and mixed-signal) hardware for the BM. This particular methodology involves: analyzing the computational/operational cost and the related micro-architecture, exploring candidate hardware components, proposing various custom hardware architectures using both traditional CMOS and hybrid nanotechnology - CMOL, and investigating the baseline performance/price of these architectures. The results suggest that CMOL is a promising candidate for implementing a BM. Such implementations can utilize the very high density storage/computation benefits of these new nano-scale technologies much more efficiently; for example, the throughput per 858 mm2 (TPM) obtained for CMOL based architectures is 32 to 40 times better than the TPM for a CMOS based multiprocessor/multi-FPGA system, and almost 2000 times better than the TPM for a PC implementation. We later use this methodology to investigate the hardware implementations of cortex-scale spiking neural system, which is an approximate neural equivalent of BICM based cortex-scale system. The results of this investigation also suggest that CMOL is a promising candidate to implement such large-scale neuromorphic systems. In general, the assessment of such hypothetical baseline hardware architectures provides the prospects for building large-scale (mammalian cortex-scale) implementations of neuromorphic/Bayesian/intelligent systems using state-of-the-art and beyond state-of-the-art silicon structures.

  7. Implementation theory of distortion-invariant pattern recognition for optical and digital signal processing systems

    NASA Astrophysics Data System (ADS)

    Lhamon, Michael Earl

    A pattern recognition system which uses complex correlation filter banks requires proportionally more computational effort than single-real valued filters. This introduces increased computation burden but also introduces a higher level of parallelism, that common computing platforms fail to identify. As a result, we consider algorithm mapping to both optical and digital processors. For digital implementation, we develop computationally efficient pattern recognition algorithms, referred to as, vector inner product operators that require less computational effort than traditional fast Fourier methods. These algorithms do not need correlation and they map readily onto parallel digital architectures, which imply new architectures for optical processors. These filters exploit circulant-symmetric matrix structures of the training set data representing a variety of distortions. By using the same mathematical basis as with the vector inner product operations, we are able to extend the capabilities of more traditional correlation filtering to what we refer to as "Super Images". These "Super Images" are used to morphologically transform a complicated input scene into a predetermined dot pattern. The orientation of the dot pattern is related to the rotational distortion of the object of interest. The optical implementation of "Super Images" yields feature reduction necessary for using other techniques, such as artificial neural networks. We propose a parallel digital signal processor architecture based on specific pattern recognition algorithms but general enough to be applicable to other similar problems. Such an architecture is classified as a data flow architecture. Instead of mapping an algorithm to an architecture, we propose mapping the DSP architecture to a class of pattern recognition algorithms. Today's optical processing systems have difficulties implementing full complex filter structures. Typically, optical systems (like the 4f correlators) are limited to phase-only implementation with lower detection performance than full complex electronic systems. Our study includes pseudo-random pixel encoding techniques for approximating full complex filtering. Optical filter bank implementation is possible and they have the advantage of time averaging the entire filter bank at real time rates. Time-averaged optical filtering is computational comparable to billions of digital operations-per-second. For this reason, we believe future trends in high speed pattern recognition will involve hybrid architectures of both optical and DSP elements.

  8. Super and parallel computers and their impact on civil engineering

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kamat, M.P.

    1986-01-01

    This book presents the papers given at a conference on the use of supercomputers in civil engineering. Topics considered at the conference included solving nonlinear equations on a hypercube, a custom architectured parallel processing system, distributed data processing, algorithms, computer architecture, parallel processing, vector processing, computerized simulation, and cost benefit analysis.

  9. p88110: A Graphical Simulator for Computer Architecture and Organization Courses

    ERIC Educational Resources Information Center

    Garcia, M. I.; Rodriguez, S.; Perez, A.; Garcia, A.

    2009-01-01

    Studying fundamental Computer Architecture and Organization topics requires a significant amount of practical work if students are to acquire a good grasp of the theoretical concepts presented in classroom lectures or textbooks. The use of simulators is commonly adopted in order to reach this objective. However, as most of the available…

  10. Component architecture in drug discovery informatics.

    PubMed

    Smith, Peter M

    2002-05-01

    This paper reviews the characteristics of a new model of computing that has been spurred on by the Internet, known as Netcentric computing. Developments in this model led to distributed component architectures, which, although not new ideas, are now realizable with modern tools such as Enterprise Java. The application of this approach to scientific computing, particularly in pharmaceutical discovery research, is discussed and highlighted by a particular case involving the management of biological assay data.

  11. FPGA-based architecture for motion recovering in real-time

    NASA Astrophysics Data System (ADS)

    Arias-Estrada, Miguel; Maya-Rueda, Selene E.; Torres-Huitzil, Cesar

    2002-03-01

    A key problem in the computer vision field is the measurement of object motion in a scene. The main goal is to compute an approximation of the 3D motion from the analysis of an image sequence. Once computed, this information can be used as a basis to reach higher level goals in different applications. Motion estimation algorithms pose a significant computational load for the sequential processors limiting its use in practical applications. In this work we propose a hardware architecture for motion estimation in real time based on FPGA technology. The technique used for motion estimation is Optical Flow due to its accuracy, and the density of velocity estimation, however other techniques are being explored. The architecture is composed of parallel modules working in a pipeline scheme to reach high throughput rates near gigaflops. The modules are organized in a regular structure to provide a high degree of flexibility to cover different applications. Some results will be presented and the real-time performance will be discussed and analyzed. The architecture is prototyped in an FPGA board with a Virtex device interfaced to a digital imager.

  12. Application of CHAD hydrodynamics to shock-wave problems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Trease, H.E.; O`Rourke, P.J.; Sahota, M.S.

    1997-12-31

    CHAD is the latest in a sequence of continually evolving computer codes written to effectively utilize massively parallel computer architectures and the latest grid generators for unstructured meshes. Its applications range from automotive design issues such as in-cylinder and manifold flows of internal combustion engines, vehicle aerodynamics, underhood cooling and passenger compartment heating, ventilation, and air conditioning to shock hydrodynamics and materials modeling. CHAD solves the full unsteady Navier-Stoke equations with the k-epsilon turbulence model in three space dimensions. The code has four major features that distinguish it from the earlier KIVA code, also developed at Los Alamos. First, itmore » is based on a node-centered, finite-volume method in which, like finite element methods, all fluid variables are located at computational nodes. The computational mesh efficiently and accurately handles all element shapes ranging from tetrahedra to hexahedra. Second, it is written in standard Fortran 90 and relies on automatic domain decomposition and a universal communication library written in standard C and MPI for unstructured grids to effectively exploit distributed-memory parallel architectures. Thus the code is fully portable to a variety of computing platforms such as uniprocessor workstations, symmetric multiprocessors, clusters of workstations, and massively parallel platforms. Third, CHAD utilizes a variable explicit/implicit upwind method for convection that improves computational efficiency in flows that have large velocity Courant number variations due to velocity of mesh size variations. Fourth, CHAD is designed to also simulate shock hydrodynamics involving multimaterial anisotropic behavior under high shear. The authors will discuss CHAD capabilities and show several sample calculations showing the strengths and weaknesses of CHAD.« less

  13. Parallel, stochastic measurement of molecular surface area.

    PubMed

    Juba, Derek; Varshney, Amitabh

    2008-08-01

    Biochemists often wish to compute surface areas of proteins. A variety of algorithms have been developed for this task, but they are designed for traditional single-processor architectures. The current trend in computer hardware is towards increasingly parallel architectures for which these algorithms are not well suited. We describe a parallel, stochastic algorithm for molecular surface area computation that maps well to the emerging multi-core architectures. Our algorithm is also progressive, providing a rough estimate of surface area immediately and refining this estimate as time goes on. Furthermore, the algorithm generates points on the molecular surface which can be used for point-based rendering. We demonstrate a GPU implementation of our algorithm and show that it compares favorably with several existing molecular surface computation programs, giving fast estimates of the molecular surface area with good accuracy.

  14. Selecting an Architecture for a Safety-Critical Distributed Computer System with Power, Weight and Cost Considerations

    NASA Technical Reports Server (NTRS)

    Torres-Pomales, Wilfredo

    2014-01-01

    This report presents an example of the application of multi-criteria decision analysis to the selection of an architecture for a safety-critical distributed computer system. The design problem includes constraints on minimum system availability and integrity, and the decision is based on the optimal balance of power, weight and cost. The analysis process includes the generation of alternative architectures, evaluation of individual decision criteria, and the selection of an alternative based on overall value. In this example presented here, iterative application of the quantitative evaluation process made it possible to deliberately generate an alternative architecture that is superior to all others regardless of the relative importance of cost.

  15. Fault tolerant architectures for integrated aircraft electronics systems, task 2

    NASA Technical Reports Server (NTRS)

    Levitt, K. N.; Melliar-Smith, P. M.; Schwartz, R. L.

    1984-01-01

    The architectural basis for an advanced fault tolerant on-board computer to succeed the current generation of fault tolerant computers is examined. The network error tolerant system architecture is studied with particular attention to intercluster configurations and communication protocols, and to refined reliability estimates. The diagnosis of faults, so that appropriate choices for reconfiguration can be made is discussed. The analysis relates particularly to the recognition of transient faults in a system with tasks at many levels of priority. The demand driven data-flow architecture, which appears to have possible application in fault tolerant systems is described and work investigating the feasibility of automatic generation of aircraft flight control programs from abstract specifications is reported.

  16. Experimental Demonstration of a Self-organized Architecture for Emerging Grid Computing Applications on OBS Testbed

    NASA Astrophysics Data System (ADS)

    Liu, Lei; Hong, Xiaobin; Wu, Jian; Lin, Jintong

    As Grid computing continues to gain popularity in the industry and research community, it also attracts more attention from the customer level. The large number of users and high frequency of job requests in the consumer market make it challenging. Clearly, all the current Client/Server(C/S)-based architecture will become unfeasible for supporting large-scale Grid applications due to its poor scalability and poor fault-tolerance. In this paper, based on our previous works [1, 2], a novel self-organized architecture to realize a highly scalable and flexible platform for Grids is proposed. Experimental results show that this architecture is suitable and efficient for consumer-oriented Grids.

  17. A cognitive computational model inspired by the immune system response.

    PubMed

    Abdo Abd Al-Hady, Mohamed; Badr, Amr Ahmed; Mostafa, Mostafa Abd Al-Azim

    2014-01-01

    The immune system has a cognitive ability to differentiate between healthy and unhealthy cells. The immune system response (ISR) is stimulated by a disorder in the temporary fuzzy state that is oscillating between the healthy and unhealthy states. However, modeling the immune system is an enormous challenge; the paper introduces an extensive summary of how the immune system response functions, as an overview of a complex topic, to present the immune system as a cognitive intelligent agent. The homogeneity and perfection of the natural immune system have been always standing out as the sought-after model we attempted to imitate while building our proposed model of cognitive architecture. The paper divides the ISR into four logical phases: setting a computational architectural diagram for each phase, proceeding from functional perspectives (input, process, and output), and their consequences. The proposed architecture components are defined by matching biological operations with computational functions and hence with the framework of the paper. On the other hand, the architecture focuses on the interoperability of main theoretical immunological perspectives (classic, cognitive, and danger theory), as related to computer science terminologies. The paper presents a descriptive model of immune system, to figure out the nature of response, deemed to be intrinsic for building a hybrid computational model based on a cognitive intelligent agent perspective and inspired by the natural biology. To that end, this paper highlights the ISR phases as applied to a case study on hepatitis C virus, meanwhile illustrating our proposed architecture perspective.

  18. A Cognitive Computational Model Inspired by the Immune System Response

    PubMed Central

    Abdo Abd Al-Hady, Mohamed; Badr, Amr Ahmed; Mostafa, Mostafa Abd Al-Azim

    2014-01-01

    The immune system has a cognitive ability to differentiate between healthy and unhealthy cells. The immune system response (ISR) is stimulated by a disorder in the temporary fuzzy state that is oscillating between the healthy and unhealthy states. However, modeling the immune system is an enormous challenge; the paper introduces an extensive summary of how the immune system response functions, as an overview of a complex topic, to present the immune system as a cognitive intelligent agent. The homogeneity and perfection of the natural immune system have been always standing out as the sought-after model we attempted to imitate while building our proposed model of cognitive architecture. The paper divides the ISR into four logical phases: setting a computational architectural diagram for each phase, proceeding from functional perspectives (input, process, and output), and their consequences. The proposed architecture components are defined by matching biological operations with computational functions and hence with the framework of the paper. On the other hand, the architecture focuses on the interoperability of main theoretical immunological perspectives (classic, cognitive, and danger theory), as related to computer science terminologies. The paper presents a descriptive model of immune system, to figure out the nature of response, deemed to be intrinsic for building a hybrid computational model based on a cognitive intelligent agent perspective and inspired by the natural biology. To that end, this paper highlights the ISR phases as applied to a case study on hepatitis C virus, meanwhile illustrating our proposed architecture perspective. PMID:25003131

  19. From Petascale to Exascale: Eight Focus Areas of R&D Challenges for HPC Simulation Environments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Springmeyer, R; Still, C; Schulz, M

    2011-03-17

    Programming models bridge the gap between the underlying hardware architecture and the supporting layers of software available to applications. Programming models are different from both programming languages and application programming interfaces (APIs). Specifically, a programming model is an abstraction of the underlying computer system that allows for the expression of both algorithms and data structures. In comparison, languages and APIs provide implementations of these abstractions and allow the algorithms and data structures to be put into practice - a programming model exists independently of the choice of both the programming language and the supporting APIs. Programming models are typically focusedmore » on achieving increased developer productivity, performance, and portability to other system designs. The rapidly changing nature of processor architectures and the complexity of designing an exascale platform provide significant challenges for these goals. Several other factors are likely to impact the design of future programming models. In particular, the representation and management of increasing levels of parallelism, concurrency and memory hierarchies, combined with the ability to maintain a progressive level of interoperability with today's applications are of significant concern. Overall the design of a programming model is inherently tied not only to the underlying hardware architecture, but also to the requirements of applications and libraries including data analysis, visualization, and uncertainty quantification. Furthermore, the successful implementation of a programming model is dependent on exposed features of the runtime software layers and features of the operating system. Successful use of a programming model also requires effective presentation to the software developer within the context of traditional and new software development tools. Consideration must also be given to the impact of programming models on both languages and the associated compiler infrastructure. Exascale programming models must reflect several, often competing, design goals. These design goals include desirable features such as abstraction and separation of concerns. However, some aspects are unique to large-scale computing. For example, interoperability and composability with existing implementations will prove critical. In particular, performance is the essential underlying goal for large-scale systems. A key evaluation metric for exascale models will be the extent to which they support these goals rather than merely enable them.« less

  20. Advanced and secure architectural EHR approaches.

    PubMed

    Blobel, Bernd

    2006-01-01

    Electronic Health Records (EHRs) provided as a lifelong patient record advance towards core applications of distributed and co-operating health information systems and health networks. For meeting the challenge of scalable, flexible, portable, secure EHR systems, the underlying EHR architecture must be based on the component paradigm and model driven, separating platform-independent and platform-specific models. Allowing manageable models, real systems must be decomposed and simplified. The resulting modelling approach has to follow the ISO Reference Model - Open Distributing Processing (RM-ODP). The ISO RM-ODP describes any system component from different perspectives. Platform-independent perspectives contain the enterprise view (business process, policies, scenarios, use cases), the information view (classes and associations) and the computational view (composition and decomposition), whereas platform-specific perspectives concern the engineering view (physical distribution and realisation) and the technology view (implementation details from protocols up to education and training) on system components. Those views have to be established for components reflecting aspects of all domains involved in healthcare environments including administrative, legal, medical, technical, etc. Thus, security-related component models reflecting all view mentioned have to be established for enabling both application and communication security services as integral part of the system's architecture. Beside decomposition and simplification of system regarding the different viewpoint on their components, different levels of systems' granularity can be defined hiding internals or focusing on properties of basic components to form a more complex structure. The resulting models describe both structure and behaviour of component-based systems. The described approach has been deployed in different projects defining EHR systems and their underlying architectural principles. In that context, the Australian GEHR project, the openEHR initiative, the revision of CEN ENV 13606 "Electronic Health Record communication", all based on Archetypes, but also the HL7 version 3 activities are discussed in some detail. The latter include the HL7 RIM, the HL7 Development Framework, the HL7's clinical document architecture (CDA) as well as the set of models from use cases, activity diagrams, sequence diagrams up to Domain Information Models (DMIMs) and their building blocks Common Message Element Types (CMET) Constraining Models to their underlying concepts. The future-proof EHR architecture as open, user-centric, user-friendly, flexible, scalable, portable core application in health information systems and health networks has to follow advanced architectural paradigms.

  1. Experimental Evaluation and Workload Characterization for High-Performance Computer Architectures

    NASA Technical Reports Server (NTRS)

    El-Ghazawi, Tarek A.

    1995-01-01

    This research is conducted in the context of the Joint NSF/NASA Initiative on Evaluation (JNNIE). JNNIE is an inter-agency research program that goes beyond typical.bencbking to provide and in-depth evaluations and understanding of the factors that limit the scalability of high-performance computing systems. Many NSF and NASA centers have participated in the effort. Our research effort was an integral part of implementing JNNIE in the NASA ESS grand challenge applications context. Our research work under this program was composed of three distinct, but related activities. They include the evaluation of NASA ESS high- performance computing testbeds using the wavelet decomposition application; evaluation of NASA ESS testbeds using astrophysical simulation applications; and developing an experimental model for workload characterization for understanding workload requirements. In this report, we provide a summary of findings that covers all three parts, a list of the publications that resulted from this effort, and three appendices with the details of each of the studies using a key publication developed under the respective work.

  2. Generic Divide and Conquer Internet-Based Computing

    NASA Technical Reports Server (NTRS)

    Follen, Gregory J. (Technical Monitor); Radenski, Atanas

    2003-01-01

    The growth of Internet-based applications and the proliferation of networking technologies have been transforming traditional commercial application areas as well as computer and computational sciences and engineering. This growth stimulates the exploration of Peer to Peer (P2P) software technologies that can open new research and application opportunities not only for the commercial world, but also for the scientific and high-performance computing applications community. The general goal of this project is to achieve better understanding of the transition to Internet-based high-performance computing and to develop solutions for some of the technical challenges of this transition. In particular, we are interested in creating long-term motivation for end users to provide their idle processor time to support computationally intensive tasks. We believe that a practical P2P architecture should provide useful service to both clients with high-performance computing needs and contributors of lower-end computing resources. To achieve this, we are designing dual -service architecture for P2P high-performance divide-and conquer computing; we are also experimenting with a prototype implementation. Our proposed architecture incorporates a master server, utilizes dual satellite servers, and operates on the Internet in a dynamically changing large configuration of lower-end nodes provided by volunteer contributors. A dual satellite server comprises a high-performance computing engine and a lower-end contributor service engine. The computing engine provides generic support for divide and conquer computations. The service engine is intended to provide free useful HTTP-based services to contributors of lower-end computing resources. Our proposed architecture is complementary to and accessible from computational grids, such as Globus, Legion, and Condor. Grids provide remote access to existing higher-end computing resources; in contrast, our goal is to utilize idle processor time of lower-end Internet nodes. Our project is focused on a generic divide and conquer paradigm and on mobile applications of this paradigm that can operate on a loose and ever changing pool of lower-end Internet nodes.

  3. Data Compression for Maskless Lithography Systems: Architecture, Algorithms and Implementation

    DTIC Science & Technology

    2008-05-19

    Data Compression for Maskless Lithography Systems: Architecture, Algorithms and Implementation Vito Dai Electrical Engineering and Computer Sciences...servers or to redistribute to lists, requires prior specific permission. Data Compression for Maskless Lithography Systems: Architecture, Algorithms and...for Maskless Lithography Systems: Architecture, Algorithms and Implementation Copyright 2008 by Vito Dai 1 Abstract Data Compression for Maskless

  4. Intelligent Robotic Systems Study (IRSS), phase 4

    NASA Technical Reports Server (NTRS)

    1991-01-01

    Under the Intelligent Robotics Systems Study (IRSS), a generalized robotic control architecture was developed for use with the ProtoFlight Manipulator Arm (PFMA). Based upon the NASREM system design concept, the controller built for the PFMA provides localized position based force control, teleoperation, and advanced path recording and playback capabilities. The PFMA has six computer controllable degrees of freedom (DOF) plus a 7th manually indexable DOF, making the manipulator a pseudo 7 DOF mechanism. Joints on the PFMA are driven via 7 pulse width modulated amplifiers. Digital control of the PFMA is implemented using a variety of single board computers. There were two major activities under the IRSS phase 4 study: (1) enhancement of the PFMA control system software functionality; and (2) evaluation of operating modes via a teleoperation performance study. These activities are described and results are given.

  5. Autonomic Computing for Spacecraft Ground Systems

    NASA Technical Reports Server (NTRS)

    Li, Zhenping; Savkli, Cetin; Jones, Lori

    2007-01-01

    Autonomic computing for spacecraft ground systems increases the system reliability and reduces the cost of spacecraft operations and software maintenance. In this paper, we present an autonomic computing solution for spacecraft ground systems at NASA Goddard Space Flight Center (GSFC), which consists of an open standard for a message oriented architecture referred to as the GMSEC architecture (Goddard Mission Services Evolution Center), and an autonomic computing tool, the Criteria Action Table (CAT). This solution has been used in many upgraded ground systems for NASA 's missions, and provides a framework for developing solutions with higher autonomic maturity.

  6. Architecture of Allosteric Materials and Edge Modes

    NASA Astrophysics Data System (ADS)

    Yan, Le; Ravasio, Riccardo; Brito, Carolina; Wyart, Matthieu

    Allostery, a long-range elasticity-mediated interaction, remains the biggest mystery decades after its discovery in proteins. We introduce a numerical scheme to evolve functional materials that can accomplish a specified mechanical task. In this scheme, the number of solutions, their spatial architectures and the correlations among them can be computed. As an example, we consider an ``allosteric'' task, which requires the material to respond specifically to a stimulus at a distant active site. We find that functioning materials evolve a less-constrained trumpet-shaped region connecting the stimulus and active sites and that the amplitude of the elastic response varies non-monotonically along the trumpet. As previously shown for some proteins, we find that correlations appearing during evolution alone are sufficient to identify key aspects of this design. Finally, we show that the success of this architecture stems from the emergence of soft edge modes recently found to appear near the surface of marginally connected materials. Overall, our in silico evolution experiment offers a new window to study the relationship between structure, function, and correlations emerging during evolution. L.Y. was supported in part by the National Science Foundation under Grant No. NSF PHY11-25915. M.W. thanks the Swiss National Science Foundation for support under Grant No. 200021-165509 and the Simons Foundation Grant (#454953 Matthieu Wyart).

  7. Deep learning based classification of breast tumors with shear-wave elastography.

    PubMed

    Zhang, Qi; Xiao, Yang; Dai, Wei; Suo, Jingfeng; Wang, Congzhi; Shi, Jun; Zheng, Hairong

    2016-12-01

    This study aims to build a deep learning (DL) architecture for automated extraction of learned-from-data image features from the shear-wave elastography (SWE), and to evaluate the DL architecture in differentiation between benign and malignant breast tumors. We construct a two-layer DL architecture for SWE feature extraction, comprised of the point-wise gated Boltzmann machine (PGBM) and the restricted Boltzmann machine (RBM). The PGBM contains task-relevant and task-irrelevant hidden units, and the task-relevant units are connected to the RBM. Experimental evaluation was performed with five-fold cross validation on a set of 227 SWE images, 135 of benign tumors and 92 of malignant tumors, from 121 patients. The features learned with our DL architecture were compared with the statistical features quantifying image intensity and texture. Results showed that the DL features achieved better classification performance with an accuracy of 93.4%, a sensitivity of 88.6%, a specificity of 97.1%, and an area under the receiver operating characteristic curve of 0.947. The DL-based method integrates feature learning with feature selection on SWE. It may be potentially used in clinical computer-aided diagnosis of breast cancer. Copyright © 2016 Elsevier B.V. All rights reserved.

  8. The why, what, where, when and how of goal-directed choice: neuronal and computational principles

    PubMed Central

    Verschure, Paul F. M. J.; Pennartz, Cyriel M. A.; Pezzulo, Giovanni

    2014-01-01

    The central problems that goal-directed animals must solve are: ‘What do I need and Why, Where and When can this be obtained, and How do I get it?' or the H4W problem. Here, we elucidate the principles underlying the neuronal solutions to H4W using a combination of neurobiological and neurorobotic approaches. First, we analyse H4W from a system-level perspective by mapping its objectives onto the Distributed Adaptive Control embodied cognitive architecture which sees the generation of adaptive action in the real world as the primary task of the brain rather than optimally solving abstract problems. We next map this functional decomposition to the architecture of the rodent brain to test its consistency. Following this approach, we propose that the mammalian brain solves the H4W problem on the basis of multiple kinds of outcome predictions, integrating central representations of needs and drives (e.g. hypothalamus), valence (e.g. amygdala), world, self and task state spaces (e.g. neocortex, hippocampus and prefrontal cortex, respectively) combined with multi-modal selection (e.g. basal ganglia). In our analysis, goal-directed behaviour results from a well-structured architecture in which goals are bootstrapped on the basis of predefined needs, valence and multiple learning, memory and planning mechanisms rather than being generated by a singular computation. PMID:25267825

  9. A task-based parallelism and vectorized approach to 3D Method of Characteristics (MOC) reactor simulation for high performance computing architectures

    NASA Astrophysics Data System (ADS)

    Tramm, John R.; Gunow, Geoffrey; He, Tim; Smith, Kord S.; Forget, Benoit; Siegel, Andrew R.

    2016-05-01

    In this study we present and analyze a formulation of the 3D Method of Characteristics (MOC) technique applied to the simulation of full core nuclear reactors. Key features of the algorithm include a task-based parallelism model that allows independent MOC tracks to be assigned to threads dynamically, ensuring load balancing, and a wide vectorizable inner loop that takes advantage of modern SIMD computer architectures. The algorithm is implemented in a set of highly optimized proxy applications in order to investigate its performance characteristics on CPU, GPU, and Intel Xeon Phi architectures. Speed, power, and hardware cost efficiencies are compared. Additionally, performance bottlenecks are identified for each architecture in order to determine the prospects for continued scalability of the algorithm on next generation HPC architectures.

  10. Root water uptake and lateral interactions among root systems in a temperate forest

    NASA Astrophysics Data System (ADS)

    Agee, E.; He, L.; Bisht, G.; Gough, C. M.; Couvreur, V.; Matheny, A. M.; Bohrer, G.; Ivanov, V. Y.

    2016-12-01

    A growing body of research has highlighted the importance of root architecture and hydraulic properties to the maintenance of the transpiration stream under water limitation and drought. Detailed studies of single plant systems have shown the ability of root systems to adjust zones of uptake due to the redistribution of local water potential gradients, thereby delaying the onset of stress under drying conditions. An open question is how lateral interactions and competition among neighboring plants impact individual and community resilience to water stress. While computational complexity has previously hindered the implementation of microscopic root system structure and function in larger scale hydrological models, newer hybrid approaches allow for the resolution of these properties at the plot scale. Using a modified version of the PFLOTRAN model, which represents the 3-D physics of variably saturated soil, we model root water uptake in a one-hectare temperate forest plot under natural and synthetic forcings. Two characteristic hydraulic architectures, tap roots and laterally sprawling roots, are implemented in an ensemble of simulations. Variations of root architecture, their hydraulic properties, and degree of system interactions produce variable local response to water limitation and provide insights on individual and community response to changing meteorological conditions. Results demonstrate the ability of interacting systems to shift areas of active uptake based on local gradients, allowing individuals to meet water demands despite competition from their peers. These results further illustrate how inter- and intra-species variations in root properties may influence not only individual response to water stress, but also help quantify the margins of resilience for forest ecosystems under changing climate.

  11. Dynamic array processing for computationally intensive expert systems in CLIPS

    NASA Technical Reports Server (NTRS)

    Athavale, N. N.; Ragade, R. K.; Fenske, T. E.; Cassaro, M. A.

    1990-01-01

    This paper puts forth an architecture for implementing a loop for advanced data structure of arrays in CLIPS. An attempt is made to use multi-field variables in such an architecture to process a set of data during the decision making cycle. Also, current limitations on the expert system shells are discussed in brief in this paper. The resulting architecture is designed to circumvent the current limitations set by the expert system shell and also by the operating environment. Such advanced data structures are needed for tightly coupling symbolic and numeric computation modules.

  12. A model for architectural comparison

    NASA Astrophysics Data System (ADS)

    Ho, Sam; Snyder, Larry

    1988-04-01

    Recently, architectures for sequential computers became a topic of much discussion and controversy. At the center of this storm is the Reduced Instruction Set Computer, or RISC, first described at Berkeley in 1980. While the merits of the RISC architecture cannot be ignored, its opponents have tried to do just that, while its proponents have expanded and frequently exaggerated them. This state of affairs has persisted to this day. No attempt is made to settle this controversy, since indeed there is likely no one answer. A qualitative framework is provided for a rational discussion of the issues.

  13. Dual-scale topology optoelectronic processor.

    PubMed

    Marsden, G C; Krishnamoorthy, A V; Esener, S C; Lee, S H

    1991-12-15

    The dual-scale topology optoelectronic processor (D-STOP) is a parallel optoelectronic architecture for matrix algebraic processing. The architecture can be used for matrix-vector multiplication and two types of vector outer product. The computations are performed electronically, which allows multiplication and summation concepts in linear algebra to be generalized to various nonlinear or symbolic operations. This generalization permits the application of D-STOP to many computational problems. The architecture uses a minimum number of optical transmitters, which thereby reduces fabrication requirements while maintaining area-efficient electronics. The necessary optical interconnections are space invariant, minimizing space-bandwidth requirements.

  14. Lightness computation by the human visual system

    NASA Astrophysics Data System (ADS)

    Rudd, Michael E.

    2017-05-01

    A model of achromatic color computation by the human visual system is presented, which is shown to account in an exact quantitative way for a large body of appearance matching data collected with simple visual displays. The model equations are closely related to those of the original Retinex model of Land and McCann. However, the present model differs in important ways from Land and McCann's theory in that it invokes additional biological and perceptual mechanisms, including contrast gain control, different inherent neural gains for incremental, and decremental luminance steps, and two types of top-down influence on the perceptual weights applied to local luminance steps in the display: edge classification and spatial integration attentional windowing. Arguments are presented to support the claim that these various visual processes must be instantiated by a particular underlying neural architecture. By pointing to correspondences between the architecture of the model and findings from visual neurophysiology, this paper suggests that edge classification involves a top-down gating of neural edge responses in early visual cortex (cortical areas V1 and/or V2) while spatial integration windowing occurs in cortical area V4 or beyond.

  15. Organically Grown Architectures: Creating Decentralized, Autonomous Systems by Embryomorphic Engineering

    NASA Astrophysics Data System (ADS)

    Doursat, René

    Exploding growth growth in computational systems forces us to gradually replace rigid design and control with decentralization and autonomy. Information technologies will progress, instead, by"meta-designing" mechanisms of system self-assembly, self-regulation and evolution. Nature offers a great variety of efficient complex systems, in which numerous small elements form large-scale, adaptive patterns. The new engineering challenge is to recreate this self-organization and let it freely generate innovative designs under guidance. This article presents an original model of artificial system growth inspired by embryogenesis. A virtual organism is a lattice of cells that proliferate, migrate and self-pattern into differentiated domains. Each cell's fate is controlled by an internal gene regulatory network network. Embryomorphic engineering emphasizes hyperdistributed architectures, and their development as a prerequisite of evolutionary design.

  16. Characterizing Attention with Predictive Network Models.

    PubMed

    Rosenberg, M D; Finn, E S; Scheinost, D; Constable, R T; Chun, M M

    2017-04-01

    Recent work shows that models based on functional connectivity in large-scale brain networks can predict individuals' attentional abilities. While being some of the first generalizable neuromarkers of cognitive function, these models also inform our basic understanding of attention, providing empirical evidence that: (i) attention is a network property of brain computation; (ii) the functional architecture that underlies attention can be measured while people are not engaged in any explicit task; and (iii) this architecture supports a general attentional ability that is common to several laboratory-based tasks and is impaired in attention deficit hyperactivity disorder (ADHD). Looking ahead, connectivity-based predictive models of attention and other cognitive abilities and behaviors may potentially improve the assessment, diagnosis, and treatment of clinical dysfunction. Copyright © 2017 Elsevier Ltd. All rights reserved.

  17. A novel anti-windup framework for cascade control systems: an application to underactuated mechanical systems.

    PubMed

    Mehdi, Niaz; Rehan, Muhammad; Malik, Fahad Mumtaz; Bhatti, Aamer Iqbal; Tufail, Muhammad

    2014-05-01

    This paper describes the anti-windup compensator (AWC) design methodologies for stable and unstable cascade plants with cascade controllers facing actuator saturation. Two novel full-order decoupling AWC architectures, based on equivalence of the overall closed-loop system, are developed to deal with windup effects. The decoupled architectures have been developed, to formulate the AWC synthesis problem, by assuring equivalence of the coupled and the decoupled architectures, instead of using an analogy, for cascade control systems. A comparison of both AWC architectures from application point of view is provided to consolidate their utilities. Mainly, one of the architecture is better in terms of computational complexity for implementation, while the other is suitable for unstable cascade systems. On the basis of the architectures for cascade systems facing stability and performance degradation problems in the event of actuator saturation, the global AWC design methodologies utilizing linear matrix inequalities (LMIs) are developed. These LMIs are synthesized by application of the Lyapunov theory, the global sector condition and the ℒ2 gain reduction of the uncertain decoupled nonlinear component of the decoupled architecture. Further, an LMI-based local AWC design methodology is derived by utilizing a local sector condition by means of a quadratic Lyapunov function to resolve the windup problem for unstable cascade plants under saturation. To demonstrate effectiveness of the proposed AWC schemes, an underactuated mechanical system, the ball-and-beam system, is considered, and details of the simulation and practical implementation results are described. Copyright © 2014 ISA. Published by Elsevier Ltd. All rights reserved.

  18. Combining Self-Explaining with Computer Architecture Diagrams to Enhance the Learning of Assembly Language Programming

    ERIC Educational Resources Information Center

    Hung, Y.-C.

    2012-01-01

    This paper investigates the impact of combining self explaining (SE) with computer architecture diagrams to help novice students learn assembly language programming. Pre- and post-test scores for the experimental and control groups were compared and subjected to covariance (ANCOVA) statistical analysis. Results indicate that the SE-plus-diagram…

  19. MIT CSAIL and Lincoln Laboratory Task Force Report

    DTIC Science & Technology

    2016-08-01

    projects have been very diverse, spanning several areas of CSAIL concentration, including robotics, big data analytics , wireless communications...spanning several areas of CSAIL concentration, including robotics, big data analytics , wireless communications, computing architectures and...to machine learning systems and algorithms, such as recommender systems, and “Big Data ” analytics . Advanced computing architectures broadly refer to

  20. A Model for Minimizing Numeric Function Generator Complexity and Delay

    DTIC Science & Technology

    2007-12-01

    allow computation of difficult mathematical functions in less time and with less hardware than commonly employed methods. They compute piecewise...Programmable Gate Arrays (FPGAs). The algorithms and estimation techniques apply to various NFG architectures and mathematical functions. This...thesis compares hardware utilization and propagation delay for various NFG architectures, mathematical functions, word widths, and segmentation methods

  1. Usage of Thin-Client/Server Architecture in Computer Aided Education

    ERIC Educational Resources Information Center

    Cimen, Caghan; Kavurucu, Yusuf; Aydin, Halit

    2014-01-01

    With the advances of technology, thin-client/server architecture has become popular in multi-user/single network environments. Thin-client is a user terminal in which the user can login to a domain and run programs by connecting to a remote server. Recent developments in network and hardware technologies (cloud computing, virtualization, etc.)…

  2. Optical Computing Based on Neuronal Models

    DTIC Science & Technology

    1988-05-01

    walking, and cognition are far too complex for existing sequential digital computers. Therefore new architectures, hardware, and algorithms modeled...collective behavior, and iterative processing into optical processing and artificial neurodynamical systems. Another intriguing promise of neural nets is...with architectures, implementations, and programming; and material research s -7- called for. Our future research in neurodynamics will continue to

  3. Using SPEEDES to simulate the blue gene interconnect network

    NASA Technical Reports Server (NTRS)

    Springer, P.; Upchurch, E.

    2003-01-01

    JPL and the Center for Advanced Computer Architecture (CACR) is conducting application and simulation analyses of BG/L in order to establish a range of effectiveness for the Blue Gene/L MPP architecture in performing important classes of computations and to determine the design sensitivity of the global interconnect network in support of real world ASCI application execution.

  4. The Use of Metaphors as a Parametric Design Teaching Model: A Case Study

    ERIC Educational Resources Information Center

    Agirbas, Asli

    2018-01-01

    Teaching methodologies for parametric design are being researched all over the world, since there is a growing demand for computer programming logic and its fabrication process in architectural education. The computer programming courses in architectural education are usually done in a very short period of time, and so students have no chance to…

  5. Implementation of a parallel unstructured Euler solver on shared and distributed memory architectures

    NASA Technical Reports Server (NTRS)

    Mavriplis, D. J.; Das, Raja; Saltz, Joel; Vermeland, R. E.

    1992-01-01

    An efficient three dimensional unstructured Euler solver is parallelized on a Cray Y-MP C90 shared memory computer and on an Intel Touchstone Delta distributed memory computer. This paper relates the experiences gained and describes the software tools and hardware used in this study. Performance comparisons between two differing architectures are made.

  6. Flight Model of the `Flying Laptop' OBC and Reconfiguration Unit

    NASA Astrophysics Data System (ADS)

    Eickhoff, Jens; Stratton, Sam; Butz, Pius; Cook, Barry; Walker, Paul; Uryu, Alexander; Lengowski, Michael; Roser, Hans-Peter

    2012-08-01

    As already published in papers at the DASIA conferences 2010 in Budapest [1] and 2011 in Malta [2], the University of Stuttgart, Germany, is developing an advanced 3-axis stabilized small satellite applying industry standards for command/control techniques, onboard software design and onboard computer components. The satellite has a launch mass of approx. 120kg. One of the main challenges was the development of an ultra compact and performing onboard computer (OBC), which was intended to support an RTEMS operating system, a PUS standard based onboard software (OBSW) and CCSDS standard based ground/space communication. The developed architecture is based on 4 main elements (see [1, 2] and Figure 3) which are developed in cooperation with industrial partners:• the OBC core board based on the LEON3 FT architecture,• an I/O Board for all OBC digital interfaces to S/C equipment,• a CCSDS TC/TM decoder/encoder board,• reconfiguration unit being embedded in the satellite power control and distribution unit PCDU.In the meantime the EM / Breadboard units of the computer have been tested intensively including first HW/SW integration tests in a Satellite Testbench (see Figure 2). The FM HW elements from the co-authoring suppliers are under assembly in Stuttgart.

  7. An Open Architecture to Support Social and Health Services in a Smart TV Environment.

    PubMed

    Costa, Carlos Rivas; Anido-Rifon, Luis E; Fernandez-Iglesias, Manuel J

    2017-03-01

    To design, implement, and test a solution to provide social and health services for the elderly at home based on smart TV technologies and access to all services. The architecture proposed is based on an open software platform and standard personal computing hardware. This provides great flexibility to develop new applications over the underlying infrastructure or to integrate new devices, for instance to monitor a broad range of vital signs in those cases where home monitoring is required. An actual system as a proof-of-concept was designed, implemented, and deployed. Applications range from social network clients to vital signs monitoring; from interactive TV contests to conventional online care applications such as medication reminders or telemedicine. In both cases, the results have been very positive, confirming the initial perception of the TV as a convenient, easy-to-use technology to provide social and health care. The TV set is a much more familiar computing interface for most senior users, and as a consequence, smart TVs become a most convenient solution for the design and implementation of applications and services targeted to this user group. This proposal has been tested in real setting with 62 senior people at their homes. Users included both individuals with experience using computers and others reluctant to them.

  8. Service-Oriented Architecture for NVO and TeraGrid Computing

    NASA Technical Reports Server (NTRS)

    Jacob, Joseph; Miller, Craig; Williams, Roy; Steenberg, Conrad; Graham, Matthew

    2008-01-01

    The National Virtual Observatory (NVO) Extensible Secure Scalable Service Infrastructure (NESSSI) is a Web service architecture and software framework that enables Web-based astronomical data publishing and processing on grid computers such as the National Science Foundation's TeraGrid. Characteristics of this architecture include the following: (1) Services are created, managed, and upgraded by their developers, who are trusted users of computing platforms on which the services are deployed. (2) Service jobs can be initiated by means of Java or Python client programs run on a command line or with Web portals. (3) Access is granted within a graduated security scheme in which the size of a job that can be initiated depends on the level of authentication of the user.

  9. A multitasking finite state architecture for computer control of an electric powertrain

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Burba, J.C.

    1984-01-01

    Finite state techniques provide a common design language between the control engineer and the computer engineer for event driven computer control systems. They simplify communication and provide a highly maintainable control system understandable by both. This paper describes the development of a control system for an electric vehicle powertrain utilizing finite state concepts. The basics of finite state automata are provided as a framework to discuss a unique multitasking software architecture developed for this application. The architecture employs conventional time-sliced techniques with task scheduling controlled by a finite state machine representation of the control strategy of the powertrain. The complexitiesmore » of excitation variable sampling in this environment are also considered.« less

  10. Architecture and data processing alternatives for Tse computer. Volume 1: Tse logic design concepts and the development of image processing machine architectures

    NASA Technical Reports Server (NTRS)

    Rickard, D. A.; Bodenheimer, R. E.

    1976-01-01

    Digital computer components which perform two dimensional array logic operations (Tse logic) on binary data arrays are described. The properties of Golay transforms which make them useful in image processing are reviewed, and several architectures for Golay transform processors are presented with emphasis on the skeletonizing algorithm. Conventional logic control units developed for the Golay transform processors are described. One is a unique microprogrammable control unit that uses a microprocessor to control the Tse computer. The remaining control units are based on programmable logic arrays. Performance criteria are established and utilized to compare the various Golay transform machines developed. A critique of Tse logic is presented, and recommendations for additional research are included.

  11. FPGA-based real-time phase measuring profilometry algorithm design and implementation

    NASA Astrophysics Data System (ADS)

    Zhan, Guomin; Tang, Hongwei; Zhong, Kai; Li, Zhongwei; Shi, Yusheng

    2016-11-01

    Phase measuring profilometry (PMP) has been widely used in many fields, like Computer Aided Verification (CAV), Flexible Manufacturing System (FMS) et al. High frame-rate (HFR) real-time vision-based feedback control will be a common demands in near future. However, the instruction time delay in the computer caused by numerous repetitive operations greatly limit the efficiency of data processing. FPGA has the advantages of pipeline architecture and parallel execution, and it fit for handling PMP algorithm. In this paper, we design a fully pipelined hardware architecture for PMP. The functions of hardware architecture includes rectification, phase calculation, phase shifting, and stereo matching. The experiment verified the performance of this method, and the factors that may influence the computation accuracy was analyzed.

  12. The Jupyter/IPython architecture: a unified view of computational research, from interactive exploration to communication and publication.

    NASA Astrophysics Data System (ADS)

    Ragan-Kelley, M.; Perez, F.; Granger, B.; Kluyver, T.; Ivanov, P.; Frederic, J.; Bussonnier, M.

    2014-12-01

    IPython has provided terminal-based tools for interactive computing in Python since 2001. The notebook document format and multi-process architecture introduced in 2011 have expanded the applicable scope of IPython into teaching, presenting, and sharing computational work, in addition to interactive exploration. The new architecture also allows users to work in any language, with implementations in Python, R, Julia, Haskell, and several other languages. The language agnostic parts of IPython have been renamed to Jupyter, to better capture the notion that a cross-language design can encapsulate commonalities present in computational research regardless of the programming language being used. This architecture offers components like the web-based Notebook interface, that supports rich documents that combine code and computational results with text narratives, mathematics, images, video and any media that a modern browser can display. This interface can be used not only in research, but also for publication and education, as notebooks can be converted to a variety of output formats, including HTML and PDF. Recent developments in the Jupyter project include a multi-user environment for hosting notebooks for a class or research group, a live collaboration notebook via Google Docs, and better support for languages other than Python.

  13. OS friendly microprocessor architecture: Hardware level computer security

    NASA Astrophysics Data System (ADS)

    Jungwirth, Patrick; La Fratta, Patrick

    2016-05-01

    We present an introduction to the patented OS Friendly Microprocessor Architecture (OSFA) and hardware level computer security. Conventional microprocessors have not tried to balance hardware performance and OS performance at the same time. Conventional microprocessors have depended on the Operating System for computer security and information assurance. The goal of the OS Friendly Architecture is to provide a high performance and secure microprocessor and OS system. We are interested in cyber security, information technology (IT), and SCADA control professionals reviewing the hardware level security features. The OS Friendly Architecture is a switched set of cache memory banks in a pipeline configuration. For light-weight threads, the memory pipeline configuration provides near instantaneous context switching times. The pipelining and parallelism provided by the cache memory pipeline provides for background cache read and write operations while the microprocessor's execution pipeline is running instructions. The cache bank selection controllers provide arbitration to prevent the memory pipeline and microprocessor's execution pipeline from accessing the same cache bank at the same time. This separation allows the cache memory pages to transfer to and from level 1 (L1) caching while the microprocessor pipeline is executing instructions. Computer security operations are implemented in hardware. By extending Unix file permissions bits to each cache memory bank and memory address, the OSFA provides hardware level computer security.

  14. Internet Architecture: Lessons Learned and Looking Forward

    DTIC Science & Technology

    2006-12-01

    Internet Architecture: Lessons Learned and Looking Forward Geoffrey G. Xie Department of Computer Science Naval Postgraduate School April 2006... Internet architecture. Report Documentation Page Form ApprovedOMB No. 0704-0188 Public reporting burden for the collection of information is...readers are referred there for more information about a specific protocol or concept. 2. Origin of Internet Architecture The Internet is easily

  15. Developing a New Framework for Integration and Teaching of Computer Aided Architectural Design (CAAD) in Nigerian Schools of Architecture

    ERIC Educational Resources Information Center

    Uwakonye, Obioha; Alagbe, Oluwole; Oluwatayo, Adedapo; Alagbe, Taiye; Alalade, Gbenga

    2015-01-01

    As a result of globalization of digital technology, intellectual discourse on what constitutes the basic body of architectural knowledge to be imparted to future professionals has been on the increase. This digital revolution has brought to the fore the need to review the already overloaded architectural education curriculum of Nigerian schools of…

  16. Understanding Portability of a High-Level Programming Model on Contemporary Heterogeneous Architectures

    DOE PAGES

    Sabne, Amit J.; Sakdhnagool, Putt; Lee, Seyong; ...

    2015-07-13

    Accelerator-based heterogeneous computing is gaining momentum in the high-performance computing arena. However, the increased complexity of heterogeneous architectures demands more generic, high-level programming models. OpenACC is one such attempt to tackle this problem. Although the abstraction provided by OpenACC offers productivity, it raises questions concerning both functional and performance portability. In this article, the authors propose HeteroIR, a high-level, architecture-independent intermediate representation, to map high-level programming models, such as OpenACC, to heterogeneous architectures. They present a compiler approach that translates OpenACC programs into HeteroIR and accelerator kernels to obtain OpenACC functional portability. They then evaluate the performance portability obtained bymore » OpenACC with their approach on 12 OpenACC programs on Nvidia CUDA, AMD GCN, and Intel Xeon Phi architectures. They study the effects of various compiler optimizations and OpenACC program settings on these architectures to provide insights into the achieved performance portability.« less

  17. The future of computing--new architectures and new technologies.

    PubMed

    Warren, P

    2004-02-01

    All modern computers are designed using the 'von Neumann' architecture and built using silicon transistor technology. Both architecture and technology have been remarkably successful. Yet there are a range of problems for which this conventional architecture is not particularly well adapted, and new architectures are being proposed to solve these problems, in particular based on insight from nature. Transistor technology has enjoyed 50 years of continuing progress. However, the laws of physics dictate that within a relatively short time period this progress will come to an end. New technologies, based on molecular and biological sciences as well as quantum physics, are vying to replace silicon, or at least coexist with it and extend its capability. The paper describes these novel architectures and technologies, places them in the context of the kinds of problems they might help to solve, and predicts their possible manner and time of adoption. Finally it describes some key questions and research problems associated with their use.

  18. Scalable service architecture for providing strong service guarantees

    NASA Astrophysics Data System (ADS)

    Christin, Nicolas; Liebeherr, Joerg

    2002-07-01

    For the past decade, a lot of Internet research has been devoted to providing different levels of service to applications. Initial proposals for service differentiation provided strong service guarantees, with strict bounds on delays, loss rates, and throughput, but required high overhead in terms of computational complexity and memory, both of which raise scalability concerns. Recently, the interest has shifted to service architectures with low overhead. However, these newer service architectures only provide weak service guarantees, which do not always address the needs of applications. In this paper, we describe a service architecture that supports strong service guarantees, can be implemented with low computational complexity, and only requires to maintain little state information. A key mechanism of the proposed service architecture is that it addresses scheduling and buffer management in a single algorithm. The presented architecture offers no solution for controlling the amount of traffic that enters the network. Instead, we plan on exploiting feedback mechanisms of TCP congestion control algorithms for the purpose of regulating the traffic entering the network.

  19. GPU-computing in econophysics and statistical physics

    NASA Astrophysics Data System (ADS)

    Preis, T.

    2011-03-01

    A recent trend in computer science and related fields is general purpose computing on graphics processing units (GPUs), which can yield impressive performance. With multiple cores connected by high memory bandwidth, today's GPUs offer resources for non-graphics parallel processing. This article provides a brief introduction into the field of GPU computing and includes examples. In particular computationally expensive analyses employed in financial market context are coded on a graphics card architecture which leads to a significant reduction of computing time. In order to demonstrate the wide range of possible applications, a standard model in statistical physics - the Ising model - is ported to a graphics card architecture as well, resulting in large speedup values.

  20. Accelerating Astronomy & Astrophysics in the New Era of Parallel Computing: GPUs, Phi and Cloud Computing

    NASA Astrophysics Data System (ADS)

    Ford, Eric B.; Dindar, Saleh; Peters, Jorg

    2015-08-01

    The realism of astrophysical simulations and statistical analyses of astronomical data are set by the available computational resources. Thus, astronomers and astrophysicists are constantly pushing the limits of computational capabilities. For decades, astronomers benefited from massive improvements in computational power that were driven primarily by increasing clock speeds and required relatively little attention to details of the computational hardware. For nearly a decade, increases in computational capabilities have come primarily from increasing the degree of parallelism, rather than increasing clock speeds. Further increases in computational capabilities will likely be led by many-core architectures such as Graphical Processing Units (GPUs) and Intel Xeon Phi. Successfully harnessing these new architectures, requires significantly more understanding of the hardware architecture, cache hierarchy, compiler capabilities and network network characteristics.I will provide an astronomer's overview of the opportunities and challenges provided by modern many-core architectures and elastic cloud computing. The primary goal is to help an astronomical audience understand what types of problems are likely to yield more than order of magnitude speed-ups and which problems are unlikely to parallelize sufficiently efficiently to be worth the development time and/or costs.I will draw on my experience leading a team in developing the Swarm-NG library for parallel integration of large ensembles of small n-body systems on GPUs, as well as several smaller software projects. I will share lessons learned from collaborating with computer scientists, including both technical and soft skills. Finally, I will discuss the challenges of training the next generation of astronomers to be proficient in this new era of high-performance computing, drawing on experience teaching a graduate class on High-Performance Scientific Computing for Astrophysics and organizing a 2014 advanced summer school on Bayesian Computing for Astronomical Data Analysis with support of the Penn State Center for Astrostatistics and Institute for CyberScience.

  1. Power Efficient Hardware Architecture of SHA-1 Algorithm for Trusted Mobile Computing

    NASA Astrophysics Data System (ADS)

    Kim, Mooseop; Ryou, Jaecheol

    The Trusted Mobile Platform (TMP) is developed and promoted by the Trusted Computing Group (TCG), which is an industry standard body to enhance the security of the mobile computing environment. The built-in SHA-1 engine in TMP is one of the most important circuit blocks and contributes the performance of the whole platform because it is used as key primitives supporting platform integrity and command authentication. Mobile platforms have very stringent limitations with respect to available power, physical circuit area, and cost. Therefore special architecture and design methods for low power SHA-1 circuit are required. In this paper, we present a novel and efficient hardware architecture of low power SHA-1 design for TMP. Our low power SHA-1 hardware can compute 512-bit data block using less than 7,000 gates and has a power consumption about 1.1 mA on a 0.25μm CMOS process.

  2. An Evaluation of Architectural Platforms for Parallel Navier-Stokes Computations

    NASA Technical Reports Server (NTRS)

    Jayasimha, D. N.; Hayder, M. E.; Pillay, S. K.

    1996-01-01

    We study the computational, communication, and scalability characteristics of a computational fluid dynamics application, which solves the time accurate flow field of a jet using the compressible Navier-Stokes equations, on a variety of parallel architecture platforms. The platforms chosen for this study are a cluster of workstations (the LACE experimental testbed at NASA Lewis), a shared memory multiprocessor (the Cray YMP), and distributed memory multiprocessors with different topologies - the IBM SP and the Cray T3D. We investigate the impact of various networks connecting the cluster of workstations on the performance of the application and the overheads induced by popular message passing libraries used for parallelization. The work also highlights the importance of matching the memory bandwidth to the processor speed for good single processor performance. By studying the performance of an application on a variety of architectures, we are able to point out the strengths and weaknesses of each of the example computing platforms.

  3. Parallelizing Navier-Stokes Computations on a Variety of Architectural Platforms

    NASA Technical Reports Server (NTRS)

    Jayasimha, D. N.; Hayder, M. E.; Pillay, S. K.

    1997-01-01

    We study the computational, communication, and scalability characteristics of a Computational Fluid Dynamics application, which solves the time accurate flow field of a jet using the compressible Navier-Stokes equations, on a variety of parallel architectural platforms. The platforms chosen for this study are a cluster of workstations (the LACE experimental testbed at NASA Lewis), a shared memory multiprocessor (the Cray YMP), distributed memory multiprocessors with different topologies-the IBM SP and the Cray T3D. We investigate the impact of various networks, connecting the cluster of workstations, on the performance of the application and the overheads induced by popular message passing libraries used for parallelization. The work also highlights the importance of matching the memory bandwidth to the processor speed for good single processor performance. By studying the performance of an application on a variety of architectures, we are able to point out the strengths and weaknesses of each of the example computing platforms.

  4. Subtlenoise: sonification of distributed computing operations

    NASA Astrophysics Data System (ADS)

    Love, P. A.

    2015-12-01

    The operation of distributed computing systems requires comprehensive monitoring to ensure reliability and robustness. There are two components found in most monitoring systems: one being visually rich time-series graphs and another being notification systems for alerting operators under certain pre-defined conditions. In this paper the sonification of monitoring messages is explored using an architecture that fits easily within existing infrastructures based on mature opensource technologies such as ZeroMQ, Logstash, and Supercollider (a synth engine). Message attributes are mapped onto audio attributes based on broad classification of the message (continuous or discrete metrics) but keeping the audio stream subtle in nature. The benefits of audio rendering are described in the context of distributed computing operations and may provide a less intrusive way to understand the operational health of these systems.

  5. The visual system’s internal model of the world

    PubMed Central

    Lee, Tai Sing

    2015-01-01

    The Bayesian paradigm has provided a useful conceptual theory for understanding perceptual computation in the brain. While the detailed neural mechanisms of Bayesian inference are not fully understood, recent computational and neurophysiological works have illuminated the underlying computational principles and representational architecture. The fundamental insights are that the visual system is organized as a modular hierarchy to encode an internal model of the world, and that perception is realized by statistical inference based on such internal model. In this paper, I will discuss and analyze the varieties of representational schemes of these internal models and how they might be used to perform learning and inference. I will argue for a unified theoretical framework for relating the internal models to the observed neural phenomena and mechanisms in the visual cortex. PMID:26566294

  6. Basic research planning in mathematical pattern recognition and image analysis

    NASA Technical Reports Server (NTRS)

    Bryant, J.; Guseman, L. F., Jr.

    1981-01-01

    Fundamental problems encountered while attempting to develop automated techniques for applications of remote sensing are discussed under the following categories: (1) geometric and radiometric preprocessing; (2) spatial, spectral, temporal, syntactic, and ancillary digital image representation; (3) image partitioning, proportion estimation, and error models in object scene interference; (4) parallel processing and image data structures; and (5) continuing studies in polarization; computer architectures and parallel processing; and the applicability of "expert systems" to interactive analysis.

  7. Goal Reconstruction: How Teton Blends Situated Action and Planned Action

    DTIC Science & Technology

    1989-11-03

    DTIC FILE COPy 00 P 1n GOAL RECONSTRUCTION: HOW TETON BLENDS N SITUATED ACTION AND PLANNED ACTION Technical Report AIP 125 Kurt VanLehn William Ball...distribiikn unlimited. so 08 21 2 GOAL RECONSTRUCTION: HOW TETON BLENDS SITUATED ACTION AND PLANNED ACTION Technical Report AlP 125 Kurt VanLehn William Ball...Architectures for Intelligence. This is the final report on the research supported by the Computer Science Division, Office of Naval Research, under

  8. Rational adaptation under task and processing constraints: implications for testing theories of cognition and action.

    PubMed

    Howes, Andrew; Lewis, Richard L; Vera, Alonso

    2009-10-01

    The authors assume that individuals adapt rationally to a utility function given constraints imposed by their cognitive architecture and the local task environment. This assumption underlies a new approach to modeling and understanding cognition-cognitively bounded rational analysis-that sharpens the predictive acuity of general, integrated theories of cognition and action. Such theories provide the necessary computational means to explain the flexible nature of human behavior but in doing so introduce extreme degrees of freedom in accounting for data. The new approach narrows the space of predicted behaviors through analysis of the payoff achieved by alternative strategies, rather than through fitting strategies and theoretical parameters to data. It extends and complements established approaches, including computational cognitive architectures, rational analysis, optimal motor control, bounded rationality, and signal detection theory. The authors illustrate the approach with a reanalysis of an existing account of psychological refractory period (PRP) dual-task performance and the development and analysis of a new theory of ordered dual-task responses. These analyses yield several novel results, including a new understanding of the role of strategic variation in existing accounts of PRP and the first predictive, quantitative account showing how the details of ordered dual-task phenomena emerge from the rational control of a cognitive system subject to the combined constraints of internal variance, motor interference, and a response selection bottleneck.

  9. Performance Analysis of Distributed Object-Oriented Applications

    NASA Technical Reports Server (NTRS)

    Schoeffler, James D.

    1998-01-01

    The purpose of this research was to evaluate the efficiency of a distributed simulation architecture which creates individual modules which are made self-scheduling through the use of a message-based communication system used for requesting input data from another module which is the source of that data. To make the architecture as general as possible, the message-based communication architecture was implemented using standard remote object architectures (Common Object Request Broker Architecture (CORBA) and/or Distributed Component Object Model (DCOM)). A series of experiments were run in which different systems are distributed in a variety of ways across multiple computers and the performance evaluated. The experiments were duplicated in each case so that the overhead due to message communication and data transmission can be separated from the time required to actually perform the computational update of a module each iteration. The software used to distribute the modules across multiple computers was developed in the first year of the current grant and was modified considerably to add a message-based communication scheme supported by the DCOM distributed object architecture. The resulting performance was analyzed using a model created during the first year of this grant which predicts the overhead due to CORBA and DCOM remote procedure calls and includes the effects of data passed to and from the remote objects. A report covering the distributed simulation software and the results of the performance experiments has been submitted separately. The above report also discusses possible future work to apply the methodology to dynamically distribute the simulation modules so as to minimize overall computation time.

  10. Parallel algorithms for mapping pipelined and parallel computations

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1988-01-01

    Many computational problems in image processing, signal processing, and scientific computing are naturally structured for either pipelined or parallel computation. When mapping such problems onto a parallel architecture it is often necessary to aggregate an obvious problem decomposition. Even in this context the general mapping problem is known to be computationally intractable, but recent advances have been made in identifying classes of problems and architectures for which optimal solutions can be found in polynomial time. Among these, the mapping of pipelined or parallel computations onto linear array, shared memory, and host-satellite systems figures prominently. This paper extends that work first by showing how to improve existing serial mapping algorithms. These improvements have significantly lower time and space complexities: in one case a published O(nm sup 3) time algorithm for mapping m modules onto n processors is reduced to an O(nm log m) time complexity, and its space requirements reduced from O(nm sup 2) to O(m). Run time complexity is further reduced with parallel mapping algorithms based on these improvements, which run on the architecture for which they create the mappings.

  11. Evaluation of Cache-based Superscalar and Cacheless Vector Architectures for Scientific Computations

    NASA Technical Reports Server (NTRS)

    Oliker, Leonid; Carter, Jonathan; Shalf, John; Skinner, David; Ethier, Stephane; Biswas, Rupak; Djomehri, Jahed; VanderWijngaart, Rob

    2003-01-01

    The growing gap between sustained and peak performance for scientific applications has become a well-known problem in high performance computing. The recent development of parallel vector systems offers the potential to bridge this gap for a significant number of computational science codes and deliver a substantial increase in computing capabilities. This paper examines the intranode performance of the NEC SX6 vector processor and the cache-based IBM Power3/4 superscalar architectures across a number of key scientific computing areas. First, we present the performance of a microbenchmark suite that examines a full spectrum of low-level machine characteristics. Next, we study the behavior of the NAS Parallel Benchmarks using some simple optimizations. Finally, we evaluate the perfor- mance of several numerical codes from key scientific computing domains. Overall results demonstrate that the SX6 achieves high performance on a large fraction of our application suite and in many cases significantly outperforms the RISC-based architectures. However, certain classes of applications are not easily amenable to vectorization and would likely require extensive reengineering of both algorithm and implementation to utilize the SX6 effectively.

  12. Parallel compression/decompression-based datapath architecture for multibeam mask writers

    NASA Astrophysics Data System (ADS)

    Chaudhary, Narendra; Savari, Serap A.

    2017-06-01

    Multibeam electron beam systems will be used in the future for mask writing and for complimentary lithography. The major challenges of the multibeam systems are in meeting throughput requirements and in handling the large data volumes associated with writing grayscale data on the wafer. In terms of future communications and computational requirements Amdahl's Law suggests that a simple increase of computation power and parallelism may not be a sustainable solution. We propose a parallel data compression algorithm to exploit the sparsity of mask data and a grayscale video-like representation of data. To improve the communication and computational efficiency of these systems at the write time we propose an alternate datapath architecture partly motivated by multibeam direct write lithography and partly motivated by the circuit testing literature, where parallel decompression reduces clock cycles. We explain a deflection plate architecture inspired by NuFlare Technology's multibeam mask writing system and how our datapath architecture can be easily added to it to improve performance.

  13. Parallel compression/decompression-based datapath architecture for multibeam mask writers

    NASA Astrophysics Data System (ADS)

    Chaudhary, Narendra; Savari, Serap A.

    2017-10-01

    Multibeam electron beam systems will be used in the future for mask writing and for complementary lithography. The major challenges of the multibeam systems are in meeting throughput requirements and in handling the large data volumes associated with writing grayscale data on the wafer. In terms of future communications and computational requirements, Amdahl's law suggests that a simple increase of computation power and parallelism may not be a sustainable solution. We propose a parallel data compression algorithm to exploit the sparsity of mask data and a grayscale video-like representation of data. To improve the communication and computational efficiency of these systems at the write time, we propose an alternate datapath architecture partly motivated by multibeam direct-write lithography and partly motivated by the circuit testing literature, where parallel decompression reduces clock cycles. We explain a deflection plate architecture inspired by NuFlare Technology's multibeam mask writing system and how our datapath architecture can be easily added to it to improve performance.

  14. Real-time simulation of large-scale neural architectures for visual features computation based on GPU.

    PubMed

    Chessa, Manuela; Bianchi, Valentina; Zampetti, Massimo; Sabatini, Silvio P; Solari, Fabio

    2012-01-01

    The intrinsic parallelism of visual neural architectures based on distributed hierarchical layers is well suited to be implemented on the multi-core architectures of modern graphics cards. The design strategies that allow us to optimally take advantage of such parallelism, in order to efficiently map on GPU the hierarchy of layers and the canonical neural computations, are proposed. Specifically, the advantages of a cortical map-like representation of the data are exploited. Moreover, a GPU implementation of a novel neural architecture for the computation of binocular disparity from stereo image pairs, based on populations of binocular energy neurons, is presented. The implemented neural model achieves good performances in terms of reliability of the disparity estimates and a near real-time execution speed, thus demonstrating the effectiveness of the devised design strategies. The proposed approach is valid in general, since the neural building blocks we implemented are a common basis for the modeling of visual neural functionalities.

  15. Virtual Business Operating Environment in the Cloud: Conceptual Architecture and Challenges

    NASA Astrophysics Data System (ADS)

    Nezhad, Hamid R. Motahari; Stephenson, Bryan; Singhal, Sharad; Castellanos, Malu

    Advances in service oriented architecture (SOA) have brought us close to the once imaginary vision of establishing and running a virtual business, a business in which most or all of its business functions are outsourced to online services. Cloud computing offers a realization of SOA in which IT resources are offered as services that are more affordable, flexible and attractive to businesses. In this paper, we briefly study advances in cloud computing, and discuss the benefits of using cloud services for businesses and trade-offs that they have to consider. We then present 1) a layered architecture for the virtual business, and 2) a conceptual architecture for a virtual business operating environment. We discuss the opportunities and research challenges that are ahead of us in realizing the technical components of this conceptual architecture. We conclude by giving the outlook and impact of cloud services on both large and small businesses.

  16. Strategies for concurrent processing of complex algorithms in data driven architectures

    NASA Technical Reports Server (NTRS)

    Stoughton, John W.; Mielke, Roland R.

    1987-01-01

    The results of ongoing research directed at developing a graph theoretical model for describing data and control flow associated with the execution of large grained algorithms in a spatial distributed computer environment is presented. This model is identified by the acronym ATAMM (Algorithm/Architecture Mapping Model). The purpose of such a model is to provide a basis for establishing rules for relating an algorithm to its execution in a multiprocessor environment. Specifications derived from the model lead directly to the description of a data flow architecture which is a consequence of the inherent behavior of the data and control flow described by the model. The purpose of the ATAMM based architecture is to optimize computational concurrency in the multiprocessor environment and to provide an analytical basis for performance evaluation. The ATAMM model and architecture specifications are demonstrated on a prototype system for concept validation.

  17. A "Language Lab" for Architectural Design.

    ERIC Educational Resources Information Center

    Mackenzie, Arch; And Others

    This paper discusses a "language lab" strategy in which traditional studio learning may be supplemented by language lessons using computer graphics techniques to teach architectural grammar, a body of elements and principles that govern the design of buildings belonging to a particular architectural theory or style. Two methods of…

  18. Geocomputation over Hybrid Computer Architecture and Systems: Prior Works and On-going Initiatives at UARK

    NASA Astrophysics Data System (ADS)

    Shi, X.

    2015-12-01

    As NSF indicated - "Theory and experimentation have for centuries been regarded as two fundamental pillars of science. It is now widely recognized that computational and data-enabled science forms a critical third pillar." Geocomputation is the third pillar of GIScience and geosciences. With the exponential growth of geodata, the challenge of scalable and high performance computing for big data analytics become urgent because many research activities are constrained by the inability of software or tool that even could not complete the computation process. Heterogeneous geodata integration and analytics obviously magnify the complexity and operational time frame. Many large-scale geospatial problems may be not processable at all if the computer system does not have sufficient memory or computational power. Emerging computer architectures, such as Intel's Many Integrated Core (MIC) Architecture and Graphics Processing Unit (GPU), and advanced computing technologies provide promising solutions to employ massive parallelism and hardware resources to achieve scalability and high performance for data intensive computing over large spatiotemporal and social media data. Exploring novel algorithms and deploying the solutions in massively parallel computing environment to achieve the capability for scalable data processing and analytics over large-scale, complex, and heterogeneous geodata with consistent quality and high-performance has been the central theme of our research team in the Department of Geosciences at the University of Arkansas (UARK). New multi-core architectures combined with application accelerators hold the promise to achieve scalability and high performance by exploiting task and data levels of parallelism that are not supported by the conventional computing systems. Such a parallel or distributed computing environment is particularly suitable for large-scale geocomputation over big data as proved by our prior works, while the potential of such advanced infrastructure remains unexplored in this domain. Within this presentation, our prior and on-going initiatives will be summarized to exemplify how we exploit multicore CPUs, GPUs, and MICs, and clusters of CPUs, GPUs and MICs, to accelerate geocomputation in different applications.

  19. PNNLs Data Intensive Computing research battles Homeland Security threats

    ScienceCinema

    David Thurman; Joe Kielman; Katherine Wolf; David Atkinson

    2018-05-11

    The Pacific Northwest National Laboratorys (PNNL's) approach to data intensive computing (DIC) is focused on three key research areas: hybrid hardware architecture, software architectures, and analytic algorithms. Advancements in these areas will help to address, and solve, DIC issues associated with capturing, managing, analyzing and understanding, in near real time, data at volumes and rates that push the frontiers of current technologies.

  20. Design and Training of Limited-Interconnect Architectures

    DTIC Science & Technology

    1991-07-16

    and signal processing. Neuromorphic (brain like) models, allow an alternative for achieving real-time operation tor such tasks, while having a...compact and robust architecture. Neuromorphic models consist of interconnections of simple computational nodes. In this approach, each node computes a...operational performance. I1. Research Objectives The research objectives were: 1. Development of on- chip local training rules specifically designed for

  1. Pipelined CPU Design with FPGA in Teaching Computer Architecture

    ERIC Educational Resources Information Center

    Lee, Jong Hyuk; Lee, Seung Eun; Yu, Heon Chang; Suh, Taeweon

    2012-01-01

    This paper presents a pipelined CPU design project with a field programmable gate array (FPGA) system in a computer architecture course. The class project is a five-stage pipelined 32-bit MIPS design with experiments on the Altera DE2 board. For proper scheduling, milestones were set every one or two weeks to help students complete the project on…

  2. PNNL pushing scientific discovery through data intensive computing breakthroughs

    ScienceCinema

    Deborah Gracio; David Koppenaal; Ruby Leung

    2018-05-18

    The Pacific Northwest National Laboratory's approach to data intensive computing (DIC) is focused on three key research areas: hybrid hardware architectures, software architectures, and analytic algorithms. Advancements in these areas will help to address, and solve, DIC issues associated with capturing, managing, analyzing and understanding, in near real time, data at volumes and rates that push the frontiers of current technologies.

  3. Fault tolerant computer control for a Maglev transportation system

    NASA Technical Reports Server (NTRS)

    Lala, Jaynarayan H.; Nagle, Gail A.; Anagnostopoulos, George

    1994-01-01

    Magnetically levitated (Maglev) vehicles operating on dedicated guideways at speeds of 500 km/hr are an emerging transportation alternative to short-haul air and high-speed rail. They have the potential to offer a service significantly more dependable than air and with less operating cost than both air and high-speed rail. Maglev transportation derives these benefits by using magnetic forces to suspend a vehicle 8 to 200 mm above the guideway. Magnetic forces are also used for propulsion and guidance. The combination of high speed, short headways, stringent ride quality requirements, and a distributed offboard propulsion system necessitates high levels of automation for the Maglev control and operation. Very high levels of safety and availability will be required for the Maglev control system. This paper describes the mission scenario, functional requirements, and dependability and performance requirements of the Maglev command, control, and communications system. A distributed hierarchical architecture consisting of vehicle on-board computers, wayside zone computers, a central computer facility, and communication links between these entities was synthesized to meet the functional and dependability requirements on the maglev. Two variations of the basic architecture are described: the Smart Vehicle Architecture (SVA) and the Zone Control Architecture (ZCA). Preliminary dependability modeling results are also presented.

  4. Quantum error correction in crossbar architectures

    NASA Astrophysics Data System (ADS)

    Helsen, Jonas; Steudtner, Mark; Veldhorst, Menno; Wehner, Stephanie

    2018-07-01

    A central challenge for the scaling of quantum computing systems is the need to control all qubits in the system without a large overhead. A solution for this problem in classical computing comes in the form of so-called crossbar architectures. Recently we made a proposal for a large-scale quantum processor (Li et al arXiv:1711.03807 (2017)) to be implemented in silicon quantum dots. This system features a crossbar control architecture which limits parallel single-qubit control, but allows the scheme to overcome control scaling issues that form a major hurdle to large-scale quantum computing systems. In this work, we develop a language that makes it possible to easily map quantum circuits to crossbar systems, taking into account their architecture and control limitations. Using this language we show how to map well known quantum error correction codes such as the planar surface and color codes in this limited control setting with only a small overhead in time. We analyze the logical error behavior of this surface code mapping for estimated experimental parameters of the crossbar system and conclude that logical error suppression to a level useful for real quantum computation is feasible.

  5. Robust Software Architecture for Robots

    NASA Technical Reports Server (NTRS)

    Aghazanian, Hrand; Baumgartner, Eric; Garrett, Michael

    2009-01-01

    Robust Real-Time Reconfigurable Robotics Software Architecture (R4SA) is the name of both a software architecture and software that embodies the architecture. The architecture was conceived in the spirit of current practice in designing modular, hard, realtime aerospace systems. The architecture facilitates the integration of new sensory, motor, and control software modules into the software of a given robotic system. R4SA was developed for initial application aboard exploratory mobile robots on Mars, but is adaptable to terrestrial robotic systems, real-time embedded computing systems in general, and robotic toys.

  6. Parallel computing works

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Not Available

    An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of manymore » computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.« less

  7. MINDS: Architecture & Design

    DTIC Science & Technology

    2006-07-14

    MINDS: Architecture & Design Technical Report Department of Computer Science and Engineering University of Minnesota 4-192 EECS Building 200 Union...Street SE Minneapolis, MN 55455-0159 USA TR 06-022 MINDS: Architecture & Design Varun Chandola, Eric Eilertson, Levent Ertoz, Gyorgy Simon, and Vipin...REPORT DATE 14 JUL 2006 2. REPORT TYPE 3. DATES COVERED 00-07-2006 to 00-07-2006 4. TITLE AND SUBTITLE MINDS: Architecture & Design 5a

  8. VTK-m: Accelerating the Visualization Toolkit for Massively Threaded Architectures

    DOE PAGES

    Moreland, Kenneth; Sewell, Christopher; Usher, William; ...

    2016-05-09

    Here, one of the most critical challenges for high-performance computing (HPC) scientific visualization is execution on massively threaded processors. Of the many fundamental changes we are seeing in HPC systems, one of the most profound is a reliance on new processor types optimized for execution bandwidth over latency hiding. Our current production scientific visualization software is not designed for these new types of architectures. To address this issue, the VTK-m framework serves as a container for algorithms, provides flexible data representation, and simplifies the design of visualization algorithms on new and future computer architecture.

  9. VTK-m: Accelerating the Visualization Toolkit for Massively Threaded Architectures

    DOE PAGES

    Moreland, Kenneth; Sewell, Christopher; Usher, William; ...

    2016-05-09

    Execution on massively threaded processors is one of the most critical challenges for high-performance computing (HPC) scientific visualization. Of the many fundamental changes we are seeing in HPC systems, one of the most profound is a reliance on new processor types optimized for execution bandwidth over latency hiding. Moreover, our current production scientific visualization software is not designed for these new types of architectures. In order to address this issue, the VTK-m framework serves as a container for algorithms, provides flexible data representation, and simplifies the design of visualization algorithms on new and future computer architecture.

  10. The Architecture of Information at Plateau Beaubourg

    ERIC Educational Resources Information Center

    Branda, Ewan Edward

    2012-01-01

    During the course of the 1960s, computers and information networks made their appearance in the public imagination. To architects on the cusp of architecture's postmodern turn, information technology offered new forms, metaphors, and techniques by which modern architecture's technological and utopian basis could be reasserted. Yet by the end of…

  11. 75 FR 2433 - Special Conditions: Boeing Model 747-8/-8F Airplanes, Systems and Data Networks Security...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-01-15

    ... design features associated with the architecture and connectivity capabilities of the airplane's computer... novel or unusual design features: digital systems architecture composed of several connected networks. The architecture and network configuration may be used for, or interfaced with, a diverse set of...

  12. Integrated Cognitive-neuroscience Architectures for Understanding Sensemaking (ICArUS): Phase 1 Test and Evaluation Development Guide

    DTIC Science & Technology

    2014-11-01

    Integrated Cognitive-neuroscience Architectures for Understanding Sensemaking (ICArUS): Phase 1 Test and Evaluation Development Guide Craig...Self-initiated sensemaking ........................................................................................... 19 Feature Vector Format: Tasks...The Integrated Cognitive-neuroscience Architectures for Understanding Sensemaking (ICArUS) Program aimed to build computational cognitive

  13. Computational Design of Metal Ion Sequestering Agents

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hay, Benjamin P.; Rapko, Brian M.

    Organic ligands that exhibit a high degree of metal ion recognition are essential precursors for developing separation processes and sensors for metal ions. Since the beginning of the nuclear era, much research has focused on discovering ligands that target specific radionuclides. Members of the Group 1A and 2A cations (e.g., Cs, Sr, Ra) and the f-block metals (actinides and lanthanides) are of primary concern to DOE. Although there has been some success in identifying ligand architectures that exhibit a degree of metal ion recognition, the ability to control binding affinity and selectivity remains a significant challenge. The traditional approach formore » discovering such ligands has involved lengthy programs of organic synthesis and testing that, in the absence of reliable methods for screening compounds before synthesis, have resulted in much wasted research effort. This project seeks to enhance and strengthen the traditional approach through computer-aided design of new and improved host molecules. Accurate electronic structure calculations are coupled with experimental data to provide fundamental information about ligand structure and the nature of metal-donor group interactions (design criteria). This fundamental information then is used in a molecular mechanics model (MM) that helps us rapidly screen proposed ligand architectures and select the best members from a set of potential candidates. By using combinatorial methods, molecule building software has been developed that generates large numbers of candidate architectures for a given set of donor groups. The specific goals of this project are: • further understand the structural and energetic aspects of individual donor group- metal ion interactions and incorporate this information within the MM framework • further develop and evaluate approaches for correlating ligand structure with reactivity toward metal ions, in other words, screening capability • use molecule structure building software to generate large numbers of candidate ligand architectures for given sets of donor groups • screen candidates and identify ligand architectures that will exhibit enhanced metal ion recognition. These new capabilities are being applied to ligand systems identified under other DOEsponsored projects where studies have suggested that modifying existing architectures will lead to dramatic enhancements in metal ion binding affinity and selectivity. With this in mind, we are collaborating with Professors R. T. Paine (University of New Mexico), K. N. Raymond (University of California, Berkeley), and J. E. Hutchison (University of Oregon), and Dr. B. A. Moyer (Oak Ridge National Laboratory) to obtain experimental validation of the predicted new ligand structures. Successful completion of this study will yield molecular-level insight into the role that ligand architecture plays in controlling metal ion complexation and will provide a computational approach to ligand design.« less

  14. Multi-level Hierarchical Poly Tree computer architectures

    NASA Technical Reports Server (NTRS)

    Padovan, Joe; Gute, Doug

    1990-01-01

    Based on the concept of hierarchical substructuring, this paper develops an optimal multi-level Hierarchical Poly Tree (HPT) parallel computer architecture scheme which is applicable to the solution of finite element and difference simulations. Emphasis is given to minimizing computational effort, in-core/out-of-core memory requirements, and the data transfer between processors. In addition, a simplified communications network that reduces the number of I/O channels between processors is presented. HPT configurations that yield optimal superlinearities are also demonstrated. Moreover, to generalize the scope of applicability, special attention is given to developing: (1) multi-level reduction trees which provide an orderly/optimal procedure by which model densification/simplification can be achieved, as well as (2) methodologies enabling processor grading that yields architectures with varying types of multi-level granularity.

  15. FFT Computation with Systolic Arrays, A New Architecture

    NASA Technical Reports Server (NTRS)

    Boriakoff, Valentin

    1994-01-01

    The use of the Cooley-Tukey algorithm for computing the l-d FFT lends itself to a particular matrix factorization which suggests direct implementation by linearly-connected systolic arrays. Here we present a new systolic architecture that embodies this algorithm. This implementation requires a smaller number of processors and a smaller number of memory cells than other recent implementations, as well as having all the advantages of systolic arrays. For the implementation of the decimation-in-frequency case, word-serial data input allows continuous real-time operation without the need of a serial-to-parallel conversion device. No control or data stream switching is necessary. Computer simulation of this architecture was done in the context of a 1024 point DFT with a fixed point processor, and CMOS processor implementation has started.

  16. Organization of the Drosophila larval visual circuit

    PubMed Central

    Gendre, Nanae; Neagu-Maier, G Larisa; Fetter, Richard D; Schneider-Mizell, Casey M; Truman, James W; Zlatic, Marta; Cardona, Albert

    2017-01-01

    Visual systems transduce, process and transmit light-dependent environmental cues. Computation of visual features depends on photoreceptor neuron types (PR) present, organization of the eye and wiring of the underlying neural circuit. Here, we describe the circuit architecture of the visual system of Drosophila larvae by mapping the synaptic wiring diagram and neurotransmitters. By contacting different targets, the two larval PR-subtypes create two converging pathways potentially underlying the computation of ambient light intensity and temporal light changes already within this first visual processing center. Locally processed visual information then signals via dedicated projection interneurons to higher brain areas including the lateral horn and mushroom body. The stratified structure of the larval optic neuropil (LON) suggests common organizational principles with the adult fly and vertebrate visual systems. The complete synaptic wiring diagram of the LON paves the way to understanding how circuits with reduced numerical complexity control wide ranges of behaviors.

  17. The influence of fiber orientation on the equilibrium properties of neutral and charged biphasic tissues.

    PubMed

    Nagel, Thomas; Kelly, Daniel J

    2010-11-01

    Constitutive models facilitate investigation into load bearing mechanisms of biological tissues and may aid attempts to engineer tissue replacements. In soft tissue models, a commonly made assumption is that collagen fibers can only bear tensile loads. Previous computational studies have demonstrated that radially aligned fibers stiffen a material in unconfined compression most by limiting lateral expansion while vertically aligned fibers buckle under the compressive loads. In this short communication, we show that in conjunction with swelling, these intuitive statements can be violated at small strains. Under such conditions, a tissue with fibers aligned parallel to the direction of load initially provides the greatest resistance to compression. The results are further put into the context of a Benninghoff architecture for articular cartilage. The predictions of this computational study demonstrate the effects of varying fiber orientations and an initial tare strain on the apparent material parameters obtained from unconfined compression tests of charged tissues.

  18. Implementation of High-Order Multireference Coupled-Cluster Methods on Intel Many Integrated Core Architecture.

    PubMed

    Aprà, E; Kowalski, K

    2016-03-08

    In this paper we discuss the implementation of multireference coupled-cluster formalism with singles, doubles, and noniterative triples (MRCCSD(T)), which is capable of taking advantage of the processing power of the Intel Xeon Phi coprocessor. We discuss the integration of two levels of parallelism underlying the MRCCSD(T) implementation with computational kernels designed to offload the computationally intensive parts of the MRCCSD(T) formalism to Intel Xeon Phi coprocessors. Special attention is given to the enhancement of the parallel performance by task reordering that has improved load balancing in the noniterative part of the MRCCSD(T) calculations. We also discuss aspects regarding efficient optimization and vectorization strategies.

  19. Building a robust vehicle detection and classification module

    NASA Astrophysics Data System (ADS)

    Grigoryev, Anton; Khanipov, Timur; Koptelov, Ivan; Bocharov, Dmitry; Postnikov, Vassily; Nikolaev, Dmitry

    2015-12-01

    The growing adoption of intelligent transportation systems (ITS) and autonomous driving requires robust real-time solutions for various event and object detection problems. Most of real-world systems still cannot rely on computer vision algorithms and employ a wide range of costly additional hardware like LIDARs. In this paper we explore engineering challenges encountered in building a highly robust visual vehicle detection and classification module that works under broad range of environmental and road conditions. The resulting technology is competitive to traditional non-visual means of traffic monitoring. The main focus of the paper is on software and hardware architecture, algorithm selection and domain-specific heuristics that help the computer vision system avoid implausible answers.

  20. Prevention of Malicious Nodes Communication in MANETs by Using Authorized Tokens

    NASA Astrophysics Data System (ADS)

    Chandrakant, N.; Shenoy, P. Deepa; Venugopal, K. R.; Patnaik, L. M.

    A rapid increase of wireless networks and mobile computing applications has changed the landscape of network security. A MANET is more susceptible to the attacks than wired network. As a result, attacks with malicious intent have been and will be devised to take advantage of these vulnerabilities and to cripple the MANET operation. Hence we need to search for new architecture and mechanisms to protect the wireless networks and mobile computing applications. In this paper, we examine the nodes that come under the vicinity of base node and members of the network and communication is provided to genuine nodes only. It is found that the proposed algorithm is a effective algorithm for security in MANETs.

  1. Exact computation of the maximum-entropy potential of spiking neural-network models.

    PubMed

    Cofré, R; Cessac, B

    2014-05-01

    Understanding how stimuli and synaptic connectivity influence the statistics of spike patterns in neural networks is a central question in computational neuroscience. The maximum-entropy approach has been successfully used to characterize the statistical response of simultaneously recorded spiking neurons responding to stimuli. However, in spite of good performance in terms of prediction, the fitting parameters do not explain the underlying mechanistic causes of the observed correlations. On the other hand, mathematical models of spiking neurons (neuromimetic models) provide a probabilistic mapping between the stimulus, network architecture, and spike patterns in terms of conditional probabilities. In this paper we build an exact analytical mapping between neuromimetic and maximum-entropy models.

  2. High speed cylindrical roller bearing analysis, SKF computer program CYBEAN. Volume 2: User's manual

    NASA Technical Reports Server (NTRS)

    Kleckner, R. J.; Pirvics, J.

    1978-01-01

    The CYBEAN (Cylindrical Bearing Analysis) was created to detail radially loaded, aligned and misaligned cylindrical roller bearing performance under a variety of operating conditions. Emphasis was placed on detailing the effects of high speed, preload and system thermal coupling. Roller tilt, skew, radial, circumferential and axial displacement as well as flange contact were considered. Variable housing and flexible out-of-round outer ring geometries, and both steady state and time transient temperature calculations were enabled. The complete range of elastohydrodynamic contact considerations, employing full and partial film conditions were treated in the computation of raceway and flange contacts. Input and output architectures containing guidelines for use and a sample execution are detailed.

  3. Optical computing, optical memory, and SBIRs at Foster-Miller

    NASA Astrophysics Data System (ADS)

    Domash, Lawrence H.

    1994-03-01

    A desktop design and manufacturing system for binary diffractive elements, MacBEEP, was developed with the optical researcher in mind. Optical processing systems for specialized tasks such as cellular automation computation and fractal measurement were constructed. A new family of switchable holograms has enabled several applications for control of laser beams in optical memories. New spatial light modulators and optical logic elements have been demonstrated based on a more manufacturable semiconductor technology. Novel synthetic and polymeric nonlinear materials for optical storage are under development in an integrated memory architecture. SBIR programs enable creative contributions from smaller companies, both product oriented and technology oriented, and support advances that might not otherwise be developed.

  4. Ultra-low-power and robust digital-signal-processing hardware for implantable neural interface microsystems.

    PubMed

    Narasimhan, S; Chiel, H J; Bhunia, S

    2011-04-01

    Implantable microsystems for monitoring or manipulating brain activity typically require on-chip real-time processing of multichannel neural data using ultra low-power, miniaturized electronics. In this paper, we propose an integrated-circuit/architecture-level hardware design framework for neural signal processing that exploits the nature of the signal-processing algorithm. First, we consider different power reduction techniques and compare the energy efficiency between the ultra-low frequency subthreshold and conventional superthreshold design. We show that the superthreshold design operating at a much higher frequency can achieve comparable energy dissipation by taking advantage of extensive power gating. It also provides significantly higher robustness of operation and yield under large process variations. Next, we propose an architecture level preferential design approach for further energy reduction by isolating the critical computation blocks (with respect to the quality of the output signal) and assigning them higher delay margins compared to the noncritical ones. Possible delay failures under parameter variations are confined to the noncritical components, allowing graceful degradation in quality under voltage scaling. Simulation results using prerecorded neural data from the sea-slug (Aplysia californica) show that the application of the proposed design approach can lead to significant improvement in total energy, without compromising the output signal quality under process variations, compared to conventional design approaches.

  5. The Nebula Standard Computer Architecture,

    DTIC Science & Technology

    good target for high level languages, the designers also adopted a visibility approach in architecture design that provides more freedom for the hardware implementor while still maintaining software portability. (Author)

  6. Computational performance of a smoothed particle hydrodynamics simulation for shared-memory parallel computing

    NASA Astrophysics Data System (ADS)

    Nishiura, Daisuke; Furuichi, Mikito; Sakaguchi, Hide

    2015-09-01

    The computational performance of a smoothed particle hydrodynamics (SPH) simulation is investigated for three types of current shared-memory parallel computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in efficient shared-memory allocation methods for each chipset, because the efficient data access patterns differ between compute unified device architecture (CUDA) programming for GPUs and OpenMP programming for MIC processors and multi-core CPUs. We first introduce several parallel implementation techniques for the SPH code, and then examine these on our target computer architectures to determine the most effective algorithms for each processor unit. In addition, we evaluate the effective computing performance and power efficiency of the SPH simulation on each architecture, as these are critical metrics for overall performance in a multi-device environment. In our benchmark test, the GPU is found to produce the best arithmetic performance as a standalone device unit, and gives the most efficient power consumption. The multi-core CPU obtains the most effective computing performance. The computational speed of the MIC processor on Xeon Phi approached that of two Xeon CPUs. This indicates that using MICs is an attractive choice for existing SPH codes on multi-core CPUs parallelized by OpenMP, as it gains computational acceleration without the need for significant changes to the source code.

  7. Benchmarking hardware architecture candidates for the NFIRAOS real-time controller

    NASA Astrophysics Data System (ADS)

    Smith, Malcolm; Kerley, Dan; Herriot, Glen; Véran, Jean-Pierre

    2014-07-01

    As a part of the trade study for the Narrow Field Infrared Adaptive Optics System, the adaptive optics system for the Thirty Meter Telescope, we investigated the feasibility of performing real-time control computation using a Linux operating system and Intel Xeon E5 CPUs. We also investigated a Xeon Phi based architecture which allows higher levels of parallelism. This paper summarizes both the CPU based real-time controller architecture and the Xeon Phi based RTC. The Intel Xeon E5 CPU solution meets the requirements and performs the computation for one AO cycle in an average of 767 microseconds. The Xeon Phi solution did not meet the 1200 microsecond time requirement and also suffered from unpredictable execution times. More detailed benchmark results are reported for both architectures.

  8. Optical systolic solutions of linear algebraic equations

    NASA Technical Reports Server (NTRS)

    Neuman, C. P.; Casasent, D.

    1984-01-01

    The philosophy and data encoding possible in systolic array optical processor (SAOP) were reviewed. The multitude of linear algebraic operations achievable on this architecture is examined. These operations include such linear algebraic algorithms as: matrix-decomposition, direct and indirect solutions, implicit and explicit methods for partial differential equations, eigenvalue and eigenvector calculations, and singular value decomposition. This architecture can be utilized to realize general techniques for solving matrix linear and nonlinear algebraic equations, least mean square error solutions, FIR filters, and nested-loop algorithms for control engineering applications. The data flow and pipelining of operations, design of parallel algorithms and flexible architectures, application of these architectures to computationally intensive physical problems, error source modeling of optical processors, and matching of the computational needs of practical engineering problems to the capabilities of optical processors are emphasized.

  9. Memory access in shared virtual memory

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Berrendorf, R.

    1992-01-01

    Shared virtual memory (SVM) is a virtual memory layer with a single address space on top of a distributed real memory on parallel computers. We examine the behavior and performance of SVM running a parallel program with medium-grained, loop-level parallelism on top of it. A simulator for the underlying parallel architecture can be used to examine the behavior of SVM more deeply. The influence of several parameters, such as the number of processors, page size, cold or warm start, and restricted page replication, is studied.

  10. Memory access in shared virtual memory

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Berrendorf, R.

    1992-09-01

    Shared virtual memory (SVM) is a virtual memory layer with a single address space on top of a distributed real memory on parallel computers. We examine the behavior and performance of SVM running a parallel program with medium-grained, loop-level parallelism on top of it. A simulator for the underlying parallel architecture can be used to examine the behavior of SVM more deeply. The influence of several parameters, such as the number of processors, page size, cold or warm start, and restricted page replication, is studied.

  11. Will machines ever think

    NASA Technical Reports Server (NTRS)

    Denning, P. J.

    1986-01-01

    Artificial Intelligence research has come under fire for failing to fulfill its promises. A growing number of AI researchers are reexamining the bases of AI research and are challenging the assumption that intelligent behavior can be fully explained as manipulation of symbols by algorithms. Three recent books -- Mind over Machine (H. Dreyfus and S. Dreyfus), Understanding Computers and Cognition (T. Winograd and F. Flores), and Brains, Behavior, and Robots (J. Albus) -- explore alternatives and open the door to new architectures that may be able to learn skills.

  12. Laboratory for Computer Science Progress Report 19, 1 July 1981-30 June 1982.

    DTIC Science & Technology

    1984-05-01

    Multiprocessor Architectures 202 4. TRIX Operating System 209 5. VLSI Tools 212 ’SYSTEMATIC PROGRAM DEVELOPMENT, 221 1. Introduction 222 2. Specification...exploring distributed operating systems and the architecture of single-user powerful computers that are interconnected by communication networks. The...to now. In particular, we expect to experiment with languages, operating systems , and applications that establish the feasibility of distributed

  13. T and D-Bench--Innovative Combined Support for Education and Research in Computer Architecture and Embedded Systems

    ERIC Educational Resources Information Center

    Soares, S. N.; Wagner, F. R.

    2011-01-01

    Teaching and Design Workbench (T&D-Bench) is a framework aimed at education and research in the areas of computer architecture and embedded systems. It includes a set of features not found in other educational environments. This set of features is the result of an original combination of design requirements for T&D-Bench: that the…

  14. High-Speed Systolic Array Testbed.

    DTIC Science & Technology

    1987-10-01

    applications since the concept was introduced by H.T. Kung In 1978. This highly parallel architecture of nearet neighbor data communciation and...must be addressed. For instance, should bit-serial or bit parallei computation be utilized. Does the dynamic range of the candidate applications or...numericai stability of the algorithms used require computations In fixed point and Integer format or the architecturally more complex and slower floating

  15. Framework for a clinical information system.

    PubMed

    Van de Velde, R

    2000-01-01

    The current status of our work towards the design and implementation of a reference architecture for a Clinical Information System is presented. This architecture has been developed and implemented based on components following a strong underlying conceptual and technological model. Common Object Request Broker and n-tier technology featuring centralised and departmental clinical information systems as the back-end store for all clinical data are used. Servers located in the 'middle' tier apply the clinical (business) model and application rules to communicate with so-called 'thin client' workstations. The main characteristics are the focus on modelling and reuse of both data and business logic as there is a shift away from data and functional modelling towards object modelling. Scalability as well as adaptability to constantly changing requirements via component driven computing are the main reasons for that approach.

  16. Robotic disaster recovery efforts with ad-hoc deployable cloud computing

    NASA Astrophysics Data System (ADS)

    Straub, Jeremy; Marsh, Ronald; Mohammad, Atif F.

    2013-06-01

    Autonomous operations of search and rescue (SaR) robots is an ill posed problem, which is complexified by the dynamic disaster recovery environment. In a typical SaR response scenario, responder robots will require different levels of processing capabilities during various parts of the response effort and will need to utilize multiple algorithms. Placing these capabilities onboard the robot is a mediocre solution that precludes algorithm specific performance optimization and results in mediocre performance. Architecture for an ad-hoc, deployable cloud environment suitable for use in a disaster response scenario is presented. Under this model, each service provider is optimized for the task and maintains a database of situation-relevant information. This service-oriented architecture (SOA 3.0) compliant framework also serves as an example of the efficient use of SOA 3.0 in an actual cloud application.

  17. Defining the computational structure of the motion detector in Drosophila.

    PubMed

    Clark, Damon A; Bursztyn, Limor; Horowitz, Mark A; Schnitzer, Mark J; Clandinin, Thomas R

    2011-06-23

    Many animals rely on visual motion detection for survival. Motion information is extracted from spatiotemporal intensity patterns on the retina, a paradigmatic neural computation. A phenomenological model, the Hassenstein-Reichardt correlator (HRC), relates visual inputs to neural activity and behavioral responses to motion, but the circuits that implement this computation remain unknown. By using cell-type specific genetic silencing, minimal motion stimuli, and in vivo calcium imaging, we examine two critical HRC inputs. These two pathways respond preferentially to light and dark moving edges. We demonstrate that these pathways perform overlapping but complementary subsets of the computations underlying the HRC. A numerical model implementing differential weighting of these operations displays the observed edge preferences. Intriguingly, these pathways are distinguished by their sensitivities to a stimulus correlation that corresponds to an illusory percept, "reverse phi," that affects many species. Thus, this computational architecture may be widely used to achieve edge selectivity in motion detection. Copyright © 2011 Elsevier Inc. All rights reserved.

  18. Experimental comparison of two quantum computing architectures

    PubMed Central

    Linke, Norbert M.; Maslov, Dmitri; Roetteler, Martin; Debnath, Shantanu; Figgatt, Caroline; Landsman, Kevin A.; Wright, Kenneth; Monroe, Christopher

    2017-01-01

    We run a selection of algorithms on two state-of-the-art 5-qubit quantum computers that are based on different technology platforms. One is a publicly accessible superconducting transmon device (www.research.ibm.com/ibm-q) with limited connectivity, and the other is a fully connected trapped-ion system. Even though the two systems have different native quantum interactions, both can be programed in a way that is blind to the underlying hardware, thus allowing a comparison of identical quantum algorithms between different physical systems. We show that quantum algorithms and circuits that use more connectivity clearly benefit from a better-connected system of qubits. Although the quantum systems here are not yet large enough to eclipse classical computers, this experiment exposes critical factors of scaling quantum computers, such as qubit connectivity and gate expressivity. In addition, the results suggest that codesigning particular quantum applications with the hardware itself will be paramount in successfully using quantum computers in the future. PMID:28325879

  19. Scheduling based on a dynamic resource connection

    NASA Astrophysics Data System (ADS)

    Nagiyev, A. E.; Botygin, I. A.; Shersntneva, A. I.; Konyaev, P. A.

    2017-02-01

    The practical using of distributed computing systems associated with many problems, including troubles with the organization of an effective interaction between the agents located at the nodes of the system, with the specific configuration of each node of the system to perform a certain task, with the effective distribution of the available information and computational resources of the system, with the control of multithreading which implements the logic of solving research problems and so on. The article describes the method of computing load balancing in distributed automatic systems, focused on the multi-agency and multi-threaded data processing. The scheme of the control of processing requests from the terminal devices, providing the effective dynamic scaling of computing power under peak load is offered. The results of the model experiments research of the developed load scheduling algorithm are set out. These results show the effectiveness of the algorithm even with a significant expansion in the number of connected nodes and zoom in the architecture distributed computing system.

  20. A real-time control system for the control of suspended interferometers based on hybrid computing techniques

    NASA Astrophysics Data System (ADS)

    Acernese, Fausto; Barone, Fabrizio; De Rosa, Rosario; Eleuteri, Antonio; Milano, Leopoldo; Pardi, Silvio; Ricciardi, Iolanda; Russo, Guido

    2004-09-01

    One of the main requirements of a digital system for the control of interferometric detectors of gravitational waves is the computing power, that is a direct consequence of the increasing complexity of the digital algorithms necessary for the control signals generation. For this specific task many specialized non standard real-time architectures have been developed, often very expensive and difficult to upgrade. On the other hand, such computing power is generally fully available for off-line applications on standard Pc based systems. Therefore, a possible and obvious solution may be provided by the integration of both the real-time and off-line architecture resulting in a hybrid control system architecture based on standards available components, trying to get both the advantages of the perfect data synchronization provided by the real-time systems and by the large computing power available on Pc based systems. Such integration may be provided by the implementation of the link between the two different architectures through the standard Ethernet network, whose data transfer speed is largely increasing in these years, using the TCP/IP, UDP and raw Ethernet protocols. In this paper we describe the architecture of an hybrid Ethernet based real-time control system prototype we implemented in Napoli, discussing its characteristics and performances. Finally we discuss a possible application to the real-time control of a suspended mass of the mode cleaner of the 3m prototype optical interferometer for gravitational wave detection (IDGW-3P) operational in Napoli.

  1. Kokkos: Enabling manycore performance portability through polymorphic memory access patterns

    DOE PAGES

    Carter Edwards, H.; Trott, Christian R.; Sunderland, Daniel

    2014-07-22

    The manycore revolution can be characterized by increasing thread counts, decreasing memory per thread, and diversity of continually evolving manycore architectures. High performance computing (HPC) applications and libraries must exploit increasingly finer levels of parallelism within their codes to sustain scalability on these devices. We found that a major obstacle to performance portability is the diverse and conflicting set of constraints on memory access patterns across devices. Contemporary portable programming models address manycore parallelism (e.g., OpenMP, OpenACC, OpenCL) but fail to address memory access patterns. The Kokkos C++ library enables applications and domain libraries to achieve performance portability on diversemore » manycore architectures by unifying abstractions for both fine-grain data parallelism and memory access patterns. In this paper we describe Kokkos’ abstractions, summarize its application programmer interface (API), present performance results for unit-test kernels and mini-applications, and outline an incremental strategy for migrating legacy C++ codes to Kokkos. Furthermore, the Kokkos library is under active research and development to incorporate capabilities from new generations of manycore architectures, and to address a growing list of applications and domain libraries.« less

  2. Recent Performance Results of VPIC on Trinity

    NASA Astrophysics Data System (ADS)

    Nystrom, W. D.; Bergen, B.; Bird, R. F.; Bowers, K. J.; Daughton, W. S.; Guo, F.; Le, A.; Li, H.; Nam, H.; Pang, X.; Stark, D. J.; Rust, W. N., III; Yin, L.; Albright, B. J.

    2017-10-01

    Trinity is a new DOE compute resource now in production at Los Alamos National Laboratory. Trinity has several new and unique features including two compute partitions, one with dual socket Intel Haswell Xeon compute nodes and one with Intel Knights Landing (KNL) Xeon Phi compute nodes, use of on package high bandwidth memory (HBM) for KNL nodes, ability to configure KNL nodes with respect to HBM model and on die network topology in a variety of operational modes at run time, and use of solid state storage via burst buffer technology to reduce time required to perform I/O. An effort is in progress to optimize VPIC on Trinity by taking advantage of these new architectural features. Results of work will be presented on performance of VPIC on Haswell and KNL partitions for single node runs and runs at scale. Results include use of burst buffers at scale to optimize I/O, comparison of strategies for using MPI and threads, performance benefits using HBM and effectiveness of using intrinsics for vectorization. Work performed under auspices of U.S. Dept. of Energy by Los Alamos National Security, LLC Los Alamos National Laboratory under contract DE-AC52-06NA25396 and supported by LANL LDRD program.

  3. Message Passing vs. Shared Address Space on a Cluster of SMPs

    NASA Technical Reports Server (NTRS)

    Shan, Hongzhang; Singh, Jaswinder Pal; Oliker, Leonid; Biswas, Rupak

    2000-01-01

    The convergence of scalable computer architectures using clusters of PCs (or PC-SMPs) with commodity networking has become an attractive platform for high end scientific computing. Currently, message-passing and shared address space (SAS) are the two leading programming paradigms for these systems. Message-passing has been standardized with MPI, and is the most common and mature programming approach. However message-passing code development can be extremely difficult, especially for irregular structured computations. SAS offers substantial ease of programming, but may suffer from performance limitations due to poor spatial locality, and high protocol overhead. In this paper, we compare the performance of and programming effort, required for six applications under both programming models on a 32 CPU PC-SMP cluster. Our application suite consists of codes that typically do not exhibit high efficiency under shared memory programming. due to their high communication to computation ratios and complex communication patterns. Results indicate that SAS can achieve about half the parallel efficiency of MPI for most of our applications: however, on certain classes of problems SAS performance is competitive with MPI. We also present new algorithms for improving the PC cluster performance of MPI collective operations.

  4. Heterogeneous compute in computer vision: OpenCL in OpenCV

    NASA Astrophysics Data System (ADS)

    Gasparakis, Harris

    2014-02-01

    We explore the relevance of Heterogeneous System Architecture (HSA) in Computer Vision, both as a long term vision, and as a near term emerging reality via the recently ratified OpenCL 2.0 Khronos standard. After a brief review of OpenCL 1.2 and 2.0, including HSA features such as Shared Virtual Memory (SVM) and platform atomics, we identify what genres of Computer Vision workloads stand to benefit by leveraging those features, and we suggest a new mental framework that replaces GPU compute with hybrid HSA APU compute. As a case in point, we discuss, in some detail, popular object recognition algorithms (part-based models), emphasizing the interplay and concurrent collaboration between the GPU and CPU. We conclude by describing how OpenCL has been incorporated in OpenCV, a popular open source computer vision library, emphasizing recent work on the Transparent API, to appear in OpenCV 3.0, which unifies the native CPU and OpenCL execution paths under a single API, allowing the same code to execute either on CPU or on a OpenCL enabled device, without even recompiling.

  5. A Real-Time High Performance Computation Architecture for Multiple Moving Target Tracking Based on Wide-Area Motion Imagery via Cloud and Graphic Processing Units

    PubMed Central

    Liu, Kui; Wei, Sixiao; Chen, Zhijiang; Jia, Bin; Chen, Genshe; Ling, Haibin; Sheaff, Carolyn; Blasch, Erik

    2017-01-01

    This paper presents the first attempt at combining Cloud with Graphic Processing Units (GPUs) in a complementary manner within the framework of a real-time high performance computation architecture for the application of detecting and tracking multiple moving targets based on Wide Area Motion Imagery (WAMI). More specifically, the GPU and Cloud Moving Target Tracking (GC-MTT) system applied a front-end web based server to perform the interaction with Hadoop and highly parallelized computation functions based on the Compute Unified Device Architecture (CUDA©). The introduced multiple moving target detection and tracking method can be extended to other applications such as pedestrian tracking, group tracking, and Patterns of Life (PoL) analysis. The cloud and GPUs based computing provides an efficient real-time target recognition and tracking approach as compared to methods when the work flow is applied using only central processing units (CPUs). The simultaneous tracking and recognition results demonstrate that a GC-MTT based approach provides drastically improved tracking with low frame rates over realistic conditions. PMID:28208684

  6. A Real-Time High Performance Computation Architecture for Multiple Moving Target Tracking Based on Wide-Area Motion Imagery via Cloud and Graphic Processing Units.

    PubMed

    Liu, Kui; Wei, Sixiao; Chen, Zhijiang; Jia, Bin; Chen, Genshe; Ling, Haibin; Sheaff, Carolyn; Blasch, Erik

    2017-02-12

    This paper presents the first attempt at combining Cloud with Graphic Processing Units (GPUs) in a complementary manner within the framework of a real-time high performance computation architecture for the application of detecting and tracking multiple moving targets based on Wide Area Motion Imagery (WAMI). More specifically, the GPU and Cloud Moving Target Tracking (GC-MTT) system applied a front-end web based server to perform the interaction with Hadoop and highly parallelized computation functions based on the Compute Unified Device Architecture (CUDA©). The introduced multiple moving target detection and tracking method can be extended to other applications such as pedestrian tracking, group tracking, and Patterns of Life (PoL) analysis. The cloud and GPUs based computing provides an efficient real-time target recognition and tracking approach as compared to methods when the work flow is applied using only central processing units (CPUs). The simultaneous tracking and recognition results demonstrate that a GC-MTT based approach provides drastically improved tracking with low frame rates over realistic conditions.

  7. Unit cell-based computer-aided manufacturing system for tissue engineering.

    PubMed

    Kang, Hyun-Wook; Park, Jeong Hun; Kang, Tae-Yun; Seol, Young-Joon; Cho, Dong-Woo

    2012-03-01

    Scaffolds play an important role in the regeneration of artificial tissues or organs. A scaffold is a porous structure with a micro-scale inner architecture in the range of several to several hundreds of micrometers. Therefore, computer-aided construction of scaffolds should provide sophisticated functionality for porous structure design and a tool path generation strategy that can achieve micro-scale architecture. In this study, a new unit cell-based computer-aided manufacturing (CAM) system was developed for the automated design and fabrication of a porous structure with micro-scale inner architecture that can be applied to composite tissue regeneration. The CAM system was developed by first defining a data structure for the computing process of a unit cell representing a single pore structure. Next, an algorithm and software were developed and applied to construct porous structures with a single or multiple pore design using solid freeform fabrication technology and a 3D tooth/spine computer-aided design model. We showed that this system is quite feasible for the design and fabrication of a scaffold for tissue engineering.

  8. The neurobiology of syntax: beyond string sets.

    PubMed

    Petersson, Karl Magnus; Hagoort, Peter

    2012-07-19

    The human capacity to acquire language is an outstanding scientific challenge to understand. Somehow our language capacities arise from the way the human brain processes, develops and learns in interaction with its environment. To set the stage, we begin with a summary of what is known about the neural organization of language and what our artificial grammar learning (AGL) studies have revealed. We then review the Chomsky hierarchy in the context of the theory of computation and formal learning theory. Finally, we outline a neurobiological model of language acquisition and processing based on an adaptive, recurrent, spiking network architecture. This architecture implements an asynchronous, event-driven, parallel system for recursive processing. We conclude that the brain represents grammars (or more precisely, the parser/generator) in its connectivity, and its ability for syntax is based on neurobiological infrastructure for structured sequence processing. The acquisition of this ability is accounted for in an adaptive dynamical systems framework. Artificial language learning (ALL) paradigms might be used to study the acquisition process within such a framework, as well as the processing properties of the underlying neurobiological infrastructure. However, it is necessary to combine and constrain the interpretation of ALL results by theoretical models and empirical studies on natural language processing. Given that the faculty of language is captured by classical computational models to a significant extent, and that these can be embedded in dynamic network architectures, there is hope that significant progress can be made in understanding the neurobiology of the language faculty.

  9. The neurobiology of syntax: beyond string sets

    PubMed Central

    Petersson, Karl Magnus; Hagoort, Peter

    2012-01-01

    The human capacity to acquire language is an outstanding scientific challenge to understand. Somehow our language capacities arise from the way the human brain processes, develops and learns in interaction with its environment. To set the stage, we begin with a summary of what is known about the neural organization of language and what our artificial grammar learning (AGL) studies have revealed. We then review the Chomsky hierarchy in the context of the theory of computation and formal learning theory. Finally, we outline a neurobiological model of language acquisition and processing based on an adaptive, recurrent, spiking network architecture. This architecture implements an asynchronous, event-driven, parallel system for recursive processing. We conclude that the brain represents grammars (or more precisely, the parser/generator) in its connectivity, and its ability for syntax is based on neurobiological infrastructure for structured sequence processing. The acquisition of this ability is accounted for in an adaptive dynamical systems framework. Artificial language learning (ALL) paradigms might be used to study the acquisition process within such a framework, as well as the processing properties of the underlying neurobiological infrastructure. However, it is necessary to combine and constrain the interpretation of ALL results by theoretical models and empirical studies on natural language processing. Given that the faculty of language is captured by classical computational models to a significant extent, and that these can be embedded in dynamic network architectures, there is hope that significant progress can be made in understanding the neurobiology of the language faculty. PMID:22688633

  10. Architecture and Initial Development of a Digital Library Platform for Computable Knowledge Objects for Health.

    PubMed

    Flynn, Allen J; Bahulekar, Namita; Boisvert, Peter; Lagoze, Carl; Meng, George; Rampton, James; Friedman, Charles P

    2017-01-01

    Throughout the world, biomedical knowledge is routinely generated and shared through primary and secondary scientific publications. However, there is too much latency between publication of knowledge and its routine use in practice. To address this latency, what is actionable in scientific publications can be encoded to make it computable. We have created a purpose-built digital library platform to hold, manage, and share actionable, computable knowledge for health called the Knowledge Grid Library. Here we present it with its system architecture.

  11. Concepts and Relations in Neurally Inspired In Situ Concept-Based Computing

    PubMed Central

    van der Velde, Frank

    2016-01-01

    In situ concept-based computing is based on the notion that conceptual representations in the human brain are “in situ.” In this way, they are grounded in perception and action. Examples are neuronal assemblies, whose connection structures develop over time and are distributed over different brain areas. In situ concepts representations cannot be copied or duplicated because that will disrupt their connection structure, and thus the meaning of these concepts. Higher-level cognitive processes, as found in language and reasoning, can be performed with in situ concepts by embedding them in specialized neurally inspired “blackboards.” The interactions between the in situ concepts and the blackboards form the basis for in situ concept computing architectures. In these architectures, memory (concepts) and processing are interwoven, in contrast with the separation between memory and processing found in Von Neumann architectures. Because the further development of Von Neumann computing (more, faster, yet power limited) is questionable, in situ concept computing might be an alternative for concept-based computing. In situ concept computing will be illustrated with a recently developed BABI reasoning task. Neurorobotics can play an important role in the development of in situ concept computing because of the development of in situ concept representations derived in scenarios as needed for reasoning tasks. Neurorobotics would also benefit from power limited and in situ concept computing. PMID:27242504

  12. Concepts and Relations in Neurally Inspired In Situ Concept-Based Computing.

    PubMed

    van der Velde, Frank

    2016-01-01

    In situ concept-based computing is based on the notion that conceptual representations in the human brain are "in situ." In this way, they are grounded in perception and action. Examples are neuronal assemblies, whose connection structures develop over time and are distributed over different brain areas. In situ concepts representations cannot be copied or duplicated because that will disrupt their connection structure, and thus the meaning of these concepts. Higher-level cognitive processes, as found in language and reasoning, can be performed with in situ concepts by embedding them in specialized neurally inspired "blackboards." The interactions between the in situ concepts and the blackboards form the basis for in situ concept computing architectures. In these architectures, memory (concepts) and processing are interwoven, in contrast with the separation between memory and processing found in Von Neumann architectures. Because the further development of Von Neumann computing (more, faster, yet power limited) is questionable, in situ concept computing might be an alternative for concept-based computing. In situ concept computing will be illustrated with a recently developed BABI reasoning task. Neurorobotics can play an important role in the development of in situ concept computing because of the development of in situ concept representations derived in scenarios as needed for reasoning tasks. Neurorobotics would also benefit from power limited and in situ concept computing.

  13. Heterogeneous real-time computing in radio astronomy

    NASA Astrophysics Data System (ADS)

    Ford, John M.; Demorest, Paul; Ransom, Scott

    2010-07-01

    Modern computer architectures suited for general purpose computing are often not the best choice for either I/O-bound or compute-bound problems. Sometimes the best choice is not to choose a single architecture, but to take advantage of the best characteristics of different computer architectures to solve your problems. This paper examines the tradeoffs between using computer systems based on the ubiquitous X86 Central Processing Units (CPU's), Field Programmable Gate Array (FPGA) based signal processors, and Graphical Processing Units (GPU's). We will show how a heterogeneous system can be produced that blends the best of each of these technologies into a real-time signal processing system. FPGA's tightly coupled to analog-to-digital converters connect the instrument to the telescope and supply the first level of computing to the system. These FPGA's are coupled to other FPGA's to continue to provide highly efficient processing power. Data is then packaged up and shipped over fast networks to a cluster of general purpose computers equipped with GPU's, which are used for floating-point intensive computation. Finally, the data is handled by the CPU and written to disk, or further processed. Each of the elements in the system has been chosen for its specific characteristics and the role it can play in creating a system that does the most for the least, in terms of power, space, and money.

  14. DNS of Flow in a Low-Pressure Turbine Cascade Using a Discontinuous-Galerkin Spectral-Element Method

    NASA Technical Reports Server (NTRS)

    Garai, Anirban; Diosady, Laslo Tibor; Murman, Scott; Madavan, Nateri

    2015-01-01

    A new computational capability under development for accurate and efficient high-fidelity direct numerical simulation (DNS) and large eddy simulation (LES) of turbomachinery is described. This capability is based on an entropy-stable Discontinuous-Galerkin spectral-element approach that extends to arbitrarily high orders of spatial and temporal accuracy and is implemented in a computationally efficient manner on a modern high performance computer architecture. A validation study using this method to perform DNS of flow in a low-pressure turbine airfoil cascade are presented. Preliminary results indicate that the method captures the main features of the flow. Discrepancies between the predicted results and the experiments are likely due to the effects of freestream turbulence not being included in the simulation and will be addressed in the final paper.

  15. Parallel processing for scientific computations

    NASA Technical Reports Server (NTRS)

    Alkhatib, Hasan S.

    1991-01-01

    The main contribution of the effort in the last two years is the introduction of the MOPPS system. After doing extensive literature search, we introduced the system which is described next. MOPPS employs a new solution to the problem of managing programs which solve scientific and engineering applications on a distributed processing environment. Autonomous computers cooperate efficiently in solving large scientific problems with this solution. MOPPS has the advantage of not assuming the presence of any particular network topology or configuration, computer architecture, or operating system. It imposes little overhead on network and processor resources while efficiently managing programs concurrently. The core of MOPPS is an intelligent program manager that builds a knowledge base of the execution performance of the parallel programs it is managing under various conditions. The manager applies this knowledge to improve the performance of future runs. The program manager learns from experience.

  16. Algorithms and software used in selecting structure of machine-training cluster based on neurocomputers

    NASA Astrophysics Data System (ADS)

    Romanchuk, V. A.; Lukashenko, V. V.

    2018-05-01

    The technique of functioning of a control system by a computing cluster based on neurocomputers is proposed. Particular attention is paid to the method of choosing the structure of the computing cluster due to the fact that the existing methods are not effective because of a specialized hardware base - neurocomputers, which are highly parallel computer devices with an architecture different from the von Neumann architecture. A developed algorithm for choosing the computational structure of a cloud cluster is described, starting from the direction of data transfer in the flow control graph of the program and its adjacency matrix.

  17. SharP

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Venkata, Manjunath Gorentla; Aderholdt, William F

    The pre-exascale systems are expected to have a significant amount of hierarchical and heterogeneous on-node memory, and this trend of system architecture in extreme-scale systems is expected to continue into the exascale era. along with hierarchical-heterogeneous memory, the system typically has a high-performing network ad a compute accelerator. This system architecture is not only effective for running traditional High Performance Computing (HPC) applications (Big-Compute), but also for running data-intensive HPC applications and Big-Data applications. As a consequence, there is a growing desire to have a single system serve the needs of both Big-Compute and Big-Data applications. Though the system architecturemore » supports the convergence of the Big-Compute and Big-Data, the programming models and software layer have yet to evolve to support either hierarchical-heterogeneous memory systems or the convergence. A programming abstraction to address this problem. The programming abstraction is implemented as a software library and runs on pre-exascale and exascale systems supporting current and emerging system architecture. Using distributed data-structures as a central concept, it provides (1) a simple, usable, and portable abstraction for hierarchical-heterogeneous memory and (2) a unified programming abstraction for Big-Compute and Big-Data applications.« less

  18. Importance of balanced architectures in the design of high-performance imaging systems

    NASA Astrophysics Data System (ADS)

    Sgro, Joseph A.; Stanton, Paul C.

    1999-03-01

    Imaging systems employed in demanding military and industrial applications, such as automatic target recognition and computer vision, typically require real-time high-performance computing resources. While high- performances computing systems have traditionally relied on proprietary architectures and custom components, recent advances in high performance general-purpose microprocessor technology have produced an abundance of low cost components suitable for use in high-performance computing systems. A common pitfall in the design of high performance imaging system, particularly systems employing scalable multiprocessor architectures, is the failure to balance computational and memory bandwidth. The performance of standard cluster designs, for example, in which several processors share a common memory bus, is typically constrained by memory bandwidth. The symptom characteristic of this problem is failure to the performance of the system to scale as more processors are added. The problem becomes exacerbated if I/O and memory functions share the same bus. The recent introduction of microprocessors with large internal caches and high performance external memory interfaces makes it practical to design high performance imaging system with balanced computational and memory bandwidth. Real word examples of such designs will be presented, along with a discussion of adapting algorithm design to best utilize available memory bandwidth.

  19. Advanced Launch System Multi-Path Redundant Avionics Architecture Analysis and Characterization

    NASA Technical Reports Server (NTRS)

    Baker, Robert L.

    1993-01-01

    The objective of the Multi-Path Redundant Avionics Suite (MPRAS) program is the development of a set of avionic architectural modules which will be applicable to the family of launch vehicles required to support the Advanced Launch System (ALS). To enable ALS cost/performance requirements to be met, the MPRAS must support autonomy, maintenance, and testability capabilities which exceed those present in conventional launch vehicles. The multi-path redundant or fault tolerance characteristics of the MPRAS are necessary to offset a reduction in avionics reliability due to the increased complexity needed to support these new cost reduction and performance capabilities and to meet avionics reliability requirements which will provide cost-effective reductions in overall ALS recurring costs. A complex, real-time distributed computing system is needed to meet the ALS avionics system requirements. General Dynamics, Boeing Aerospace, and C.S. Draper Laboratory have proposed system architectures as candidates for the ALS MPRAS. The purpose of this document is to report the results of independent performance and reliability characterization and assessment analyses of each proposed candidate architecture and qualitative assessments of testability, maintainability, and fault tolerance mechanisms. These independent analyses were conducted as part of the MPRAS Part 2 program and were carried under NASA Langley Research Contract NAS1-17964, Task Assignment 28.

  20. Experimental demonstration of multi-dimensional resources integration for service provisioning in cloud radio over fiber network

    NASA Astrophysics Data System (ADS)

    Yang, Hui; Zhang, Jie; Ji, Yuefeng; He, Yongqi; Lee, Young

    2016-07-01

    Cloud radio access network (C-RAN) becomes a promising scenario to accommodate high-performance services with ubiquitous user coverage and real-time cloud computing in 5G area. However, the radio network, optical network and processing unit cloud have been decoupled from each other, so that their resources are controlled independently. Traditional architecture cannot implement the resource optimization and scheduling for the high-level service guarantee due to the communication obstacle among them with the growing number of mobile internet users. In this paper, we report a study on multi-dimensional resources integration (MDRI) for service provisioning in cloud radio over fiber network (C-RoFN). A resources integrated provisioning (RIP) scheme using an auxiliary graph is introduced based on the proposed architecture. The MDRI can enhance the responsiveness to dynamic end-to-end user demands and globally optimize radio frequency, optical network and processing resources effectively to maximize radio coverage. The feasibility of the proposed architecture is experimentally verified on OpenFlow-based enhanced SDN testbed. The performance of RIP scheme under heavy traffic load scenario is also quantitatively evaluated to demonstrate the efficiency of the proposal based on MDRI architecture in terms of resource utilization, path blocking probability, network cost and path provisioning latency, compared with other provisioning schemes.

Top