high performance hardware: Topics by Science.gov

Sample records for high performance hardware

A Linux Workstation for High Performance Graphics

NASA Technical Reports Server (NTRS)

Geist, Robert; Westall, James

2000-01-01

The primary goal of this effort was to provide a low-cost method of obtaining high-performance 3-D graphics using an industry standard library (OpenGL) on PC class computers. Previously, users interested in doing substantial visualization or graphical manipulation were constrained to using specialized, custom hardware most often found in computers from Silicon Graphics (SGI). We provided an alternative to expensive SGI hardware by taking advantage of third-party, 3-D graphics accelerators that have now become available at very affordable prices. To make use of this hardware our goal was to provide a free, redistributable, and fully-compatible OpenGL work-alike library so that existing bodies of code could simply be recompiled. for PC class machines running a free version of Unix. This should allow substantial cost savings while greatly expanding the population of people with access to a serious graphics development and viewing environment. This should offer a means for NASA to provide a spectrum of graphics performance to its scientists, supplying high-end specialized SGI hardware for high-performance visualization while fulfilling the requirements of medium and lower performance applications with generic, off-the-shelf components and still maintaining compatibility between the two.
OpenMM 4: A Reusable, Extensible, Hardware Independent Library for High Performance Molecular Simulation.

PubMed

Eastman, Peter; Friedrichs, Mark S; Chodera, John D; Radmer, Randall J; Bruns, Christopher M; Ku, Joy P; Beauchamp, Kyle A; Lane, Thomas J; Wang, Lee-Ping; Shukla, Diwakar; Tye, Tony; Houston, Mike; Stich, Timo; Klein, Christoph; Shirts, Michael R; Pande, Vijay S

2013-01-08

OpenMM is a software toolkit for performing molecular simulations on a range of high performance computing architectures. It is based on a layered architecture: the lower layers function as a reusable library that can be invoked by any application, while the upper layers form a complete environment for running molecular simulations. The library API hides all hardware-specific dependencies and optimizations from the users and developers of simulation programs: they can be run without modification on any hardware on which the API has been implemented. The current implementations of OpenMM include support for graphics processing units using the OpenCL and CUDA frameworks. In addition, OpenMM was designed to be extensible, so new hardware architectures can be accommodated and new functionality (e.g., energy terms and integrators) can be easily added.
OpenMM 4: A Reusable, Extensible, Hardware Independent Library for High Performance Molecular Simulation

PubMed Central

Eastman, Peter; Friedrichs, Mark S.; Chodera, John D.; Radmer, Randall J.; Bruns, Christopher M.; Ku, Joy P.; Beauchamp, Kyle A.; Lane, Thomas J.; Wang, Lee-Ping; Shukla, Diwakar; Tye, Tony; Houston, Mike; Stich, Timo; Klein, Christoph; Shirts, Michael R.; Pande, Vijay S.

2012-01-01

OpenMM is a software toolkit for performing molecular simulations on a range of high performance computing architectures. It is based on a layered architecture: the lower layers function as a reusable library that can be invoked by any application, while the upper layers form a complete environment for running molecular simulations. The library API hides all hardware-specific dependencies and optimizations from the users and developers of simulation programs: they can be run without modification on any hardware on which the API has been implemented. The current implementations of OpenMM include support for graphics processing units using the OpenCL and CUDA frameworks. In addition, OpenMM was designed to be extensible, so new hardware architectures can be accommodated and new functionality (e.g., energy terms and integrators) can be easily added. PMID:23316124
Efficient architecture for spike sorting in reconfigurable hardware.

PubMed

Hwang, Wen-Jyi; Lee, Wei-Hao; Lin, Shiow-Jyu; Lai, Sheng-Ying

2013-11-01

This paper presents a novel hardware architecture for fast spike sorting. The architecture is able to perform both the feature extraction and clustering in hardware. The generalized Hebbian algorithm (GHA) and fuzzy C-means (FCM) algorithm are used for feature extraction and clustering, respectively. The employment of GHA allows efficient computation of principal components for subsequent clustering operations. The FCM is able to achieve near optimal clustering for spike sorting. Its performance is insensitive to the selection of initial cluster centers. The hardware implementations of GHA and FCM feature low area costs and high throughput. In the GHA architecture, the computation of different weight vectors share the same circuit for lowering the area costs. Moreover, in the FCM hardware implementation, the usual iterative operations for updating the membership matrix and cluster centroid are merged into one single updating process to evade the large storage requirement. To show the effectiveness of the circuit, the proposed architecture is physically implemented by field programmable gate array (FPGA). It is embedded in a System-on-Chip (SOC) platform for performance measurement. Experimental results show that the proposed architecture is an efficient spike sorting design for attaining high classification correct rate and high speed computation.
Speeding-up Bioinformatics Algorithms with Heterogeneous Architectures: Highly Heterogeneous Smith-Waterman (HHeterSW).

PubMed

Gálvez, Sergio; Ferusic, Adis; Esteban, Francisco J; Hernández, Pilar; Caballero, Juan A; Dorado, Gabriel

2016-10-01

The Smith-Waterman algorithm has a great sensitivity when used for biological sequence-database searches, but at the expense of high computing-power requirements. To overcome this problem, there are implementations in literature that exploit the different hardware-architectures available in a standard PC, such as GPU, CPU, and coprocessors. We introduce an application that splits the original database-search problem into smaller parts, resolves each of them by executing the most efficient implementations of the Smith-Waterman algorithms in different hardware architectures, and finally unifies the generated results. Using non-overlapping hardware allows simultaneous execution, and up to 2.58-fold performance gain, when compared with any other algorithm to search sequence databases. Even the performance of the popular BLAST heuristic is exceeded in 78% of the tests. The application has been tested with standard hardware: Intel i7-4820K CPU, Intel Xeon Phi 31S1P coprocessors, and nVidia GeForce GTX 960 graphics cards. An important increase in performance has been obtained in a wide range of situations, effectively exploiting the available hardware.
An integrated framework for high level design of high performance signal processing circuits on FPGAs

NASA Astrophysics Data System (ADS)

Benkrid, K.; Belkacemi, S.; Sukhsawas, S.

2005-06-01

This paper proposes an integrated framework for the high level design of high performance signal processing algorithms' implementations on FPGAs. The framework emerged from a constant need to rapidly implement increasingly complicated algorithms on FPGAs while maintaining the high performance needed in many real time digital signal processing applications. This is particularly important for application developers who often rely on iterative and interactive development methodologies. The central idea behind the proposed framework is to dynamically integrate high performance structural hardware description languages with higher level hardware languages in other to help satisfy the dual requirement of high level design and high performance implementation. The paper illustrates this by integrating two environments: Celoxica's Handel-C language, and HIDE, a structural hardware environment developed at the Queen's University of Belfast. On the one hand, Handel-C has been proven to be very useful in the rapid design and prototyping of FPGA circuits, especially control intensive ones. On the other hand, HIDE, has been used extensively, and successfully, in the generation of highly optimised parameterisable FPGA cores. In this paper, this is illustrated in the construction of a scalable and fully parameterisable core for image algebra's five core neighbourhood operations, where fully floorplanned efficient FPGA configurations, in the form of EDIF netlists, are generated automatically for instances of the core. In the proposed combined framework, highly optimised data paths are invoked dynamically from within Handel-C, and are synthesized using HIDE. Although the idea might seem simple prima facie, it could have serious implications on the design of future generations of hardware description languages.
Integrating Reconfigurable Hardware-Based Grid for High Performance Computing

PubMed Central

Dondo Gazzano, Julio; Sanchez Molina, Francisco; Rincon, Fernando; López, Juan Carlos

2015-01-01

FPGAs have shown several characteristics that make them very attractive for high performance computing (HPC). The impressive speed-up factors that they are able to achieve, the reduced power consumption, and the easiness and flexibility of the design process with fast iterations between consecutive versions are examples of benefits obtained with their use. However, there are still some difficulties when using reconfigurable platforms as accelerator that need to be addressed: the need of an in-depth application study to identify potential acceleration, the lack of tools for the deployment of computational problems in distributed hardware platforms, and the low portability of components, among others. This work proposes a complete grid infrastructure for distributed high performance computing based on dynamically reconfigurable FPGAs. Besides, a set of services designed to facilitate the application deployment is described. An example application and a comparison with other hardware and software implementations are shown. Experimental results show that the proposed architecture offers encouraging advantages for deployment of high performance distributed applications simplifying development process. PMID:25874241
Scout: high-performance heterogeneous computing made simple

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jablin, James; Mc Cormick, Patrick; Herlihy, Maurice

2011-01-26

Researchers must often write their own simulation and analysis software. During this process they simultaneously confront both computational and scientific problems. Current strategies for aiding the generation of performance-oriented programs do not abstract the software development from the science. Furthermore, the problem is becoming increasingly complex and pressing with the continued development of many-core and heterogeneous (CPU-GPU) architectures. To acbieve high performance, scientists must expertly navigate both software and hardware. Co-design between computer scientists and research scientists can alleviate but not solve this problem. The science community requires better tools for developing, optimizing, and future-proofing codes, allowing scientists to focusmore » on their research while still achieving high computational performance. Scout is a parallel programming language and extensible compiler framework targeting heterogeneous architectures. It provides the abstraction required to buffer scientists from the constantly-shifting details of hardware while still realizing higb-performance by encapsulating software and hardware optimization within a compiler framework.« less
Targeting multiple heterogeneous hardware platforms with OpenCL

NASA Astrophysics Data System (ADS)

Fox, Paul A.; Kozacik, Stephen T.; Humphrey, John R.; Paolini, Aaron; Kuller, Aryeh; Kelmelis, Eric J.

2014-06-01

The OpenCL API allows for the abstract expression of parallel, heterogeneous computing, but hardware implementations have substantial implementation differences. The abstractions provided by the OpenCL API are often insufficiently high-level to conceal differences in hardware architecture. Additionally, implementations often do not take advantage of potential performance gains from certain features due to hardware limitations and other factors. These factors make it challenging to produce code that is portable in practice, resulting in much OpenCL code being duplicated for each hardware platform being targeted. This duplication of effort offsets the principal advantage of OpenCL: portability. The use of certain coding practices can mitigate this problem, allowing a common code base to be adapted to perform well across a wide range of hardware platforms. To this end, we explore some general practices for producing performant code that are effective across platforms. Additionally, we explore some ways of modularizing code to enable optional optimizations that take advantage of hardware-specific characteristics. The minimum requirement for portability implies avoiding the use of OpenCL features that are optional, not widely implemented, poorly implemented, or missing in major implementations. Exposing multiple levels of parallelism allows hardware to take advantage of the types of parallelism it supports, from the task level down to explicit vector operations. Static optimizations and branch elimination in device code help the platform compiler to effectively optimize programs. Modularization of some code is important to allow operations to be chosen for performance on target hardware. Optional subroutines exploiting explicit memory locality allow for different memory hierarchies to be exploited for maximum performance. The C preprocessor and JIT compilation using the OpenCL runtime can be used to enable some of these techniques, as well as to factor in hardware-specific optimizations as necessary.
An approach to secure weather and climate models against hardware faults

NASA Astrophysics Data System (ADS)

Düben, Peter D.; Dawson, Andrew

2017-03-01

Enabling Earth System models to run efficiently on future supercomputers is a serious challenge for model development. Many publications study efficient parallelization to allow better scaling of performance on an increasing number of computing cores. However, one of the most alarming threats for weather and climate predictions on future high performance computing architectures is widely ignored: the presence of hardware faults that will frequently hit large applications as we approach exascale supercomputing. Changes in the structure of weather and climate models that would allow them to be resilient against hardware faults are hardly discussed in the model development community. In this paper, we present an approach to secure the dynamical core of weather and climate models against hardware faults using a backup system that stores coarse resolution copies of prognostic variables. Frequent checks of the model fields on the backup grid allow the detection of severe hardware faults, and prognostic variables that are changed by hardware faults on the model grid can be restored from the backup grid to continue model simulations with no significant delay. To justify the approach, we perform model simulations with a C-grid shallow water model in the presence of frequent hardware faults. As long as the backup system is used, simulations do not crash and a high level of model quality can be maintained. The overhead due to the backup system is reasonable and additional storage requirements are small. Runtime is increased by only 13 % for the shallow water model.
An approach to secure weather and climate models against hardware faults

NASA Astrophysics Data System (ADS)

Düben, Peter; Dawson, Andrew

2017-04-01

Enabling Earth System models to run efficiently on future supercomputers is a serious challenge for model development. Many publications study efficient parallelisation to allow better scaling of performance on an increasing number of computing cores. However, one of the most alarming threats for weather and climate predictions on future high performance computing architectures is widely ignored: the presence of hardware faults that will frequently hit large applications as we approach exascale supercomputing. Changes in the structure of weather and climate models that would allow them to be resilient against hardware faults are hardly discussed in the model development community. We present an approach to secure the dynamical core of weather and climate models against hardware faults using a backup system that stores coarse resolution copies of prognostic variables. Frequent checks of the model fields on the backup grid allow the detection of severe hardware faults, and prognostic variables that are changed by hardware faults on the model grid can be restored from the backup grid to continue model simulations with no significant delay. To justify the approach, we perform simulations with a C-grid shallow water model in the presence of frequent hardware faults. As long as the backup system is used, simulations do not crash and a high level of model quality can be maintained. The overhead due to the backup system is reasonable and additional storage requirements are small. Runtime is increased by only 13% for the shallow water model.
Exploring Infiniband Hardware Virtualization in OpenNebula towards Efficient High-Performance Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pais Pitta de Lacerda Ruivo, Tiago; Bernabeu Altayo, Gerard; Garzoglio, Gabriele

2014-11-11

has been widely accepted that software virtualization has a big negative impact on high-performance computing (HPC) application performance. This work explores the potential use of Infiniband hardware virtualization in an OpenNebula cloud towards the efficient support of MPI-based workloads. We have implemented, deployed, and tested an Infiniband network on the FermiCloud private Infrastructure-as-a-Service (IaaS) cloud. To avoid software virtualization towards minimizing the virtualization overhead, we employed a technique called Single Root Input/Output Virtualization (SRIOV). Our solution spanned modifications to the Linux’s Hypervisor as well as the OpenNebula manager. We evaluated the performance of the hardware virtualization on up to 56more » virtual machines connected by up to 8 DDR Infiniband network links, with micro-benchmarks (latency and bandwidth) as well as w a MPI-intensive application (the HPL Linpack benchmark).« less
Onboard FPGA-based SAR processing for future spaceborne systems

NASA Technical Reports Server (NTRS)

Le, Charles; Chan, Samuel; Cheng, Frank; Fang, Winston; Fischman, Mark; Hensley, Scott; Johnson, Robert; Jourdan, Michael; Marina, Miguel; Parham, Bruce;

2004-01-01

We present a real-time high-performance and fault-tolerant FPGA-based hardware architecture for the processing of synthetic aperture radar (SAR) images in future spaceborne system. In particular, we will discuss the integrated design approach, from top-level algorithm specifications and system requirements, design methodology, functional verification and performance validation, down to hardware design and implementation.

HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi

DOE PAGES

Dongarra, Jack; Gates, Mark; Haidar, Azzam; ...

2015-01-01

This paper presents the design and implementation of several fundamental dense linear algebra (DLA) algorithms for multicore with Intel Xeon Phi coprocessors. In particular, we consider algorithms for solving linear systems. Further, we give an overview of the MAGMA MIC library, an open source, high performance library, that incorporates the developments presented here and, more broadly, provides the DLA functionality equivalent to that of the popular LAPACK library while targeting heterogeneous architectures that feature a mix of multicore CPUs and coprocessors. The LAPACK-compliance simplifies the use of the MAGMA MIC library in applications, while providing them with portably performant DLA.more » High performance is obtained through the use of the high-performance BLAS, hardware-specific tuning, and a hybridization methodology whereby we split the algorithm into computational tasks of various granularities. Execution of those tasks is properly scheduled over the heterogeneous hardware by minimizing data movements and mapping algorithmic requirements to the architectural strengths of the various heterogeneous hardware components. Our methodology and programming techniques are incorporated into the MAGMA MIC API, which abstracts the application developer from the specifics of the Xeon Phi architecture and is therefore applicable to algorithms beyond the scope of DLA.« less
Media processors using a new microsystem architecture designed for the Internet era

NASA Astrophysics Data System (ADS)

Wyland, David C.

1999-12-01

The demands of digital image processing, communications and multimedia applications are growing more rapidly than traditional design methods can fulfill them. Previously, only custom hardware designs could provide the performance required to meet the demands of these applications. However, hardware design has reached a crisis point. Hardware design can no longer deliver a product with the required performance and cost in a reasonable time for a reasonable risk. Software based designs running on conventional processors can deliver working designs in a reasonable time and with low risk but cannot meet the performance requirements. What is needed is a media processing approach that combines very high performance, a simple programming model, complete programmability, short time to market and scalability. The Universal Micro System (UMS) is a solution to these problems. The UMS is a completely programmable (including I/O) system on a chip that combines hardware performance with the fast time to market, low cost and low risk of software designs.
Exploring Cloud Computing for Large-scale Scientific Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lin, Guang; Han, Binh; Yin, Jian

This paper explores cloud computing for large-scale data-intensive scientific applications. Cloud computing is attractive because it provides hardware and software resources on-demand, which relieves the burden of acquiring and maintaining a huge amount of resources that may be used only once by a scientific application. However, unlike typical commercial applications that often just requires a moderate amount of ordinary resources, large-scale scientific applications often need to process enormous amount of data in the terabyte or even petabyte range and require special high performance hardware with low latency connections to complete computation in a reasonable amount of time. To address thesemore » challenges, we build an infrastructure that can dynamically select high performance computing hardware across institutions and dynamically adapt the computation to the selected resources to achieve high performance. We have also demonstrated the effectiveness of our infrastructure by building a system biology application and an uncertainty quantification application for carbon sequestration, which can efficiently utilize data and computation resources across several institutions.« less
System-on-chip architecture and validation for real-time transceiver optimization: APC implementation on FPGA

NASA Astrophysics Data System (ADS)

Suarez, Hernan; Zhang, Yan R.

2015-05-01

New radar applications need to perform complex algorithms and process large quantity of data to generate useful information for the users. This situation has motivated the search for better processing solutions that include low power high-performance processors, efficient algorithms, and high-speed interfaces. In this work, hardware implementation of adaptive pulse compression for real-time transceiver optimization are presented, they are based on a System-on-Chip architecture for Xilinx devices. This study also evaluates the performance of dedicated coprocessor as hardware accelerator units to speed up and improve the computation of computing-intensive tasks such matrix multiplication and matrix inversion which are essential units to solve the covariance matrix. The tradeoffs between latency and hardware utilization are also presented. Moreover, the system architecture takes advantage of the embedded processor, which is interconnected with the logic resources through the high performance AXI buses, to perform floating-point operations, control the processing blocks, and communicate with external PC through a customized software interface. The overall system functionality is demonstrated and tested for real-time operations using a Ku-band tested together with a low-cost channel emulator for different types of waveforms.
Automatic Parameter Tuning for the Morpheus Vehicle Using Particle Swarm Optimization

NASA Technical Reports Server (NTRS)

Birge, B.

2013-01-01

A high fidelity simulation using a PC based Trick framework has been developed for Johnson Space Center's Morpheus test bed flight vehicle. There is an iterative development loop of refining and testing the hardware, refining the software, comparing the software simulation to hardware performance and adjusting either or both the hardware and the simulation to extract the best performance from the hardware as well as the most realistic representation of the hardware from the software. A Particle Swarm Optimization (PSO) based technique has been developed that increases speed and accuracy of the iterative development cycle. Parameters in software can be automatically tuned to make the simulation match real world subsystem data from test flights. Special considerations for scale, linearity, discontinuities, can be all but ignored with this technique, allowing fast turnaround both for simulation tune up to match hardware changes as well as during the test and validation phase to help identify hardware issues. Software models with insufficient control authority to match hardware test data can be immediately identified and using this technique requires very little to no specialized knowledge of optimization, freeing model developers to concentrate on spacecraft engineering. Integration of the PSO into the Morpheus development cycle will be discussed as well as a case study highlighting the tool's effectiveness.
FPS-RAM: Fast Prefix Search RAM-Based Hardware for Forwarding Engine

NASA Astrophysics Data System (ADS)

Zaitsu, Kazuya; Yamamoto, Koji; Kuroda, Yasuto; Inoue, Kazunari; Ata, Shingo; Oka, Ikuo

Ternary content addressable memory (TCAM) is becoming very popular for designing high-throughput forwarding engines on routers. However, TCAM has potential problems in terms of hardware and power costs, which limits its ability to deploy large amounts of capacity in IP routers. In this paper, we propose new hardware architecture for fast forwarding engines, called fast prefix search RAM-based hardware (FPS-RAM). We designed FPS-RAM hardware with the intent of maintaining the same search performance and physical user interface as TCAM because our objective is to replace the TCAM in the market. Our RAM-based hardware architecture is completely different from that of TCAM and has dramatically reduced the costs and power consumption to 62% and 52%, respectively. We implemented FPS-RAM on an FPGA to examine its lookup operation.
Marc Henry de Frahan | NREL

Science.gov Websites

Computing Project, Marc develops high-fidelity turbulence models to enhance simulation accuracy and efficient numerical algorithms for future high performance computing hardware architectures. Research Interests High performance computing High order numerical methods for computational fluid dynamics Fluid

Design Report for Low Power Acoustic Detector

DTIC Science & Technology

2013-08-01

high speed integrated circuit (VHSIC) hardware description language ( VHDL ) implementation of both the HED and DCD detectors. Figures 4 and 5 show the...the hardware design, target detection algorithm design in both MATLAB and VHDL , and typical performance results. 15. SUBJECT TERMS Acoustic low...5 2.4 Algorithm Implementation ..............................................................................................6 3. Testing
Environmental qualification testing of payload G-534, the Pool Boiling Experiment

NASA Technical Reports Server (NTRS)

Sexton, J. Andrew

1992-01-01

Payload G-534, the prototype Pool Boiling Experiment (PBE), is scheduled to fly on the STS-47 mission in September 1992. This paper describes the purpose of the experiment and the environmental qualification testing program that was used to prove the integrity of the hardware. Component and box level vibration and thermal cycling tests were performed to give an early level of confidence in the hardware designs. At the system level, vibration, thermal extreme soaks, and thermal vacuum cycling tests were performed to qualify the complete design for the expected shuttle environment. The system level vibration testing included three axis sine sweeps and random inputs. The system level hot and cold soak tests demonstrated the hardware's capability to operate over a wide range of temperatures and gave wider latitude in determining which shuttle thermal attitudes were compatible with the experiment. The system level thermal vacuum cycling tests demonstrated the hardware's capability to operate in a convection free environment. A unique environmental chamber was designed and fabricated by the PBE team and allowed most of the environmental testing to be performed within the hardware build laboratory. The completion of the test program gave the project team high confidence in the hardware's ability to function as designed during flight.
OS friendly microprocessor architecture: Hardware level computer security

NASA Astrophysics Data System (ADS)

Jungwirth, Patrick; La Fratta, Patrick

2016-05-01

We present an introduction to the patented OS Friendly Microprocessor Architecture (OSFA) and hardware level computer security. Conventional microprocessors have not tried to balance hardware performance and OS performance at the same time. Conventional microprocessors have depended on the Operating System for computer security and information assurance. The goal of the OS Friendly Architecture is to provide a high performance and secure microprocessor and OS system. We are interested in cyber security, information technology (IT), and SCADA control professionals reviewing the hardware level security features. The OS Friendly Architecture is a switched set of cache memory banks in a pipeline configuration. For light-weight threads, the memory pipeline configuration provides near instantaneous context switching times. The pipelining and parallelism provided by the cache memory pipeline provides for background cache read and write operations while the microprocessor's execution pipeline is running instructions. The cache bank selection controllers provide arbitration to prevent the memory pipeline and microprocessor's execution pipeline from accessing the same cache bank at the same time. This separation allows the cache memory pages to transfer to and from level 1 (L1) caching while the microprocessor pipeline is executing instructions. Computer security operations are implemented in hardware. By extending Unix file permissions bits to each cache memory bank and memory address, the OSFA provides hardware level computer security.
Best bang for your buck: GPU nodes for GROMACS biomolecular simulations.

PubMed

Kutzner, Carsten; Páll, Szilárd; Fechner, Martin; Esztermann, Ansgar; de Groot, Bert L; Grubmüller, Helmut

2015-10-05

The molecular dynamics simulation package GROMACS runs efficiently on a wide variety of hardware from commodity workstations to high performance computing clusters. Hardware features are well-exploited with a combination of single instruction multiple data, multithreading, and message passing interface (MPI)-based single program multiple data/multiple program multiple data parallelism while graphics processing units (GPUs) can be used as accelerators to compute interactions off-loaded from the CPU. Here, we evaluate which hardware produces trajectories with GROMACS 4.6 or 5.0 in the most economical way. We have assembled and benchmarked compute nodes with various CPU/GPU combinations to identify optimal compositions in terms of raw trajectory production rate, performance-to-price ratio, energy efficiency, and several other criteria. Although hardware prices are naturally subject to trends and fluctuations, general tendencies are clearly visible. Adding any type of GPU significantly boosts a node's simulation performance. For inexpensive consumer-class GPUs this improvement equally reflects in the performance-to-price ratio. Although memory issues in consumer-class GPUs could pass unnoticed as these cards do not support error checking and correction memory, unreliable GPUs can be sorted out with memory checking tools. Apart from the obvious determinants for cost-efficiency like hardware expenses and raw performance, the energy consumption of a node is a major cost factor. Over the typical hardware lifetime until replacement of a few years, the costs for electrical power and cooling can become larger than the costs of the hardware itself. Taking that into account, nodes with a well-balanced ratio of CPU and consumer-class GPU resources produce the maximum amount of GROMACS trajectory over their lifetime. © 2015 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc.
Visualization of fluid dynamics at NASA Ames

NASA Technical Reports Server (NTRS)

Watson, Val

1989-01-01

The hardware and software currently used for visualization of fluid dynamics at NASA Ames is described. The software includes programs to create scenes (for example particle traces representing the flow over an aircraft), programs to interactively view the scenes, and programs to control the creation of video tapes and 16mm movies. The hardware includes high performance graphics workstations, a high speed network, digital video equipment, and film recorders.
Optimized design of embedded DSP system hardware supporting complex algorithms

NASA Astrophysics Data System (ADS)

Li, Yanhua; Wang, Xiangjun; Zhou, Xinling

2003-09-01

The paper presents an optimized design method for a flexible and economical embedded DSP system that can implement complex processing algorithms as biometric recognition, real-time image processing, etc. It consists of a floating-point DSP, 512 Kbytes data RAM, 1 Mbytes FLASH program memory, a CPLD for achieving flexible logic control of input channel and a RS-485 transceiver for local network communication. Because of employing a high performance-price ratio DSP TMS320C6712 and a large FLASH in the design, this system permits loading and performing complex algorithms with little algorithm optimization and code reduction. The CPLD provides flexible logic control for the whole DSP board, especially in input channel, and allows convenient interface between different sensors and DSP system. The transceiver circuit can transfer data between DSP and host computer. In the paper, some key technologies are also introduced which make the whole system work efficiently. Because of the characters referred above, the hardware is a perfect flat for multi-channel data collection, image processing, and other signal processing with high performance and adaptability. The application section of this paper presents how this hardware is adapted for the biometric identification system with high identification precision. The result reveals that this hardware is easy to interface with a CMOS imager and is capable of carrying out complex biometric identification algorithms, which require real-time process.
Hardware for Accelerating N-Modular Redundant Systems for High-Reliability Computing

NASA Technical Reports Server (NTRS)

Dobbs, Carl, Sr.

2012-01-01

A hardware unit has been designed that reduces the cost, in terms of performance and power consumption, for implementing N-modular redundancy (NMR) in a multiprocessor device. The innovation monitors transactions to memory, and calculates a form of sumcheck on-the-fly, thereby relieving the processors of calculating the sumcheck in software
Managing Risk for Thermal Vacuum Testing of the International Space Station Radiators

NASA Technical Reports Server (NTRS)

Carek, Jerry A.; Beach, Duane E.; Remp, Kerry L.

2000-01-01

The International Space Station (ISS) is designed with large deployable radiator panels that are used to reject waste heat from the habitation modules. Qualification testing of the Heat Rejection System (HRS) radiators was performed using qualification hardware only. As a result of those tests, over 30 design changes were made to the actual flight hardware. Consequently, a system level test of the flight hardware was needed to validate its performance in the final configuration. A full thermal vacuum test was performed on the flight hardware in order to demonstrate its ability to deploy on-orbit. Since there is an increased level of risk associated with testing flight hardware, because of cost and schedule limitations, special risk mitigation procedures were developed and implemented for the test program, This paper introduces the Continuous Risk Management process that was utilized for the ISS HRS test program. Testing was performed in the Space Power Facility at the NASA Glenn Research Center, Plum Brook Station located in Sandusky, Ohio. The radiator system was installed in the 100-foot diameter by 122-foot tall vacuum chamber on a special deployment track. Radiator deployments were performed at several thermal conditions similar to those expected on-orbit using both the primary deployment mechanism and the back-up deployment mechanism. The tests were highly successful and were completed without incident.
Evaluating the Performance of the NASA LaRC CMF Motion Base Safety Devices

NASA Technical Reports Server (NTRS)

Gupton, Lawrence E.; Bryant, Richard B., Jr.; Carrelli, David J.

2006-01-01

This paper describes the initial measured performance results of the previously documented NASA Langley Research Center (LaRC) Cockpit Motion Facility (CMF) motion base hardware safety devices. These safety systems are required to prevent excessive accelerations that could injure personnel and damage simulator cockpits or the motion base structure. Excessive accelerations may be caused by erroneous commands or hardware failures driving an actuator to the end of its travel at high velocity, stepping a servo valve, or instantly reversing servo direction. Such commands may result from single order failures of electrical or hydraulic components within the control system itself, or from aggressive or improper cueing commands from the host simulation computer. The safety systems must mitigate these high acceleration events while minimizing the negative performance impacts. The system accomplishes this by controlling the rate of change of valve signals to limit excessive commanded accelerations. It also aids hydraulic cushion performance by limiting valve command authority as the actuator approaches its end of travel. The design takes advantage of inherent motion base hydraulic characteristics to implement all safety features using hardware only solutions.
Determination of performance characteristics of scientific applications on IBM Blue Gene/Q

DOE Office of Scientific and Technical Information (OSTI.GOV)

Evangelinos, C.; Walkup, R. E.; Sachdeva, V.

The IBM Blue Gene®/Q platform presents scientists and engineers with a rich set of hardware features such as 16 cores per chip sharing a Level 2 cache, a wide SIMD (single-instruction, multiple-data) unit, a five-dimensional torus network, and hardware support for collective operations. Especially important is the feature related to cores that have four “hardware threads,” which makes it possible to hide latencies and obtain a high fraction of the peak issue rate from each core. All of these hardware resources present unique performance-tuning opportunities on Blue Gene/Q. We provide an overview of several important applications and solvers and studymore » them on Blue Gene/Q using performance counters and Message Passing Interface profiles. We also discuss how Blue Gene/Q tools help us understand the interaction of the application with the hardware and software layers and provide guidance for optimization. Furthermore, on the basis of our analysis, we discuss code improvement strategies targeting Blue Gene/Q. Information about how these algorithms map to the Blue Gene® architecture is expected to have an impact on future system design as we move to the exascale era.« less
Environmental qualification testing of the prototype pool boiling experiment

NASA Technical Reports Server (NTRS)

Sexton, J. Andrew

1992-01-01

The prototype Pool Boiling Experiment (PBE) flew on the STS-47 mission in September 1992. This report describes the purpose of the experiment and the environmental qualification testing program that was used to prove the integrity of the prototype hardware. Component and box level vibration and thermal cycling tests were performed to give an early level of confidence in the hardware designs. At the system level, vibration, thermal extreme soaks, and thermal vacuum cycling tests were performed to qualify the complete design for the expected shuttle environment. The system level vibration testing included three axis sine sweeps and random inputs. The system level hot and cold soak tests demonstrated the hardware's capability to operate over a wide range of temperatures and gave the project team a wider latitude in determining which shuttle thermal altitudes were compatible with the experiment. The system level thermal vacuum cycling tests demonstrated the hardware's capability to operate in a convection free environment. A unique environmental chamber was designed and fabricated by the PBE team and allowed most of the environmental testing to be performed within the project's laboratory. The completion of the test program gave the project team high confidence in the hardware's ability to function as designed during flight.
Spherical roller bearing analysis. SKF computer program SPHERBEAN. Volume 3: Program correlation with full scale hardware tests

NASA Technical Reports Server (NTRS)

Kleckner, R. J.; Rosenlieb, J. W.; Dyba, G.

1980-01-01

The results of a series of full scale hardware tests comparing predictions of the SPHERBEAN computer program with measured data are presented. The SPHERBEAN program predicts the thermomechanical performance characteristics of high speed lubricated double row spherical roller bearings. The degree of correlation between performance predicted by SPHERBEAN and measured data is demonstrated. Experimental and calculated performance data is compared over a range in speed up to 19,400 rpm (0.8 MDN) under pure radial, pure axial, and combined loads.
Test Program for Stirling Radioisotope Generator Hardware at NASA Glenn Research Center

NASA Technical Reports Server (NTRS)

Lewandowski, Edward J.; Bolotin, Gary S.; Oriti, Salvatore M.

2015-01-01

Stirling-based energy conversion technology has demonstrated the potential of high efficiency and low mass power systems for future space missions. This capability is beneficial, if not essential, to making certain deep space missions possible. Significant progress was made developing the Advanced Stirling Radioisotope Generator (ASRG), a 140-W radioisotope power system. A variety of flight-like hardware, including Stirling convertors, controllers, and housings, was designed and built under the ASRG flight development project. To support future Stirling-based power system development NASA has proposals that, if funded, will allow this hardware to go on test at the NASA Glenn Research Center. While future flight hardware may not be identical to the hardware developed under the ASRG flight development project, many components will likely be similar, and system architectures may have heritage to ASRG. Thus, the importance of testing the ASRG hardware to the development of future Stirling-based power systems cannot be understated. This proposed testing will include performance testing, extended operation to establish an extensive reliability database, and characterization testing to quantify subsystem and system performance and better understand system interfaces. This paper details this proposed test program for Stirling radioisotope generator hardware at NASA Glenn. It explains the rationale behind the proposed tests and how these tests will meet the stated objectives.
Test Program for Stirling Radioisotope Generator Hardware at NASA Glenn Research Center

NASA Technical Reports Server (NTRS)

Lewandowski, Edward J.; Bolotin, Gary S.; Oriti, Salvatore M.

2014-01-01

Stirling-based energy conversion technology has demonstrated the potential of high efficiency and low mass power systems for future space missions. This capability is beneficial, if not essential, to making certain deep space missions possible. Significant progress was made developing the Advanced Stirling Radioisotope Generator (ASRG), a 140-watt radioisotope power system. A variety of flight-like hardware, including Stirling convertors, controllers, and housings, was designed and built under the ASRG flight development project. To support future Stirling-based power system development NASA has proposals that, if funded, will allow this hardware to go on test at the NASA Glenn Research Center (GRC). While future flight hardware may not be identical to the hardware developed under the ASRG flight development project, many components will likely be similar, and system architectures may have heritage to ASRG. Thus the importance of testing the ASRG hardware to the development of future Stirling-based power systems cannot be understated. This proposed testing will include performance testing, extended operation to establish an extensive reliability database, and characterization testing to quantify subsystem and system performance and better understand system interfaces. This paper details this proposed test program for Stirling radioisotope generator hardware at NASA GRC. It explains the rationale behind the proposed tests and how these tests will meet the stated objectives.
High-performance reconfigurable hardware architecture for restricted Boltzmann machines.

PubMed

Ly, Daniel Le; Chow, Paul

2010-11-01

Despite the popularity and success of neural networks in research, the number of resulting commercial or industrial applications has been limited. A primary cause for this lack of adoption is that neural networks are usually implemented as software running on general-purpose processors. Hence, a hardware implementation that can exploit the inherent parallelism in neural networks is desired. This paper investigates how the restricted Boltzmann machine (RBM), which is a popular type of neural network, can be mapped to a high-performance hardware architecture on field-programmable gate array (FPGA) platforms. The proposed modular framework is designed to reduce the time complexity of the computations through heavily customized hardware engines. A method to partition large RBMs into smaller congruent components is also presented, allowing the distribution of one RBM across multiple FPGA resources. The framework is tested on a platform of four Xilinx Virtex II-Pro XC2VP70 FPGAs running at 100 MHz through a variety of different configurations. The maximum performance was obtained by instantiating an RBM of 256 × 256 nodes distributed across four FPGAs, which resulted in a computational speed of 3.13 billion connection-updates-per-second and a speedup of 145-fold over an optimized C program running on a 2.8-GHz Intel processor.
Acceleration of fluoro-CT reconstruction for a mobile C-Arm on GPU and FPGA hardware: a simulation study

NASA Astrophysics Data System (ADS)

Xue, Xinwei; Cheryauka, Arvi; Tubbs, David

2006-03-01

CT imaging in interventional and minimally-invasive surgery requires high-performance computing solutions that meet operational room demands, healthcare business requirements, and the constraints of a mobile C-arm system. The computational requirements of clinical procedures using CT-like data are increasing rapidly, mainly due to the need for rapid access to medical imagery during critical surgical procedures. The highly parallel nature of Radon transform and CT algorithms enables embedded computing solutions utilizing a parallel processing architecture to realize a significant gain of computational intensity with comparable hardware and program coding/testing expenses. In this paper, using a sample 2D and 3D CT problem, we explore the programming challenges and the potential benefits of embedded computing using commodity hardware components. The accuracy and performance results obtained on three computational platforms: a single CPU, a single GPU, and a solution based on FPGA technology have been analyzed. We have shown that hardware-accelerated CT image reconstruction can be achieved with similar levels of noise and clarity of feature when compared to program execution on a CPU, but gaining a performance increase at one or more orders of magnitude faster. 3D cone-beam or helical CT reconstruction and a variety of volumetric image processing applications will benefit from similar accelerations.
Simulation verification techniques study

NASA Technical Reports Server (NTRS)

Schoonmaker, P. B.; Wenglinski, T. H.

1975-01-01

Results are summarized of the simulation verification techniques study which consisted of two tasks: to develop techniques for simulator hardware checkout and to develop techniques for simulation performance verification (validation). The hardware verification task involved definition of simulation hardware (hardware units and integrated simulator configurations), survey of current hardware self-test techniques, and definition of hardware and software techniques for checkout of simulator subsystems. The performance verification task included definition of simulation performance parameters (and critical performance parameters), definition of methods for establishing standards of performance (sources of reference data or validation), and definition of methods for validating performance. Both major tasks included definition of verification software and assessment of verification data base impact. An annotated bibliography of all documents generated during this study is provided.
FPGA implementation of sparse matrix algorithm for information retrieval

NASA Astrophysics Data System (ADS)

Bojanic, Slobodan; Jevtic, Ruzica; Nieto-Taladriz, Octavio

2005-06-01

Information text data retrieval requires a tremendous amount of processing time because of the size of the data and the complexity of information retrieval algorithms. In this paper the solution to this problem is proposed via hardware supported information retrieval algorithms. Reconfigurable computing may adopt frequent hardware modifications through its tailorable hardware and exploits parallelism for a given application through reconfigurable and flexible hardware units. The degree of the parallelism can be tuned for data. In this work we implemented standard BLAS (basic linear algebra subprogram) sparse matrix algorithm named Compressed Sparse Row (CSR) that is showed to be more efficient in terms of storage space requirement and query-processing timing over the other sparse matrix algorithms for information retrieval application. Although inverted index algorithm is treated as the de facto standard for information retrieval for years, an alternative approach to store the index of text collection in a sparse matrix structure gains more attention. This approach performs query processing using sparse matrix-vector multiplication and due to parallelization achieves a substantial efficiency over the sequential inverted index. The parallel implementations of information retrieval kernel are presented in this work targeting the Virtex II Field Programmable Gate Arrays (FPGAs) board from Xilinx. A recent development in scientific applications is the use of FPGA to achieve high performance results. Computational results are compared to implementations on other platforms. The design achieves a high level of parallelism for the overall function while retaining highly optimised hardware within processing unit.
Hardware Architectures for Data-Intensive Computing Problems: A Case Study for String Matching

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tumeo, Antonino; Villa, Oreste; Chavarría-Miranda, Daniel

DNA analysis is an emerging application of high performance bioinformatic. Modern sequencing machinery are able to provide, in few hours, large input streams of data, which needs to be matched against exponentially growing databases of known fragments. The ability to recognize these patterns effectively and fastly may allow extending the scale and the reach of the investigations performed by biology scientists. Aho-Corasick is an exact, multiple pattern matching algorithm often at the base of this application. High performance systems are a promising platform to accelerate this algorithm, which is computationally intensive but also inherently parallel. Nowadays, high performance systems alsomore » include heterogeneous processing elements, such as Graphic Processing Units (GPUs), to further accelerate parallel algorithms. Unfortunately, the Aho-Corasick algorithm exhibits large performance variability, depending on the size of the input streams, on the number of patterns to search and on the number of matches, and poses significant challenges on current high performance software and hardware implementations. An adequate mapping of the algorithm on the target architecture, coping with the limit of the underlining hardware, is required to reach the desired high throughputs. In this paper, we discuss the implementation of the Aho-Corasick algorithm for GPU-accelerated high performance systems. We present an optimized implementation of Aho-Corasick for GPUs and discuss its tradeoffs on the Tesla T10 and he new Tesla T20 (codename Fermi) GPUs. We then integrate the optimized GPU code, respectively, in a MPI-based and in a pthreads-based load balancer to enable execution of the algorithm on clusters and large sharedmemory multiprocessors (SMPs) accelerated with multiple GPUs.« less
Helium: lifting high-performance stencil kernels from stripped x86 binaries to halide DSL code

DOE PAGES

Mendis, Charith; Bosboom, Jeffrey; Wu, Kevin; ...

2015-06-03

Highly optimized programs are prone to bit rot, where performance quickly becomes suboptimal in the face of new hardware and compiler techniques. In this paper we show how to automatically lift performance-critical stencil kernels from a stripped x86 binary and generate the corresponding code in the high-level domain-specific language Halide. Using Halide's state-of-the-art optimizations targeting current hardware, we show that new optimized versions of these kernels can replace the originals to rejuvenate the application for newer hardware. The original optimized code for kernels in stripped binaries is nearly impossible to analyze statically. Instead, we rely on dynamic traces to regeneratemore » the kernels. We perform buffer structure reconstruction to identify input, intermediate and output buffer shapes. Here, we abstract from a forest of concrete dependency trees which contain absolute memory addresses to symbolic trees suitable for high-level code generation. This is done by canonicalizing trees, clustering them based on structure, inferring higher-dimensional buffer accesses and finally by solving a set of linear equations based on buffer accesses to lift them up to simple, high-level expressions. Helium can handle highly optimized, complex stencil kernels with input-dependent conditionals. We lift seven kernels from Adobe Photoshop giving a 75 % performance improvement, four kernels from Irfan View, leading to 4.97 x performance, and one stencil from the mini GMG multigrid benchmark netting a 4.25 x improvement in performance. We manually rejuvenated Photoshop by replacing eleven of Photoshop's filters with our lifted implementations, giving 1.12 x speedup without affecting the user experience.« less

Three-Dimensional Nanobiocomputing Architectures With Neuronal Hypercells

DTIC Science & Technology

2007-06-01

Neumann architectures, and CMOS fabrication. Novel solutions of massive parallel distributed computing and processing (pipelined due to systolic... and processing platforms utilizing molecular hardware within an enabling organization and architecture. The design technology is based on utilizing a...Microsystems and Nanotechnologies investigated a novel 3D3 (Hardware Software Nanotechnology) technology to design super-high performance computing
A hardware-oriented algorithm for floating-point function generation

NASA Technical Reports Server (NTRS)

O'Grady, E. Pearse; Young, Baek-Kyu

1991-01-01

An algorithm is presented for performing accurate, high-speed, floating-point function generation for univariate functions defined at arbitrary breakpoints. Rapid identification of the breakpoint interval, which includes the input argument, is shown to be the key operation in the algorithm. A hardware implementation which makes extensive use of read/write memories is used to illustrate the algorithm.
Performance of the Dual BAK-12 Aircraft Arresting System with Modular Hardware with Deadloads and Aircraft

DTIC Science & Technology

1976-04-15

System, Dual-System, Single-Mode, and Dual-Mode configurations. Tests were conducted to determine the feasibility of incorporating modular hardware on a...and 11-1/2 feet OFF-CENTER with the BAK-12 configured in the Single and Dual Mode to determine the effect of engaging the aircraft arresting-hook...cable OFF-CENTER. 90,000- pound deadload arrestments were conducted ON-CENTER in the Dual Mode to determine system performance with high-energy
An environmental testing facility for Space Station Freedom power management and distribution hardware

NASA Technical Reports Server (NTRS)

Jackola, Arthur S.; Hartjen, Gary L.

1992-01-01

The plans for a new test facility, including new environmental test systems, which are presently under construction, and the major environmental Test Support Equipment (TSE) used therein are addressed. This all-new Rocketdyne facility will perform space simulation environmental tests on Power Management and Distribution (PMAD) hardware to Space Station Freedom (SSF) at the Engineering Model, Qualification Model, and Flight Model levels of fidelity. Testing will include Random Vibration in three axes - Thermal Vacuum, Thermal Cycling and Thermal Burn-in - as well as numerous electrical functional tests. The facility is designed to support a relatively high throughput of hardware under test, while maintaining the high standards required for a man-rated space program.
Airborne and Ground-Based Measurements Using a High-Performance Raman Lidar. Part 2; Ground Based

NASA Technical Reports Server (NTRS)

Whiteman, David N.; Cadirola, Martin; Venable, Demetrius; Connell, Rasheen; Rush, Kurt; Leblanc, Thierry; McDermid, Stuart

2009-01-01

The same RASL hardware as described in part I was installed in a ground-based mobile trailer and used in a water vapor lidar intercomparison campaign, hosted at Table Mountain, CA, under the auspices of the Network for the Detection of Atmospheric Composition Change (NDACC). The converted RASL hardware demonstrated high sensitivity to lower stratospheric water vapor indicating that profiling water vapor at those altitudes with sufficient accuracy to monitor climate change is possible. The measurements from Table Mountain also were used to explain the reason, and correct , for sub-optimal airborne aerosol extinction performance during the flight campaign.
A Low-Complexity and High-Performance 2D Look-Up Table for LDPC Hardware Implementation

NASA Astrophysics Data System (ADS)

Chen, Jung-Chieh; Yang, Po-Hui; Lain, Jenn-Kaie; Chung, Tzu-Wen

In this paper, we propose a low-complexity, high-efficiency two-dimensional look-up table (2D LUT) for carrying out the sum-product algorithm in the decoding of low-density parity-check (LDPC) codes. Instead of employing adders for the core operation when updating check node messages, in the proposed scheme, the main term and correction factor of the core operation are successfully merged into a compact 2D LUT. Simulation results indicate that the proposed 2D LUT not only attains close-to-optimal bit error rate performance but also enjoys a low complexity advantage that is suitable for hardware implementation.
Orbiter wheel and tire certification

NASA Technical Reports Server (NTRS)

Campbell, C. C., Jr.

1985-01-01

The orbiter wheel and tire development has required a unique series of certification tests to demonstrate the ability of the hardware to meet severe performance requirements. Early tests of the main landing gear wheel using conventional slow roll testing resulted in hardware failures. This resulted in a need to conduct high velocity tests with crosswind effects for assurance that the hardware was safe for a limited number of flights. Currently, this approach and the conventional slow roll and static tests are used to certify the wheel/tire assembly for operational use.
The Chimera II Real-Time Operating System for advanced sensor-based control applications

NASA Technical Reports Server (NTRS)

Stewart, David B.; Schmitz, Donald E.; Khosla, Pradeep K.

1992-01-01

Attention is given to the Chimera II Real-Time Operating System, which has been developed for advanced sensor-based control applications. The Chimera II provides a high-performance real-time kernel and a variety of IPC features. The hardware platform required to run Chimera II consists of commercially available hardware, and allows custom hardware to be easily integrated. The design allows it to be used with almost any type of VMEbus-based processors and devices. It allows radially differing hardware to be programmed using a common system, thus providing a first and necessary step towards the standardization of reconfigurable systems that results in a reduction of development time and cost.
A high throughput architecture for a low complexity soft-output demapping algorithm

NASA Astrophysics Data System (ADS)

Ali, I.; Wasenmüller, U.; Wehn, N.

2015-11-01

Iterative channel decoders such as Turbo-Code and LDPC decoders show exceptional performance and therefore they are a part of many wireless communication receivers nowadays. These decoders require a soft input, i.e., the logarithmic likelihood ratio (LLR) of the received bits with a typical quantization of 4 to 6 bits. For computing the LLR values from a received complex symbol, a soft demapper is employed in the receiver. The implementation cost of traditional soft-output demapping methods is relatively large in high order modulation systems, and therefore low complexity demapping algorithms are indispensable in low power receivers. In the presence of multiple wireless communication standards where each standard defines multiple modulation schemes, there is a need to have an efficient demapper architecture covering all the flexibility requirements of these standards. Another challenge associated with hardware implementation of the demapper is to achieve a very high throughput in double iterative systems, for instance, MIMO and Code-Aided Synchronization. In this paper, we present a comprehensive communication and hardware performance evaluation of low complexity soft-output demapping algorithms to select the best algorithm for implementation. The main goal of this work is to design a high throughput, flexible, and area efficient architecture. We describe architectures to execute the investigated algorithms. We implement these architectures on a FPGA device to evaluate their hardware performance. The work has resulted in a hardware architecture based on the figured out best low complexity algorithm delivering a high throughput of 166 Msymbols/second for Gray mapped 16-QAM modulation on Virtex-5. This efficient architecture occupies only 127 slice registers, 248 slice LUTs and 2 DSP48Es.
Extravehicular activity training and hardware design consideration

NASA Technical Reports Server (NTRS)

Thuot, P. J.; Harbaugh, G. J.

1995-01-01

Preparing astronauts to perform the many complex extravehicular activity (EVA) tasks required to assemble and maintain Space Station will be accomplished through training simulations in a variety of facilities. The adequacy of this training is dependent on a thorough understanding of the task to be performed, the environment in which the task will be performed, high-fidelity training hardware and an awareness of the limitations of each particular training facility. Designing hardware that can be successfully operated, or assembled, by EVA astronauts in an efficient manner, requires an acute understanding of human factors and the capabilities and limitations of the space-suited astronaut. Additionally, the significant effect the microgravity environment has on the crew members' capabilities has to be carefully considered not only for each particular task, but also for all the overhead related to the task and the general overhead associated with EVA. This paper will describe various training methods and facilities that will be used to train EVA astronauts for Space Station assembly and maintenance. User-friendly EVA hardware design considerations and recent EVA flight experience will also be presented.
Extravehicular activity training and hardware design consideration.

PubMed

Thuot, P J; Harbaugh, G J

1995-07-01

Preparing astronauts to perform the many complex extravehicular activity (EVA) tasks required to assemble and maintain Space Station will be accomplished through training simulations in a variety of facilities. The adequacy of this training is dependent on a thorough understanding of the task to be performed, the environment in which the task will be performed, high-fidelity training hardware and an awareness of the limitations of each particular training facility. Designing hardware that can be successfully operated, or assembled, by EVA astronauts in an efficient manner, requires an acute understanding of human factors and the capabilities and limitations of the space-suited astronaut. Additionally, the significant effect the microgravity environment has on the crew members' capabilities has to be carefully considered not only for each particular task, but also for all the overhead related to the task and the general overhead associated with EVA. This paper will describe various training methods and facilities that will be used to train EVA astronauts for Space Station assembly and maintenance. User-friendly EVA hardware design considerations and recent EVA flight experience will also be presented.
Propulsion system-flight control integration and optimization: Flight evaluation and technology transition

NASA Technical Reports Server (NTRS)

Burcham, Frank W., Jr.; Gilyard, Glenn B.; Myers, Lawrence P.

1990-01-01

Integration of propulsion and flight control systems and their optimization offers significant performance improvements. Research programs were conducted which have developed new propulsion and flight control integration concepts, implemented designs on high-performance airplanes, demonstrated these designs in flight, and measured the performance improvements. These programs, first on the YF-12 airplane, and later on the F-15, demonstrated increased thrust, reduced fuel consumption, increased engine life, and improved airplane performance; with improvements in the 5 to 10 percent range achieved with integration and with no changes to hardware. The design, software and hardware developments, and testing requirements were shown to be practical.
Using FastX on the Peregrine System | High-Performance Computing | NREL

Science.gov Websites

with full 3D hardware acceleration. The traditional method of displaying graphics applications to a remote X server (indirect rendering) supports 3D hardware acceleration, but this approach causes all of the OpenGL commands and 3D data to be sent over the network to be rendered on the client machine. With
Supercomputing with toys: harnessing the power of NVIDIA 8800GTX and playstation 3 for bioinformatics problem.

PubMed

Wilson, Justin; Dai, Manhong; Jakupovic, Elvis; Watson, Stanley; Meng, Fan

2007-01-01

Modern video cards and game consoles typically have much better performance to price ratios than that of general purpose CPUs. The parallel processing capabilities of game hardware are well-suited for high throughput biomedical data analysis. Our initial results suggest that game hardware is a cost-effective platform for some computationally demanding bioinformatics problems.
Neuromorphic Hardware Architecture Using the Neural Engineering Framework for Pattern Recognition.

PubMed

Wang, Runchun; Thakur, Chetan Singh; Cohen, Gregory; Hamilton, Tara Julia; Tapson, Jonathan; van Schaik, Andre

2017-06-01

We present a hardware architecture that uses the neural engineering framework (NEF) to implement large-scale neural networks on field programmable gate arrays (FPGAs) for performing massively parallel real-time pattern recognition. NEF is a framework that is capable of synthesising large-scale cognitive systems from subnetworks and we have previously presented an FPGA implementation of the NEF that successfully performs nonlinear mathematical computations. That work was developed based on a compact digital neural core, which consists of 64 neurons that are instantiated by a single physical neuron using a time-multiplexing approach. We have now scaled this approach up to build a pattern recognition system by combining identical neural cores together. As a proof of concept, we have developed a handwritten digit recognition system using the MNIST database and achieved a recognition rate of 96.55%. The system is implemented on a state-of-the-art FPGA and can process 5.12 million digits per second. The architecture and hardware optimisations presented offer high-speed and resource-efficient means for performing high-speed, neuromorphic, and massively parallel pattern recognition and classification tasks.
Analysis of performance improvements for host and GPU interface of the APENet+ 3D Torus network

NASA Astrophysics Data System (ADS)

Ammendola A, R.; Biagioni, A.; Frezza, O.; Lo Cicero, F.; Lonardo, A.; Paolucci, P. S.; Rossetti, D.; Simula, F.; Tosoratto, L.; Vicini, P.

2014-06-01

APEnet+ is an INFN (Italian Institute for Nuclear Physics) project aiming to develop a custom 3-Dimensional torus interconnect network optimized for hybrid clusters CPU-GPU dedicated to High Performance scientific Computing. The APEnet+ interconnect fabric is built on a FPGA-based PCI-express board with 6 bi-directional off-board links showing 34 Gbps of raw bandwidth per direction, and leverages upon peer-to-peer capabilities of Fermi and Kepler-class NVIDIA GPUs to obtain real zero-copy, GPU-to-GPU low latency transfers. The minimization of APEnet+ transfer latency is achieved through the adoption of RDMA protocol implemented in FPGA with specialized hardware blocks tightly coupled with embedded microprocessor. This architecture provides a high performance low latency offload engine for both trasmit and receive side of data transactions: preliminary results are encouraging, showing 50% of bandwidth increase for large packet size transfers. In this paper we describe the APEnet+ architecture, detailing the hardware implementation and discuss the impact of such RDMA specialized hardware on host interface latency and bandwidth.
Pratt and Whitney Overview and Advanced Health Management Program

NASA Technical Reports Server (NTRS)

Inabinett, Calvin

2008-01-01

Hardware Development Activity: Design and Test Custom Multi-layer Circuit Boards for use in the Fault Emulation Unit; Logic design performed using VHDL; Layout power system for lab hardware; Work lab issues with software developers and software testers; Interface with Engine Systems personnel with performance of Engine hardware components; Perform off nominal testing with new engine hardware.
New tools using the hardware performance monitor to help users tune programs on the Cray X-MP

DOE Office of Scientific and Technical Information (OSTI.GOV)

Engert, D.E.; Rudsinski, L.; Doak, J.

1991-09-25

The performance of a Cray system is highly dependent on the tuning techniques used by individuals on their codes. Many of our users were not taking advantage of the tuning tools that allow them to monitor their own programs by using the Hardware Performance Monitor (HPM). We therefore modified UNICOS to collect HPM data for all processes and to report Mflop ratings based on users, programs, and time used. Our tuning efforts are now being focused on the users and programs that have the best potential for performance improvements. These modifications and some of the more striking performance improvements aremore » described.« less
Analytical Performance Modeling and Validation of Intel’s Xeon Phi Architecture

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chunduri, Sudheer; Balaprakash, Prasanna; Morozov, Vitali

Modeling the performance of scientific applications on emerging hardware plays a central role in achieving extreme-scale computing goals. Analytical models that capture the interaction between applications and hardware characteristics are attractive because even a reasonably accurate model can be useful for performance tuning before the hardware is made available. In this paper, we develop a hardware model for Intel’s second-generation Xeon Phi architecture code-named Knights Landing (KNL) for the SKOPE framework. We validate the KNL hardware model by projecting the performance of mini-benchmarks and application kernels. The results show that our KNL model can project the performance with prediction errorsmore » of 10% to 20%. The hardware model also provides informative recommendations for code transformations and tuning.« less
In Touch With Industry: ICAF Industry Studies, 1997

DTIC Science & Technology

1997-01-01

Society of Civil Engineers, Washington, DC. . 1994. "Materials for Tomorrow’s Infrastructure: A Ten-Year Plan for Deploying High - Performance ...identified high - performance electronics as a key to modern warfare and conflict prevention. Clearly, the nation’s defense strategy relies heavily on...priced, high performance systems. As a consequence, hardware makers have undergone multiple restructures, consolidations, mergers, and global

A Parallel Rendering Algorithm for MIMD Architectures

NASA Technical Reports Server (NTRS)

Crockett, Thomas W.; Orloff, Tobias

1991-01-01

Applications such as animation and scientific visualization demand high performance rendering of complex three dimensional scenes. To deliver the necessary rendering rates, highly parallel hardware architectures are required. The challenge is then to design algorithms and software which effectively use the hardware parallelism. A rendering algorithm targeted to distributed memory MIMD architectures is described. For maximum performance, the algorithm exploits both object-level and pixel-level parallelism. The behavior of the algorithm is examined both analytically and experimentally. Its performance for large numbers of processors is found to be limited primarily by communication overheads. An experimental implementation for the Intel iPSC/860 shows increasing performance from 1 to 128 processors across a wide range of scene complexities. It is shown that minimal modifications to the algorithm will adapt it for use on shared memory architectures as well.
Understanding GPU Power. A Survey of Profiling, Modeling, and Simulation Methods

DOE PAGES

Bridges, Robert A.; Imam, Neena; Mintz, Tiffany M.

2016-09-01

Modern graphics processing units (GPUs) have complex architectures that admit exceptional performance and energy efficiency for high throughput applications.Though GPUs consume large amounts of power, their use for high throughput applications facilitate state-of-the-art energy efficiency and performance. Consequently, continued development relies on understanding their power consumption. Our work is a survey of GPU power modeling and profiling methods with increased detail on noteworthy efforts. Moreover, as direct measurement of GPU power is necessary for model evaluation and parameter initiation, internal and external power sensors are discussed. Hardware counters, which are low-level tallies of hardware events, share strong correlation to powermore » use and performance. Statistical correlation between power and performance counters has yielded worthwhile GPU power models, yet the complexity inherent to GPU architectures presents new hurdles for power modeling. Developments and challenges of counter-based GPU power modeling is discussed. Often building on the counter-based models, research efforts for GPU power simulation, which make power predictions from input code and hardware knowledge, provide opportunities for optimization in programming or architectural design. Noteworthy strides in power simulations for GPUs are included along with their performance or functional simulator counterparts when appropriate. Lastly, possible directions for future research are discussed.« less
Understanding GPU Power. A Survey of Profiling, Modeling, and Simulation Methods

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bridges, Robert A.; Imam, Neena; Mintz, Tiffany M.

Modern graphics processing units (GPUs) have complex architectures that admit exceptional performance and energy efficiency for high throughput applications.Though GPUs consume large amounts of power, their use for high throughput applications facilitate state-of-the-art energy efficiency and performance. Consequently, continued development relies on understanding their power consumption. Our work is a survey of GPU power modeling and profiling methods with increased detail on noteworthy efforts. Moreover, as direct measurement of GPU power is necessary for model evaluation and parameter initiation, internal and external power sensors are discussed. Hardware counters, which are low-level tallies of hardware events, share strong correlation to powermore » use and performance. Statistical correlation between power and performance counters has yielded worthwhile GPU power models, yet the complexity inherent to GPU architectures presents new hurdles for power modeling. Developments and challenges of counter-based GPU power modeling is discussed. Often building on the counter-based models, research efforts for GPU power simulation, which make power predictions from input code and hardware knowledge, provide opportunities for optimization in programming or architectural design. Noteworthy strides in power simulations for GPUs are included along with their performance or functional simulator counterparts when appropriate. Lastly, possible directions for future research are discussed.« less
Operations of cleanrooms during a forest fire including protocols and monitoring results

NASA Astrophysics Data System (ADS)

Matheson, Bruce A.; Egges, Joanne; Pirkey, Michael S.; Lobmeyer, Lynette D.

2012-10-01

Contamination-sensitive space flight hardware is typically built in cleanroom facilities in order to protect the hardware from particle contamination. Forest wildfires near the facilities greatly increase the number of particles and amount of vapors in the ambient outside air. Reasonable questions arise as to whether typical cleanroom facilities can adequately protect the hardware from these adverse environmental conditions. On Monday September 6, 2010 (Labor Day Holiday), a large wildfire ignited near the Boulder, Colorado Campus of Ball Aerospace. The fire was approximately 6 miles from the Boulder City limits. Smoke levels from the fire stayed very high in Boulder for the majority of the week after the fire began. Cleanroom operations were halted temporarily on contamination sensitive hardware, until particulate and non-volatile residue (NVR) sampling could be performed. Immediate monitoring showed little, if any effect on the cleanroom facilities, so programs were allowed to resume work while monitoring continued for several days and beyond in some cases. Little, if any, effect was ever noticed in the monitoring performed.
Inexact hardware for modelling weather & climate

NASA Astrophysics Data System (ADS)

Düben, Peter D.; McNamara, Hugh; Palmer, Tim

2014-05-01

The use of stochastic processing hardware and low precision arithmetic in atmospheric models is investigated. Stochastic processors allow hardware-induced faults in calculations, sacrificing exact calculations in exchange for improvements in performance and potentially accuracy and a reduction in power consumption. A similar trade-off is achieved using low precision arithmetic, with improvements in computation and communication speed and savings in storage and memory requirements. As high-performance computing becomes more massively parallel and power intensive, these two approaches may be important stepping stones in the pursuit of global cloud resolving atmospheric modelling. The impact of both, hardware induced faults and low precision arithmetic is tested in the dynamical core of a global atmosphere model. Our simulations show that both approaches to inexact calculations do not substantially affect the quality of the model simulations, provided they are restricted to act only on smaller scales. This suggests that inexact calculations at the small scale could reduce computation and power costs without adversely affecting the quality of the simulations.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Barrett, Brian W.; Hemmert, K. Scott; Underwood, Keith Douglas

Achieving the next three orders of magnitude performance increase to move from petascale to exascale computing will require a significant advancements in several fundamental areas. Recent studies have outlined many of the challenges in hardware and software that will be needed. In this paper, we examine these challenges with respect to high-performance networking. We describe the repercussions of anticipated changes to computing and networking hardware and discuss the impact that alternative parallel programming models will have on the network software stack. We also present some ideas on possible approaches that address some of these challenges.
Hardware-Assisted Large-Scale Neuroevolution for Multiagent Learning

DTIC Science & Technology

2014-12-30

SECURITY CLASSIFICATION OF: This DURIP equipment award was used to purchase, install, and bring on-line two Berkeley Emulation Engines ( BEEs ) and two...mini- BEE machines to establish an FPGA-based high-performance multiagent training platform and its associated software. This acquisition of BEE4-W...Platform; Probabilistic Domain Transformation; Hardware-Assisted; FPGA; BEE ; Hive Brain; Multiagent. REPORT DOCUMENTATION PAGE 11. SPONSOR/MONITOR’S
Best bang for your buck: GPU nodes for GROMACS biomolecular simulations

PubMed Central

Páll, Szilárd; Fechner, Martin; Esztermann, Ansgar; de Groot, Bert L.; Grubmüller, Helmut

2015-01-01

The molecular dynamics simulation package GROMACS runs efficiently on a wide variety of hardware from commodity workstations to high performance computing clusters. Hardware features are well‐exploited with a combination of single instruction multiple data, multithreading, and message passing interface (MPI)‐based single program multiple data/multiple program multiple data parallelism while graphics processing units (GPUs) can be used as accelerators to compute interactions off‐loaded from the CPU. Here, we evaluate which hardware produces trajectories with GROMACS 4.6 or 5.0 in the most economical way. We have assembled and benchmarked compute nodes with various CPU/GPU combinations to identify optimal compositions in terms of raw trajectory production rate, performance‐to‐price ratio, energy efficiency, and several other criteria. Although hardware prices are naturally subject to trends and fluctuations, general tendencies are clearly visible. Adding any type of GPU significantly boosts a node's simulation performance. For inexpensive consumer‐class GPUs this improvement equally reflects in the performance‐to‐price ratio. Although memory issues in consumer‐class GPUs could pass unnoticed as these cards do not support error checking and correction memory, unreliable GPUs can be sorted out with memory checking tools. Apart from the obvious determinants for cost‐efficiency like hardware expenses and raw performance, the energy consumption of a node is a major cost factor. Over the typical hardware lifetime until replacement of a few years, the costs for electrical power and cooling can become larger than the costs of the hardware itself. Taking that into account, nodes with a well‐balanced ratio of CPU and consumer‐class GPU resources produce the maximum amount of GROMACS trajectory over their lifetime. © 2015 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc. PMID:26238484
An improved real time superresolution FPGA system

NASA Astrophysics Data System (ADS)

Lakshmi Narasimha, Pramod; Mudigoudar, Basavaraj; Yue, Zhanfeng; Topiwala, Pankaj

2009-05-01

In numerous computer vision applications, enhancing the quality and resolution of captured video can be critical. Acquired video is often grainy and low quality due to motion, transmission bottlenecks, etc. Postprocessing can enhance it. Superresolution greatly decreases camera jitter to deliver a smooth, stabilized, high quality video. In this paper, we extend previous work on a real-time superresolution application implemented in ASIC/FPGA hardware. A gradient based technique is used to register the frames at the sub-pixel level. Once we get the high resolution grid, we use an improved regularization technique in which the image is iteratively modified by applying back-projection to get a sharp and undistorted image. The algorithm was first tested in software and migrated to hardware, to achieve 320x240 -> 1280x960, about 30 fps, a stunning superresolution by 16X in total pixels. Various input parameters, such as size of input image, enlarging factor and the number of nearest neighbors, can be tuned conveniently by the user. We use a maximum word size of 32 bits to implement the algorithm in Matlab Simulink as well as in FPGA hardware, which gives us a fine balance between the number of bits and performance. The proposed system is robust and highly efficient. We have shown the performance improvement of the hardware superresolution over the software version (C code).
Spectral-element Seismic Wave Propagation on CUDA/OpenCL Hardware Accelerators

NASA Astrophysics Data System (ADS)

Peter, D. B.; Videau, B.; Pouget, K.; Komatitsch, D.

2015-12-01

Seismic wave propagation codes are essential tools to investigate a variety of wave phenomena in the Earth. Furthermore, they can now be used for seismic full-waveform inversions in regional- and global-scale adjoint tomography. Although these seismic wave propagation solvers are crucial ingredients to improve the resolution of tomographic images to answer important questions about the nature of Earth's internal processes and subsurface structure, their practical application is often limited due to high computational costs. They thus need high-performance computing (HPC) facilities to improving the current state of knowledge. At present, numerous large HPC systems embed many-core architectures such as graphics processing units (GPUs) to enhance numerical performance. Such hardware accelerators can be programmed using either the CUDA programming environment or the OpenCL language standard. CUDA software development targets NVIDIA graphic cards while OpenCL was adopted by additional hardware accelerators, like e.g. AMD graphic cards, ARM-based processors as well as Intel Xeon Phi coprocessors. For seismic wave propagation simulations using the open-source spectral-element code package SPECFEM3D_GLOBE, we incorporated an automatic source-to-source code generation tool (BOAST) which allows us to use meta-programming of all computational kernels for forward and adjoint runs. Using our BOAST kernels, we generate optimized source code for both CUDA and OpenCL languages within the source code package. Thus, seismic wave simulations are able now to fully utilize CUDA and OpenCL hardware accelerators. We show benchmarks of forward seismic wave propagation simulations using SPECFEM3D_GLOBE on CUDA/OpenCL GPUs, validating results and comparing performances for different simulations and hardware usages.
Fast and Adaptive Lossless Onboard Hyperspectral Data Compression System

NASA Technical Reports Server (NTRS)

Aranki, Nazeeh I.; Keymeulen, Didier; Kimesh, Matthew A.

2012-01-01

Modern hyperspectral imaging systems are able to acquire far more data than can be downlinked from a spacecraft. Onboard data compression helps to alleviate this problem, but requires a system capable of power efficiency and high throughput. Software solutions have limited throughput performance and are power-hungry. Dedicated hardware solutions can provide both high throughput and power efficiency, while taking the load off of the main processor. Thus a hardware compression system was developed. The implementation uses a field-programmable gate array (FPGA). The implementation is based on the fast lossless (FL) compression algorithm reported in Fast Lossless Compression of Multispectral-Image Data (NPO-42517), NASA Tech Briefs, Vol. 30, No. 8 (August 2006), page 26, which achieves excellent compression performance and has low complexity. This algorithm performs predictive compression using an adaptive filtering method, and uses adaptive Golomb coding. The implementation also packetizes the coded data. The FL algorithm is well suited for implementation in hardware. In the FPGA implementation, one sample is compressed every clock cycle, which makes for a fast and practical realtime solution for space applications. Benefits of this implementation are: 1) The underlying algorithm achieves a combination of low complexity and compression effectiveness that exceeds that of techniques currently in use. 2) The algorithm requires no training data or other specific information about the nature of the spectral bands for a fixed instrument dynamic range. 3) Hardware acceleration provides a throughput improvement of 10 to 100 times vs. the software implementation. A prototype of the compressor is available in software, but it runs at a speed that does not meet spacecraft requirements. The hardware implementation targets the Xilinx Virtex IV FPGAs, and makes the use of this compressor practical for Earth satellites as well as beyond-Earth missions with hyperspectral instruments.
CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment

PubMed Central

Manavski, Svetlin A; Valle, Giorgio

2008-01-01

Background Searching for similarities in protein and DNA databases has become a routine procedure in Molecular Biology. The Smith-Waterman algorithm has been available for more than 25 years. It is based on a dynamic programming approach that explores all the possible alignments between two sequences; as a result it returns the optimal local alignment. Unfortunately, the computational cost is very high, requiring a number of operations proportional to the product of the length of two sequences. Furthermore, the exponential growth of protein and DNA databases makes the Smith-Waterman algorithm unrealistic for searching similarities in large sets of sequences. For these reasons heuristic approaches such as those implemented in FASTA and BLAST tend to be preferred, allowing faster execution times at the cost of reduced sensitivity. The main motivation of our work is to exploit the huge computational power of commonly available graphic cards, to develop high performance solutions for sequence alignment. Results In this paper we present what we believe is the fastest solution of the exact Smith-Waterman algorithm running on commodity hardware. It is implemented in the recently released CUDA programming environment by NVidia. CUDA allows direct access to the hardware primitives of the last-generation Graphics Processing Units (GPU) G80. Speeds of more than 3.5 GCUPS (Giga Cell Updates Per Second) are achieved on a workstation running two GeForce 8800 GTX. Exhaustive tests have been done to compare our implementation to SSEARCH and BLAST, running on a 3 GHz Intel Pentium IV processor. Our solution was also compared to a recently published GPU implementation and to a Single Instruction Multiple Data (SIMD) solution. These tests show that our implementation performs from 2 to 30 times faster than any other previous attempt available on commodity hardware. Conclusions The results show that graphic cards are now sufficiently advanced to be used as efficient hardware accelerators for sequence alignment. Their performance is better than any alternative available on commodity hardware platforms. The solution presented in this paper allows large scale alignments to be performed at low cost, using the exact Smith-Waterman algorithm instead of the largely adopted heuristic approaches. PMID:18387198
Reconfigurable HIL Testing of Earth Satellites

NASA Technical Reports Server (NTRS)

2008-01-01

In recent years, hardware-in-the-loop (HIL) testing has carved a strong niche in several industries, such as automotive, aerospace, telecomm, and consumer electronics. As desktop computers have realized gains in speed, memory size, and data storage capacity, hardware/software platforms have evolved into high performance, deterministic HIL platforms, capable of hosting the most demanding applications for testing components and subsystems. Using simulation software to emulate the digital and analog I/O signals of system components, engineers of all disciplines can now test new systems in realistic environments to evaluate their function and performance prior to field deployment. Within the Aerospace industry, space-borne satellite systems are arguably some of the most demanding in terms of their requirement for custom engineering and testing. Typically, spacecraft are built one or few at a time to fulfill a space science or defense mission. In contrast to other industries that can amortize the cost of HIL systems over thousands, even millions of units, spacecraft HIL systems have been built as one-of-a-kind solutions, expensive in terms of schedule, cost, and risk, to assure satellite and spacecraft systems reliability. The focus of this paper is to present a new approach to HIL testing for spacecraft systems that takes advantage of a highly flexible hardware/software architecture based on National Instruments PXI reconfigurable hardware and virtual instruments developed using LabVIEW. This new approach to HIL is based on a multistage/multimode spacecraft bus emulation development model called Reconfigurable Hardware In-the-Loop or RHIL.
Real-time high speed generator system emulation with hardware-in-the-loop application

NASA Astrophysics Data System (ADS)

Stroupe, Nicholas

The emerging emphasis and benefits of distributed generation on smaller scale networks has prompted much attention and focus to research in this field. Much of the research that has grown in distributed generation has also stimulated the development of simulation software and techniques. Testing and verification of these distributed power networks is a complex task and real hardware testing is often desired. This is where simulation methods such as hardware-in-the-loop become important in which an actual hardware unit can be interfaced with a software simulated environment to verify proper functionality. In this thesis, a simulation technique is taken one step further by utilizing a hardware-in-the-loop technique to emulate the output voltage of a generator system interfaced to a scaled hardware distributed power system for testing. The purpose of this thesis is to demonstrate a new method of testing a virtually simulated generation system supplying a scaled distributed power system in hardware. This task is performed by using the Non-Linear Loads Test Bed developed by the Energy Conversion and Integration Thrust at the Center for Advanced Power Systems. This test bed consists of a series of real hardware developed converters consistent with the Navy's All-Electric-Ship proposed power system to perform various tests on controls and stability under the expected non-linear load environment of the Navy weaponry. This test bed can also explore other distributed power system research topics and serves as a flexible hardware unit for a variety of tests. In this thesis, the test bed will be utilized to perform and validate this newly developed method of generator system emulation. In this thesis, the dynamics of a high speed permanent magnet generator directly coupled with a micro turbine are virtually simulated on an FPGA in real-time. The calculated output stator voltage will then serve as a reference for a controllable three phase inverter at the input of the test bed that will emulate and reproduce these voltages on real hardware. The output of the inverter is then connected with the rest of the test bed and can consist of a variety of distributed system topologies for many testing scenarios. The idea is that the distributed power system under test in hardware can also integrate real generator system dynamics without physically involving an actual generator system. The benefits of successful generator system emulation are vast and lead to much more detailed system studies without the draw backs of needing physical generator units. Some of these advantages are safety, reduced costs, and the ability of scaling while still preserving the appropriate system dynamics. This thesis will introduce the ideas behind generator emulation and explain the process and necessary steps to obtaining such an objective. It will also demonstrate real results and verification of numerical values in real-time. The final goal of this thesis is to introduce this new idea and show that it is in fact obtainable and can prove to be a highly useful tool in the simulation and verification of distributed power systems.
Alternative synthetic aperture radar (SAR) modalities using a 1D dynamic metasurface antenna

NASA Astrophysics Data System (ADS)

Boyarsky, Michael; Sleasman, Timothy; Pulido-Mancera, Laura; Imani, Mohammadreza F.; Reynolds, Matthew S.; Smith, David R.

2017-05-01

Synthetic aperture radar (SAR) systems conventionally rely on mechanically-actuated reflector dishes or large phased arrays for generating steerable directive beams. While these systems have yielded high-resolution images, the hardware suffers from considerable weight, high cost, substantial power consumption, and moving parts. Since these disadvantages are particularly relevant in airborne and spaceborne systems, a flat, lightweight, and low-cost solution is a sought-after goal. Dynamic metasurface antennas have emerged as a recent technology for generating waveforms with desired characteristics. Metasurface antennas consist of an electrically-large waveguide loaded with numerous subwavelength radiators which selectively leak energy from a guided wave into free space to form various radiation patterns. By tuning each radiating element, we can modulate the aperture's overall radiation pattern to generate steered directive beams, without moving parts or phase shifters. Furthermore, by using established manufacturing methods, these apertures can be made to be lightweight, low-cost, and planar, while maintaining high performance. In addition to their hardware benefits, dynamic metasurfaces can leverage their dexterity and high switching speeds to enable alternative SAR modalities for improved performance. In this work, we briefly discuss how dynamic metasurfaces can conduct existing SAR modalities with similar performance as conventional systems from a significantly simpler hardware platform. We will also describe two additional modalities which may achieve improved performance as compared to traditional modalities. These modalities, enhanced resolution stripmap and diverse pattern stripmap, offer the ability to circumvent the trade-off between resolution and region-of-interest size that exists within stripmap and spotlight. Imaging results with a simulated dynamic metasurface verify the benefits of these modalities and a discussion of implementation considerations and noise effects is also included. Ultimately, the hardware gains coupled with the additional modalities well-suited to dynamic metasurface antennas has poised them to propel the SAR field forward and open the door to exciting opportunities.
Hardware design and implementation of fast DOA estimation method based on multicore DSP

NASA Astrophysics Data System (ADS)

Guo, Rui; Zhao, Yingxiao; Zhang, Yue; Lin, Qianqiang; Chen, Zengping

2016-10-01

In this paper, we present a high-speed real-time signal processing hardware platform based on multicore digital signal processor (DSP). The real-time signal processing platform shows several excellent characteristics including high performance computing, low power consumption, large-capacity data storage and high speed data transmission, which make it able to meet the constraint of real-time direction of arrival (DOA) estimation. To reduce the high computational complexity of DOA estimation algorithm, a novel real-valued MUSIC estimator is used. The algorithm is decomposed into several independent steps and the time consumption of each step is counted. Based on the statistics of the time consumption, we present a new parallel processing strategy to distribute the task of DOA estimation to different cores of the real-time signal processing hardware platform. Experimental results demonstrate that the high processing capability of the signal processing platform meets the constraint of real-time direction of arrival (DOA) estimation.
Lossless data compression for improving the performance of a GPU-based beamformer.

PubMed

Lok, U-Wai; Fan, Gang-Wei; Li, Pai-Chi

2015-04-01

The powerful parallel computation ability of a graphics processing unit (GPU) makes it feasible to perform dynamic receive beamforming However, a real time GPU-based beamformer requires high data rate to transfer radio-frequency (RF) data from hardware to software memory, as well as from central processing unit (CPU) to GPU memory. There are data compression methods (e.g. Joint Photographic Experts Group (JPEG)) available for the hardware front end to reduce data size, alleviating the data transfer requirement of the hardware interface. Nevertheless, the required decoding time may even be larger than the transmission time of its original data, in turn degrading the overall performance of the GPU-based beamformer. This article proposes and implements a lossless compression-decompression algorithm, which enables in parallel compression and decompression of data. By this means, the data transfer requirement of hardware interface and the transmission time of CPU to GPU data transfers are reduced, without sacrificing image quality. In simulation results, the compression ratio reached around 1.7. The encoder design of our lossless compression approach requires low hardware resources and reasonable latency in a field programmable gate array. In addition, the transmission time of transferring data from CPU to GPU with the parallel decoding process improved by threefold, as compared with transferring original uncompressed data. These results show that our proposed lossless compression plus parallel decoder approach not only mitigate the transmission bandwidth requirement to transfer data from hardware front end to software system but also reduce the transmission time for CPU to GPU data transfer. © The Author(s) 2014.
Design Approach and Implementation of Application Specific Instruction Set Processor for SHA-3 BLAKE Algorithm

NASA Astrophysics Data System (ADS)

Zhang, Yuli; Han, Jun; Weng, Xinqian; He, Zhongzhu; Zeng, Xiaoyang

This paper presents an Application Specific Instruction-set Processor (ASIP) for the SHA-3 BLAKE algorithm family by instruction set extensions (ISE) from an RISC (reduced instruction set computer) processor. With a design space exploration for this ASIP to increase the performance and reduce the area cost, we accomplish an efficient hardware and software implementation of BLAKE algorithm. The special instructions and their well-matched hardware function unit improve the calculation of the key section of the algorithm, namely G-functions. Also, relaxing the time constraint of the special function unit can decrease its hardware cost, while keeping the high data throughput of the processor. Evaluation results reveal the ASIP achieves 335Mbps and 176Mbps for BLAKE-256 and BLAKE-512. The extra area cost is only 8.06k equivalent gates. The proposed ASIP outperforms several software approaches on various platforms in cycle per byte. In fact, both high throughput and low hardware cost achieved by this programmable processor are comparable to that of ASIC implementations.
Parameters that affect parallel processing for computational electromagnetic simulation codes on high performance computing clusters

NASA Astrophysics Data System (ADS)

Moon, Hongsik

What is the impact of multicore and associated advanced technologies on computational software for science? Most researchers and students have multicore laptops or desktops for their research and they need computing power to run computational software packages. Computing power was initially derived from Central Processing Unit (CPU) clock speed. That changed when increases in clock speed became constrained by power requirements. Chip manufacturers turned to multicore CPU architectures and associated technological advancements to create the CPUs for the future. Most software applications benefited by the increased computing power the same way that increases in clock speed helped applications run faster. However, for Computational ElectroMagnetics (CEM) software developers, this change was not an obvious benefit - it appeared to be a detriment. Developers were challenged to find a way to correctly utilize the advancements in hardware so that their codes could benefit. The solution was parallelization and this dissertation details the investigation to address these challenges. Prior to multicore CPUs, advanced computer technologies were compared with the performance using benchmark software and the metric was FLoting-point Operations Per Seconds (FLOPS) which indicates system performance for scientific applications that make heavy use of floating-point calculations. Is FLOPS an effective metric for parallelized CEM simulation tools on new multicore system? Parallel CEM software needs to be benchmarked not only by FLOPS but also by the performance of other parameters related to type and utilization of the hardware, such as CPU, Random Access Memory (RAM), hard disk, network, etc. The codes need to be optimized for more than just FLOPs and new parameters must be included in benchmarking. In this dissertation, the parallel CEM software named High Order Basis Based Integral Equation Solver (HOBBIES) is introduced. This code was developed to address the needs of the changing computer hardware platforms in order to provide fast, accurate and efficient solutions to large, complex electromagnetic problems. The research in this dissertation proves that the performance of parallel code is intimately related to the configuration of the computer hardware and can be maximized for different hardware platforms. To benchmark and optimize the performance of parallel CEM software, a variety of large, complex projects are created and executed on a variety of computer platforms. The computer platforms used in this research are detailed in this dissertation. The projects run as benchmarks are also described in detail and results are presented. The parameters that affect parallel CEM software on High Performance Computing Clusters (HPCC) are investigated. This research demonstrates methods to maximize the performance of parallel CEM software code.
High-performance image reconstruction in fluorescence tomography on desktop computers and graphics hardware.

PubMed

Freiberger, Manuel; Egger, Herbert; Liebmann, Manfred; Scharfetter, Hermann

2011-11-01

Image reconstruction in fluorescence optical tomography is a three-dimensional nonlinear ill-posed problem governed by a system of partial differential equations. In this paper we demonstrate that a combination of state of the art numerical algorithms and a careful hardware optimized implementation allows to solve this large-scale inverse problem in a few seconds on standard desktop PCs with modern graphics hardware. In particular, we present methods to solve not only the forward but also the non-linear inverse problem by massively parallel programming on graphics processors. A comparison of optimized CPU and GPU implementations shows that the reconstruction can be accelerated by factors of about 15 through the use of the graphics hardware without compromising the accuracy in the reconstructed images.

The use of UNIX in a real-time environment

NASA Technical Reports Server (NTRS)

Luken, R. D.; Simons, P. C.

1986-01-01

This paper describes a project to evaluate the feasibility of using commercial off-the-shelf hardware and the UNIX operating system, to implement a real-time control and monitor system. A functional subset of the Checkout, Control and Monitor System was chosen as the test bed for the project. The project consists of three separate architecture implementations: a local area bus network, a star network, and a central host. The motivation for this project stemmed from the need to find a way to implement real-time systems, without the cost burden of developing and maintaining custom hardware and unique software. This has always been accepted as the only option because of the need to optimize the implementation for performance. However, with the cost/performance of today's hardware, the inefficiencies of high-level languages and portable operating systems can be effectively overcome.
LLMapReduce: Multi-Level Map-Reduce for High Performance Data Analysis

DTIC Science & Technology

2016-05-23

LLMapReduce works with several schedulers such as SLURM, Grid Engine and LSF. Keywords—LLMapReduce; map-reduce; performance; scheduler; Grid Engine ...SLURM; LSF I. INTRODUCTION Large scale computing is currently dominated by four ecosystems: supercomputing, database, enterprise , and big data [1...interconnects [6]), High performance math libraries (e.g., BLAS [7, 8], LAPACK [9], ScaLAPACK [10]) designed to exploit special processing hardware, High
Techniques for the rapid display and manipulation of 3-D biomedical data.

PubMed

Goldwasser, S M; Reynolds, R A; Talton, D A; Walsh, E S

1988-01-01

The use of fully interactive 3-D workstations with true real-time performance will become increasingly common as technology matures and economical commercial systems become available. This paper provides a comprehensive introduction to high speed approaches to the display and manipulation of 3-D medical objects obtained from tomographic data acquisition systems such as CT, MR, and PET. A variety of techniques are outlined including the use of software on conventional minicomputers, hardware assist devices such as array processors and programmable frame buffers, and special purpose computer architecture for dedicated high performance systems. While both algorithms and architectures are addressed, the major theme centers around the utilization of hardware-based approaches including parallel processors for the implementation of true real-time systems.
Hardware and software for automating the process of studying high-speed gas flows in wind tunnels of short-term action

NASA Astrophysics Data System (ADS)

Yakovlev, V. V.; Shakirov, S. R.; Gilyov, V. M.; Shpak, S. I.

2017-10-01

In this paper, we propose a variant of constructing automation systems for aerodynamic experiments on the basis of modern hardware-software means of domestic development. The structure of the universal control and data collection system for performing experiments in wind tunnels of continuous, periodic or short-term action is proposed. The proposed hardware and software development tools for ICT SB RAS and ITAM SB RAS, as well as subsystems based on them, can be widely applied to any scientific and experimental installations, as well as to the automation of technological processes in production.
Benchmarking and Hardware-In-The-Loop Operation of a ...

EPA Pesticide Factsheets

Engine Performance evaluation in support of LD MTE. EPA used elements of its ALPHA model to apply hardware-in-the-loop (HIL) controls to the SKYACTIV engine test setup to better understand how the engine would operate in a chassis test after combined with future leading edge technologies, advanced high-efficiency transmission, reduced mass, and reduced roadload. Predict future vehicle performance with Atkinson engine. As part of its technology assessment for the upcoming midterm evaluation of the 2017-2025 LD vehicle GHG emissions regulation, EPA has been benchmarking engines and transmissions to generate inputs for use in its ALPHA model
Embedded Streaming Deep Neural Networks Accelerator With Applications.

PubMed

Dundar, Aysegul; Jin, Jonghoon; Martini, Berin; Culurciello, Eugenio

2017-07-01

Deep convolutional neural networks (DCNNs) have become a very powerful tool in visual perception. DCNNs have applications in autonomous robots, security systems, mobile phones, and automobiles, where high throughput of the feedforward evaluation phase and power efficiency are important. Because of this increased usage, many field-programmable gate array (FPGA)-based accelerators have been proposed. In this paper, we present an optimized streaming method for DCNNs' hardware accelerator on an embedded platform. The streaming method acts as a compiler, transforming a high-level representation of DCNNs into operation codes to execute applications in a hardware accelerator. The proposed method utilizes maximum computational resources available based on a novel-scheduled routing topology that combines data reuse and data concatenation. It is tested with a hardware accelerator implemented on the Xilinx Kintex-7 XC7K325T FPGA. The system fully explores weight-level and node-level parallelizations of DCNNs and achieves a peak performance of 247 G-ops while consuming less than 4 W of power. We test our system with applications on object classification and object detection in real-world scenarios. Our results indicate high-performance efficiency, outperforming all other presented platforms while running these applications.
Using DMA for copying performance counter data to memory

DOEpatents

Gara, Alan; Salapura, Valentina; Wisniewski, Robert W.

2012-09-25

A device for copying performance counter data includes hardware path that connects a direct memory access (DMA) unit to a plurality of hardware performance counters and a memory device. Software prepares an injection packet for the DMA unit to perform copying, while the software can perform other tasks. In one aspect, the software that prepares the injection packet runs on a processing core other than the core that gathers the hardware performance counter data.
Using DMA for copying performance counter data to memory

DOEpatents

Gara, Alan; Salapura, Valentina; Wisniewski, Robert W

2013-12-31

A device for copying performance counter data includes hardware path that connects a direct memory access (DMA) unit to a plurality of hardware performance counters and a memory device. Software prepares an injection packet for the DMA unit to perform copying, while the software can perform other tasks. In one aspect, the software that prepares the injection packet runs on a processing core other than the core that gathers the hardware performance data.
An investigation of acoustic noise requirements for the Space Station centrifuge facility

NASA Technical Reports Server (NTRS)

Castellano, Timothy

1994-01-01

Acoustic noise emissions from the Space Station Freedom (SSF) centrifuge facility hardware represent a potential technical and programmatic risk to the project. The SSF program requires that no payload exceed a Noise Criterion 40 (NC-40) noise contour in any octave band between 63 Hz and 8 kHz as measured 2 feet from the equipment item. Past experience with life science experiment hardware indicates that this requirement will be difficult to meet. The crew has found noise levels on Spacelab flights to be unacceptably high. Many past Ames Spacelab life science payloads have required waivers because of excessive noise. The objectives of this study were (1) to develop an understanding of acoustic measurement theory, instruments, and technique, and (2) to characterize the noise emission of analogous Facility components and previously flown flight hardware. Test results from existing hardware were reviewed and analyzed. Measurements of the spectral and intensity characteristics of fans and other rotating machinery were performed. The literature was reviewed and contacts were made with NASA and industry organizations concerned with or performing research on noise control.
Ethoscopes: An open platform for high-throughput ethomics.

PubMed

Geissmann, Quentin; Garcia Rodriguez, Luis; Beckwith, Esteban J; French, Alice S; Jamasb, Arian R; Gilestro, Giorgio F

2017-10-01

Here, we present the use of ethoscopes, which are machines for high-throughput analysis of behavior in Drosophila and other animals. Ethoscopes provide a software and hardware solution that is reproducible and easily scalable. They perform, in real-time, tracking and profiling of behavior by using a supervised machine learning algorithm, are able to deliver behaviorally triggered stimuli to flies in a feedback-loop mode, and are highly customizable and open source. Ethoscopes can be built easily by using 3D printing technology and rely on Raspberry Pi microcomputers and Arduino boards to provide affordable and flexible hardware. All software and construction specifications are available at http://lab.gilest.ro/ethoscope.
NASA Ames Research Center R and D Services Directorate Biomedical Systems Development

NASA Technical Reports Server (NTRS)

Pollitt, J.; Flynn, K.

1999-01-01

The Ames Research Center R&D Services Directorate teams with NASA, other government agencies and/or industry investigators for the development, design, fabrication, manufacturing and qualification testing of space-flight and ground-based experiment hardware for biomedical and general aerospace applications. In recent years, biomedical research hardware and software has been developed to support space-flight and ground-based experiment needs including the E 132 Biotelemetry system for the Research Animal Holding Facility (RAHF), E 100 Neurolab neuro-vestibular investigation systems, the Autogenic Feedback Systems, and the Standard Interface Glove Box (SIGB) experiment workstation module. Centrifuges, motion simulators, habitat design, environmental control systems, and other unique experiment modules and fixtures have also been developed. A discussion of engineered systems and capabilities will be provided to promote understanding of possibilities for future system designs in biomedical applications. In addition, an overview of existing engineered products will be shown. Examples of hardware and literature that demonstrate the organization's capabilities will be displayed. The Ames Research Center R&D Services Directorate is available to support the development of new hardware and software systems or adaptation of existing systems to meet the needs of academic, commercial/industrial, and government research requirements. The Ames R&D Services Directorate can provide specialized support for: System concept definition and feasibility Mathematical modeling and simulation of system performance Prototype hardware development Hardware and software design Data acquisition systems Graphical user interface development Motion control design Hardware fabrication and high-fidelity machining Composite materials development and application design Electronic/electrical system design and fabrication System performance verification testing and qualification.
An Evaluation of One-Sided and Two-Sided Communication Paradigms on Relaxed-Ordering Interconnect

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ibrahim, Khaled Z.; Hargrove, Paul H.; Iancu, Costin

The Cray Gemini interconnect hardware provides multiple transfer mechanisms and out-of-order message delivery to improve communication throughput. In this paper we quantify the performance of one-sided and two-sided communication paradigms with respect to: 1) the optimal available hardware transfer mechanism, 2) message ordering constraints, 3) per node and per core message concurrency. In addition to using Cray native communication APIs, we use UPC and MPI micro-benchmarks to capture one- and two-sided semantics respectively. Our results indicate that relaxing the message delivery order can improve performance up to 4.6x when compared with strict ordering. When hardware allows it, high-level one-sided programmingmore » models can already take advantage of message reordering. Enforcing the ordering semantics of two-sided communication comes with a performance penalty. Furthermore, we argue that exposing out-of-order delivery at the application level is required for the next-generation programming models. Any ordering constraints in the language specifications reduce communication performance for small messages and increase the number of active cores required for peak throughput.« less
A Study of Complex Deep Learning Networks on High Performance, Neuromorphic, and Quantum Computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Potok, Thomas E; Schuman, Catherine D; Young, Steven R

Current Deep Learning models use highly optimized convolutional neural networks (CNN) trained on large graphical processing units (GPU)-based computers with a fairly simple layered network topology, i.e., highly connected layers, without intra-layer connections. Complex topologies have been proposed, but are intractable to train on current systems. Building the topologies of the deep learning network requires hand tuning, and implementing the network in hardware is expensive in both cost and power. In this paper, we evaluate deep learning models using three different computing architectures to address these problems: quantum computing to train complex topologies, high performance computing (HPC) to automatically determinemore » network topology, and neuromorphic computing for a low-power hardware implementation. Due to input size limitations of current quantum computers we use the MNIST dataset for our evaluation. The results show the possibility of using the three architectures in tandem to explore complex deep learning networks that are untrainable using a von Neumann architecture. We show that a quantum computer can find high quality values of intra-layer connections and weights, while yielding a tractable time result as the complexity of the network increases; a high performance computer can find optimal layer-based topologies; and a neuromorphic computer can represent the complex topology and weights derived from the other architectures in low power memristive hardware. This represents a new capability that is not feasible with current von Neumann architecture. It potentially enables the ability to solve very complicated problems unsolvable with current computing technologies.« less
Tools for 3D scientific visualization in computational aerodynamics

NASA Technical Reports Server (NTRS)

Bancroft, Gordon; Plessel, Todd; Merritt, Fergus; Watson, Val

1989-01-01

The purpose is to describe the tools and techniques in use at the NASA Ames Research Center for performing visualization of computational aerodynamics, for example visualization of flow fields from computer simulations of fluid dynamics about vehicles such as the Space Shuttle. The hardware used for visualization is a high-performance graphics workstation connected to a super computer with a high speed channel. At present, the workstation is a Silicon Graphics IRIS 3130, the supercomputer is a CRAY2, and the high speed channel is a hyperchannel. The three techniques used for visualization are post-processing, tracking, and steering. Post-processing analysis is done after the simulation. Tracking analysis is done during a simulation but is not interactive, whereas steering analysis involves modifying the simulation interactively during the simulation. Using post-processing methods, a flow simulation is executed on a supercomputer and, after the simulation is complete, the results of the simulation are processed for viewing. The software in use and under development at NASA Ames Research Center for performing these types of tasks in computational aerodynamics is described. Workstation performance issues, benchmarking, and high-performance networks for this purpose are also discussed as well as descriptions of other hardware for digital video and film recording.
Connecting Performance Analysis and Visualization to Advance Extreme Scale Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bremer, Peer-Timo; Mohr, Bernd; Schulz, Martin

2015-07-29

The characterization, modeling, analysis, and tuning of software performance has been a central topic in High Performance Computing (HPC) since its early beginnings. The overall goal is to make HPC software run faster on particular hardware, either through better scheduling, on-node resource utilization, or more efficient distributed communication.
TADSim: Discrete Event-based Performance Prediction for Temperature Accelerated Dynamics

DOE PAGES

Mniszewski, Susan M.; Junghans, Christoph; Voter, Arthur F.; ...

2015-04-16

Next-generation high-performance computing will require more scalable and flexible performance prediction tools to evaluate software--hardware co-design choices relevant to scientific applications and hardware architectures. Here, we present a new class of tools called application simulators—parameterized fast-running proxies of large-scale scientific applications using parallel discrete event simulation. Parameterized choices for the algorithmic method and hardware options provide a rich space for design exploration and allow us to quickly find well-performing software--hardware combinations. We demonstrate our approach with a TADSim simulator that models the temperature-accelerated dynamics (TAD) method, an algorithmically complex and parameter-rich member of the accelerated molecular dynamics (AMD) family ofmore » molecular dynamics methods. The essence of the TAD application is captured without the computational expense and resource usage of the full code. We accomplish this by identifying the time-intensive elements, quantifying algorithm steps in terms of those elements, abstracting them out, and replacing them by the passage of time. We use TADSim to quickly characterize the runtime performance and algorithmic behavior for the otherwise long-running simulation code. We extend TADSim to model algorithm extensions, such as speculative spawning of the compute-bound stages, and predict performance improvements without having to implement such a method. Validation against the actual TAD code shows close agreement for the evolution of an example physical system, a silver surface. Finally, focused parameter scans have allowed us to study algorithm parameter choices over far more scenarios than would be possible with the actual simulation. This has led to interesting performance-related insights and suggested extensions.« less
The use of emulator-based simulators for on-board software maintenance

NASA Astrophysics Data System (ADS)

Irvine, M. M.; Dartnell, A.

2002-07-01

Traditionally, onboard software maintenance activities within the space sector are performed using hardware-based facilities. These facilities are developed around the use of hardware emulation or breadboards containing target processors. Some sort of environment is provided around the hardware to support the maintenance actives. However, these environments are not easy to use to set-up the required test scenarios, particularly when the onboard software executes in a dynamic I/O environment, e.g. attitude control software, or data handling software. In addition, the hardware and/or environment may not support the test set-up required during investigations into software anomalies, e.g. raise spurious interrupt, fail memory, etc, and the overall "visibility" of the software executing may be limited. The Software Maintenance Simulator (SOMSIM) is a tool that can support the traditional maintenance facilities. The following list contains some of the main benefits that SOMSIM can provide: Low cost flexible extension to existing product - operational simulator containing software processor emulator; System-level high-fidelity test-bed in which software "executes"; Provides a high degree of control/configuration over the entire "system", including contingency conditions perhaps not possible with real hardware; High visibility and control over execution of emulated software. This paper describes the SOMSIM concept in more detail, and also describes the SOMSIM study being carried out for ESA/ESOC by VEGA IT GmbH.
A SOPC-BASED Evaluation of AES for 2.4 GHz Wireless Network

NASA Astrophysics Data System (ADS)

Ken, Cai; Xiaoying, Liang

In modern systems, data security is needed more than ever before and many cryptographic algorithms are utilized for security services. Wireless Sensor Networks (WSN) is an example of such technologies. In this paper an innovative SOPC-based approach for the security services evaluation in WSN is proposed that addresses the issues of scalability, flexible performance, and silicon efficiency for the hardware acceleration of encryption system. The design includes a Nios II processor together with custom designed modules for the Advanced Encryption Standard (AES) which has become the default choice for various security services in numerous applications. The objective of this mechanism is to present an efficient hardware realization of AES using very high speed integrated circuit hardware description language (Verilog HDL) and expand the usability for various applications. As compared to traditional customize processor design, the mechanism provides a very broad range of cost/performance points.
Comparing an FPGA to a Cell for an Image Processing Application

NASA Astrophysics Data System (ADS)

Rakvic, Ryan N.; Ngo, Hau; Broussard, Randy P.; Ives, Robert W.

2010-12-01

Modern advancements in configurable hardware, most notably Field-Programmable Gate Arrays (FPGAs), have provided an exciting opportunity to discover the parallel nature of modern image processing algorithms. On the other hand, PlayStation3 (PS3) game consoles contain a multicore heterogeneous processor known as the Cell, which is designed to perform complex image processing algorithms at a high performance. In this research project, our aim is to study the differences in performance of a modern image processing algorithm on these two hardware platforms. In particular, Iris Recognition Systems have recently become an attractive identification method because of their extremely high accuracy. Iris matching, a repeatedly executed portion of a modern iris recognition algorithm, is parallelized on an FPGA system and a Cell processor. We demonstrate a 2.5 times speedup of the parallelized algorithm on the FPGA system when compared to a Cell processor-based version.
Diskless supercomputers: Scalable, reliable I/O for the Tera-Op technology base

NASA Technical Reports Server (NTRS)

Katz, Randy H.; Ousterhout, John K.; Patterson, David A.

1993-01-01

Computing is seeing an unprecedented improvement in performance; over the last five years there has been an order-of-magnitude improvement in the speeds of workstation CPU's. At least another order of magnitude seems likely in the next five years, to machines with 500 MIPS or more. The goal of the ARPA Teraop program is to realize even larger, more powerful machines, executing as many as a trillion operations per second. Unfortunately, we have seen no comparable breakthroughs in I/O performance; the speeds of I/O devices and the hardware and software architectures for managing them have not changed substantially in many years. We have completed a program of research to demonstrate hardware and software I/O architectures capable of supporting the kinds of internetworked 'visualization' workstations and supercomputers that will appear in the mid 1990s. The project had three overall goals: high performance, high reliability, and scalable, multipurpose system.

Hardware simulation of fuel cell/gas turbine hybrids

NASA Astrophysics Data System (ADS)

Smith, Thomas Paul

Hybrid solid oxide fuel cell/gas turbine (SOFC/GT) systems offer high efficiency power generation, but face numerous integration and operability challenges. This dissertation addresses the application of hardware-in-the-loop simulation (HILS) to explore the performance of a solid oxide fuel cell stack and gas turbine when combined into a hybrid system. Specifically, this project entailed developing and demonstrating a methodology for coupling a numerical SOFC subsystem model with a gas turbine that has been modified with supplemental process flow and control paths to mimic a hybrid system. This HILS approach was implemented with the U.S. Department of Energy Hybrid Performance Project (HyPer) located at the National Energy Technology Laboratory. By utilizing HILS the facility provides a cost effective and capable platform for characterizing the response of hybrid systems to dynamic variations in operating conditions. HILS of a hybrid system was accomplished by first interfacing a numerical model with operating gas turbine hardware. The real-time SOFC stack model responds to operating turbine flow conditions in order to predict the level of thermal effluent from the SOFC stack. This simulated level of heating then dynamically sets the turbine's "firing" rate to reflect the stack output heat rate. Second, a high-speed computer system with data acquisition capabilities was integrated with the existing controls and sensors of the turbine facility. In the future, this will allow for the utilization of high-fidelity fuel cell models that infer cell performance parameters while still computing the simulation in real-time. Once the integration of the numeric and the hardware simulation components was completed, HILS experiments were conducted to evaluate hybrid system performance. The testing identified non-intuitive transient responses arising from the large thermal capacitance of the stack that are inherent to hybrid systems. Furthermore, the tests demonstrated the capabilities of HILS as a research tool for investigating the dynamic behavior of SOFC/GT hybrid power generation systems.
Palmprint and face score level fusion: hardware implementation of a contactless small sample biometric system

NASA Astrophysics Data System (ADS)

Poinsot, Audrey; Yang, Fan; Brost, Vincent

2011-02-01

Including multiple sources of information in personal identity recognition and verification gives the opportunity to greatly improve performance. We propose a contactless biometric system that combines two modalities: palmprint and face. Hardware implementations are proposed on the Texas Instrument Digital Signal Processor and Xilinx Field-Programmable Gate Array (FPGA) platforms. The algorithmic chain consists of a preprocessing (which includes palm extraction from hand images), Gabor feature extraction, comparison by Hamming distance, and score fusion. Fusion possibilities are discussed and tested first using a bimodal database of 130 subjects that we designed (uB database), and then two common public biometric databases (AR for face and PolyU for palmprint). High performance has been obtained for recognition and verification purpose: a recognition rate of 97.49% with AR-PolyU database and an equal error rate of 1.10% on the uB database using only two training samples per subject have been obtained. Hardware results demonstrate that preprocessing can easily be performed during the acquisition phase, and multimodal biometric recognition can be treated almost instantly (0.4 ms on FPGA). We show the feasibility of a robust and efficient multimodal hardware biometric system that offers several advantages, such as user-friendliness and flexibility.
Toward Evolvable Hardware Chips: Experiments with a Programmable Transistor Array

NASA Technical Reports Server (NTRS)

Stoica, Adrian

1998-01-01

Evolvable Hardware is reconfigurable hardware that self-configures under the control of an evolutionary algorithm. We search for a hardware configuration can be performed using software models or, faster and more accurate, directly in reconfigurable hardware. Several experiments have demonstrated the possibility to automatically synthesize both digital and analog circuits. The paper introduces an approach to automated synthesis of CMOS circuits, based on evolution on a Programmable Transistor Array (PTA). The approach is illustrated with a software experiment showing evolutionary synthesis of a circuit with a desired DC characteristic. A hardware implementation of a test PTA chip is then described, and the same evolutionary experiment is performed on the chip demonstrating circuit synthesis/self-configuration directly in hardware.
Pre-Hardware Optimization of Spacecraft Image Processing Software Algorithms and Hardware Implementation

NASA Technical Reports Server (NTRS)

Kizhner, Semion; Flatley, Thomas P.; Hestnes, Phyllis; Jentoft-Nilsen, Marit; Petrick, David J.; Day, John H. (Technical Monitor)

2001-01-01

Spacecraft telemetry rates have steadily increased over the last decade presenting a problem for real-time processing by ground facilities. This paper proposes a solution to a related problem for the Geostationary Operational Environmental Spacecraft (GOES-8) image processing application. Although large super-computer facilities are the obvious heritage solution, they are very costly, making it imperative to seek a feasible alternative engineering solution at a fraction of the cost. The solution is based on a Personal Computer (PC) platform and synergy of optimized software algorithms and re-configurable computing hardware technologies, such as Field Programmable Gate Arrays (FPGA) and Digital Signal Processing (DSP). It has been shown in [1] and [2] that this configuration can provide superior inexpensive performance for a chosen application on the ground station or on-board a spacecraft. However, since this technology is still maturing, intensive pre-hardware steps are necessary to achieve the benefits of hardware implementation. This paper describes these steps for the GOES-8 application, a software project developed using Interactive Data Language (IDL) (Trademark of Research Systems, Inc.) on a Workstation/UNIX platform. The solution involves converting the application to a PC/Windows/RC platform, selected mainly by the availability of low cost, adaptable high-speed RC hardware. In order for the hybrid system to run, the IDL software was modified to account for platform differences. It was interesting to examine the gains and losses in performance on the new platform, as well as unexpected observations before implementing hardware. After substantial pre-hardware optimization steps, the necessity of hardware implementation for bottleneck code in the PC environment became evident and solvable beginning with the methodology described in [1], [2], and implementing a novel methodology for this specific application [6]. The PC-RC interface bandwidth problem for the class of applications with moderate input-output data rates but large intermediate multi-thread data streams has been addressed and mitigated. This opens a new class of satellite image processing applications for bottleneck problems solution using RC technologies. The issue of a science algorithm level of abstraction necessary for RC hardware implementation is also described. Selected Matlab functions already implemented in hardware were investigated for their direct applicability to the GOES-8 application with the intent to create a library of Matlab and IDL RC functions for ongoing work. A complete class of spacecraft image processing applications using embedded re-configurable computing technology to meet real-time requirements, including performance results and comparison with the existing system, is described in this paper.
Measuring human performance on NASA's microgravity aircraft

NASA Technical Reports Server (NTRS)

Morris, Randy B.; Whitmore, Mihriban

1993-01-01

Measuring human performance in a microgravity environment will aid in identifying the design requirements, human capabilities, safety, and productivity of future astronauts. The preliminary understanding of the microgravity effects on human performance can be achieved through evaluations conducted onboard NASA's KC-135 aircraft. These evaluations can be performed in relation to hardware performance, human-hardware interface, and hardware integration. Measuring human performance in the KC-135 simulated environment will contribute to the efforts of optimizing the human-machine interfaces for future and existing space vehicles. However, there are limitations, such as limited number of qualified subjects, unexpected hardware problems, and miscellaneous plane movements which must be taken into consideration. Examples for these evaluations, the results, and their implications are discussed in the paper.
77 FR 57005 - Airworthiness Directives; Bell Helicopter Textron Canada Helicopters

Federal Register 2010, 2011, 2012, 2013, 2014

2012-09-17

... tailboom-attachment hardware (attachment hardware), and perform initial and recurring determinations of the... bolts specified in the BHTC Model 407 Maintenance Manual and applied during manufacturing was incorrect... require replacing attachment hardware and performing initial and recurring determinations of the torque on...
Cooperative GN&C development in a rapid prototyping environment. [flight software design for space vehicles

NASA Technical Reports Server (NTRS)

Bordano, Aldo; Uhde-Lacovara, JO; Devall, Ray; Partin, Charles; Sugano, Jeff; Doane, Kent; Compton, Jim

1993-01-01

The Navigation, Control and Aeronautics Division (NCAD) at NASA-JSC is exploring ways of producing Guidance, Navigation and Control (GN&C) flight software faster, better, and cheaper. To achieve these goals NCAD established two hardware/software facilities that take an avionics design project from initial inception through high fidelity real-time hardware-in-the-loop testing. Commercially available software products are used to develop the GN&C algorithms in block diagram form and then automatically generate source code from these diagrams. A high fidelity real-time hardware-in-the-loop laboratory provides users with the capability to analyze mass memory usage within the targeted flight computer, verify hardware interfaces, conduct system level verification, performance, acceptance testing, as well as mission verification using reconfigurable and mission unique data. To evaluate these concepts and tools, NCAD embarked on a project to build a real-time 6 DOF simulation of the Soyuz Assured Crew Return Vehicle flight software. To date, a productivity increase of 185 percent has been seen over traditional NASA methods for developing flight software.
Ice-sheet modelling accelerated by graphics cards

NASA Astrophysics Data System (ADS)

Brædstrup, Christian Fredborg; Damsgaard, Anders; Egholm, David Lundbek

2014-11-01

Studies of glaciers and ice sheets have increased the demand for high performance numerical ice flow models over the past decades. When exploring the highly non-linear dynamics of fast flowing glaciers and ice streams, or when coupling multiple flow processes for ice, water, and sediment, researchers are often forced to use super-computing clusters. As an alternative to conventional high-performance computing hardware, the Graphical Processing Unit (GPU) is capable of massively parallel computing while retaining a compact design and low cost. In this study, we present a strategy for accelerating a higher-order ice flow model using a GPU. By applying the newest GPU hardware, we achieve up to 180× speedup compared to a similar but serial CPU implementation. Our results suggest that GPU acceleration is a competitive option for ice-flow modelling when compared to CPU-optimised algorithms parallelised by the OpenMP or Message Passing Interface (MPI) protocols.
Multiview 3D sensing and analysis for high quality point cloud reconstruction

NASA Astrophysics Data System (ADS)

Satnik, Andrej; Izquierdo, Ebroul; Orjesek, Richard

2018-04-01

Multiview 3D reconstruction techniques enable digital reconstruction of 3D objects from the real world by fusing different viewpoints of the same object into a single 3D representation. This process is by no means trivial and the acquisition of high quality point cloud representations of dynamic 3D objects is still an open problem. In this paper, an approach for high fidelity 3D point cloud generation using low cost 3D sensing hardware is presented. The proposed approach runs in an efficient low-cost hardware setting based on several Kinect v2 scanners connected to a single PC. It performs autocalibration and runs in real-time exploiting an efficient composition of several filtering methods including Radius Outlier Removal (ROR), Weighted Median filter (WM) and Weighted Inter-Frame Average filtering (WIFA). The performance of the proposed method has been demonstrated through efficient acquisition of dense 3D point clouds of moving objects.
A highly reliable, high performance open avionics architecture for real time Nap-of-the-Earth operations

NASA Technical Reports Server (NTRS)

Harper, Richard E.; Elks, Carl

1995-01-01

An Army Fault Tolerant Architecture (AFTA) has been developed to meet real-time fault tolerant processing requirements of future Army applications. AFTA is the enabling technology that will allow the Army to configure existing processors and other hardware to provide high throughput and ultrahigh reliability necessary for TF/TA/NOE flight control and other advanced Army applications. A comprehensive conceptual study of AFTA has been completed that addresses a wide range of issues including requirements, architecture, hardware, software, testability, producibility, analytical models, validation and verification, common mode faults, VHDL, and a fault tolerant data bus. A Brassboard AFTA for demonstration and validation has been fabricated, and two operating systems and a flight-critical Army application have been ported to it. Detailed performance measurements have been made of fault tolerance and operating system overheads while AFTA was executing the flight application in the presence of faults.
Innovations in Small-Animal PET/MR Imaging Instrumentation.

PubMed

Tsoumpas, Charalampos; Visvikis, Dimitris; Loudos, George

2016-04-01

Multimodal imaging has led to a more detailed exploration of different physiologic processes with integrated PET/MR imaging being the most recent entry. Although the clinical need is still questioned, it is well recognized that it represents one of the most active and promising fields of medical imaging research in terms of software and hardware. The hardware developments have moved from small detector components to high-performance PET inserts and new concepts in full systems. Conversely, the software focuses on the efficient performance of necessary corrections without the use of CT data. The most recent developments in both directions are reviewed. Copyright © 2016 Elsevier Inc. All rights reserved.
Apollo experience report: Battery subsystem

NASA Technical Reports Server (NTRS)

Trout, J. B.

1972-01-01

Experience with the Apollo command service module and lunar module batteries is discussed. Significant hardware development concepts and hardware test results are summarized, and the operational performance of batteries on the Apollo 7 to 13 missions is discussed in terms of performance data, mission constraints, and basic hardware design and capability. Also, the flight performance of the Apollo battery charger is discussed. Inflight data are presented.
Hardware support for collecting performance counters directly to memory

DOEpatents

Gara, Alan; Salapura, Valentina; Wisniewski, Robert W.

2012-09-25

Hardware support for collecting performance counters directly to memory, in one aspect, may include a plurality of performance counters operable to collect one or more counts of one or more selected activities. A first storage element may be operable to store an address of a memory location. A second storage element may be operable to store a value indicating whether the hardware should begin copying. A state machine may be operable to detect the value in the second storage element and trigger hardware copying of data in selected one or more of the plurality of performance counters to the memory location whose address is stored in the first storage element.
Thermal management of advanced fuel cell power systems

NASA Technical Reports Server (NTRS)

Vanderborgh, N. E.; Hedstrom, J.; Huff, J.

1990-01-01

It is shown that fuel cell devices are particularly attractive for the high-efficiency, high-reliability space hardware necessary to support upcoming space missions. These low-temperature hydrogen-oxygen systems necessarily operate with two-phase water. In either PEMFCs (proton exchange membrane fuel cells) or AFCs (alkaline fuel cells), engineering design must be critically focused on both stack temperature control and on the relative humidity control necessary to sustain appropriate conductivity within the ionic conductor. Water must also be removed promptly from the hardware. Present designs for AFC space hardware accomplish thermal management through two coupled cooling loops, both driven by a heat transfer fluid, and involve a recirculation fan to remove water and heat from the stack. There appears to be a certain advantage in using product water for these purposes within PEM hardware, because in that case a single fluid can serve both to control stack temperature, operating simultaneously as a heat transfer medium and through evaporation, and to provide the gas-phase moisture levels necessary to set the ionic conductor at appropriate performance levels. Moreover, the humidification cooling process automatically follows current loads. This design may remove the necessity for recirculation gas fans, thus demonstrating the long-term reliability essential for future space power hardware.
Complexity Optimization and High-Throughput Low-Latency Hardware Implementation of a Multi-Electrode Spike-Sorting Algorithm

PubMed Central

Dragas, Jelena; Jäckel, David; Hierlemann, Andreas; Franke, Felix

2017-01-01

Reliable real-time low-latency spike sorting with large data throughput is essential for studies of neural network dynamics and for brain-machine interfaces (BMIs), in which the stimulation of neural networks is based on the networks' most recent activity. However, the majority of existing multi-electrode spike-sorting algorithms are unsuited for processing high quantities of simultaneously recorded data. Recording from large neuronal networks using large high-density electrode sets (thousands of electrodes) imposes high demands on the data-processing hardware regarding computational complexity and data transmission bandwidth; this, in turn, entails demanding requirements in terms of chip area, memory resources and processing latency. This paper presents computational complexity optimization techniques, which facilitate the use of spike-sorting algorithms in large multi-electrode-based recording systems. The techniques are then applied to a previously published algorithm, on its own, unsuited for large electrode set recordings. Further, a real-time low-latency high-performance VLSI hardware architecture of the modified algorithm is presented, featuring a folded structure capable of processing the activity of hundreds of neurons simultaneously. The hardware is reconfigurable “on-the-fly” and adaptable to the nonstationarities of neuronal recordings. By transmitting exclusively spike time stamps and/or spike waveforms, its real-time processing offers the possibility of data bandwidth and data storage reduction. PMID:25415989
Complexity optimization and high-throughput low-latency hardware implementation of a multi-electrode spike-sorting algorithm.

PubMed

Dragas, Jelena; Jackel, David; Hierlemann, Andreas; Franke, Felix

2015-03-01

Reliable real-time low-latency spike sorting with large data throughput is essential for studies of neural network dynamics and for brain-machine interfaces (BMIs), in which the stimulation of neural networks is based on the networks' most recent activity. However, the majority of existing multi-electrode spike-sorting algorithms are unsuited for processing high quantities of simultaneously recorded data. Recording from large neuronal networks using large high-density electrode sets (thousands of electrodes) imposes high demands on the data-processing hardware regarding computational complexity and data transmission bandwidth; this, in turn, entails demanding requirements in terms of chip area, memory resources and processing latency. This paper presents computational complexity optimization techniques, which facilitate the use of spike-sorting algorithms in large multi-electrode-based recording systems. The techniques are then applied to a previously published algorithm, on its own, unsuited for large electrode set recordings. Further, a real-time low-latency high-performance VLSI hardware architecture of the modified algorithm is presented, featuring a folded structure capable of processing the activity of hundreds of neurons simultaneously. The hardware is reconfigurable “on-the-fly” and adaptable to the nonstationarities of neuronal recordings. By transmitting exclusively spike time stamps and/or spike waveforms, its real-time processing offers the possibility of data bandwidth and data storage reduction.
Improved pulse laser ranging algorithm based on high speed sampling

NASA Astrophysics Data System (ADS)

Gao, Xuan-yi; Qian, Rui-hai; Zhang, Yan-mei; Li, Huan; Guo, Hai-chao; He, Shi-jie; Guo, Xiao-kang

2016-10-01

Narrow pulse laser ranging achieves long-range target detection using laser pulse with low divergent beams. Pulse laser ranging is widely used in military, industrial, civil, engineering and transportation field. In this paper, an improved narrow pulse laser ranging algorithm is studied based on the high speed sampling. Firstly, theoretical simulation models have been built and analyzed including the laser emission and pulse laser ranging algorithm. An improved pulse ranging algorithm is developed. This new algorithm combines the matched filter algorithm and the constant fraction discrimination (CFD) algorithm. After the algorithm simulation, a laser ranging hardware system is set up to implement the improved algorithm. The laser ranging hardware system includes a laser diode, a laser detector and a high sample rate data logging circuit. Subsequently, using Verilog HDL language, the improved algorithm is implemented in the FPGA chip based on fusion of the matched filter algorithm and the CFD algorithm. Finally, the laser ranging experiment is carried out to test the improved algorithm ranging performance comparing to the matched filter algorithm and the CFD algorithm using the laser ranging hardware system. The test analysis result demonstrates that the laser ranging hardware system realized the high speed processing and high speed sampling data transmission. The algorithm analysis result presents that the improved algorithm achieves 0.3m distance ranging precision. The improved algorithm analysis result meets the expected effect, which is consistent with the theoretical simulation.
Test Facilities in Support of High Power Electric Propulsion Systems

NASA Technical Reports Server (NTRS)

VanDyke, Melissa; Houts, Mike; Godfroy, Thomas; Dickens, Ricky; Martin, James J.; Salvail, Patrick; Carter, Robert

2002-01-01

Successful development of space fission systems requires an extensive program of affordable and realistic testing. In addition to tests related to design/development of the fission system, realistic testing of the actual flight unit must also be performed. If the system is designed to operate within established radiation damage and fuel burn up limits while simultaneously being designed to allow close simulation of heat from fission using resistance heaters, high confidence in fission system performance and lifetime can be attained through non-nuclear testing. Through demonstration of systems concepts (designed by DOE National Laboratories) in relevant environments, this philosophy has been demonstrated through hardware testing in the High Power Propulsion Thermal Simulator (HPPTS). The HPPTS is designed to enable very realistic non-nuclear testing of space fission systems. Ongoing research at the HPPTS is geared towards facilitating research, development, system integration, and system utilization via cooperative efforts with DOE labs, industry, universities, and other NASA centers. Through hardware based design and testing, the HPPTS investigates High Power Electric Propulsion (HPEP) component, subsystem, and integrated system design and performance.
Ethoscopes: An open platform for high-throughput ethomics

PubMed Central

Geissmann, Quentin; Garcia Rodriguez, Luis; Beckwith, Esteban J.; French, Alice S.; Jamasb, Arian R.

2017-01-01

Here, we present the use of ethoscopes, which are machines for high-throughput analysis of behavior in Drosophila and other animals. Ethoscopes provide a software and hardware solution that is reproducible and easily scalable. They perform, in real-time, tracking and profiling of behavior by using a supervised machine learning algorithm, are able to deliver behaviorally triggered stimuli to flies in a feedback-loop mode, and are highly customizable and open source. Ethoscopes can be built easily by using 3D printing technology and rely on Raspberry Pi microcomputers and Arduino boards to provide affordable and flexible hardware. All software and construction specifications are available at http://lab.gilest.ro/ethoscope. PMID:29049280
An Overview of Hardware for Protein Crystallization in a Magnetic Field.

PubMed

Yan, Er-Kai; Zhang, Chen-Yan; He, Jin; Yin, Da-Chuan

2016-11-16

Protein crystallization under a magnetic field is an interesting research topic because a magnetic field may provide a special environment to acquire improved quality protein crystals. Because high-quality protein crystals are very useful in high-resolution structure determination using diffraction techniques (X-ray, neutron, and electron diffraction), research using magnetic fields in protein crystallization has attracted substantial interest; some studies have been performed in the past two decades. In this research field, the hardware is especially essential for successful studies because the environment is special and the design and utilization of the research apparatus in such an environment requires special considerations related to the magnetic field. This paper reviews the hardware for protein crystallization (including the magnet systems and the apparatus designed for use in a magnetic field) and progress in this area. Future prospects in this field will also be discussed.

Cellular-enabled water quality measurements

NASA Astrophysics Data System (ADS)

Zhao, Y.; Kerkez, B.

2013-12-01

While the past decade has seen significant improvements in our ability to measure nutrients and other water quality parameters, the use of these sensors has yet to gain traction due to their costprohibitive nature and deployment expertise required on the part of researchers. Furthermore, an extra burden is incurred when real-time data access becomes an experimental requirement. We present an open-source hardware design to facilitate the real-time, low-cost, and robust measurements of water quality across large urbanized areas. Our hardware platform interfaces an embedded, vastly configurable, high-precision, ultra-low power measurement system, with a low-power cellular module. Each sensor station is configured with an IP address, permitting reliable streaming of sensor data to off-site locations as measurements are made. We discuss the role of high-quality hardware components during extreme event scenarios, and present preliminary performance metrics that validate the ability of the platform to provide streaming access to sensor measurements.
An Overview of Hardware for Protein Crystallization in a Magnetic Field

PubMed Central

Yan, Er-Kai; Zhang, Chen-Yan; He, Jin; Yin, Da-Chuan

2016-01-01

Protein crystallization under a magnetic field is an interesting research topic because a magnetic field may provide a special environment to acquire improved quality protein crystals. Because high-quality protein crystals are very useful in high-resolution structure determination using diffraction techniques (X-ray, neutron, and electron diffraction), research using magnetic fields in protein crystallization has attracted substantial interest; some studies have been performed in the past two decades. In this research field, the hardware is especially essential for successful studies because the environment is special and the design and utilization of the research apparatus in such an environment requires special considerations related to the magnetic field. This paper reviews the hardware for protein crystallization (including the magnet systems and the apparatus designed for use in a magnetic field) and progress in this area. Future prospects in this field will also be discussed. PMID:27854318
Benchmarking Model Variants in Development of a Hardware-in-the-Loop Simulation System

NASA Technical Reports Server (NTRS)

Aretskin-Hariton, Eliot D.; Zinnecker, Alicia M.; Kratz, Jonathan L.; Culley, Dennis E.; Thomas, George L.

2016-01-01

Distributed engine control architecture presents a significant increase in complexity over traditional implementations when viewed from the perspective of system simulation and hardware design and test. Even if the overall function of the control scheme remains the same, the hardware implementation can have a significant effect on the overall system performance due to differences in the creation and flow of data between control elements. A Hardware-in-the-Loop (HIL) simulation system is under development at NASA Glenn Research Center that enables the exploration of these hardware dependent issues. The system is based on, but not limited to, the Commercial Modular Aero-Propulsion System Simulation 40k (C-MAPSS40k). This paper describes the step-by-step conversion from the self-contained baseline model to the hardware in the loop model, and the validation of each step. As the control model hardware fidelity was improved during HIL system development, benchmarking simulations were performed to verify that engine system performance characteristics remained the same. The results demonstrate the goal of the effort; the new HIL configurations have similar functionality and performance compared to the baseline C-MAPSS40k system.
ATS-6 engineering performance report. Volume 6: Scientific experiments

NASA Technical Reports Server (NTRS)

Wales, R. O. (Editor)

1981-01-01

Evaluations include a very high resolution radiometer, a radio beacon experiment, environmental measurement experiments (EME), EME support hardware, EME anomalies and failures, EME results, and US/USSR magnetometer experiments.
Robustness of spiking Deep Belief Networks to noise and reduced bit precision of neuro-inspired hardware platforms.

PubMed

Stromatias, Evangelos; Neil, Daniel; Pfeiffer, Michael; Galluppi, Francesco; Furber, Steve B; Liu, Shih-Chii

2015-01-01

Increasingly large deep learning architectures, such as Deep Belief Networks (DBNs) are the focus of current machine learning research and achieve state-of-the-art results in different domains. However, both training and execution of large-scale Deep Networks require vast computing resources, leading to high power requirements and communication overheads. The on-going work on design and construction of spike-based hardware platforms offers an alternative for running deep neural networks with significantly lower power consumption, but has to overcome hardware limitations in terms of noise and limited weight precision, as well as noise inherent in the sensor signal. This article investigates how such hardware constraints impact the performance of spiking neural network implementations of DBNs. In particular, the influence of limited bit precision during execution and training, and the impact of silicon mismatch in the synaptic weight parameters of custom hybrid VLSI implementations is studied. Furthermore, the network performance of spiking DBNs is characterized with regard to noise in the spiking input signal. Our results demonstrate that spiking DBNs can tolerate very low levels of hardware bit precision down to almost two bits, and show that their performance can be improved by at least 30% through an adapted training mechanism that takes the bit precision of the target platform into account. Spiking DBNs thus present an important use-case for large-scale hybrid analog-digital or digital neuromorphic platforms such as SpiNNaker, which can execute large but precision-constrained deep networks in real time.
Robustness of spiking Deep Belief Networks to noise and reduced bit precision of neuro-inspired hardware platforms

PubMed Central

Stromatias, Evangelos; Neil, Daniel; Pfeiffer, Michael; Galluppi, Francesco; Furber, Steve B.; Liu, Shih-Chii

2015-01-01

Increasingly large deep learning architectures, such as Deep Belief Networks (DBNs) are the focus of current machine learning research and achieve state-of-the-art results in different domains. However, both training and execution of large-scale Deep Networks require vast computing resources, leading to high power requirements and communication overheads. The on-going work on design and construction of spike-based hardware platforms offers an alternative for running deep neural networks with significantly lower power consumption, but has to overcome hardware limitations in terms of noise and limited weight precision, as well as noise inherent in the sensor signal. This article investigates how such hardware constraints impact the performance of spiking neural network implementations of DBNs. In particular, the influence of limited bit precision during execution and training, and the impact of silicon mismatch in the synaptic weight parameters of custom hybrid VLSI implementations is studied. Furthermore, the network performance of spiking DBNs is characterized with regard to noise in the spiking input signal. Our results demonstrate that spiking DBNs can tolerate very low levels of hardware bit precision down to almost two bits, and show that their performance can be improved by at least 30% through an adapted training mechanism that takes the bit precision of the target platform into account. Spiking DBNs thus present an important use-case for large-scale hybrid analog-digital or digital neuromorphic platforms such as SpiNNaker, which can execute large but precision-constrained deep networks in real time. PMID:26217169
Liquid Nitrogen Removal of Critical Aerospace Materials

NASA Technical Reports Server (NTRS)

Noah, Donald E.; Merrick, Jason; Hayes, Paul W.

2005-01-01

Identification of innovative solutions to unique materials problems is an every-day quest for members of the aerospace community. Finding a technique that will minimize costs, maximize throughput, and generate quality results is always the target. United Space Alliance Materials Engineers recently conducted such a search in their drive to return the Space Shuttle fleet to operational status. The removal of high performance thermal coatings from solid rocket motors represents a formidable task during post flight disassembly on reusable expended hardware. The removal of these coatings from unfired motors increases the complexity and safety requirements while reducing the available facilities and approved processes. A temporary solution to this problem was identified, tested and approved during the Solid Rocket Booster (SRB) return to flight activities. Utilization of ultra high-pressure liquid nitrogen (LN2) to strip the protective coating from assembled space shuttle hardware marked the first such use of the technology in the aerospace industry. This process provides a configurable stream of liquid nitrogen (LN2) at pressures of up to 55,000 psig. The performance of a one-time certification for the removal of thermal ablatives from SRB hardware involved extensive testing to ensure adequate material removal without causing undesirable damage to the residual materials or aluminum substrates. Testing to establish appropriate process parameters such as flow, temperature and pressures of the liquid nitrogen stream provided an initial benchmark for process testing. Equipped with these initial parameters engineers were then able to establish more detailed test criteria that set the process limits. Quantifying the potential for aluminum hardware damage represented the greatest hurdle for satisfying engineers as to the safety of this process. Extensive testing for aluminum erosion, surface profiling, and substrate weight loss was performed. This successful project clearly demonstrated that the liquid nitrogen jet possesses unique strengths that align remarkably well with the unusual challenges that space hardware and missile manufacturers face on a regular basis. Performance of this task within the confines of a critical manufacturing facility marks a milestone in advanced processing.
A Software Defined Radio Based Airplane Communication Navigation Simulation System

NASA Astrophysics Data System (ADS)

He, L.; Zhong, H. T.; Song, D.

2018-01-01

Radio communication and navigation system plays important role in ensuring the safety of civil airplane in flight. Function and performance should be tested before these systems are installed on-board. Conventionally, a set of transmitter and receiver are needed for each system, thus all the equipment occupy a lot of space and are high cost. In this paper, software defined radio technology is applied to design a common hardware communication and navigation ground simulation system, which can host multiple airplane systems with different operating frequency, such as HF, VHF, VOR, ILS, ADF, etc. We use a broadband analog frontend hardware platform, universal software radio peripheral (USRP), to transmit/receive signal of different frequency band. Software is compiled by LabVIEW on computer, which interfaces with USRP through Ethernet, and is responsible for communication and navigation signal processing and system control. An integrated testing system is established to perform functional test and performance verification of the simulation signal, which demonstrate the feasibility of our design. The system is a low-cost and common hardware platform for multiple airplane systems, which provide helpful reference for integrated avionics design.
ProjectQ: Compiling quantum programs for various backends

NASA Astrophysics Data System (ADS)

Haener, Thomas; Steiger, Damian S.; Troyer, Matthias

In order to control quantum computers beyond the current generation, a high level quantum programming language and optimizing compilers will be essential. Therefore, we have developed ProjectQ - an open source software framework to facilitate implementing and running quantum algorithms both in software and on actual quantum hardware. Here, we introduce the backends available in ProjectQ. This includes a high-performance simulator and emulator to test and debug quantum algorithms, tools for resource estimation, and interfaces to several small-scale quantum devices. We demonstrate the workings of the framework and show how easily it can be further extended to control upcoming quantum hardware.
birgHPC: creating instant computing clusters for bioinformatics and molecular dynamics.

PubMed

Chew, Teong Han; Joyce-Tan, Kwee Hong; Akma, Farizuwana; Shamsir, Mohd Shahir

2011-05-01

birgHPC, a bootable Linux Live CD has been developed to create high-performance clusters for bioinformatics and molecular dynamics studies using any Local Area Network (LAN)-networked computers. birgHPC features automated hardware and slots detection as well as provides a simple job submission interface. The latest versions of GROMACS, NAMD, mpiBLAST and ClustalW-MPI can be run in parallel by simply booting the birgHPC CD or flash drive from the head node, which immediately positions the rest of the PCs on the network as computing nodes. Thus, a temporary, affordable, scalable and high-performance computing environment can be built by non-computing-based researchers using low-cost commodity hardware. The birgHPC Live CD and relevant user guide are available for free at http://birg1.fbb.utm.my/birghpc.
AEA Cell-Bypass-Switch Activation: An Update

NASA Technical Reports Server (NTRS)

Keys, Denney; Rao, Gopalakrishna M.; Wannemacher, Harry

2002-01-01

The objectives of this project included the following: (1) verify the performance of AEA cell bypass protection device (CBPD) under simulated EOS-Aqua/Aura flight hardware configuration; (2) assess the safety of the hardware under an inadvertent firing of CBPD switch, as well as the closing of CBPD; and (3) confirm that the mode of operation of CBPD switch is the formation of a continuous low impedance path (a homogeneous low melting point alloy). The nominal performance of AEA CBPD under flight operating conditions (vacuum except zero-G, and high impedance cell) has been demonstrated. There is no evidence of cell rupture or excessive heat production during or after CBPD switch activation under simulated high cell impedance (open-circuit cell failure mode). The formation of a continuous low impedance path (a homogeneous low melting point alloy) has been confirmed.
The Evolution of Exercise Hardware on ISS: Past, Present, and Future

NASA Technical Reports Server (NTRS)

Buxton, R. E.; Kalogera, K. L.; Hanson, A. M.

2017-01-01

During 16 years in low-Earth orbit, the suite of exercise hardware aboard the International Space Station (ISS) has matured significantly. Today, the countermeasure system supports an array of physical-training protocols and serves as an extensive research platform. Future hardware designs are required to have smaller operational envelopes and must also mitigate known physiologic issues observed in long-duration spaceflight. Taking lessons learned from the long history of space exercise will be important to successful development and implementation of future, compact exercise hardware. The evolution of exercise hardware as deployed on the ISS has implications for future exercise hardware and operations. Key lessons learned from the early days of ISS have helped to: 1. Enhance hardware performance (increased speed and loads). 2. Mature software interfaces. 3. Compare inflight exercise workloads to pre-, in-, and post-flight musculoskeletal and aerobic conditions. 4. Improve exercise comfort. 5. Develop complimentary hardware for research and operations. Current ISS exercise hardware includes both custom and commercial-off-the-shelf (COTS) hardware. Benefits and challenges to this approach have prepared engineering teams to take a hybrid approach when designing and implementing future exercise hardware. Significant effort has gone into consideration of hardware instrumentation and wearable devices that provide important data to monitor crew health and performance.
Skylab materials processing facility experiment developer's report

NASA Technical Reports Server (NTRS)

Parks, P. G.

1975-01-01

The development of the Skylab M512 Materials Processing Facility is traced from the design of a portable, self-contained electron beam welding system for terrestrial applications to the highly complex experiment system ultimately developed for three Skylab missions. The M512 experiment facility was designed to support six in-space experiments intended to explore the advantages of manufacturing materials in the near-zero-gravity environment of Earth orbit. Detailed descriptions of the M512 facility and related experiment hardware are provided, with discussions of hardware verification and man-machine interfaces included. An analysis of the operation of the facility and experiments during the three Skylab missions is presented, including discussions of the hardware performance, anomalies, and data returned to earth.
An Agent Inspired Reconfigurable Computing Implementation of a Genetic Algorithm

NASA Technical Reports Server (NTRS)

Weir, John M.; Wells, B. Earl

2003-01-01

Many software systems have been successfully implemented using an agent paradigm which employs a number of independent entities that communicate with one another to achieve a common goal. The distributed nature of such a paradigm makes it an excellent candidate for use in high speed reconfigurable computing hardware environments such as those present in modem FPGA's. In this paper, a distributed genetic algorithm that can be applied to the agent based reconfigurable hardware model is introduced. The effectiveness of this new algorithm is evaluated by comparing the quality of the solutions found by the new algorithm with those found by traditional genetic algorithms. The performance of a reconfigurable hardware implementation of the new algorithm on an FPGA is compared to traditional single processor implementations.
Multicore Challenges and Benefits for High Performance Scientific Computing

DOE PAGES

Nielsen, Ida M. B.; Janssen, Curtis L.

2008-01-01

Until recently, performance gains in processors were achieved largely by improvements in clock speeds and instruction level parallelism. Thus, applications could obtain performance increases with relatively minor changes by upgrading to the latest generation of computing hardware. Currently, however, processor performance improvements are realized by using multicore technology and hardware support for multiple threads within each core, and taking full advantage of this technology to improve the performance of applications requires exposure of extreme levels of software parallelism. We will here discuss the architecture of parallel computers constructed from many multicore chips as well as techniques for managing the complexitymore » of programming such computers, including the hybrid message-passing/multi-threading programming model. We will illustrate these ideas with a hybrid distributed memory matrix multiply and a quantum chemistry algorithm for energy computation using Møller–Plesset perturbation theory.« less
MSFC Skylab structures and mechanical systems mission evaluation

NASA Technical Reports Server (NTRS)

1974-01-01

A performance analysis for structural and mechanical major hardware systems and components is presented. Development background testing, modifications, and requirement adjustments are included. Functional narratives are provided for comparison purposes as are predicted design performance criterion. Each item is evaluated on an individual basis: that is, (1) history (requirements, design, manufacture, and test); (2) in-orbit performance (description and analysis); and (3) conclusions and recommendations regarding future space hardware application. Overall, the structural and mechanical performance of the Skylab hardware was outstanding.
FUX-Sim: Implementation of a fast universal simulation/reconstruction framework for X-ray systems.

PubMed

Abella, Monica; Serrano, Estefania; Garcia-Blas, Javier; García, Ines; de Molina, Claudia; Carretero, Jesus; Desco, Manuel

2017-01-01

The availability of digital X-ray detectors, together with advances in reconstruction algorithms, creates an opportunity for bringing 3D capabilities to conventional radiology systems. The downside is that reconstruction algorithms for non-standard acquisition protocols are generally based on iterative approaches that involve a high computational burden. The development of new flexible X-ray systems could benefit from computer simulations, which may enable performance to be checked before expensive real systems are implemented. The development of simulation/reconstruction algorithms in this context poses three main difficulties. First, the algorithms deal with large data volumes and are computationally expensive, thus leading to the need for hardware and software optimizations. Second, these optimizations are limited by the high flexibility required to explore new scanning geometries, including fully configurable positioning of source and detector elements. And third, the evolution of the various hardware setups increases the effort required for maintaining and adapting the implementations to current and future programming models. Previous works lack support for completely flexible geometries and/or compatibility with multiple programming models and platforms. In this paper, we present FUX-Sim, a novel X-ray simulation/reconstruction framework that was designed to be flexible and fast. Optimized implementation for different families of GPUs (CUDA and OpenCL) and multi-core CPUs was achieved thanks to a modularized approach based on a layered architecture and parallel implementation of the algorithms for both architectures. A detailed performance evaluation demonstrates that for different system configurations and hardware platforms, FUX-Sim maximizes performance with the CUDA programming model (5 times faster than other state-of-the-art implementations). Furthermore, the CPU and OpenCL programming models allow FUX-Sim to be executed over a wide range of hardware platforms.
18F-FDG PET/CT evaluation of children and young adults with suspected spinal fusion hardware infection.

PubMed

Bagrosky, Brian M; Hayes, Kari L; Koo, Phillip J; Fenton, Laura Z

2013-08-01

Evaluation of the child with spinal fusion hardware and concern for infection is challenging because of hardware artifact with standard imaging (CT and MRI) and difficult physical examination. Studies using (18)F-FDG PET/CT combine the benefit of functional imaging with anatomical localization. To discuss a case series of children and young adults with spinal fusion hardware and clinical concern for hardware infection. These people underwent FDG PET/CT imaging to determine the site of infection. We performed a retrospective review of whole-body FDG PET/CT scans at a tertiary children's hospital from December 2009 to January 2012 in children and young adults with spinal hardware and suspected hardware infection. The PET/CT scan findings were correlated with pertinent clinical information including laboratory values of inflammatory markers, postoperative notes and pathology results to evaluate the diagnostic accuracy of FDG PET/CT. An exempt status for this retrospective review was approved by the Institution Review Board. Twenty-five FDG PET/CT scans were performed in 20 patients. Spinal fusion hardware infection was confirmed surgically and pathologically in six patients. The most common FDG PET/CT finding in patients with hardware infection was increased FDG uptake in the soft tissue and bone immediately adjacent to the posterior spinal fusion rods at multiple contiguous vertebral levels. Noninfectious hardware complications were diagnosed in ten patients and proved surgically in four. Alternative sources of infection were diagnosed by FDG PET/CT in seven patients (five with pneumonia, one with pyonephrosis and one with superficial wound infections). FDG PET/CT is helpful in evaluation of children and young adults with concern for spinal hardware infection. Noninfectious hardware complications and alternative sources of infection, including pneumonia and pyonephrosis, can be diagnosed. FDG PET/CT should be the first-line cross-sectional imaging study in patients with suspected spinal hardware infection. Because pneumonia was diagnosed as often as spinal hardware infection, initial chest radiography should also be performed.
Missileborne Artificial Vision System (MAVIS)

NASA Technical Reports Server (NTRS)

Andes, David K.; Witham, James C.; Miles, Michael D.

1994-01-01

Several years ago when INTEL and China Lake designed the ETANN chip, analog VLSI appeared to be the only way to do high density neural computing. In the last five years, however, digital parallel processing chips capable of performing neural computation functions have evolved to the point of rough equality with analog chips in system level computational density. The Naval Air Warfare Center, China Lake, has developed a real time, hardware and software system designed to implement and evaluate biologically inspired retinal and cortical models. The hardware is based on the Adaptive Solutions Inc. massively parallel CNAPS system COHO boards. Each COHO board is a standard size 6U VME card featuring 256 fixed point, RISC processors running at 20 MHz in a SIMD configuration. Each COHO board has a companion board built to support a real time VSB interface to an imaging seeker, a NTSC camera, and to other COHO boards. The system is designed to have multiple SIMD machines each performing different corticomorphic functions. The system level software has been developed which allows a high level description of corticomorphic structures to be translated into the native microcode of the CNAPS chips. Corticomorphic structures are those neural structures with a form similar to that of the retina, the lateral geniculate nucleus, or the visual cortex. This real time hardware system is designed to be shrunk into a volume compatible with air launched tactical missiles. Initial versions of the software and hardware have been completed and are in the early stages of integration with a missile seeker.
Computing Cluster for Large Scale Turbulence Simulations and Applications in Computational Aeroacoustics

NASA Astrophysics Data System (ADS)

Lele, Sanjiva K.

2002-08-01

Funds were received in April 2001 under the Department of Defense DURIP program for construction of a 48 processor high performance computing cluster. This report details the hardware which was purchased and how it has been used to enable and enhance research activities directly supported by, and of interest to, the Air Force Office of Scientific Research and the Department of Defense. The report is divided into two major sections. The first section after this summary describes the computer cluster, its setup, and some cluster performance benchmark results. The second section explains ongoing research efforts which have benefited from the cluster hardware, and presents highlights of those efforts since installation of the cluster.

Using Technology Readiness Level (TRL), Life Cycle Cost (LCC), and Other Metrics to Supplement Equivalent System Mass (ESM) in Advanced Life Support (ALS)

NASA Technical Reports Server (NTRS)

Jones, Harry

2003-01-01

The ALS project plan goals are reducing cost, improving performance, and achieving flight readiness. ALS selects projects to advance the mission readiness of low cost, high performance technologies. The role of metrics is to help select good projects and report progress. The Equivalent Mass (EM) of a system is the sum of the estimated mass of the hardware, of its required materials and spares, and of the pressurized volume, power supply, and cooling system needed to support the hardware in space. EM is the total payload launch mass needed to provide and support a system. EM is directly proportional to the launch cost.
A VLSI VAX chip set

NASA Astrophysics Data System (ADS)

Johnson, W. N.; Herrick, W. V.; Grundmann, W. J.

1984-10-01

For the first time, VLSI technology is used to compress the full functinality and comparable performance of the VAX 11/780 super-minicomputer into a 1.2 M transistor microprocessor chip set. There was no subsetting of the 304 instruction set and the 17 data types, nor reduction in hardware support for the 4 Gbyte virtual memory management architecture. The chipset supports an integral 8 kbyte memory cache, a 13.3 Mbyte/s system bus, and sophisticated multiprocessing. High performance is achieved through microcode optimizations afforded by the large control store, tightly coupled address and data caches, the use of internal and external 32 bit datapaths, the extensive aplication of both microlevel and macrolevel pipelining, and the use of specialized hardware assists.
Hardware Removal in Craniomaxillofacial Trauma

PubMed Central

Cahill, Thomas J.; Gandhi, Rikesh; Allori, Alexander C.; Marcus, Jeffrey R.; Powers, David; Erdmann, Detlev; Hollenbeck, Scott T.; Levinson, Howard

2015-01-01

Background Craniomaxillofacial (CMF) fractures are typically treated with open reduction and internal fixation. Open reduction and internal fixation can be complicated by hardware exposure or infection. The literature often does not differentiate between these 2 entities; so for this study, we have considered all hardware exposures as hardware infections. Approximately 5% of adults with CMF trauma are thought to develop hardware infections. Management consists of either removing the hardware versus leaving it in situ. The optimal approach has not been investigated. Thus, a systematic review of the literature was undertaken and a resultant evidence-based approach to the treatment and management of CMF hardware infections was devised. Materials and Methods A comprehensive search of journal articles was performed in parallel using MEDLINE, Web of Science, and ScienceDirect electronic databases. Keywords and phrases used were maxillofacial injuries; facial bones; wounds and injuries; fracture fixation, internal; wound infection; and infection. Our search yielded 529 articles. To focus on CMF fractures with hardware infections, the full text of English-language articles was reviewed to identify articles focusing on the evaluation and management of infected hardware in CMF trauma. Each article’s reference list was manually reviewed and citation analysis performed to identify articles missed by the search strategy. There were 259 articles that met the full inclusion criteria and form the basis of this systematic review. The articles were rated based on the level of evidence. There were 81 grade II articles included in the meta-analysis. Result Our meta-analysis revealed that 7503 patients were treated with hardware for CMF fractures in the 81 grade II articles. Hardware infection occurred in 510 (6.8%) of these patients. Of those infections, hardware removal occurred in 264 (51.8%) patients; hardware was left in place in 166 (32.6%) patients; and in 80 (15.6%) cases, there was no report as to hardware management. Finally, our review revealed that there were no reported differences in outcomes between groups. Conclusions Management of CMF hardware infections should be performed in a sequential and consistent manner to optimize outcome. An evidence-based algorithm for management of CMF hardware infections based on this critical review of the literature is presented and discussed. PMID:25393499
Sequoia: A fault-tolerant tightly coupled multiprocessor for transaction processing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bernstein, P.A.

1988-02-01

The Sequoia computer is a tightly coupled multiprocessor, and thus attains the performance advantages of this style of architecture. It avoids most of the fault-tolerance disadvantages of tight coupling by using a new fault-tolerance design. The Sequoia architecture is similar to other multimicroprocessor architectures, such as those of Encore and Sequent, in that it gives dozens of microprocessors shared access to a large main memory. It resembles the Stratus architecture in its extensive use of hardware fault-detection techniques. It resembles Stratus and Auragen in its ability to quickly recover all processes after a single point failure, transparently to the user.more » However, Sequoia is unique in its combination of a large-scale tightly coupled architecture with a hardware approach to fault tolerance. This article gives an overview of how the hardware architecture and operating systems (OS) work together to provide a high degree of fault tolerance with good system performance.« less
Demonstration of automated proximity and docking technologies

NASA Astrophysics Data System (ADS)

Anderson, Robert L.; Tsugawa, Roy K.; Bryan, Thomas C.

An autodock was demonstrated using straightforward techniques and real sensor hardware. A simulation testbed was established and validated. The sensor design was refined with improved optical performance and image processing noise mitigation techniques, and the sensor is ready for production from off-the-shelf components. The autonomous spacecraft architecture is defined. The areas of sensors, docking hardware, propulsion, and avionics are included in the design. The Guidance Navigation and Control architecture and requirements are developed. Modular structures suitable for automated control are used. The spacecraft system manager functions including configuration, resource, and redundancy management are defined. The requirements for autonomous spacecraft executive are defined. High level decisionmaking, mission planning, and mission contingency recovery are a part of this. The next step is to do flight demonstrations. After the presentation the following question was asked. How do you define validation? There are two components to validation definition: software simulation with formal and vigorous validation, and hardware and facility performance validated with respect to software already validated against analytical profile.
Multipurpose Controller with EPICS integration and data logging: BPM application for ESS Bilbao

NASA Astrophysics Data System (ADS)

Arredondo, I.; del Campo, M.; Echevarria, P.; Jugo, J.; Etxebarria, V.

2013-10-01

This work presents a multipurpose configurable control system which can be integrated in an EPICS control network, this functionality being configured through a XML configuration file. The core of the system is the so-called Hardware Controller which is in charge of the control hardware management, the set up and communication with the EPICS network and the data storage. The reconfigurable nature of the controller is based on a single XML file, allowing any final user to easily modify and adjust the control system to any specific requirement. The selected Java development environment ensures a multiplatform operation and large versatility, even regarding the control hardware to be controlled. Specifically, this paper, focused on fast control based on a high performance FPGA, describes also an application approach for the ESS Bilbao's Beam Position Monitoring system. The implementation of the XML configuration file and the satisfactory performance outcome achieved are presented, as well as a general description of the Multipurpose Controller itself.
Diagnostic-management system and test pulse acquisition for WEST plasma measurement system

NASA Astrophysics Data System (ADS)

Wojenski, A.; Kasprowicz, G.; Pozniak, K. T.; Byszuk, A.; Juszczyk, B.; Zabolotny, W.; Zienkiewicz, P.; Chernyshova, M.; Czarski, T.; Mazon, D.; Malard, P.

2014-11-01

This paper describes current status of electronics, firmware and software development for new plasma measurement system for use in WEST facility. The system allows to perform two dimensional plasma visualization (in time) with spectrum measurement. The analog front-end is connected to Gas Electron Multiplier detector (GEM detector). The system architecture have high data throughput due to use of PCI-Express interface, Gigabit Transceivers and sampling frequency of ADC integrated circuits. The hardware is based on several years of experience in building X-ray spectrometer system for Joint European Torus (JET) facility. Data streaming is done using Artix7 FPGA devices. The system in basic configuration can work with up to 256 channels, while the maximum number of measurement channels is 2048. Advanced firmware for the FPGA is required in order to perform high speed data streaming and analog signal sampling. Diagnostic system management has been developed in order to configure measurement system, perform necessary calibration and prepare hardware for data acquisition.
ArControl: An Arduino-Based Comprehensive Behavioral Platform with Real-Time Performance.

PubMed

Chen, Xinfeng; Li, Haohong

2017-01-01

Studying animal behavior in the lab requires reliable delivering stimulations and monitoring responses. We constructed a comprehensive behavioral platform (ArControl: Arduino Control Platform) that was an affordable, easy-to-use, high-performance solution combined software and hardware components. The hardware component was consisted of an Arduino UNO board and a simple drive circuit. As for software, the ArControl provided a stand-alone and intuitive GUI (graphical user interface) application that did not require users to master scripts. The experiment data were automatically recorded with the built in DAQ (data acquisition) function. The ArControl also allowed the behavioral schedule to be entirely stored in and operated on the Arduino chip. This made the ArControl a genuine, real-time system with high temporal resolution (<1 ms). We tested the ArControl, based on strict performance measurements and two mice behavioral experiments. The results showed that the ArControl was an adaptive and reliable system suitable for behavioral research.
ArControl: An Arduino-Based Comprehensive Behavioral Platform with Real-Time Performance

PubMed Central

Chen, Xinfeng; Li, Haohong

2017-01-01

Studying animal behavior in the lab requires reliable delivering stimulations and monitoring responses. We constructed a comprehensive behavioral platform (ArControl: Arduino Control Platform) that was an affordable, easy-to-use, high-performance solution combined software and hardware components. The hardware component was consisted of an Arduino UNO board and a simple drive circuit. As for software, the ArControl provided a stand-alone and intuitive GUI (graphical user interface) application that did not require users to master scripts. The experiment data were automatically recorded with the built in DAQ (data acquisition) function. The ArControl also allowed the behavioral schedule to be entirely stored in and operated on the Arduino chip. This made the ArControl a genuine, real-time system with high temporal resolution (<1 ms). We tested the ArControl, based on strict performance measurements and two mice behavioral experiments. The results showed that the ArControl was an adaptive and reliable system suitable for behavioral research. PMID:29321735
Design and implementation of digital controllers for smart structures using field-programmable gate arrays

NASA Astrophysics Data System (ADS)

Kelly, Jamie S.; Bowman, Hiroshi C.; Rao, Vittal S.; Pottinger, Hardy J.

1997-06-01

Implementation issues represent an unfamiliar challenge to most control engineers, and many techniques for controller design ignore these issues outright. Consequently, the design of controllers for smart structural systems usually proceeds without regard for their eventual implementation, thus resulting either in serious performance degradation or in hardware requirements that squander power, complicate integration, and drive up cost. The level of integration assumed by the Smart Patch further exacerbates these difficulties, and any design inefficiency may render the realization of a single-package sensor-controller-actuator system infeasible. The goal of this research is to automate the controller implementation process and to relieve the design engineer of implementation concerns like quantization, computational efficiency, and device selection. We specifically target Field Programmable Gate Arrays (FPGAs) as our hardware platform because these devices are highly flexible, power efficient, and reprogrammable. The current study develops an automated implementation sequence that minimizes hardware requirements while maintaining controller performance. Beginning with a state space representation of the controller, the sequence automatically generates a configuration bitstream for a suitable FPGA implementation. MATLAB functions optimize and simulate the control algorithm before translating it into the VHSIC hardware description language. These functions improve power efficiency and simplify integration in the final implementation by performing a linear transformation that renders the controller computationally friendly. The transformation favors sparse matrices in order to reduce multiply operations and the hardware necessary to support them; simultaneously, the remaining matrix elements take on values that minimize limit cycles and parameter sensitivity. The proposed controller design methodology is implemented on a simple cantilever beam test structure using FPGA hardware. The experimental closed loop response is compared with that of an automated FPGA controller implementation. Finally, we explore the integration of FPGA based controllers into a multi-chip module, which we believe represents the next step towards the realization of the Smart Patch.
HDL Based FPGA Interface Library for Data Acquisition and Multipurpose Real Time Algorithms

NASA Astrophysics Data System (ADS)

Fernandes, Ana M.; Pereira, R. C.; Sousa, J.; Batista, A. J. N.; Combo, A.; Carvalho, B. B.; Correia, C. M. B. A.; Varandas, C. A. F.

2011-08-01

The inherent parallelism of the logic resources, the flexibility in its configuration and the performance at high processing frequencies makes the field programmable gate array (FPGA) the most suitable device to be used both for real time algorithm processing and data transfer in instrumentation modules. Moreover, the reconfigurability of these FPGA based modules enables exploiting different applications on the same module. When using a reconfigurable module for various applications, the availability of a common interface library for easier implementation of the algorithms on the FPGA leads to more efficient development. The FPGA configuration is usually specified in a hardware description language (HDL) or other higher level descriptive language. The critical paths, such as the management of internal hardware clocks that require deep knowledge of the module behavior shall be implemented in HDL to optimize the timing constraints. The common interface library should include these critical paths, freeing the application designer from hardware complexity and able to choose any of the available high-level abstraction languages for the algorithm implementation. With this purpose a modular Verilog code was developed for the Virtex 4 FPGA of the in-house Transient Recorder and Processor (TRP) hardware module, based on the Advanced Telecommunications Computing Architecture (ATCA), with eight channels sampling at up to 400 MSamples/s (MSPS). The TRP was designed to perform real time Pulse Height Analysis (PHA), Pulse Shape Discrimination (PSD) and Pile-Up Rejection (PUR) algorithms at a high count rate (few Mevent/s). A brief description of this modular code is presented and examples of its use as an interface with end user algorithms, including a PHA with PUR, are described.
Accelerating DNA analysis applications on GPU clusters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tumeo, Antonino; Villa, Oreste

DNA analysis is an emerging application of high performance bioinformatic. Modern sequencing machinery are able to provide, in few hours, large input streams of data which needs to be matched against exponentially growing databases known fragments. The ability to recognize these patterns effectively and fastly may allow extending the scale and the reach of the investigations performed by biology scientists. Aho-Corasick is an exact, multiple pattern matching algorithm often at the base of this application. High performance systems are a promising platform to accelerate this algorithm, which is computationally intensive but also inherently parallel. Nowadays, high performance systems also includemore » heterogeneous processing elements, such as Graphic Processing Units (GPUs), to further accelerate parallel algorithms. Unfortunately, the Aho-Corasick algorithm exhibits large performance variabilities, depending on the size of the input streams, on the number of patterns to search and on the number of matches, and poses significant challenges on current high performance software and hardware implementations. An adequate mapping of the algorithm on the target architecture, coping with the limit of the underlining hardware, is required to reach the desired high throughputs. Load balancing also plays a crucial role when considering the limited bandwidth among the nodes of these systems. In this paper we present an efficient implementation of the Aho-Corasick algorithm for high performance clusters accelerated with GPUs. We discuss how we partitioned and adapted the algorithm to fit the Tesla C1060 GPU and then present a MPI based implementation for a heterogeneous high performance cluster. We compare this implementation to MPI and MPI with pthreads based implementations for a homogeneous cluster of x86 processors, discussing the stability vs. the performance and the scaling of the solutions, taking into consideration aspects such as the bandwidth among the different nodes.« less
Performance Prediction Toolkit

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chennupati, Gopinath; Santhi, Nanadakishore; Eidenbenz, Stephen

The Performance Prediction Toolkit (PPT), is a scalable co-design tool that contains the hardware and middle-ware models, which accept proxy applications as input in runtime prediction. PPT relies on Simian, a parallel discrete event simulation engine in Python or Lua, that uses the process concept, where each computing unit (host, node, core) is a Simian entity. Processes perform their task through message exchanges to remain active, sleep, wake-up, begin and end. The PPT hardware model of a compute core (such as a Haswell core) consists of a set of parameters, such as clock speed, memory hierarchy levels, their respective sizes,more » cache-lines, access times for different cache levels, average cycle counts of ALU operations, etc. These parameters are ideally read off a spec sheet or are learned using regression models learned from hardware counters (PAPI) data. The compute core model offers an API to the software model, a function called time_compute(), which takes as input a tasklist. A tasklist is an unordered set of ALU, and other CPU-type operations (in particular virtual memory loads and stores). The PPT application model mimics the loop structure of the application and replaces the computational kernels with a call to the hardware model's time_compute() function giving tasklists as input that model the compute kernel. A PPT application model thus consists of tasklists representing kernels and the high-er level loop structure that we like to think of as pseudo code. The key challenge for the hardware model's time_compute-function is to translate virtual memory accesses into actual cache hierarchy level hits and misses.PPT also contains another CPU core level hardware model, Analytical Memory Model (AMM). The AMM solves this challenge soundly, where our previous alternatives explicitly include the L1,L2,L3 hit-rates as inputs to the tasklists. Explicit hit-rates inevitably only reflect the application modeler's best guess, perhaps informed by a few small test problems using hardware counters; also, hard-coded hit-rates make the hardware model insensitive to changes in cache sizes. Alternatively, we use reuse distance distributions in the tasklists. In general, reuse profiles require the application modeler to run a very expensive trace analysis on the real code that realistically can be done at best for small examples.« less
Comparison between Frame-Constrained Fix-Pixel-Value and Frame-Free Spiking-Dynamic-Pixel ConvNets for Visual Processing

PubMed Central

Farabet, Clément; Paz, Rafael; Pérez-Carrasco, Jose; Zamarreño-Ramos, Carlos; Linares-Barranco, Alejandro; LeCun, Yann; Culurciello, Eugenio; Serrano-Gotarredona, Teresa; Linares-Barranco, Bernabe

2012-01-01

Most scene segmentation and categorization architectures for the extraction of features in images and patches make exhaustive use of 2D convolution operations for template matching, template search, and denoising. Convolutional Neural Networks (ConvNets) are one example of such architectures that can implement general-purpose bio-inspired vision systems. In standard digital computers 2D convolutions are usually expensive in terms of resource consumption and impose severe limitations for efficient real-time applications. Nevertheless, neuro-cortex inspired solutions, like dedicated Frame-Based or Frame-Free Spiking ConvNet Convolution Processors, are advancing real-time visual processing. These two approaches share the neural inspiration, but each of them solves the problem in different ways. Frame-Based ConvNets process frame by frame video information in a very robust and fast way that requires to use and share the available hardware resources (such as: multipliers, adders). Hardware resources are fixed- and time-multiplexed by fetching data in and out. Thus memory bandwidth and size is important for good performance. On the other hand, spike-based convolution processors are a frame-free alternative that is able to perform convolution of a spike-based source of visual information with very low latency, which makes ideal for very high-speed applications. However, hardware resources need to be available all the time and cannot be time-multiplexed. Thus, hardware should be modular, reconfigurable, and expansible. Hardware implementations in both VLSI custom integrated circuits (digital and analog) and FPGA have been already used to demonstrate the performance of these systems. In this paper we present a comparison study of these two neuro-inspired solutions. A brief description of both systems is presented and also discussions about their differences, pros and cons. PMID:22518097
Comparison between Frame-Constrained Fix-Pixel-Value and Frame-Free Spiking-Dynamic-Pixel ConvNets for Visual Processing.

PubMed

Farabet, Clément; Paz, Rafael; Pérez-Carrasco, Jose; Zamarreño-Ramos, Carlos; Linares-Barranco, Alejandro; Lecun, Yann; Culurciello, Eugenio; Serrano-Gotarredona, Teresa; Linares-Barranco, Bernabe

2012-01-01

Most scene segmentation and categorization architectures for the extraction of features in images and patches make exhaustive use of 2D convolution operations for template matching, template search, and denoising. Convolutional Neural Networks (ConvNets) are one example of such architectures that can implement general-purpose bio-inspired vision systems. In standard digital computers 2D convolutions are usually expensive in terms of resource consumption and impose severe limitations for efficient real-time applications. Nevertheless, neuro-cortex inspired solutions, like dedicated Frame-Based or Frame-Free Spiking ConvNet Convolution Processors, are advancing real-time visual processing. These two approaches share the neural inspiration, but each of them solves the problem in different ways. Frame-Based ConvNets process frame by frame video information in a very robust and fast way that requires to use and share the available hardware resources (such as: multipliers, adders). Hardware resources are fixed- and time-multiplexed by fetching data in and out. Thus memory bandwidth and size is important for good performance. On the other hand, spike-based convolution processors are a frame-free alternative that is able to perform convolution of a spike-based source of visual information with very low latency, which makes ideal for very high-speed applications. However, hardware resources need to be available all the time and cannot be time-multiplexed. Thus, hardware should be modular, reconfigurable, and expansible. Hardware implementations in both VLSI custom integrated circuits (digital and analog) and FPGA have been already used to demonstrate the performance of these systems. In this paper we present a comparison study of these two neuro-inspired solutions. A brief description of both systems is presented and also discussions about their differences, pros and cons.
Reliability and Qualification of Hardware to Enhance the Mission Assurance of JPL/NASA Projects

NASA Technical Reports Server (NTRS)

Ramesham, Rajeshuni

2010-01-01

Packaging Qualification and Verification (PQV) and life testing of advanced electronic packaging, mechanical assemblies (motors/actuators), and interconnect technologies (flip-chip), platinum temperature thermometer attachment processes, and various other types of hardware for Mars Exploration Rover (MER)/Mars Science Laboratory (MSL), and JUNO flight projects was performed to enhance the mission assurance. The qualification of hardware under extreme cold to hot temperatures was performed with reference to various project requirements. The flight like packages, assemblies, test coupons, and subassemblies were selected for the study to survive three times the total number of expected temperature cycles resulting from all environmental and operational exposures occurring over the life of the flight hardware including all relevant manufacturing, ground operations, and mission phases. Qualification/life testing was performed by subjecting flight-like qualification hardware to the environmental temperature extremes and assessing any structural failures, mechanical failures or degradation in electrical performance due to either overstress or thermal cycle fatigue. Experimental flight qualification test results will be described in this presentation.
An Application Development Platform for Neuromorphic Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dean, Mark; Chan, Jason; Daffron, Christopher

2016-01-01

Dynamic Adaptive Neural Network Arrays (DANNAs) are neuromorphic computing systems developed as a hardware based approach to the implementation of neural networks. They feature highly adaptive and programmable structural elements, which model arti cial neural networks with spiking behavior. We design them to solve problems using evolutionary optimization. In this paper, we highlight the current hardware and software implementations of DANNA, including their features, functionalities and performance. We then describe the development of an Application Development Platform (ADP) to support efficient application implementation and testing of DANNA based solutions. We conclude with future directions.
Hardware Based Technology Assessment in Support of Near-Term Space Fission Missions

NASA Technical Reports Server (NTRS)

Houts, Mike; VanDyke, Melissa; Godfroy, Tom; Martin, James; BraggSitton, Shannon; Carter, Robert; Dickens, Ricky; Salvail, Pat; Williams, Eric; Harper, Roger

2003-01-01

Fission technology can enable rapid, affordable access to any point in the solar system. If fission propulsion systems are to be developed to their full potential; however, near-term customers must be identified and initial fission systems successfully developed, launched, and utilized. Successful utilization will most likely occur if frequent, significant hardware-based milestones can be achieved throughout the program. Achieving these milestones will depend on the capability to perform highly realistic non-nuclear testing of nuclear systems. This paper discusses ongoing and potential research that could help achieve these milestones.
Electrochemical carbon dioxide concentrator advanced technology tasks

NASA Technical Reports Server (NTRS)

Schneider, J. J.; Schubert, F. H.; Hallick, T. M.; Woods, R. R.

1975-01-01

Technology advancement studies are reported on the basic electrochemical CO2 removal process to provide a basis for the design of the next generation cell, module and subsystem hardware. An Advanced Electrochemical Depolarized Concentrator Module (AEDCM) is developed that has the characteristics of low weight, low volume, high CO2, removal, good electrical performance and low process air pressure drop. Component weight and noise reduction for the hardware of a six man capacity CO2 collection subsystem was developed for the air revitalization group of the Space Station Prototype (SSP).
BEAGLE: an application programming interface and high-performance computing library for statistical phylogenetics.

PubMed

Ayres, Daniel L; Darling, Aaron; Zwickl, Derrick J; Beerli, Peter; Holder, Mark T; Lewis, Paul O; Huelsenbeck, John P; Ronquist, Fredrik; Swofford, David L; Cummings, Michael P; Rambaut, Andrew; Suchard, Marc A

2012-01-01

Phylogenetic inference is fundamental to our understanding of most aspects of the origin and evolution of life, and in recent years, there has been a concentration of interest in statistical approaches such as Bayesian inference and maximum likelihood estimation. Yet, for large data sets and realistic or interesting models of evolution, these approaches remain computationally demanding. High-throughput sequencing can yield data for thousands of taxa, but scaling to such problems using serial computing often necessitates the use of nonstatistical or approximate approaches. The recent emergence of graphics processing units (GPUs) provides an opportunity to leverage their excellent floating-point computational performance to accelerate statistical phylogenetic inference. A specialized library for phylogenetic calculation would allow existing software packages to make more effective use of available computer hardware, including GPUs. Adoption of a common library would also make it easier for other emerging computing architectures, such as field programmable gate arrays, to be used in the future. We present BEAGLE, an application programming interface (API) and library for high-performance statistical phylogenetic inference. The API provides a uniform interface for performing phylogenetic likelihood calculations on a variety of compute hardware platforms. The library includes a set of efficient implementations and can currently exploit hardware including GPUs using NVIDIA CUDA, central processing units (CPUs) with Streaming SIMD Extensions and related processor supplementary instruction sets, and multicore CPUs via OpenMP. To demonstrate the advantages of a common API, we have incorporated the library into several popular phylogenetic software packages. The BEAGLE library is free open source software licensed under the Lesser GPL and available from http://beagle-lib.googlecode.com. An example client program is available as public domain software.

BEAGLE: An Application Programming Interface and High-Performance Computing Library for Statistical Phylogenetics

PubMed Central

Ayres, Daniel L.; Darling, Aaron; Zwickl, Derrick J.; Beerli, Peter; Holder, Mark T.; Lewis, Paul O.; Huelsenbeck, John P.; Ronquist, Fredrik; Swofford, David L.; Cummings, Michael P.; Rambaut, Andrew; Suchard, Marc A.

2012-01-01

Abstract Phylogenetic inference is fundamental to our understanding of most aspects of the origin and evolution of life, and in recent years, there has been a concentration of interest in statistical approaches such as Bayesian inference and maximum likelihood estimation. Yet, for large data sets and realistic or interesting models of evolution, these approaches remain computationally demanding. High-throughput sequencing can yield data for thousands of taxa, but scaling to such problems using serial computing often necessitates the use of nonstatistical or approximate approaches. The recent emergence of graphics processing units (GPUs) provides an opportunity to leverage their excellent floating-point computational performance to accelerate statistical phylogenetic inference. A specialized library for phylogenetic calculation would allow existing software packages to make more effective use of available computer hardware, including GPUs. Adoption of a common library would also make it easier for other emerging computing architectures, such as field programmable gate arrays, to be used in the future. We present BEAGLE, an application programming interface (API) and library for high-performance statistical phylogenetic inference. The API provides a uniform interface for performing phylogenetic likelihood calculations on a variety of compute hardware platforms. The library includes a set of efficient implementations and can currently exploit hardware including GPUs using NVIDIA CUDA, central processing units (CPUs) with Streaming SIMD Extensions and related processor supplementary instruction sets, and multicore CPUs via OpenMP. To demonstrate the advantages of a common API, we have incorporated the library into several popular phylogenetic software packages. The BEAGLE library is free open source software licensed under the Lesser GPL and available from http://beagle-lib.googlecode.com. An example client program is available as public domain software. PMID:21963610
High-performance reactionless scan mechanism

NASA Technical Reports Server (NTRS)

Williams, Ellen I.; Summers, Richard T.; Ostaszewski, Miroslaw A.

1995-01-01

A high-performance reactionless scan mirror mechanism was developed for space applications to provide thermal images of the Earth. The design incorporates a unique mechanical means of providing reactionless operation that also minimizes weight, mechanical resonance operation to minimize power, combined use of a single optical encoder to sense coarse and fine angular position, and a new kinematic mount of the mirror. A flex pivot hardware failure and current project status are discussed.
Practical End-to-End Performance Testing Tool for High Speed 3G-Based Networks

NASA Astrophysics Data System (ADS)

Shinbo, Hiroyuki; Tagami, Atsushi; Ano, Shigehiro; Hasegawa, Toru; Suzuki, Kenji

High speed IP communication is a killer application for 3rd generation (3G) mobile systems. Thus 3G network operators should perform extensive tests to check whether expected end-to-end performances are provided to customers under various environments. An important objective of such tests is to check whether network nodes fulfill requirements to durations of processing packets because a long duration of such processing causes performance degradation. This requires testers (persons who do tests) to precisely know how long a packet is hold by various network nodes. Without any tool's help, this task is time-consuming and error prone. Thus we propose a multi-point packet header analysis tool which extracts and records packet headers with synchronized timestamps at multiple observation points. Such recorded packet headers enable testers to calculate such holding durations. The notable feature of this tool is that it is implemented on off-the shelf hardware platforms, i.e., lap-top personal computers. The key challenges of the implementation are precise clock synchronization without any special hardware and a sophisticated header extraction algorithm without any drop.
Current state and future direction of computer systems at NASA Langley Research Center

NASA Technical Reports Server (NTRS)

Rogers, James L. (Editor); Tucker, Jerry H. (Editor)

1992-01-01

Computer systems have advanced at a rate unmatched by any other area of technology. As performance has dramatically increased there has been an equally dramatic reduction in cost. This constant cost performance improvement has precipitated the pervasiveness of computer systems into virtually all areas of technology. This improvement is due primarily to advances in microelectronics. Most people are now convinced that the new generation of supercomputers will be built using a large number (possibly thousands) of high performance microprocessors. Although the spectacular improvements in computer systems have come about because of these hardware advances, there has also been a steady improvement in software techniques. In an effort to understand how these hardware and software advances will effect research at NASA LaRC, the Computer Systems Technical Committee drafted this white paper to examine the current state and possible future directions of computer systems at the Center. This paper discusses selected important areas of computer systems including real-time systems, embedded systems, high performance computing, distributed computing networks, data acquisition systems, artificial intelligence, and visualization.
The Caltech Concurrent Computation Program - Project description

NASA Technical Reports Server (NTRS)

Fox, G.; Otto, S.; Lyzenga, G.; Rogstad, D.

1985-01-01

The Caltech Concurrent Computation Program wwhich studies basic issues in computational science is described. The research builds on initial work where novel concurrent hardware, the necessary systems software to use it and twenty significant scientific implementations running on the initial 32, 64, and 128 node hypercube machines have been constructed. A major goal of the program will be to extend this work into new disciplines and more complex algorithms including general packages that decompose arbitrary problems in major application areas. New high-performance concurrent processors with up to 1024-nodes, over a gigabyte of memory and multigigaflop performance are being constructed. The implementations cover a wide range of problems in areas such as high energy and astrophysics, condensed matter, chemical reactions, plasma physics, applied mathematics, geophysics, simulation, CAD for VLSI, graphics and image processing. The products of the research program include the concurrent algorithms, hardware, systems software, and complete program implementations.
Highly efficient simulation environment for HDTV video decoder in VLSI design

NASA Astrophysics Data System (ADS)

Mao, Xun; Wang, Wei; Gong, Huimin; He, Yan L.; Lou, Jian; Yu, Lu; Yao, Qingdong; Pirsch, Peter

2002-01-01

With the increase of the complex of VLSI such as the SoC (System on Chip) of MPEG-2 Video decoder with HDTV scalability especially, simulation and verification of the full design, even as high as the behavior level in HDL, often proves to be very slow, costly and it is difficult to perform full verification until late in the design process. Therefore, they become bottleneck of the procedure of HDTV video decoder design, and influence it's time-to-market mostly. In this paper, the architecture of Hardware/Software Interface of HDTV video decoder is studied, and a Hardware-Software Mixed Simulation (HSMS) platform is proposed to check and correct error in the early design stage, based on the algorithm of MPEG-2 video decoding. The application of HSMS to target system could be achieved by employing several introduced approaches. Those approaches speed up the simulation and verification task without decreasing performance.
Component-Level Electronic-Assembly Repair (CLEAR) Synthetic Instrument Capabilities Assessment and Test Report

NASA Technical Reports Server (NTRS)

Oeftering, Richard C.; Bradish, Martin A.

2011-01-01

The role of synthetic instruments (SIs) for Component-Level Electronic-Assembly Repair (CLEAR) is to provide an external lower-level diagnostic and functional test capability beyond the built-in-test capabilities of spacecraft electronics. Built-in diagnostics can report faults and symptoms, but isolating the root cause and performing corrective action requires specialized instruments. Often a fault can be revealed by emulating the operation of external hardware. This implies complex hardware that is too massive to be accommodated in spacecraft. The SI strategy is aimed at minimizing complexity and mass by employing highly reconfigurable instruments that perform diagnostics and emulate external functions. In effect, SI can synthesize an instrument on demand. The SI architecture section of this document summarizes the result of a recent program diagnostic and test needs assessment based on the International Space Station. The SI architecture addresses operational issues such as minimizing crew time and crew skill level, and the SI data transactions between the crew and supporting ground engineering searching for the root cause and formulating corrective actions. SI technology is described within a teleoperations framework. The remaining sections describe a lab demonstration intended to show that a single SI circuit could synthesize an instrument in hardware and subsequently clear the hardware and synthesize a completely different instrument on demand. An analysis of the capabilities and limitations of commercially available SI hardware and programming tools is included. Future work in SI technology is also described.
Mixel camera--a new push-broom camera concept for high spatial resolution keystone-free hyperspectral imaging.

PubMed

Høye, Gudrun; Fridman, Andrei

2013-05-06

Current high-resolution push-broom hyperspectral cameras introduce keystone errors to the captured data. Efforts to correct these errors in hardware severely limit the optical design, in particular with respect to light throughput and spatial resolution, while at the same time the residual keystone often remains large. The mixel camera solves this problem by combining a hardware component--an array of light mixing chambers--with a mathematical method that restores the hyperspectral data to its keystone-free form, based on the data that was recorded onto the sensor with large keystone. A Virtual Camera software, that was developed specifically for this purpose, was used to compare the performance of the mixel camera to traditional cameras that correct keystone in hardware. The mixel camera can collect at least four times more light than most current high-resolution hyperspectral cameras, and simulations have shown that the mixel camera will be photon-noise limited--even in bright light--with a significantly improved signal-to-noise ratio compared to traditional cameras. A prototype has been built and is being tested.
Testing and evaluation of tactical electro-optical sensors

NASA Astrophysics Data System (ADS)

Middlebrook, Christopher T.; Smith, John G.

2002-07-01

As integrated electro-optical sensor payloads (multi- sensors) comprised of infrared imagers, visible imagers, and lasers advance in performance, the tests and testing methods must also advance in order to fully evaluate them. Future operational requirements will require integrated sensor payloads to perform missions at further ranges and with increased targeting accuracy. In order to meet these requirements sensors will require advanced imaging algorithms, advanced tracking capability, high-powered lasers, and high-resolution imagers. To meet the U.S. Navy's testing requirements of such multi-sensors, the test and evaluation group in the Night Vision and Chemical Biological Warfare Department at NAVSEA Crane is developing automated testing methods, and improved tests to evaluate imaging algorithms, and procuring advanced testing hardware to measure high resolution imagers and line of sight stabilization of targeting systems. This paper addresses: descriptions of the multi-sensor payloads tested, testing methods used and under development, and the different types of testing hardware and specific payload tests that are being developed and used at NAVSEA Crane.
APRON: A Cellular Processor Array Simulation and Hardware Design Tool

NASA Astrophysics Data System (ADS)

Barr, David R. W.; Dudek, Piotr

2009-12-01

We present a software environment for the efficient simulation of cellular processor arrays (CPAs). This software (APRON) is used to explore algorithms that are designed for massively parallel fine-grained processor arrays, topographic multilayer neural networks, vision chips with SIMD processor arrays, and related architectures. The software uses a highly optimised core combined with a flexible compiler to provide the user with tools for the design of new processor array hardware architectures and the emulation of existing devices. We present performance benchmarks for the software processor array implemented on standard commodity microprocessors. APRON can be configured to use additional processing hardware if necessary and can be used as a complete graphical user interface and development environment for new or existing CPA systems, allowing more users to develop algorithms for CPA systems.
High-pressure LOX/hydrocarbon preburners and gas generators

NASA Technical Reports Server (NTRS)

Huebner, A. W.

1981-01-01

The objective of the program was to conduct a small scale hardware test program to establish the technology base required for LOX/hydrocarbon preburners and gas generators. The program consisted of six major tasks; Task I reviewed and assessed the performance prediction models and defined a subscale test program. Task II designed and fabricated this subscale hardware. Task III tested and analyzed the data from this hardware. Task IV analyzed the hot fire results and formulated a preliminary design for 40K preburner assemblies. Task V took the preliminary design and detailed and fabricated three 40K size preburner assemblies, one each fuel-rich LOX/CH, and LOX/RP-1 and one oxidizer rich LOX/CH4. Task VI delivered these preburner assemblies to MSFC for subsequent evaluation.
Real-time dynamics simulation of the Cassini spacecraft using DARTS. Part 1: Functional capabilities and the spatial algebra algorithm

NASA Technical Reports Server (NTRS)

Jain, A.; Man, G. K.

1993-01-01

This paper describes the Dynamics Algorithms for Real-Time Simulation (DARTS) real-time hardware-in-the-loop dynamics simulator for the National Aeronautics and Space Administration's Cassini spacecraft. The spacecraft model consists of a central flexible body with a number of articulated rigid-body appendages. The demanding performance requirements from the spacecraft control system require the use of a high fidelity simulator for control system design and testing. The DARTS algorithm provides a new algorithmic and hardware approach to the solution of this hardware-in-the-loop simulation problem. It is based upon the efficient spatial algebra dynamics for flexible multibody systems. A parallel and vectorized version of this algorithm is implemented on a low-cost, multiprocessor computer to meet the simulation timing requirements.
Electrochemical carbon dioxide concentrator subsystem development

NASA Technical Reports Server (NTRS)

Koszenski, E. P.; Heppner, D. B.; Bunnell, C. T.

1986-01-01

The most promising concept for a regenerative CO2 removal system for long duration manned space flight is the Electrochemical CO2 Concentrator (EDC), which allows for the continuous, efficient removal of CO2 from the spacecraft cabin. This study addresses the advancement of the EDC system by generating subsystem and ancillary component reliability data through extensive endurance testing and developing related hardware components such as electrochemical module lightweight end plates, electrochemical module improved isolation valves, an improved air/liquid heat exchanger and a triple redundant relative humidity sensor. Efforts included fabrication and testing the EDC with a Sabatier CO2 Reduction Reactor and generation of data necessary for integration of the EDC into a space station air revitalization system. The results verified the high level of performance, reliability and durability of the EDC subsystem and ancillary hardware, verified the high efficiency of the Sabatier CO2 Reduction Reactor, and increased the overall EDC technology engineering data base. The study concluded that the EDC system is approaching the hardware maturity levels required for space station deployment.
A data-driven modeling approach to stochastic computation for low-energy biomedical devices.

PubMed

Lee, Kyong Ho; Jang, Kuk Jin; Shoeb, Ali; Verma, Naveen

2011-01-01

Low-power devices that can detect clinically relevant correlations in physiologically-complex patient signals can enable systems capable of closed-loop response (e.g., controlled actuation of therapeutic stimulators, continuous recording of disease states, etc.). In ultra-low-power platforms, however, hardware error sources are becoming increasingly limiting. In this paper, we present how data-driven methods, which allow us to accurately model physiological signals, also allow us to effectively model and overcome prominent hardware error sources with nearly no additional overhead. Two applications, EEG-based seizure detection and ECG-based arrhythmia-beat classification, are synthesized to a logic-gate implementation, and two prominent error sources are introduced: (1) SRAM bit-cell errors and (2) logic-gate switching errors ('stuck-at' faults). Using patient data from the CHB-MIT and MIT-BIH databases, performance similar to error-free hardware is achieved even for very high fault rates (up to 0.5 for SRAMs and 7 × 10(-2) for logic) that cause computational bit error rates as high as 50%.
Automated Software Acceleration in Programmable Logic for an Efficient NFFT Algorithm Implementation: A Case Study.

PubMed

Rodríguez, Manuel; Magdaleno, Eduardo; Pérez, Fernando; García, Cristhian

2017-03-28

Non-equispaced Fast Fourier transform (NFFT) is a very important algorithm in several technological and scientific areas such as synthetic aperture radar, computational photography, medical imaging, telecommunications, seismic analysis and so on. However, its computation complexity is high. In this paper, we describe an efficient NFFT implementation with a hardware coprocessor using an All-Programmable System-on-Chip (APSoC). This is a hybrid device that employs an Advanced RISC Machine (ARM) as Processing System with Programmable Logic for high-performance digital signal processing through parallelism and pipeline techniques. The algorithm has been coded in C language with pragma directives to optimize the architecture of the system. We have used the very novel Software Develop System-on-Chip (SDSoC) evelopment tool that simplifies the interface and partitioning between hardware and software. This provides shorter development cycles and iterative improvements by exploring several architectures of the global system. The computational results shows that hardware acceleration significantly outperformed the software based implementation.
Automated Software Acceleration in Programmable Logic for an Efficient NFFT Algorithm Implementation: A Case Study

PubMed Central

Rodríguez, Manuel; Magdaleno, Eduardo; Pérez, Fernando; García, Cristhian

2017-01-01

Non-equispaced Fast Fourier transform (NFFT) is a very important algorithm in several technological and scientific areas such as synthetic aperture radar, computational photography, medical imaging, telecommunications, seismic analysis and so on. However, its computation complexity is high. In this paper, we describe an efficient NFFT implementation with a hardware coprocessor using an All-Programmable System-on-Chip (APSoC). This is a hybrid device that employs an Advanced RISC Machine (ARM) as Processing System with Programmable Logic for high-performance digital signal processing through parallelism and pipeline techniques. The algorithm has been coded in C language with pragma directives to optimize the architecture of the system. We have used the very novel Software Develop System-on-Chip (SDSoC) evelopment tool that simplifies the interface and partitioning between hardware and software. This provides shorter development cycles and iterative improvements by exploring several architectures of the global system. The computational results shows that hardware acceleration significantly outperformed the software based implementation. PMID:28350358
A low cost implementation of multi-parameter patient monitor using intersection kernel support vector machine classifier

NASA Astrophysics Data System (ADS)

Mohan, Dhanya; Kumar, C. Santhosh

2016-03-01

Predicting the physiological condition (normal/abnormal) of a patient is highly desirable to enhance the quality of health care. Multi-parameter patient monitors (MPMs) using heart rate, arterial blood pressure, respiration rate and oxygen saturation (S pO2) as input parameters were developed to monitor the condition of patients, with minimum human resource utilization. The Support vector machine (SVM), an advanced machine learning approach popularly used for classification and regression is used for the realization of MPMs. For making MPMs cost effective, we experiment on the hardware implementation of the MPM using support vector machine classifier. The training of the system is done using the matlab environment and the detection of the alarm/noalarm condition is implemented in hardware. We used different kernels for SVM classification and note that the best performance was obtained using intersection kernel SVM (IKSVM). The intersection kernel support vector machine classifier MPM has outperformed the best known MPM using radial basis function kernel by an absoute improvement of 2.74% in accuracy, 1.86% in sensitivity and 3.01% in specificity. The hardware model was developed based on the improved performance system using Verilog Hardware Description Language and was implemented on Altera cyclone-II development board.
A Modular Framework for Modeling Hardware Elements in Distributed Engine Control Systems

NASA Technical Reports Server (NTRS)

Zinnecker, Alicia M.; Culley, Dennis E.; Aretskin-Hariton, Eliot D.

2014-01-01

Progress toward the implementation of distributed engine control in an aerospace application may be accelerated through the development of a hardware-in-the-loop (HIL) system for testing new control architectures and hardware outside of a physical test cell environment. One component required in an HIL simulation system is a high-fidelity model of the control platform: sensors, actuators, and the control law. The control system developed for the Commercial Modular Aero-Propulsion System Simulation 40k (C-MAPSS40k) provides a verifiable baseline for development of a model for simulating a distributed control architecture. This distributed controller model will contain enhanced hardware models, capturing the dynamics of the transducer and the effects of data processing, and a model of the controller network. A multilevel framework is presented that establishes three sets of interfaces in the control platform: communication with the engine (through sensors and actuators), communication between hardware and controller (over a network), and the physical connections within individual pieces of hardware. This introduces modularity at each level of the model, encouraging collaboration in the development and testing of various control schemes or hardware designs. At the hardware level, this modularity is leveraged through the creation of a Simulink(R) library containing blocks for constructing smart transducer models complying with the IEEE 1451 specification. These hardware models were incorporated in a distributed version of the baseline C-MAPSS40k controller and simulations were run to compare the performance of the two models. The overall tracking ability differed only due to quantization effects in the feedback measurements in the distributed controller. Additionally, it was also found that the added complexity of the smart transducer models did not prevent real-time operation of the distributed controller model, a requirement of an HIL system.
A Modular Framework for Modeling Hardware Elements in Distributed Engine Control Systems

NASA Technical Reports Server (NTRS)

Zinnecker, Alicia M.; Culley, Dennis E.; Aretskin-Hariton, Eliot D.

2015-01-01

Progress toward the implementation of distributed engine control in an aerospace application may be accelerated through the development of a hardware-in-the-loop (HIL) system for testing new control architectures and hardware outside of a physical test cell environment. One component required in an HIL simulation system is a high-fidelity model of the control platform: sensors, actuators, and the control law. The control system developed for the Commercial Modular Aero-Propulsion System Simulation 40k (C-MAPSS40k) provides a verifiable baseline for development of a model for simulating a distributed control architecture. This distributed controller model will contain enhanced hardware models, capturing the dynamics of the transducer and the effects of data processing, and a model of the controller network. A multilevel framework is presented that establishes three sets of interfaces in the control platform: communication with the engine (through sensors and actuators), communication between hardware and controller (over a network), and the physical connections within individual pieces of hardware. This introduces modularity at each level of the model, encouraging collaboration in the development and testing of various control schemes or hardware designs. At the hardware level, this modularity is leveraged through the creation of a SimulinkR library containing blocks for constructing smart transducer models complying with the IEEE 1451 specification. These hardware models were incorporated in a distributed version of the baseline C-MAPSS40k controller and simulations were run to compare the performance of the two models. The overall tracking ability differed only due to quantization effects in the feedback measurements in the distributed controller. Additionally, it was also found that the added complexity of the smart transducer models did not prevent real-time operation of the distributed controller model, a requirement of an HIL system.
A Modular Framework for Modeling Hardware Elements in Distributed Engine Control Systems

NASA Technical Reports Server (NTRS)

Zinnecker, Alicia Mae; Culley, Dennis E.; Aretskin-Hariton, Eliot D.

2014-01-01

Progress toward the implementation of distributed engine control in an aerospace application may be accelerated through the development of a hardware-in-the-loop (HIL) system for testing new control architectures and hardware outside of a physical test cell environment. One component required in an HIL simulation system is a high-fidelity model of the control platform: sensors, actuators, and the control law. The control system developed for the Commercial Modular Aero-Propulsion System Simulation 40k (40,000 pound force thrust) (C-MAPSS40k) provides a verifiable baseline for development of a model for simulating a distributed control architecture. This distributed controller model will contain enhanced hardware models, capturing the dynamics of the transducer and the effects of data processing, and a model of the controller network. A multilevel framework is presented that establishes three sets of interfaces in the control platform: communication with the engine (through sensors and actuators), communication between hardware and controller (over a network), and the physical connections within individual pieces of hardware. This introduces modularity at each level of the model, encouraging collaboration in the development and testing of various control schemes or hardware designs. At the hardware level, this modularity is leveraged through the creation of a Simulink (R) library containing blocks for constructing smart transducer models complying with the IEEE 1451 specification. These hardware models were incorporated in a distributed version of the baseline C-MAPSS40k controller and simulations were run to compare the performance of the two models. The overall tracking ability differed only due to quantization effects in the feedback measurements in the distributed controller. Additionally, it was also found that the added complexity of the smart transducer models did not prevent real-time operation of the distributed controller model, a requirement of an HIL system.

A perspective on future directions in aerospace propulsion system simulation

NASA Technical Reports Server (NTRS)

Miller, Brent A.; Szuch, John R.; Gaugler, Raymond E.; Wood, Jerry R.

1989-01-01

The design and development of aircraft engines is a lengthy and costly process using today's methodology. This is due, in large measure, to the fact that present methods rely heavily on experimental testing to verify the operability, performance, and structural integrity of components and systems. The potential exists for achieving significant speedups in the propulsion development process through increased use of computational techniques for simulation, analysis, and optimization. This paper outlines the concept and technology requirements for a Numerical Propulsion Simulation System (NPSS) that would provide capabilities to do interactive, multidisciplinary simulations of complete propulsion systems. By combining high performance computing hardware and software with state-of-the-art propulsion system models, the NPSS will permit the rapid calculation, assessment, and optimization of subcomponent, component, and system performance, durability, reliability and weight-before committing to building hardware.
Hardware accelerator for molecular dynamics: MDGRAPE-2

NASA Astrophysics Data System (ADS)

Susukita, Ryutaro; Ebisuzaki, Toshikazu; Elmegreen, Bruce G.; Furusawa, Hideaki; Kato, Kenya; Kawai, Atsushi; Kobayashi, Yoshinao; Koishi, Takahiro; McNiven, Geoffrey D.; Narumi, Tetsu; Yasuoka, Kenji

2003-10-01

We developed MDGRAPE-2, a hardware accelerator that calculates forces at high speed in molecular dynamics (MD) simulations. MDGRAPE-2 is connected to a PC or a workstation as an extension board. The sustained performance of one MDGRAPE-2 board is 15 Gflops, roughly equivalent to the peak performance of the fastest supercomputer processing element. One board is able to calculate all forces between 10 000 particles in 0.28 s (i.e. 310000 time steps per day). If 16 boards are connected to one computer and operated in parallel, this calculation speed becomes ˜10 times faster. In addition to MD, MDGRAPE-2 can be applied to gravitational N-body simulations, the vortex method and smoothed particle hydrodynamics in computational fluid dynamics.
A structurally adaptive space crane concept for assembling space systems on orbit

NASA Technical Reports Server (NTRS)

Dorsey, John T.; Sutter, Thomas R.; Wu, K. C.

1992-01-01

A space crane concept is presented which is based on erectable truss hardware to achieve high stiffness and low mass booms and articulating-truss joints which can be assembled on orbit. The hardware is characterized by linear load-deflection response and is structurally predictable. The crane can be reconfigured into different geometries to meet future assembly requirements. Articulating-truss joint concepts with significantly different geometries are analyzed and found to have similar static and dynamic performance, which indicates that criteria other than structural and kinematic performance can be used to select a joint. Passive damping and an open-loop preshaped command input technique greatly enhance the structural damping in the space crane and may preclude the need for an active vibrations suppression system.
Enhancing instruction scheduling with a block-structured ISA

DOE Office of Scientific and Technical Information (OSTI.GOV)

Melvin, S.; Patt, Y.

It is now generally recognized that not enough parallelism exists within the small basic blocks of most general purpose programs to satisfy high performance processors. Thus, a wide variety of techniques have been developed to exploit instruction level parallelism across basic block boundaries. In this paper we discuss some previous techniques along with their hardware and software requirements. Then we propose a new paradigm for an instruction set architecture (ISA): block-structuring. This new paradigm is presented, its hardware and software requirements are discussed and the results from a simulation study are presented. We show that a block-structured ISA utilizes bothmore » dynamic and compile-time mechanisms for exploiting instruction level parallelism and has significant performance advantages over a conventional ISA.« less
Techniques for efficient, real-time, 3D visualization of multi-modality cardiac data using consumer graphics hardware.

PubMed

Levin, David; Aladl, Usaf; Germano, Guido; Slomka, Piotr

2005-09-01

We exploit consumer graphics hardware to perform real-time processing and visualization of high-resolution, 4D cardiac data. We have implemented real-time, realistic volume rendering, interactive 4D motion segmentation of cardiac data, visualization of multi-modality cardiac data and 3D display of multiple series cardiac MRI. We show that an ATI Radeon 9700 Pro can render a 512x512x128 cardiac Computed Tomography (CT) study at 0.9 to 60 frames per second (fps) depending on rendering parameters and that 4D motion based segmentation can be performed in real-time. We conclude that real-time rendering and processing of cardiac data can be implemented on consumer graphics cards.
Cedar Project---Original goals and progress to date

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cybenko, G.; Kuck, D.; Padua, D.

1990-11-28

This work encompasses a broad attack on high speed parallel processing. Hardware, software, applications development, and performance evaluation and visualization as well as research topics are proposed. Our goal is to develop practical parallel processing for the 1990's.
Modeling Cooperative Threads to Project GPU Performance for Adaptive Parallelism

DOE Office of Scientific and Technical Information (OSTI.GOV)

Meng, Jiayuan; Uram, Thomas; Morozov, Vitali A.

Most accelerators, such as graphics processing units (GPUs) and vector processors, are particularly suitable for accelerating massively parallel workloads. On the other hand, conventional workloads are developed for multi-core parallelism, which often scale to only a few dozen OpenMP threads. When hardware threads significantly outnumber the degree of parallelism in the outer loop, programmers are challenged with efficient hardware utilization. A common solution is to further exploit the parallelism hidden deep in the code structure. Such parallelism is less structured: parallel and sequential loops may be imperfectly nested within each other, neigh boring inner loops may exhibit different concurrency patternsmore » (e.g. Reduction vs. Forall), yet have to be parallelized in the same parallel section. Many input-dependent transformations have to be explored. A programmer often employs a larger group of hardware threads to cooperatively walk through a smaller outer loop partition and adaptively exploit any encountered parallelism. This process is time-consuming and error-prone, yet the risk of gaining little or no performance remains high for such workloads. To reduce risk and guide implementation, we propose a technique to model workloads with limited parallelism that can automatically explore and evaluate transformations involving cooperative threads. Eventually, our framework projects the best achievable performance and the most promising transformations without implementing GPU code or using physical hardware. We envision our technique to be integrated into future compilers or optimization frameworks for autotuning.« less
FPGA based data processing in the ALICE High Level Trigger in LHC Run 2

NASA Astrophysics Data System (ADS)

Engel, Heiko; Alt, Torsten; Kebschull, Udo; ALICE Collaboration

2017-10-01

The ALICE High Level Trigger (HLT) is a computing cluster dedicated to the online compression, reconstruction and calibration of experimental data. The HLT receives detector data via serial optical links into FPGA based readout boards that process the data on a per-link level already inside the FPGA and provide it to the host machines connected with a data transport framework. FPGA based data pre-processing is enabled for the biggest detector of ALICE, the Time Projection Chamber (TPC), with a hardware cluster finding algorithm. This algorithm was ported to the Common Read-Out Receiver Card (C-RORC) as used in the HLT for RUN 2. It was improved to handle double the input bandwidth and adjusted to the upgraded TPC Readout Control Unit (RCU2). A flexible firmware implementation in the HLT handles both the old and the new TPC data format and link rates transparently. Extended protocol and data error detection, error handling and the enhanced RCU2 data ordering scheme provide an improved physics performance of the cluster finder. The performance of the cluster finder was verified against large sets of reference data both in terms of throughput and algorithmic correctness. Comparisons with a software reference implementation confirm significant savings on CPU processing power using the hardware implementation. The C-RORC hardware with the cluster finder for RCU1 data is in use in the HLT since the start of RUN 2. The extended hardware cluster finder implementation for the RCU2 with doubled throughput is active since the upgrade of the TPC readout electronics in early 2016.
Atlas : A library for numerical weather prediction and climate modelling

NASA Astrophysics Data System (ADS)

Deconinck, Willem; Bauer, Peter; Diamantakis, Michail; Hamrud, Mats; Kühnlein, Christian; Maciel, Pedro; Mengaldo, Gianmarco; Quintino, Tiago; Raoult, Baudouin; Smolarkiewicz, Piotr K.; Wedi, Nils P.

2017-11-01

The algorithms underlying numerical weather prediction (NWP) and climate models that have been developed in the past few decades face an increasing challenge caused by the paradigm shift imposed by hardware vendors towards more energy-efficient devices. In order to provide a sustainable path to exascale High Performance Computing (HPC), applications become increasingly restricted by energy consumption. As a result, the emerging diverse and complex hardware solutions have a large impact on the programming models traditionally used in NWP software, triggering a rethink of design choices for future massively parallel software frameworks. In this paper, we present Atlas, a new software library that is currently being developed at the European Centre for Medium-Range Weather Forecasts (ECMWF), with the scope of handling data structures required for NWP applications in a flexible and massively parallel way. Atlas provides a versatile framework for the future development of efficient NWP and climate applications on emerging HPC architectures. The applications range from full Earth system models, to specific tools required for post-processing weather forecast products. The Atlas library thus constitutes a step towards affordable exascale high-performance simulations by providing the necessary abstractions that facilitate the application in heterogeneous HPC environments by promoting the co-design of NWP algorithms with the underlying hardware.
Impact of Recent Hardware and Software Trends on High Performance Transaction Processing and Analytics

NASA Astrophysics Data System (ADS)

Mohan, C.

In this paper, I survey briefly some of the recent and emerging trends in hardware and software features which impact high performance transaction processing and data analytics applications. These features include multicore processor chips, ultra large main memories, flash storage, storage class memories, database appliances, field programmable gate arrays, transactional memory, key-value stores, and cloud computing. While some applications, e.g., Web 2.0 ones, were initially built without traditional transaction processing functionality in mind, slowly system architects and designers are beginning to address such previously ignored issues. The availability, analytics and response time requirements of these applications were initially given more importance than ACID transaction semantics and resource consumption characteristics. A project at IBM Almaden is studying the implications of phase change memory on transaction processing, in the context of a key-value store. Bitemporal data management has also become an important requirement, especially for financial applications. Power consumption and heat dissipation properties are also major considerations in the emergence of modern software and hardware architectural features. Considerations relating to ease of configuration, installation, maintenance and monitoring, and improvement of total cost of ownership have resulted in database appliances becoming very popular. The MapReduce paradigm is now quite popular for large scale data analysis, in spite of the major inefficiencies associated with it.
Summary of materials and hardware performance on LDEF

NASA Technical Reports Server (NTRS)

Dursch, Harry; Pippin, Gary; Teichman, Lou

1993-01-01

A wide variety of materials and experiment support hardware were flown on the Long Duration Exposure Facility (LDEF). Postflight testing has determined the effects of the almost 6 years of low-earth orbit (LEO) exposure on this hardware. An overview of the results are presented. Hardware discussed includes adhesives, fasteners, lubricants, data storage systems, solar cells, seals, and the LDEF structure. Lessons learned from the testing and analysis of LDEF hardware is also presented.
Thirty Meter Telescope narrow-field infrared adaptive optics system real-time controller prototyping results

NASA Astrophysics Data System (ADS)

Smith, Malcolm; Kerley, Dan; Chapin, Edward L.; Dunn, Jennifer; Herriot, Glen; Véran, Jean-Pierre; Boyer, Corinne; Ellerbroek, Brent; Gilles, Luc; Wang, Lianqi

2016-07-01

Prototyping and benchmarking was performed for the Real-Time Controller (RTC) of the Narrow Field InfraRed Adaptive Optics System (NFIRAOS). To perform wavefront correction, NFIRAOS utilizes two deformable mirrors (DM) and one tip/tilt stage (TTS). The RTC receives wavefront information from six Laser Guide Star (LGS) Shack- Hartmann WaveFront Sensors (WFS), one high-order Natural Guide Star Pyramid WaveFront Sensor (PWFS) and multiple low-order instrument detectors. The RTC uses this information to determine the commands to send to the wavefront correctors. NFIRAOS is the first light AO system for the Thirty Meter Telescope (TMT). The prototyping was performed using dual-socket high performance Linux servers with the real-time (PREEMPT_RT) patch and demonstrated the viability of a commercial off-the-shelf (COTS) hardware approach to large scale AO reconstruction. In particular, a large custom matrix vector multiplication (MVM) was benchmarked which met the required latency requirements. In addition all major inter-machine communication was verified to be adequate using 10Gb and 40Gb Ethernet. The results of this prototyping has enabled a CPU-based NFIRAOS RTC design to proceed with confidence and that COTS hardware can be used to meet the demanding performance requirements.
A hardware acceleration based on high-level synthesis approach for glucose-insulin analysis

NASA Astrophysics Data System (ADS)

Daud, Nur Atikah Mohd; Mahmud, Farhanahani; Jabbar, Muhamad Hairol

2017-01-01

In this paper, the research is focusing on Type 1 Diabetes Mellitus (T1DM). Since this disease requires a full attention on the blood glucose concentration with the help of insulin injection, it is important to have a tool that able to predict that level when consume a certain amount of carbohydrate during meal time. Therefore, to make it realizable, a Hovorka model which is aiming towards T1DM is chosen in this research. A high-level language is chosen that is C++ to construct the mathematical model of the Hovorka model. Later, this constructed code is converted into intellectual property (IP) which is also known as a hardware accelerator by using of high-level synthesis (HLS) approach which able to improve in terms of design and performance for glucose-insulin analysis tool later as will be explained further in this paper. This is the first step in this research before implementing the design into system-on-chip (SoC) to achieve a high-performance system for the glucose-insulin analysis tool.
Comparative Modal Analysis of Sieve Hardware Designs

NASA Technical Reports Server (NTRS)

Thompson, Nathaniel

2012-01-01

The CMTB Thwacker hardware operates as a testbed analogue for the Flight Thwacker and Sieve components of CHIMRA, a device on the Curiosity Rover. The sieve separates particles with a diameter smaller than 150 microns for delivery to onboard science instruments. The sieving behavior of the testbed hardware should be similar to the Flight hardware for the results to be meaningful. The elastodynamic behavior of both sieves was studied analytically using the Rayleigh Ritz method in conjunction with classical plate theory. Finite element models were used to determine the mode shapes of both designs, and comparisons between the natural frequencies and mode shapes were made. The analysis predicts that the performance of the CMTB Thwacker will closely resemble the performance of the Flight Thwacker within the expected steady state operating regime. Excitations of the testbed hardware that will mimic the flight hardware were recommended, as were those that will improve the efficiency of the sieving process.
Superior Generalization Capability of Hardware-Learing Algorithm Developed for Self-Learning Neuron-MOS Neural Networks

NASA Astrophysics Data System (ADS)

Kondo, Shuhei; Shibata, Tadashi; Ohmi, Tadahiro

1995-02-01

We have investigated the learning performance of the hardware backpropagation (HBP) algorithm, a hardware-oriented learning algorithm developed for the self-learning architecture of neural networks constructed using neuron MOS (metal-oxide-semiconductor) transistors. The solution to finding a mirror symmetry axis in a 4×4 binary pixel array was tested by computer simulation based on the HBP algorithm. Despite the inherent restrictions imposed on the hardware-learning algorithm, HBP exhibits equivalent learning performance to that of the original backpropagation (BP) algorithm when all the pertinent parameters are optimized. Very importantly, we have found that HBP has a superior generalization capability over BP; namely, HBP exhibits higher performance in solving problems that the network has not yet learnt.
High temperature solar thermal receiver

NASA Technical Reports Server (NTRS)

1979-01-01

A design concept for a high temperature solar thermal receiver to operate at 3 atmospheres pressure and 2500 F outlet was developed. The performance and complexity of windowed matrix, tube-header, and extended surface receivers were evaluated. The windowed matrix receiver proved to offer substantial cost and performance benefits. An efficient and cost effective hardware design was evaluated for a receiver which can be readily interfaced to fuel and chemical processes or to heat engines for power generation.
The Mars Science Laboratory (MSL) Entry, Descent And Landing Instrumentation (MEDLI): Hardware Performance and Data Reconstruction

NASA Technical Reports Server (NTRS)

Little, Alan; Bose, Deepak; Karlgaard, Chris; Munk, Michelle; Kuhl, Chris; Schoenenberger, Mark; Antill, Chuck; Verhappen, Ron; Kutty, Prasad; White, Todd

2013-01-01

The Mars Science Laboratory (MSL) Entry, Descent and Landing Instrumentation (MEDLI) hardware was a first-of-its-kind sensor system that gathered temperature and pressure readings on the MSL heatshield during Mars entry on August 6, 2012. MEDLI began as challenging instrumentation problem, and has been a model of collaboration across multiple NASA organizations. After the culmination of almost 6 years of effort, the sensors performed extremely well, collecting data from before atmospheric interface through parachute deploy. This paper will summarize the history of the MEDLI project and hardware development, including key lessons learned that can apply to future instrumentation efforts. MEDLI returned an unprecedented amount of high-quality engineering data from a Mars entry vehicle. We will present the performance of the 3 sensor types: pressure, temperature, and isotherm tracking, as well as the performance of the custom-built sensor support electronics. A key component throughout the MEDLI project has been the ground testing and analysis effort required to understand the returned flight data. Although data analysis is ongoing through 2013, this paper will reveal some of the early findings on the aerothermodynamic environment that MSL encountered at Mars, the response of the heatshield material to that heating environment, and the aerodynamic performance of the entry vehicle. The MEDLI data results promise to challenge our engineering assumptions and revolutionize the way we account for margins in entry vehicle design.
Qualification Testing of Engineering Camera and Platinum Resistance Thermometer (PRT) Sensors for Mars Science Laboratory (MSL) Project under Extreme Temperatures to Assess Reliability and to Enhance Mission Assurance

NASA Technical Reports Server (NTRS)

Ramesham, Rajeshuni; Maki, Justin N.; Cucullu, Gordon C.

2008-01-01

Package Qualification and Verification (PQV) of advanced electronic packaging and interconnect technologies and various other types of qualification hardware for the Mars Exploration Rover/Mars Science Laboratory flight projects has been performed to enhance the mission assurance. The qualification of hardware (Engineering Camera and Platinum Resistance Thermometer, PRT) under extreme cold temperatures has been performed with reference to various project requirements. The flight-like packages, sensors, and subassemblies have been selected for the study to survive three times (3x) the total number of expected temperature cycles resulting from all environmental and operational exposures occurring over the life of the flight hardware including all relevant manufacturing, ground operations and mission phases. Qualification has been performed by subjecting above flight-like qual hardware to the environmental temperature extremes and assessing any structural failures or degradation in electrical performance due to either overstress or thermal cycle fatigue. Experiments of flight like hardware qualification test results have been described in this paper.
Semiannual Technical Summary, 1 April-30 September 1993

DTIC Science & Technology

1993-12-01

Hardware failure 11 Jul 2200 - Hardware failure 12 Jul - 0531 Hardware failure 12 Jul 0744 - 1307 Hardware service 1OAug 0821 - 1514 Line failure 29 Aug...1000 - Line failure 30 Aug - 1211 Line failure 08 Sep 1518 - Line failure 09 Sep - 0428 Line failure 10 Sep 0821 - 1030 Hardware failure 18 Sep 0817...reair. Between 8 September 1306 hrs and 9 September 0428 hre all communications systems wene affected (13.5 hrs). Reduced 01B performance started 10
Experimental Results and Issues on Equalization for Nonlinear Memory Channel: Pre-Cursor Enhanced Ram-DFE Canceler

NASA Technical Reports Server (NTRS)

Yuan, Lu; LeBlanc, James

1998-01-01

This thesis investigates the effects of the High Power Amplifier (HPA) and the filters over a satellite or telemetry channel. The Volterra series expression is presented for the nonlinear channel with memory, and the algorithm is based on the finite-state machine model. A RAM-based algorithm operating on the receiver side, Pre-cursor Enhanced RAM-FSE Canceler (PERC) is developed. A high order modulation scheme , 16-QAM is used for simulation, the results show that PERC provides an efficient and reliable method to transmit data on the bandlimited nonlinear channel. The contribution of PERC algorithm is that it includes both pre-cursors and post-cursors as the RAM address lines, and suggests a new way to make decision on the pre-addresses. Compared with the RAM-DFE structure that only includes post- addresses, the BER versus Eb/NO performance of PERC is substantially enhanced. Experiments are performed for PERC algorithms with different parameters on AWGN channels, and the results are compared and analyzed. The investigation of this thesis includes software simulation and hardware verification. Hardware is setup to collect actual TWT data. Simulation on both the software-generated data and the real-world data are performed. Practical limitations are considered for the hardware collected data. Simulation results verified the reliability of the PERC algorithm. This work was conducted at NMSU in the Center for Space Telemetering and Telecommunications Systems in the Klipsch School of Electrical and Computer Engineering Department.

Discovery & Interaction in Astro 101 Laboratory Experiments

NASA Astrophysics Data System (ADS)

Maloney, Frank Patrick; Maurone, Philip; DeWarf, Laurence E.

2016-01-01

The availability of low-cost, high-performance computing hardware and software has transformed the manner by which astronomical concepts can be re-discovered and explored in a laboratory that accompanies an astronomy course for arts students. We report on a strategy, begun in 1992, for allowing each student to understand fundamental scientific principles by interactively confronting astronomical and physical phenomena, through direct observation and by computer simulation. These experiments have evolved as :a) the quality and speed of the hardware has greatly increasedb) the corresponding hardware costs have decreasedc) the students have become computer and Internet literated) the importance of computationally and scientifically literate arts graduates in the workplace has increased.We present the current suite of laboratory experiments, and describe the nature, procedures, and goals in this two-semester laboratory for liberal arts majors at the Astro 101 university level.
Real-time model-based vision system for object acquisition and tracking

NASA Technical Reports Server (NTRS)

Wilcox, Brian; Gennery, Donald B.; Bon, Bruce; Litwin, Todd

1987-01-01

A machine vision system is described which is designed to acquire and track polyhedral objects moving and rotating in space by means of two or more cameras, programmable image-processing hardware, and a general-purpose computer for high-level functions. The image-processing hardware is capable of performing a large variety of operations on images and on image-like arrays of data. Acquisition utilizes image locations and velocities of the features extracted by the image-processing hardware to determine the three-dimensional position, orientation, velocity, and angular velocity of the object. Tracking correlates edges detected in the current image with edge locations predicted from an internal model of the object and its motion, continually updating velocity information to predict where edges should appear in future frames. With some 10 frames processed per second, real-time tracking is possible.
Ultraviolet spectrometer and polarimeter (UVSP) software development and hardware tests for the solar maximum mission

NASA Technical Reports Server (NTRS)

Bruner, M. E.; Haisch, B. M.

1986-01-01

The Ultraviolet Spectrometer/Polarimeter Instrument (UVSP) for the Solar Maximum Mission (SMM) was based on the re-use of the engineering model of the high resolution ultraviolet spectrometer developed for the OSO-8 mission. Lockheed assumed four distinct responsibilities in the UVSP program: technical evaluation of the OSO-8 engineering model; technical consulting on the electronic, optical, and mechanical modifications to the OSO-8 engineering model hardware; design and development of the UVSP software system; and scientific participation in the operations and analysis phase of the mission. Lockheed also provided technical consulting and assistance with instrument hardware performance anomalies encountered during the post launch operation of the SMM observatory. An index to the quarterly reports delivered under the contract are contained, and serves as a useful capsule history of the program activity.
An analog neural hardware implementation using charge-injection multipliers and neutron-specific gain control.

PubMed

Massengill, L W; Mundie, D B

1992-01-01

A neural network IC based on a dynamic charge injection is described. The hardware design is space and power efficient, and achieves massive parallelism of analog inner products via charge-based multipliers and spatially distributed summing buses. Basic synaptic cells are constructed of exponential pulse-decay modulation (EPDM) dynamic injection multipliers operating sequentially on propagating signal vectors and locally stored analog weights. Individually adjustable gain controls on each neutron reduce the effects of limited weight dynamic range. A hardware simulator/trainer has been developed which incorporates the physical (nonideal) characteristics of actual circuit components into the training process, thus absorbing nonlinearities and parametric deviations into the macroscopic performance of the network. Results show that charge-based techniques may achieve a high degree of neural density and throughput using standard CMOS processes.
Supercomputing '91; Proceedings of the 4th Annual Conference on High Performance Computing, Albuquerque, NM, Nov. 18-22, 1991

NASA Technical Reports Server (NTRS)

1991-01-01

Various papers on supercomputing are presented. The general topics addressed include: program analysis/data dependence, memory access, distributed memory code generation, numerical algorithms, supercomputer benchmarks, latency tolerance, parallel programming, applications, processor design, networks, performance tools, mapping and scheduling, characterization affecting performance, parallelism packaging, computing climate change, combinatorial algorithms, hardware and software performance issues, system issues. (No individual items are abstracted in this volume)
Inferring Human Activity Recognition with Ambient Sound on Wireless Sensor Nodes.

PubMed

Salomons, Etto L; Havinga, Paul J M; van Leeuwen, Henk

2016-09-27

A wireless sensor network that consists of nodes with a sound sensor can be used to obtain context awareness in home environments. However, the limited processing power of wireless nodes offers a challenge when extracting features from the signal, and subsequently, classifying the source. Although multiple papers can be found on different methods of sound classification, none of these are aimed at limited hardware or take the efficiency of the algorithms into account. In this paper, we compare and evaluate several classification methods on a real sensor platform using different feature types and classifiers, in order to find an approach that results in a good classifier that can run on limited hardware. To be as realistic as possible, we trained our classifiers using sound waves from many different sources. We conclude that despite the fact that the classifiers are often of low quality due to the highly restricted hardware resources, sufficient performance can be achieved when (1) the window length for our classifiers is increased, and (2) if we apply a two-step approach that uses a refined classification after a global classification has been performed.
An Embedded Sensor Node Microcontroller with Crypto-Processors.

PubMed

Panić, Goran; Stecklina, Oliver; Stamenković, Zoran

2016-04-27

Wireless sensor network applications range from industrial automation and control, agricultural and environmental protection, to surveillance and medicine. In most applications, data are highly sensitive and must be protected from any type of attack and abuse. Security challenges in wireless sensor networks are mainly defined by the power and computing resources of sensor devices, memory size, quality of radio channels and susceptibility to physical capture. In this article, an embedded sensor node microcontroller designed to support sensor network applications with severe security demands is presented. It features a low power 16-bitprocessor core supported by a number of hardware accelerators designed to perform complex operations required by advanced crypto algorithms. The microcontroller integrates an embedded Flash and an 8-channel 12-bit analog-to-digital converter making it a good solution for low-power sensor nodes. The article discusses the most important security topics in wireless sensor networks and presents the architecture of the proposed hardware solution. Furthermore, it gives details on the chip implementation, verification and hardware evaluation. Finally, the chip power dissipation and performance figures are estimated and analyzed.
An Embedded Sensor Node Microcontroller with Crypto-Processors

PubMed Central

Panić, Goran; Stecklina, Oliver; Stamenković, Zoran

2016-01-01

Wireless sensor network applications range from industrial automation and control, agricultural and environmental protection, to surveillance and medicine. In most applications, data are highly sensitive and must be protected from any type of attack and abuse. Security challenges in wireless sensor networks are mainly defined by the power and computing resources of sensor devices, memory size, quality of radio channels and susceptibility to physical capture. In this article, an embedded sensor node microcontroller designed to support sensor network applications with severe security demands is presented. It features a low power 16-bitprocessor core supported by a number of hardware accelerators designed to perform complex operations required by advanced crypto algorithms. The microcontroller integrates an embedded Flash and an 8-channel 12-bit analog-to-digital converter making it a good solution for low-power sensor nodes. The article discusses the most important security topics in wireless sensor networks and presents the architecture of the proposed hardware solution. Furthermore, it gives details on the chip implementation, verification and hardware evaluation. Finally, the chip power dissipation and performance figures are estimated and analyzed. PMID:27128925
Resource Efficient Hardware Architecture for Fast Computation of Running Max/Min Filters

PubMed Central

Torres-Huitzil, Cesar

2013-01-01

Running max/min filters on rectangular kernels are widely used in many digital signal and image processing applications. Filtering with a k × k kernel requires of k 2 − 1 comparisons per sample for a direct implementation; thus, performance scales expensively with the kernel size k. Faster computations can be achieved by kernel decomposition and using constant time one-dimensional algorithms on custom hardware. This paper presents a hardware architecture for real-time computation of running max/min filters based on the van Herk/Gil-Werman (HGW) algorithm. The proposed architecture design uses less computation and memory resources than previously reported architectures when targeted to Field Programmable Gate Array (FPGA) devices. Implementation results show that the architecture is able to compute max/min filters, on 1024 × 1024 images with up to 255 × 255 kernels, in around 8.4 milliseconds, 120 frames per second, at a clock frequency of 250 MHz. The implementation is highly scalable for the kernel size with good performance/area tradeoff suitable for embedded applications. The applicability of the architecture is shown for local adaptive image thresholding. PMID:24288456
Systems Performance Laboratory | Energy Systems Integration Facility | NREL

Science.gov Websites

array access Small Commercial Power Hardware in the Loop The small commercial power-hardware-in-the-loop (PHIL) test bay is dedicated to small-scale power hardware-in-the-loop studies of inverters and other , natural gas supply Multi-Inverter Power Hardware in the Loop The multi-inverter test bay is dedicated to
EVA Training and Development Facilities

NASA Technical Reports Server (NTRS)

Cupples, Scott

2016-01-01

Overview: Vast majority of US EVA (ExtraVehicular Activity) training and EVA hardware development occurs at JSC; EVA training facilities used to develop and refine procedures and improve skills; EVA hardware development facilities test hardware to evaluate performance and certify requirement compliance; Environmental chambers enable testing of hardware from as large as suits to as small as individual components in thermal vacuum conditions.
Scheduling Operations for Massive Heterogeneous Clusters

NASA Technical Reports Server (NTRS)

Humphrey, John; Spagnoli, Kyle

2013-01-01

High-performance computing (HPC) programming has become increasingly difficult with the advent of hybrid supercomputers consisting of multicore CPUs and accelerator boards such as the GPU. Manual tuning of software to achieve high performance on this type of machine has been performed by programmers. This is needlessly difficult and prone to being invalidated by new hardware, new software, or changes in the underlying code. A system was developed for task-based representation of programs, which when coupled with a scheduler and runtime system, allows for many benefits, including higher performance and utilization of computational resources, easier programming and porting, and adaptations of code during runtime. The system consists of a method of representing computer algorithms as a series of data-dependent tasks. The series forms a graph, which can be scheduled for execution on many nodes of a supercomputer efficiently by a computer algorithm. The schedule is executed by a dispatch component, which is tailored to understand all of the hardware types that may be available within the system. The scheduler is informed by a cluster mapping tool, which generates a topology of available resources and their strengths and communication costs. Software is decoupled from its hardware, which aids in porting to future architectures. A computer algorithm schedules all operations, which for systems of high complexity (i.e., most NASA codes), cannot be performed optimally by a human. The system aids in reducing repetitive code, such as communication code, and aids in the reduction of redundant code across projects. It adds new features to code automatically, such as recovering from a lost node or the ability to modify the code while running. In this project, the innovators at the time of this reporting intend to develop two distinct technologies that build upon each other and both of which serve as building blocks for more efficient HPC usage. First is the scheduling and dynamic execution framework, and the second is scalable linear algebra libraries that are built directly on the former.
High performance cellular level agent-based simulation with FLAME for the GPU.

PubMed

Richmond, Paul; Walker, Dawn; Coakley, Simon; Romano, Daniela

2010-05-01

Driven by the availability of experimental data and ability to simulate a biological scale which is of immediate interest, the cellular scale is fast emerging as an ideal candidate for middle-out modelling. As with 'bottom-up' simulation approaches, cellular level simulations demand a high degree of computational power, which in large-scale simulations can only be achieved through parallel computing. The flexible large-scale agent modelling environment (FLAME) is a template driven framework for agent-based modelling (ABM) on parallel architectures ideally suited to the simulation of cellular systems. It is available for both high performance computing clusters (www.flame.ac.uk) and GPU hardware (www.flamegpu.com) and uses a formal specification technique that acts as a universal modelling format. This not only creates an abstraction from the underlying hardware architectures, but avoids the steep learning curve associated with programming them. In benchmarking tests and simulations of advanced cellular systems, FLAME GPU has reported massive improvement in performance over more traditional ABM frameworks. This allows the time spent in the development and testing stages of modelling to be drastically reduced and creates the possibility of real-time visualisation for simple visual face-validation.
Optimal Digital Controller Design for a Servo Motor Taking Account of Intersample Behavior

NASA Astrophysics Data System (ADS)

Akiyoshi, Tatsuro; Imai, Jun; Funabiki, Shigeyuki

A continuous-time plant with discretized continuous-time controller do not yield stability if the sampling rate is lower than some certain level. Thus far, high functioning electronic control has made use of high cost hardwares which are needed to implement discretized continuous-time controllers, while low cost hardwares generally do not have high enough sampling rate. This technical note presents results comparing performance indices with and without intersample behavior, and some answer to the question how a low specification device can control a plant effectively. We consider a machine simulating wafer handling robots at semiconductor factories, which is an electromechanical system driven by a direct drive motor. We illustrate controller design for the robot with and without intersample behavior, and simulations and experimental results by using these controllers. Taking intersample behavior into account proves to be effective to make control performance better and enables it to choose relatively long sampling period. By controller design via performance index with intersample behavior, we can cope with situation where short enough sampling period may not be employed, and freedom of controller design might be widened especially on choice of sampling period.
Exercise Countermeasure Hardware Evolution on ISS: The First Decade.

PubMed

Korth, Deborah W

2015-12-01

The hardware systems necessary to support exercise countermeasures to the deconditioning associated with microgravity exposure have evolved and improved significantly during the first decade of the International Space Station (ISS), resulting in both new types of hardware and enhanced performance capabilities for initial hardware items. The original suite of countermeasure hardware supported the first crews to arrive on the ISS and the improved countermeasure system delivered in later missions continues to serve the astronauts today with increased efficacy. Due to aggressive hardware development schedules and constrained budgets, the initial approach was to identify existing spaceflight-certified exercise countermeasure equipment, when available, and modify it for use on the ISS. Program management encouraged the use of commercial-off-the-shelf (COTS) hardware, or hardware previously developed (heritage hardware) for the Space Shuttle Program. However, in many cases the resultant hardware did not meet the additional requirements necessary to support crew health maintenance during long-duration missions (3 to 12 mo) and anticipated future utilization activities in support of biomedical research. Hardware development was further complicated by performance requirements that were not fully defined at the outset and tended to evolve over the course of design and fabrication. Modifications, ranging from simple to extensive, were necessary to meet these evolving requirements in each case where heritage hardware was proposed. Heritage hardware was anticipated to be inherently reliable without the need for extensive ground testing, due to its prior positive history during operational spaceflight utilization. As a result, developmental budgets were typically insufficient and schedules were too constrained to permit long-term evaluation of dedicated ground-test units ("fleet leader" type testing) to identify reliability issues when applied to long-duration use. In most cases, the exercise unit with the most operational history was the unit installed on the ISS.
A 3D-PIV System for Gas Turbine Applications

NASA Astrophysics Data System (ADS)

Acharya, Sumanta

2002-08-01

Funds were received in April 2001 under the Department of Defense DURIP program for construction of a 48 processor high performance computing cluster. This report details the hardware, which was purchased, and how it has been used to enable and enhance research activities directly supported by, and of interest to, the Air Force Office of Scientific Research and the Department of Defense. The report is divided into two major sections. The first section after the summary describes the computer cluster, its setup, and some cluster hardware, and presents highlights of those efforts since installation of the cluster.
Comparison of spike-sorting algorithms for future hardware implementation.

PubMed

Gibson, Sarah; Judy, Jack W; Markovic, Dejan

2008-01-01

Applications such as brain-machine interfaces require hardware spike sorting in order to (1) obtain single-unit activity and (2) perform data reduction for wireless transmission of data. Such systems must be low-power, low-area, high-accuracy, automatic, and able to operate in real time. Several detection and feature extraction algorithms for spike sorting are described briefly and evaluated in terms of accuracy versus computational complexity. The nonlinear energy operator method is chosen as the optimal spike detection algorithm, being most robust over noise and relatively simple. The discrete derivatives method [1] is chosen as the optimal feature extraction method, maintaining high accuracy across SNRs with a complexity orders of magnitude less than that of traditional methods such as PCA.
Biosequence Similarity Search on the Mercury System

PubMed Central

Krishnamurthy, Praveen; Buhler, Jeremy; Chamberlain, Roger; Franklin, Mark; Gyang, Kwame; Jacob, Arpith; Lancaster, Joseph

2007-01-01

Biosequence similarity search is an important application in modern molecular biology. Search algorithms aim to identify sets of sequences whose extensional similarity suggests a common evolutionary origin or function. The most widely used similarity search tool for biosequences is BLAST, a program designed to compare query sequences to a database. Here, we present the design of BLASTN, the version of BLAST that searches DNA sequences, on the Mercury system, an architecture that supports high-volume, high-throughput data movement off a data store and into reconfigurable hardware. An important component of application deployment on the Mercury system is the functional decomposition of the application onto both the reconfigurable hardware and the traditional processor. Both the Mercury BLASTN application design and its performance analysis are described. PMID:18846267
MSFC Skylab corollary experiment systems mission evaluation

NASA Technical Reports Server (NTRS)

1974-01-01

Evaluations are presented of the performances of corollary experiment hardware developed by the George C. Marshall Space Flight Center and operated during the three manned Skylab missions. Also presented are assessments of the functional adequacy of the experiment hardware and its supporting systems, and indications are given as to the degrees by which experiment constraints and interfaces were met. It is shown that most of the corollary experiment hardware performed satisfactorily and within design specifications.
FPGA-based protein sequence alignment : A review

NASA Astrophysics Data System (ADS)

Isa, Mohd. Nazrin Md.; Muhsen, Ku Noor Dhaniah Ku; Saiful Nurdin, Dayana; Ahmad, Muhammad Imran; Anuar Zainol Murad, Sohiful; Nizam Mohyar, Shaiful; Harun, Azizi; Hussin, Razaidi

2017-11-01

Sequence alignment have been optimized using several techniques in order to accelerate the computation time to obtain the optimal score by implementing DP-based algorithm into hardware such as FPGA-based platform. During hardware implementation, there will be performance challenges such as the frequent memory access and highly data dependent in computation process. Therefore, investigation in processing element (PE) configuration where involves more on memory access in load or access the data (substitution matrix, query sequence character) and the PE configuration time will be the main focus in this paper. There are various approaches to enhance the PE configuration performance that have been done in previous works such as by using serial configuration chain and parallel configuration chain i.e. the configuration data will be loaded into each PEs sequentially and simultaneously respectively. Some researchers have proven that the performance using parallel configuration chain has optimized both the configuration time and area.

Free-piston Stirling engine/linear alternator 1000-hour endurance test

NASA Technical Reports Server (NTRS)

Rauch, J.; Dochat, G.

1985-01-01

The Free Piston Stirling Engine (FPSE) has the potential to be a long lived, highly reliable, power conversion device attractive for many product applications such as space, residential or remote site power. The purpose of endurance testing the FPSE was to demonstrate its potential for long life. The endurance program was directed at obtaining 1000 operational hours under various test conditions: low power, full stroke, duty cycle and stop/start. Critical performance parameters were measured to note any change and/or trend. Inspections were conducted to measure and compare critical seal/bearing clearances. The engine performed well throughout the program, completing more than 1100 hours. Hardware inspection, including the critical clearances, showed no significant change in hardware or clearance dimensions. The performance parameters did not exhibit any increasing or decreasing trends. The test program confirms the potential for long life FPSE applications.
Advanced Architectures for Astrophysical Supercomputing

NASA Astrophysics Data System (ADS)

Barsdell, B. R.; Barnes, D. G.; Fluke, C. J.

2010-12-01

Astronomers have come to rely on the increasing performance of computers to reduce, analyze, simulate and visualize their data. In this environment, faster computation can mean more science outcomes or the opening up of new parameter spaces for investigation. If we are to avoid major issues when implementing codes on advanced architectures, it is important that we have a solid understanding of our algorithms. A recent addition to the high-performance computing scene that highlights this point is the graphics processing unit (GPU). The hardware originally designed for speeding-up graphics rendering in video games is now achieving speed-ups of O(100×) in general-purpose computation - performance that cannot be ignored. We are using a generalized approach, based on the analysis of astronomy algorithms, to identify the optimal problem-types and techniques for taking advantage of both current GPU hardware and future developments in computing architectures.
Performance/price estimates for cortex-scale hardware: a design space exploration.

PubMed

Zaveri, Mazad S; Hammerstrom, Dan

2011-04-01

In this paper, we revisit the concept of virtualization. Virtualization is useful for understanding and investigating the performance/price and other trade-offs related to the hardware design space. Moreover, it is perhaps the most important aspect of a hardware design space exploration. Such a design space exploration is a necessary part of the study of hardware architectures for large-scale computational models for intelligent computing, including AI, Bayesian, bio-inspired and neural models. A methodical exploration is needed to identify potentially interesting regions in the design space, and to assess the relative performance/price points of these implementations. As an example, in this paper we investigate the performance/price of (digital and mixed-signal) CMOS and hypothetical CMOL (nanogrid) technology based hardware implementations of human cortex-scale spiking neural systems. Through this analysis, and the resulting performance/price points, we demonstrate, in general, the importance of virtualization, and of doing these kinds of design space explorations. The specific results suggest that hybrid nanotechnology such as CMOL is a promising candidate to implement very large-scale spiking neural systems, providing a more efficient utilization of the density and storage benefits of emerging nano-scale technologies. In general, we believe that the study of such hypothetical designs/architectures will guide the neuromorphic hardware community towards building large-scale systems, and help guide research trends in intelligent computing, and computer engineering. Copyright © 2010 Elsevier Ltd. All rights reserved.
50 CFR 660.15 - Equipment requirements.

Code of Federal Regulations, 2010 CFR

2010-10-01

... receivers, computer hardware for electronic fish ticket software and computer hardware for electronic logbook software. (b) Performance and technical requirements for scales used to weigh catch at sea... ticket software provided by Pacific States Marine Fish Commission are required to meet the hardware and...
The use of real-time, hardware-in-the-loop simulation in the design and development of the new Hughes HS601 spacecraft attitude control system

NASA Technical Reports Server (NTRS)

Slafer, Loren I.

1989-01-01

Realtime simulation and hardware-in-the-loop testing is being used extensively in all phases of the design, development, and testing of the attitude control system (ACS) for the new Hughes HS601 satellite bus. Realtime, hardware-in-the-loop simulation, integrated with traditional analysis and pure simulation activities is shown to provide a highly efficient and productive overall development program. Implementation of high fidelity simulations of the satellite dynamics and control system algorithms, capable of real-time execution (using applied Dynamics International's System 100), provides a tool which is capable of being integrated with the critical flight microprocessor to create a mixed simulation test (MST). The MST creates a highly accurate, detailed simulated on-orbit test environment, capable of open and closed loop ACS testing, in which the ACS design can be validated. The MST is shown to provide a valuable extension of traditional test methods. A description of the MST configuration is presented, including the spacecraft dynamics simulation model, sensor and actuator emulators, and the test support system. Overall system performance parameters are presented. MST applications are discussed; supporting ACS design, developing on-orbit system performance predictions, flight software development and qualification testing (augmenting the traditional software-based testing), mission planning, and a cost-effective subsystem-level acceptance test. The MST is shown to provide an ideal tool in which the ACS designer can fly the spacecraft on the ground.
Dynamo: A Model Transition Framework for Dynamic Stability Control and Body Mass Manipulation

DTIC Science & Technology

2011-11-01

driving at high speed, and you turn the steering wheel hard to the right and slam on the brakes, then you will end up in the oversteer regime. At the...sensors (GPS, IMU, LIDAR ) for vehicle control. Figure 17: Dynamo high-speed small UGV hardware platform We will perform experiments to measure the MTC
DEPEND - A design environment for prediction and evaluation of system dependability

NASA Technical Reports Server (NTRS)

Goswami, Kumar K.; Iyer, Ravishankar K.

1990-01-01

The development of DEPEND, an integrated simulation environment for the design and dependability analysis of fault-tolerant systems, is described. DEPEND models both hardware and software components at a functional level, and allows automatic failure injection to assess system performance and reliability. It relieves the user of the work needed to inject failures, maintain statistics, and output reports. The automatic failure injection scheme is geared toward evaluating a system under high stress (workload) conditions. The failures that are injected can affect both hardware and software components. To illustrate the capability of the simulator, a distributed system which employs a prediction-based, dynamic load-balancing heuristic is evaluated. Experiments were conducted to determine the impact of failures on system performance and to identify the failures to which the system is especially susceptible.
Opportunities and choice in a new vector era

NASA Astrophysics Data System (ADS)

Nowak, A.

2014-06-01

This work discusses the significant changes in computing landscape related to the progression of Moore's Law, and the implications on scientific computing. Particular attention is devoted to the High Energy Physics domain (HEP), which has always made good use of threading, but levels of parallelism closer to the hardware were often left underutilized. Findings of the CERN openlab Platform Competence Center are reported in the context of expanding "performance dimensions", and especially the resurgence of vectors. These suggest that data oriented designs are feasible in HEP and have considerable potential for performance improvements on multiple levels, but will rarely trump algorithmic enhancements. Finally, an analysis of upcoming hardware and software technologies identifies heterogeneity as a major challenge for software, which will require more emphasis on scalable, efficient design.
Perm State University HPC-hardware and software services: capabilities for aircraft engine aeroacoustics problems solving

NASA Astrophysics Data System (ADS)

Demenev, A. G.

2018-02-01

The present work is devoted to analyze high-performance computing (HPC) infrastructure capabilities for aircraft engine aeroacoustics problems solving at Perm State University. We explore here the ability to develop new computational aeroacoustics methods/solvers for computer-aided engineering (CAE) systems to handle complicated industrial problems of engine noise prediction. Leading aircraft engine engineering company, including “UEC-Aviadvigatel” JSC (our industrial partners in Perm, Russia), require that methods/solvers to optimize geometry of aircraft engine for fan noise reduction. We analysed Perm State University HPC-hardware resources and software services to use efficiently. The performed results demonstrate that Perm State University HPC-infrastructure are mature enough to face out industrial-like problems of development CAE-system with HPC-method and CFD-solvers.
Value of PCR in sonication fluid for the diagnosis of orthopedic hardware-associated infections: Has the molecular era arrived?

PubMed

Renz, Nora; Cabric, Sabrina; Morgenstern, Christian; Schuetz, Michael A; Trampuz, Andrej

2018-04-01

Bone healing disturbance following fracture fixation represents a continuing challenge. We evaluated a novel fully automated polymerase chain reaction (PCR) assay using sonication fluid from retrieved orthopedic hardware to diagnose infection. In this prospective diagnostic cohort study, explanted orthopedic hardware materials from consecutive patients were investigated by sonication and the resulting sonication fluid was analyzed by culture (standard procedure) and multiplex PCR (investigational procedure). Hardware-associated infection was defined as visible purulence, presence of a sinus tract, implant on view, inflammation in peri-implant tissue or positive culture. McNemar's chi-squared test was used to compare the performance of diagnostic tests. For the clinical performance all pathogens were considered, whereas for analytical performance only microorganisms were considered for which primers are included in the PCR assay. Among 51 patients, hardware-associated infection was diagnosed in 38 cases (75%) and non-infectious causes in 13 patients (25%). The sensitivity for diagnosing infection was 66% for peri-implant tissue culture, 84% for sonication fluid culture, 71% (clinical performance) and 77% (analytical performance) for sonication fluid PCR, the specificity of all tests was >90%. The analytical sensitivity of PCR was higher for gram-negative bacilli (100%), coagulase-negative staphylococci (89%) and Staphylococcus aureus (75%) than for Cutibacterium (formerly Propionibacterium) acnes (57%), enterococci (50%) and Candida spp. (25%). The performance of sonication fluid PCR for diagnosis of orthopedic hardware-associated infection was comparable to culture tests. The additional advantage of PCR was short processing time (<5 h) and fully automated procedure. With further improvement of the performance, PCR has the potential to complement conventional cultures. Copyright © 2018 Elsevier Ltd. All rights reserved.
Hardware architecture design of image restoration based on time-frequency domain computation

NASA Astrophysics Data System (ADS)

Wen, Bo; Zhang, Jing; Jiao, Zipeng

2013-10-01

The image restoration algorithms based on time-frequency domain computation is high maturity and applied widely in engineering. To solve the high-speed implementation of these algorithms, the TFDC hardware architecture is proposed. Firstly, the main module is designed, by analyzing the common processing and numerical calculation. Then, to improve the commonality, the iteration control module is planed for iterative algorithms. In addition, to reduce the computational cost and memory requirements, the necessary optimizations are suggested for the time-consuming module, which include two-dimensional FFT/IFFT and the plural calculation. Eventually, the TFDC hardware architecture is adopted for hardware design of real-time image restoration system. The result proves that, the TFDC hardware architecture and its optimizations can be applied to image restoration algorithms based on TFDC, with good algorithm commonality, hardware realizability and high efficiency.
Concepts for on-board satellite image registration. Volume 2: IAS prototype performance evaluation standard definition

NASA Astrophysics Data System (ADS)

Daluge, D. R.; Ruedger, W. H.

1981-06-01

Problems encountered in testing onboard signal processing hardware designed to achieve radiometric and geometric correction of satellite imaging data are considered. These include obtaining representative image and ancillary data for simulation and the transfer and storage of a large quantity of image data at very high speed. The high resolution, high speed preprocessing of LANDSAT-D imagery is considered.
Effect of color visualization and display hardware on the visual assessment of pseudocolor medical images

PubMed Central

Zabala-Travers, Silvina; Choi, Mina; Cheng, Wei-Chung

2015-01-01

Purpose: Even though the use of color in the interpretation of medical images has increased significantly in recent years, the ad hoc manner in which color is handled and the lack of standard approaches have been associated with suboptimal and inconsistent diagnostic decisions with a negative impact on patient treatment and prognosis. The purpose of this study is to determine if the choice of color scale and display device hardware affects the visual assessment of patterns that have the characteristics of functional medical images. Methods: Perfusion magnetic resonance imaging (MRI) was the basis for designing and performing experiments. Synthetic images resembling brain dynamic-contrast enhanced MRI consisting of scaled mixtures of white, lumpy, and clustered backgrounds were used to assess the performance of a rainbow (“jet”), a heated black-body (“hot”), and a gray (“gray”) color scale with display devices of different quality on the detection of small changes in color intensity. The authors used a two-alternative, forced-choice design where readers were presented with 600 pairs of images. Each pair consisted of two images of the same pattern flipped along the vertical axis with a small difference in intensity. Readers were asked to select the image with the highest intensity. Three differences in intensity were tested on four display devices: a medical-grade three-million-pixel display, a consumer-grade monitor, a tablet device, and a phone. Results: The estimates of percent correct show that jet outperformed hot and gray in the high and low range of the color scales for all devices with a maximum difference in performance of 18% (confidence intervals: 6%, 30%). Performance with hot was different for high and low intensity, comparable to jet for the high range, and worse than gray for lower intensity values. Similar performance was seen between devices using jet and hot, while gray performance was better for handheld devices. Time of performance was shorter with jet. Conclusions: Our findings demonstrate that the choice of color scale and display hardware affects the visual comparative analysis of pseudocolor images. Follow-up studies in clinical settings are being considered to confirm the results with patient images. PMID:26127048
Known-plaintext attack on the double phase encoding and its implementation with parallel hardware

NASA Astrophysics Data System (ADS)

Wei, Hengzheng; Peng, Xiang; Liu, Haitao; Feng, Songlin; Gao, Bruce Z.

2008-03-01

A known-plaintext attack on the double phase encryption scheme implemented with parallel hardware is presented. The double random phase encoding (DRPE) is one of the most representative optical cryptosystems developed in mid of 90's and derives quite a few variants since then. Although the DRPE encryption system has a strong power resisting to a brute-force attack, the inherent architecture of DRPE leaves a hidden trouble due to its linearity nature. Recently the real security strength of this opto-cryptosystem has been doubted and analyzed from the cryptanalysis point of view. In this presentation, we demonstrate that the optical cryptosystems based on DRPE architecture are vulnerable to known-plain text attack. With this attack the two encryption keys in the DRPE can be accessed with the help of the phase retrieval technique. In our approach, we adopt hybrid input-output algorithm (HIO) to recover the random phase key in the object domain and then infer the key in frequency domain. Only a plaintext-ciphertext pair is sufficient to create vulnerability. Moreover this attack does not need to select particular plaintext. The phase retrieval technique based on HIO is an iterative process performing Fourier transforms, so it fits very much into the hardware implementation of the digital signal processor (DSP). We make use of the high performance DSP to accomplish the known-plaintext attack. Compared with the software implementation, the speed of the hardware implementation is much fast. The performance of this DSP-based cryptanalysis system is also evaluated.
Online Learning Flight Control for Intelligent Flight Control Systems (IFCS)

NASA Technical Reports Server (NTRS)

Niewoehner, Kevin R.; Carter, John (Technical Monitor)

2001-01-01

The research accomplishments for the cooperative agreement 'Online Learning Flight Control for Intelligent Flight Control Systems (IFCS)' include the following: (1) previous IFC program data collection and analysis; (2) IFC program support site (configured IFC systems support network, configured Tornado/VxWorks OS development system, made Configuration and Documentation Management Systems Internet accessible); (3) Airborne Research Test Systems (ARTS) II Hardware (developed hardware requirements specification, developing environmental testing requirements, hardware design, and hardware design development); (4) ARTS II software development laboratory unit (procurement of lab style hardware, configured lab style hardware, and designed interface module equivalent to ARTS II faceplate); (5) program support documentation (developed software development plan, configuration management plan, and software verification and validation plan); (6) LWR algorithm analysis (performed timing and profiling on algorithm); (7) pre-trained neural network analysis; (8) Dynamic Cell Structures (DCS) Neural Network Analysis (performing timing and profiling on algorithm); and (9) conducted technical interchange and quarterly meetings to define IFC research goals.
Intrinsic Hardware Evolution for the Design and Reconfiguration of Analog Speed Controllers for a DC Motor

NASA Technical Reports Server (NTRS)

Gwaltney, David A.; Ferguson, Michael I.

2003-01-01

Evolvable hardware provides the capability to evolve analog circuits to produce amplifier and filter functions. Conventional analog controller designs employ these same functions. Analog controllers for the control of the shaft speed of a DC motor are evolved on an evolvable hardware platform utilizing a second generation Field Programmable Transistor Array (FPTA2). The performance of an evolved controller is compared to that of a conventional proportional-integral (PI) controller. It is shown that hardware evolution is able to create a compact design that provides good performance, while using considerably less functional electronic components than the conventional design. Additionally, the use of hardware evolution to provide fault tolerance by reconfiguring the design is explored. Experimental results are presented showing that significant recovery of capability can be made in the face of damaging induced faults.
Evaluation of Emerging Energy-Efficient Heterogeneous Computing Platforms for Biomolecular and Cellular Simulation Workloads.

PubMed

Stone, John E; Hallock, Michael J; Phillips, James C; Peterson, Joseph R; Luthey-Schulten, Zaida; Schulten, Klaus

2016-05-01

Many of the continuing scientific advances achieved through computational biology are predicated on the availability of ongoing increases in computational power required for detailed simulation and analysis of cellular processes on biologically-relevant timescales. A critical challenge facing the development of future exascale supercomputer systems is the development of new computing hardware and associated scientific applications that dramatically improve upon the energy efficiency of existing solutions, while providing increased simulation, analysis, and visualization performance. Mobile computing platforms have recently become powerful enough to support interactive molecular visualization tasks that were previously only possible on laptops and workstations, creating future opportunities for their convenient use for meetings, remote collaboration, and as head mounted displays for immersive stereoscopic viewing. We describe early experiences adapting several biomolecular simulation and analysis applications for emerging heterogeneous computing platforms that combine power-efficient system-on-chip multi-core CPUs with high-performance massively parallel GPUs. We present low-cost power monitoring instrumentation that provides sufficient temporal resolution to evaluate the power consumption of individual CPU algorithms and GPU kernels. We compare the performance and energy efficiency of scientific applications running on emerging platforms with results obtained on traditional platforms, identify hardware and algorithmic performance bottlenecks that affect the usability of these platforms, and describe avenues for improving both the hardware and applications in pursuit of the needs of molecular modeling tasks on mobile devices and future exascale computers.
An update on SCARLET hardware development and flight programs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jones, P.A.; Murphy, D.M.; Piszczor, M.F.

1995-10-01

Solar Concentrator Array with Refractive Linear Element Technology (SCARLET) is one of the first practical photovoltaic concentrator array technologies that offers a number of benefits for space applications (i.e. high array efficiency, protection from space radiation effects, a relatively light weight system, minimized plasma interactions, etc.) The line-focus concentrator concept, however, also offers two very important advantages: (1) low-cost mass production potential of the lens material; and (2) relaxation of precise array tracking requirements to only a single axis. These benefits offer unique capabilities to both commercial and government spacecraft users, specifically those interested in high radiation missions, such asmore » MEO orbits, and electric-powered propulsion LEO-to-GEO orbit raising applications. SCARLET is an aggressive hardware development and flight validation program sponsored by the Ballistic Missile Defense Organization (BMDO) and NASA Lewis Research Center. Its intent is to bring technology to the level of performance and validation necessary for use by various government and commercial programs. The first phase of the SCARLET program culminated with the design, development and fabrication of a small concentrator array for flight on the METEOR satellite. This hardware will be the first in-space demonstration of concentrator technology at the `array level` and will provide valuable in-orbit performance measurements. The METEOR satellite is currently planned for a September/October 1995 launch. The next phase of the program is the development of large array for use by one of the NASA New Millenium Program missions. This hardware will incorporate a number of the significant improvements over the basic METEOR design. This presentation will address the basic SCARLET technology, examine its benefits to users, and describe the expected improvements for future missions.« less
Design and hardware-in-loop implementation of collision avoidance algorithms for heavy commercial road vehicles

NASA Astrophysics Data System (ADS)

Rajaram, Vignesh; Subramanian, Shankar C.

2016-07-01

An important aspect from the perspective of operational safety of heavy road vehicles is the detection and avoidance of collisions, particularly at high speeds. The development of a collision avoidance system is the overall focus of the research presented in this paper. The collision avoidance algorithm was developed using a sliding mode controller (SMC) and compared to one developed using linear full state feedback in terms of performance and controller effort. Important dynamic characteristics such as load transfer during braking, tyre-road interaction, dynamic brake force distribution and pneumatic brake system response were considered. The effect of aerodynamic drag on the controller performance was also studied. The developed control algorithms have been implemented on a Hardware-in-Loop experimental set-up equipped with the vehicle dynamic simulation software, IPG/TruckMaker®. The evaluation has been performed for realistic traffic scenarios with different loading and road conditions. The Hardware-in-Loop experimental results showed that the SMC and full state feedback controller were able to prevent the collision. However, when the discrepancies in the form of parametric variations were included, the SMC provided better results in terms of reduced stopping distance and lower controller effort compared to the full state feedback controller.
Neighbour lists for smoothed particle hydrodynamics on GPUs

NASA Astrophysics Data System (ADS)

Winkler, Daniel; Rezavand, Massoud; Rauch, Wolfgang

2018-04-01

The efficient iteration of neighbouring particles is a performance critical aspect of any high performance smoothed particle hydrodynamics (SPH) solver. SPH solvers that implement a constant smoothing length generally divide the simulation domain into a uniform grid to reduce the computational complexity of the neighbour search. Based on this method, particle neighbours are either stored per grid cell or for each individual particle, denoted as Verlet list. While the latter approach has significantly higher memory requirements, it has the potential for a significant computational speedup. A theoretical comparison is performed to estimate the potential improvements of the method based on unknown hardware dependent factors. Subsequently, the computational performance of both approaches is empirically evaluated on graphics processing units. It is shown that the speedup differs significantly for different hardware, dimensionality and floating point precision. The Verlet list algorithm is implemented as an alternative to the cell linked list approach in the open-source SPH solver DualSPHysics and provided as a standalone software package.

Towards Autonomous Inspection of Space Systems Using Mobile Robotic Sensor Platforms

NASA Technical Reports Server (NTRS)

Wong, Edmond; Saad, Ashraf; Litt, Jonathan S.

2007-01-01

The space transportation systems required to support NASA's Exploration Initiative will demand a high degree of reliability to ensure mission success. This reliability can be realized through autonomous fault/damage detection and repair capabilities. It is crucial that such capabilities are incorporated into these systems since it will be impractical to rely upon Extra-Vehicular Activity (EVA), visual inspection or tele-operation due to the costly, labor-intensive and time-consuming nature of these methods. One approach to achieving this capability is through the use of an autonomous inspection system comprised of miniature mobile sensor platforms that will cooperatively perform high confidence inspection of space vehicles and habitats. This paper will discuss the efforts to develop a small scale demonstration test-bed to investigate the feasibility of using autonomous mobile sensor platforms to perform inspection operations. Progress will be discussed in technology areas including: the hardware implementation and demonstration of robotic sensor platforms, the implementation of a hardware test-bed facility, and the investigation of collaborative control algorithms.
Reconfigurable Autonomy for Future Planetary Rovers

NASA Astrophysics Data System (ADS)

Burroughes, Guy

Extra-terrestrial Planetary rover systems are uniquely remote, placing constraints in regard to communication, environmental uncertainty, and limited physical resources, and requiring a high level of fault tolerance and resistance to hardware degradation. This thesis presents a novel self-reconfiguring autonomous software architecture designed to meet the needs of extraterrestrial planetary environments. At runtime it can safely reconfigure low-level control systems, high-level decisional autonomy systems, and managed software architecture. The architecture can perform automatic Verification and Validation of self-reconfiguration at run-time, and enables a system to be self-optimising, self-protecting, and self-healing. A novel self-monitoring system, which is non-invasive, efficient, tunable, and autonomously deploying, is also presented. The architecture was validated through the use-case of a highly autonomous extra-terrestrial planetary exploration rover. Three major forms of reconfiguration were demonstrated and tested: first, high level adjustment of system internal architecture and goal; second, software module modification; and third, low level alteration of hardware control in response to degradation of hardware and environmental change. The architecture was demonstrated to be robust and effective in a Mars sample return mission use-case testing the operational aspects of a novel, reconfigurable guidance, navigation, and control system for a planetary rover, all operating in concert through a scenario that required reconfiguration of all elements of the system.
A research of a high precision multichannel data acquisition system

NASA Astrophysics Data System (ADS)

Zhong, Ling-na; Tang, Xiao-ping; Yan, Wei

2013-08-01

The output signals of the focusing system in lithography are analog. To convert the analog signals into digital ones which are more flexible and stable to process, a desirable data acquisition system is required. The resolution of data acquisition, to some extent, affects the accuracy of focusing. In this article, we first compared performance between the various kinds of analog-to-digital converters (ADC) available on the market at the moment. Combined with the specific requirements (sampling frequency, converting accuracy, numbers of channels etc) and the characteristics (polarization, amplitude range etc) of the analog signals, the model of the ADC to be used as the core chip in our hardware design was determined. On this basis, we chose other chips needed in the hardware circuit that would well match with ADC, then the overall hardware design was obtained. Validation of our data acquisition system was verified through experiments and it can be demonstrated that the system can effectively realize the high resolution conversion of the multi-channel analog signals and give the accurate focusing information in lithography.
High-Speed Data Recorder for Space, Geodesy, and Other High-Speed Recording Applications

NASA Technical Reports Server (NTRS)

Taveniku, Mikael

2013-01-01

A high-speed data recorder and replay equipment has been developed for reliable high-data-rate recording to disk media. It solves problems with slow or faulty disks, multiple disk insertions, high-altitude operation, reliable performance using COTS hardware, and long-term maintenance and upgrade path challenges. The current generation data recor - ders used within the VLBI community are aging, special-purpose machines that are both slow (do not meet today's requirements) and are very expensive to maintain and operate. Furthermore, they are not easily upgraded to take advantage of commercial technology development, and are not scalable to multiple 10s of Gbit/s data rates required by new applications. The innovation provides a softwaredefined, high-speed data recorder that is scalable with technology advances in the commercial space. It maximally utilizes current technologies without being locked to a particular hardware platform. The innovation also provides a cost-effective way of streaming large amounts of data from sensors to disk, enabling many applications to store raw sensor data and perform post and signal processing offline. This recording system will be applicable to many applications needing realworld, high-speed data collection, including electronic warfare, softwaredefined radar, signal history storage of multispectral sensors, development of autonomous vehicles, and more.
High-productivity DRIE solutions for 3D-SiP and MEMS volume manufacturing

NASA Astrophysics Data System (ADS)

Puech, M.; Thevenoud, J. M.; Launay, N.; Arnal, N.; Godinat, P.; Andrieu, B.; Gruffat, J. M.

2006-12-01

Emerging 3D-SiP technologies and high volume MEMS applications require high productivity mass production DRIE systems. The Alcatel DRIE product range has recently been optimized to reach the highest process and hardware production performances. A study based on sub-micron high aspect ratio structures encountered in the most stringent 3D-SiP has been carried out. The optimization of the Bosch process parameters have shown ultra high silicon etch rate, with unrivaled uniformity and repeatability leading to excellent process yields. In parallel, most recent hardware and proprietary design optimization including vacuum pumping lines, process chamber, wafer chucks, pressure control system, gas delivery are discussed. A key factor for achieving the highest performances was the recognized expertise of Alcatel vacuum and plasma science technologies. These improvements have been monitored in a mass production environment for a mobile phone application. Field data analysis shows a significant reduction of cost of ownership thanks to increased throughput and much lower running costs. These benefits are now available for all 3D-SiP and high volume MEMS applications. The typical etched patterns include tapered trenches for CMOS imagers, through silicon via holes for die stacking, well controlled profile angle for 3D high precision inertial sensors, and large exposed area features for inkjet printer head and Silicon microphones.
Soft Computing Techniques for the Protein Folding Problem on High Performance Computing Architectures.

PubMed

Llanes, Antonio; Muñoz, Andrés; Bueno-Crespo, Andrés; García-Valverde, Teresa; Sánchez, Antonia; Arcas-Túnez, Francisco; Pérez-Sánchez, Horacio; Cecilia, José M

2016-01-01

The protein-folding problem has been extensively studied during the last fifty years. The understanding of the dynamics of global shape of a protein and the influence on its biological function can help us to discover new and more effective drugs to deal with diseases of pharmacological relevance. Different computational approaches have been developed by different researchers in order to foresee the threedimensional arrangement of atoms of proteins from their sequences. However, the computational complexity of this problem makes mandatory the search for new models, novel algorithmic strategies and hardware platforms that provide solutions in a reasonable time frame. We present in this revision work the past and last tendencies regarding protein folding simulations from both perspectives; hardware and software. Of particular interest to us are both the use of inexact solutions to this computationally hard problem as well as which hardware platforms have been used for running this kind of Soft Computing techniques.
Reset Tree-Based Optical Fault Detection

PubMed Central

Lee, Dong-Geon; Choi, Dooho; Seo, Jungtaek; Kim, Howon

2013-01-01

In this paper, we present a new reset tree-based scheme to protect cryptographic hardware against optical fault injection attacks. As one of the most powerful invasive attacks on cryptographic hardware, optical fault attacks cause semiconductors to misbehave by injecting high-energy light into a decapped integrated circuit. The contaminated result from the affected chip is then used to reveal secret information, such as a key, from the cryptographic hardware. Since the advent of such attacks, various countermeasures have been proposed. Although most of these countermeasures are strong, there is still the possibility of attack. In this paper, we present a novel optical fault detection scheme that utilizes the buffers on a circuit's reset signal tree as a fault detection sensor. To evaluate our proposal, we model radiation-induced currents into circuit components and perform a SPICE simulation. The proposed scheme is expected to be used as a supplemental security tool. PMID:23698267
Implementing real-time robotic systems using CHIMERA II

NASA Technical Reports Server (NTRS)

Stewart, David B.; Schmitz, Donald E.; Khosla, Pradeep K.

1990-01-01

A description is given of the CHIMERA II programming environment and operating system, which was developed for implementing real-time robotic systems. Sensor-based robotic systems contain both general- and special-purpose hardware, and thus the development of applications tends to be a time-consuming task. The CHIMERA II environment is designed to reduce the development time by providing a convenient software interface between the hardware and the user. CHIMERA II supports flexible hardware configurations which are based on one or more VME-backplanes. All communication across multiple processors is transparent to the user through an extensive set of interprocessor communication primitives. CHIMERA II also provides a high-performance real-time kernel which supports both deadline and highest-priority-first scheduling. The flexibility of CHIMERA II allows hierarchical models for robot control, such as NASREM, to be implemented with minimal programming time and effort.
Accessible high performance computing solutions for near real-time image processing for time critical applications

NASA Astrophysics Data System (ADS)

Bielski, Conrad; Lemoine, Guido; Syryczynski, Jacek

2009-09-01

High Performance Computing (HPC) hardware solutions such as grid computing and General Processing on a Graphics Processing Unit (GPGPU) are now accessible to users with general computing needs. Grid computing infrastructures in the form of computing clusters or blades are becoming common place and GPGPU solutions that leverage the processing power of the video card are quickly being integrated into personal workstations. Our interest in these HPC technologies stems from the need to produce near real-time maps from a combination of pre- and post-event satellite imagery in support of post-disaster management. Faster processing provides a twofold gain in this situation: 1. critical information can be provided faster and 2. more elaborate automated processing can be performed prior to providing the critical information. In our particular case, we test the use of the PANTEX index which is based on analysis of image textural measures extracted using anisotropic, rotation-invariant GLCM statistics. The use of this index, applied in a moving window, has been shown to successfully identify built-up areas in remotely sensed imagery. Built-up index image masks are important input to the structuring of damage assessment interpretation because they help optimise the workload. The performance of computing the PANTEX workflow is compared on two different HPC hardware architectures: (1) a blade server with 4 blades, each having dual quad-core CPUs and (2) a CUDA enabled GPU workstation. The reference platform is a dual CPU-quad core workstation and the PANTEX workflow total computing time is measured. Furthermore, as part of a qualitative evaluation, the differences in setting up and configuring various hardware solutions and the related software coding effort is presented.
Environmental testing for new SOFIA flight hardware

NASA Astrophysics Data System (ADS)

Lachenmann, Michael; Wolf, Jürgen; Strecker, Rainer; Weckenmann, Benedikt; Trimpe, Fritz; Hall, Helen J.

2014-07-01

New flight hardware for the Stratospheric Observatory for Infrared Astronomy (SOFIA) has to be tested to prove its safety and functionality and to measure its performance under flight conditions. Although it is not expected to experience critical issues inside the pressurized cabin with close-to-normal conditions, all equipment has to be tested for safety margins in case of a decompression event and/or for unusual high temperatures, e.g. inside an electronic unit caused by a malfunction as well as unusual high ambient temperatures inside the cabin, when the aircraft is parked in a desert. For equipment mounted on the cavity side of the telescope, stratospheric conditions apply, i.e., temperatures from -40 °C to -60°C and an air pressure of about 0.1 bar. Besides safety aspects as not to endanger personnel or equipment, new hardware inside the cavity has to function and to perform to specifications under such conditions. To perform these tests, an environmental test laboratory was set up at the SOFIA Science Center at the NASA Ames Research Center, including a thermal vacuum chamber, temperature measurement equipment, and a control and data logging workstation. This paper gives an overview of the test and measurement equipment, shows results from the commissioning and characterization of the thermal vacuum chamber, and presents examples of the component tests that were performed so far. To test the focus position stability of optics when cooling them to stratospheric temperatures, an auto-collimation device has been developed. We will present its design and results from measurements on commercial off-the-shelf optics as candidates for the new Wide Field Imager for SOFIA as an example.
The Astronaut-Athlete: Optimizing Human Performance in Space.

PubMed

Hackney, Kyle J; Scott, Jessica M; Hanson, Andrea M; English, Kirk L; Downs, Meghan E; Ploutz-Snyder, Lori L

2015-12-01

It is well known that long-duration spaceflight results in deconditioning of neuromuscular and cardiovascular systems, leading to a decline in physical fitness. On reloading in gravitational environments, reduced fitness (e.g., aerobic capacity, muscular strength, and endurance) could impair human performance, mission success, and crew safety. The level of fitness necessary for the performance of routine and off-nominal terrestrial mission tasks remains an unanswered and pressing question for scientists and flight physicians. To mitigate fitness loss during spaceflight, resistance and aerobic exercise are the most effective countermeasure available to astronauts. Currently, 2.5 h·d, 6-7 d·wk is allotted in crew schedules for exercise to be performed on highly specialized hardware on the International Space Station (ISS). Exercise hardware provides up to 273 kg of loading capability for resistance exercise, treadmill speeds between 0.44 and 5.5 m·s, and cycle workloads from 0 and 350 W. Compared to ISS missions, future missions beyond low earth orbit will likely be accomplished with less vehicle volume and power allocated for exercise hardware. Concomitant factors, such as diet and age, will also affect the physiologic responses to exercise training (e.g., anabolic resistance) in the space environment. Research into the potential optimization of exercise countermeasures through use of dietary supplementation, and pharmaceuticals may assist in reducing physiological deconditioning during long-duration spaceflight and have the potential to enhance performance of occupationally related astronaut tasks (e.g., extravehicular activity, habitat construction, equipment repairs, planetary exploration, and emergency response).
Boeing's STAR-FODB test results

NASA Astrophysics Data System (ADS)

Fritz, Martin E.; de la Chapelle, Michael; Van Ausdal, Arthur W.

1995-05-01

Boeing has successfully concluded a 2 1/2 year, two phase developmental contract for the STAR-Fiber Optic Data Bus (FODB) that is intended for future space-based applications. The first phase included system analysis, trade studies, behavior modeling, and architecture and protocal selection. During this phase we selected AS4074 Linear Token Passing Bus (LTPB) protocol operating at 200 Mbps, along with the passive, star-coupled fiber media. The second phase involved design, build, integration, and performance and environmental test of brassboard hardware. The resulting brassboard hardware successfully passed performance testing, providing 200 Mbps operation with a 32 X 32 star-coupled medium. This hardware is suitable for a spaceflight experiment to validate ground testing and analysis and to demonstrate performace in the intended environment. The fiber bus interface unit (FBIU) is a multichip module containing transceiver, protocol, and data formatting chips, buffer memory, and a station management controller. The FBIU has been designed for low power, high reliability, and radiation tolerance. Nine FBIUs were built and integrated with the fiber optic physical layer consisting of the fiber cable plant (FCP) and star coupler assembly (SCA). Performance and environmental testing, including radiation exposure, was performed on selected FBIUs and the physical layer. The integrated system was demonstrated with a full motion color video image transfer across the bus while simultaneously performing utility functions with a fiber bus control module (FBCM) over a telemetry and control (T&C) bus, in this case AS1773.
Hardware and software status of QCDOC

NASA Astrophysics Data System (ADS)

Boyle, P. A.; Chen, D.; Christ, N. H.; Clark, M.; Cohen, S. D.; Cristian, C.; Dong, Z.; Gara, A.; Joó, B.; Jung, C.; Kim, C.; Levkova, L.; Liao, X.; Liu, G.; Mawhinney, R. D.; Ohta, S.; Petrov, K.; Wettig, T.; Yamaguchi, A.

2004-03-01

QCDOC is a massively parallel supercomputer whose processing nodes are based on an application-specific integrated circuit (ASIC). This ASIC was custom-designed so that crucial lattice QCD kernels achieve an overall sustained performance of 50% on machines with several 10,000 nodes. This strong scalability, together with low power consumption and a price/performance ratio of $1 per sustained MFlops, enable QCDOC to attack the most demanding lattice QCD problems. The first ASICs became available in June of 2003, and the testing performed so far has shown all systems functioning according to specification. We review the hardware and software status of QCDOC and present performance figures obtained in real hardware as well as in simulation.
Performance Qualification Test of the ISS Water Processor Assembly (WPA) Expendables

NASA Technical Reports Server (NTRS)

Carter, Layne; Tabb, David; Tatara, James D.; Mason, Richard K.

2005-01-01

The Water Processor Assembly (WPA) for use on the International Space Station (ISS) includes various technologies for the treatment of waste water. These technologies include filtration, ion exchange, adsorption, catalytic oxidation, and iodination. The WPA hardware implementing portions of these technologies, including the Particulate Filter, Multifiltration Bed, Ion Exchange Bed, and Microbial Check Valve, was recently qualified for chemical performance at the Marshall Space Flight Center. Waste water representing the quality of that produced on the ISS was generated by test subjects and processed by the WPA. Water quality analysis and instrumentation data was acquired throughout the test to monitor hardware performance. This paper documents operation of the test and the assessment of the hardware performance.
Evaluation of the OpenCL AES Kernel using the Intel FPGA SDK for OpenCL

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jin, Zheming; Yoshii, Kazutomo; Finkel, Hal

The OpenCL standard is an open programming model for accelerating algorithms on heterogeneous computing system. OpenCL extends the C-based programming language for developing portable codes on different platforms such as CPU, Graphics processing units (GPUs), Digital Signal Processors (DSPs) and Field Programmable Gate Arrays (FPGAs). The Intel FPGA SDK for OpenCL is a suite of tools that allows developers to abstract away the complex FPGA-based development flow for a high-level software development flow. Users can focus on the design of hardware-accelerated kernel functions in OpenCL and then direct the tools to generate the low-level FPGA implementations. The approach makes themore » FPGA-based development more accessible to software users as the needs for hybrid computing using CPUs and FPGAs are increasing. It can also significantly reduce the hardware development time as users can evaluate different ideas with high-level language without deep FPGA domain knowledge. In this report, we evaluate the performance of the kernel using the Intel FPGA SDK for OpenCL and Nallatech 385A FPGA board. Compared to the M506 module, the board provides more hardware resources for a larger design exploration space. The kernel performance is measured with the compute kernel throughput, an upper bound to the FPGA throughput. The report presents the experimental results in details. The Appendix lists the kernel source code.« less
FPGA Based Reconfigurable ATM Switch Test Bed

NASA Technical Reports Server (NTRS)

Chu, Pong P.; Jones, Robert E.

1998-01-01

Various issues associated with "FPGA Based Reconfigurable ATM Switch Test Bed" are presented in viewgraph form. Specific topics include: 1) Network performance evaluation; 2) traditional approaches; 3) software simulation; 4) hardware emulation; 5) test bed highlights; 6) design environment; 7) test bed architecture; 8) abstract sheared-memory switch; 9) detailed switch diagram; 10) traffic generator; 11) data collection circuit and user interface; 12) initial results; and 13) the following conclusions: Advances in FPGA make hardware emulation feasible for performance evaluation, hardware emulation can provide several orders of magnitude speed-up over software simulation; due to the complexity of hardware synthesis process, development in emulation is much more difficult than simulation and requires knowledge in both networks and digital design.
Code Modernization of VPIC

NASA Astrophysics Data System (ADS)

Bird, Robert; Nystrom, David; Albright, Brian

2017-10-01

The ability of scientific simulations to effectively deliver performant computation is increasingly being challenged by successive generations of high-performance computing architectures. Code development to support efficient computation on these modern architectures is both expensive, and highly complex; if it is approached without due care, it may also not be directly transferable between subsequent hardware generations. Previous works have discussed techniques to support the process of adapting a legacy code for modern hardware generations, but despite the breakthroughs in the areas of mini-app development, portable-performance, and cache oblivious algorithms the problem still remains largely unsolved. In this work we demonstrate how a focus on platform agnostic modern code-development can be applied to Particle-in-Cell (PIC) simulations to facilitate effective scientific delivery. This work builds directly on our previous work optimizing VPIC, in which we replaced intrinsic based vectorisation with compile generated auto-vectorization to improve the performance and portability of VPIC. In this work we present the use of a specialized SIMD queue for processing some particle operations, and also preview a GPU capable OpenMP variant of VPIC. Finally we include a lessons learnt. Work performed under the auspices of the U.S. Dept. of Energy by the Los Alamos National Security, LLC Los Alamos National Laboratory under contract DE-AC52-06NA25396 and supported by the LANL LDRD program.
Accelerating epistasis analysis in human genetics with consumer graphics hardware.

PubMed

Sinnott-Armstrong, Nicholas A; Greene, Casey S; Cancare, Fabio; Moore, Jason H

2009-07-24

Human geneticists are now capable of measuring more than one million DNA sequence variations from across the human genome. The new challenge is to develop computationally feasible methods capable of analyzing these data for associations with common human disease, particularly in the context of epistasis. Epistasis describes the situation where multiple genes interact in a complex non-linear manner to determine an individual's disease risk and is thought to be ubiquitous for common diseases. Multifactor Dimensionality Reduction (MDR) is an algorithm capable of detecting epistasis. An exhaustive analysis with MDR is often computationally expensive, particularly for high order interactions. This challenge has previously been met with parallel computation and expensive hardware. The option we examine here exploits commodity hardware designed for computer graphics. In modern computers Graphics Processing Units (GPUs) have more memory bandwidth and computational capability than Central Processing Units (CPUs) and are well suited to this problem. Advances in the video game industry have led to an economy of scale creating a situation where these powerful components are readily available at very low cost. Here we implement and evaluate the performance of the MDR algorithm on GPUs. Of primary interest are the time required for an epistasis analysis and the price to performance ratio of available solutions. We found that using MDR on GPUs consistently increased performance per machine over both a feature rich Java software package and a C++ cluster implementation. The performance of a GPU workstation running a GPU implementation reduces computation time by a factor of 160 compared to an 8-core workstation running the Java implementation on CPUs. This GPU workstation performs similarly to 150 cores running an optimized C++ implementation on a Beowulf cluster. Furthermore this GPU system provides extremely cost effective performance while leaving the CPU available for other tasks. The GPU workstation containing three GPUs costs $2000 while obtaining similar performance on a Beowulf cluster requires 150 CPU cores which, including the added infrastructure and support cost of the cluster system, cost approximately $82,500. Graphics hardware based computing provides a cost effective means to perform genetic analysis of epistasis using MDR on large datasets without the infrastructure of a computing cluster.
High speed bus technology development

NASA Astrophysics Data System (ADS)

Modrow, Marlan B.; Hatfield, Donald W.

1989-09-01

The development and demonstration of the High Speed Data Bus system, a 50 Million bits per second (Mbps) local data network intended for avionics applications in advanced military aircraft is described. The Advanced System Avionics (ASA)/PAVE PILLAR program provided the avionics architecture concept and basic requirements. Designs for wire and fiber optic media were produced and hardware demonstrations were performed. An efficient, robust token-passing protocol was developed and partially demonstrated. The requirements specifications, the trade-offs made, and the resulting designs for both a coaxial wire media system and a fiber optics design are examined. Also, the development of a message-oriented media access protocol is described, from requirements definition through analysis, simulation and experimentation. Finally, the testing and demonstrations conducted on the breadboard and brassboard hardware is presented.
Safe to Fly: Certifying COTS Hardware for Spaceflight

NASA Technical Reports Server (NTRS)

Fichuk, Jessica L.

2011-01-01

Providing hardware for the astronauts to use on board the Space Shuttle or International Space Station (ISS) involves a certification process that entails evaluating hardware safety, weighing risks, providing mitigation, and verifying requirements. Upon completion of this certification process, the hardware is deemed safe to fly. This process from start to finish can be completed as quickly as 1 week or can take several years in length depending on the complexity of the hardware and whether the item is a unique custom design. One area of cost and schedule savings that NASA implements is buying Commercial Off the Shelf (COTS) hardware and certifying it for human spaceflight as safe to fly. By utilizing commercial hardware, NASA saves time not having to develop, design and build the hardware from scratch, as well as a timesaving in the certification process. By utilizing COTS hardware, the current detailed certification process can be simplified which results in schedule savings. Cost savings is another important benefit of flying COTS hardware. Procuring COTS hardware for space use can be more economical than custom building the hardware. This paper will investigate the cost savings associated with certifying COTS hardware to NASA s standards rather than performing a custom build.

Standard Hardware Acquisition and Reliability Program's (SHARP's) efforts in incorporating fiber optic interconnects into standard electronic module (SEM) connectors

NASA Astrophysics Data System (ADS)

Riggs, William R.

1994-05-01

SHARP is a Navy wide logistics technology development effort aimed at reducing the acquisition costs, support costs, and risks of military electronic weapon systems while increasing the performance capability, reliability, maintainability, and readiness of these systems. Lower life cycle costs for electronic hardware are achieved through technology transition, standardization, and reliability enhancement to improve system affordability and availability as well as enhancing fleet modernization. Advanced technology is transferred into the fleet through hardware specifications for weapon system building blocks of standard electronic modules, standard power systems, and standard electronic systems. The product lines are all defined with respect to their size, weight, I/O, environmental performance, and operational performance. This method of defining the standard is very conducive to inserting new technologies into systems using the standard hardware. This is the approach taken thus far in inserting photonic technologies into SHARP hardware. All of the efforts have been related to module packaging; i.e. interconnects, component packaging, and module developments. Fiber optic interconnects are discussed in this paper.
Transistor Level Circuit Experiments using Evolvable Hardware

NASA Technical Reports Server (NTRS)

Stoica, A.; Zebulum, R. S.; Keymeulen, D.; Ferguson, M. I.; Daud, Taher; Thakoor, A.

2005-01-01

The Jet Propulsion Laboratory (JPL) performs research in fault tolerant, long life, and space survivable electronics for the National Aeronautics and Space Administration (NASA). With that focus, JPL has been involved in Evolvable Hardware (EHW) technology research for the past several years. We have advanced the technology not only by simulation and evolution experiments, but also by designing, fabricating, and evolving a variety of transistor-based analog and digital circuits at the chip level. EHW refers to self-configuration of electronic hardware by evolutionary/genetic search mechanisms, thereby maintaining existing functionality in the presence of degradations due to aging, temperature, and radiation. In addition, EHW has the capability to reconfigure itself for new functionality when required for mission changes or encountered opportunities. Evolution experiments are performed using a genetic algorithm running on a DSP as the reconfiguration mechanism and controlling the evolvable hardware mounted on a self-contained circuit board. Rapid reconfiguration allows convergence to circuit solutions in the order of seconds. The paper illustrates hardware evolution results of electronic circuits and their ability to perform under 230 C temperature as well as radiations of up to 250 kRad.
A Stochastic Spiking Neural Network for Virtual Screening.

PubMed

Morro, A; Canals, V; Oliver, A; Alomar, M L; Galan-Prado, F; Ballester, P J; Rossello, J L

2018-04-01

Virtual screening (VS) has become a key computational tool in early drug design and screening performance is of high relevance due to the large volume of data that must be processed to identify molecules with the sought activity-related pattern. At the same time, the hardware implementations of spiking neural networks (SNNs) arise as an emerging computing technique that can be applied to parallelize processes that normally present a high cost in terms of computing time and power. Consequently, SNN represents an attractive alternative to perform time-consuming processing tasks, such as VS. In this brief, we present a smart stochastic spiking neural architecture that implements the ultrafast shape recognition (USR) algorithm achieving two order of magnitude of speed improvement with respect to USR software implementations. The neural system is implemented in hardware using field-programmable gate arrays allowing a highly parallelized USR implementation. The results show that, due to the high parallelization of the system, millions of compounds can be checked in reasonable times. From these results, we can state that the proposed architecture arises as a feasible methodology to efficiently enhance time-consuming data-mining processes such as 3-D molecular similarity search.
Identifying Trustworthiness Deficit in Legacy Systems Using the NFR Approach

DTIC Science & Technology

2014-01-01

trustworthy envi- ronment. These adaptations can be stated in terms of design modifications and/or implementation mechanisms (for example, wrappers) that will...extensions to the VHSIC Hardware Description Language ( VHDL -AMS). He has spent the last 10 years leading research in high performance embedded computing
High performance bilateral telerobot control.

PubMed

Kline-Schoder, Robert; Finger, William; Hogan, Neville

2002-01-01

Telerobotic systems are used when the environment that requires manipulation is not easily accessible to humans, as in space, remote, hazardous, or microscopic applications or to extend the capabilities of an operator by scaling motions and forces. The Creare control algorithm and software is an enabling technology that makes possible guaranteed stability and high performance for force-feedback telerobots. We have developed the necessary theory, structure, and software design required to implement high performance telerobot systems with time delay. This includes controllers for the master and slave manipulators, the manipulator servo levels, the communication link, and impedance shaping modules. We verified the performance using both bench top hardware as well as a commercial microsurgery system.
High capacity demonstration of honeycomb panel heat pipes

NASA Technical Reports Server (NTRS)

Tanzer, H. J.

1989-01-01

The feasibility of performance enhancing the sandwich panel heat pipe was investigated for moderate temperature range heat rejection radiators on future-high-power spacecraft. The hardware development program consisted of performance prediction modeling, fabrication, ground test, and data correlation. Using available sandwich panel materials, a series of subscale test panels were augumented with high-capacity sideflow and temperature control variable conductance features, and test evaluated for correlation with performance prediction codes. Using the correlated prediction model, a 50-kW full size radiator was defined using methanol working fluid and closely spaced sideflows. A new concept called the hybrid radiator individually optimizes heat pipe components. A 2.44-m long hybrid test vehicle demonstrated proof-of-principle performance.
Efficiently passing messages in distributed spiking neural network simulation.

PubMed

Thibeault, Corey M; Minkovich, Kirill; O'Brien, Michael J; Harris, Frederick C; Srinivasa, Narayan

2013-01-01

Efficiently passing spiking messages in a neural model is an important aspect of high-performance simulation. As the scale of networks has increased so has the size of the computing systems required to simulate them. In addition, the information exchange of these resources has become more of an impediment to performance. In this paper we explore spike message passing using different mechanisms provided by the Message Passing Interface (MPI). A specific implementation, MVAPICH, designed for high-performance clusters with Infiniband hardware is employed. The focus is on providing information about these mechanisms for users of commodity high-performance spiking simulators. In addition, a novel hybrid method for spike exchange was implemented and benchmarked.
Software/hardware optimization for attenuation-based microtomography using SR at PETRA III (Conference Presentation)

NASA Astrophysics Data System (ADS)

Beckmann, Felix

2016-10-01

The Helmholtz-Zentrum Geesthacht, Germany, is operating the user experiments for microtomography at the beamlines P05 and P07 using synchrotron radiation produced in the storage ring PETRA III at DESY, Hamburg, Germany. In recent years the software pipeline, sample changing hardware for performing high throughput experiments were developed. In this talk the current status of the beamlines will be given. Furthermore, optimisation and automatisation of scanning techniques, will be presented. These are required to scan samples which are larger than the field of view defined by the X-ray beam. The integration into an optimized reconstruction pipeline will be shown.
Evaluation of RSRM case hardware fretting concerns

NASA Technical Reports Server (NTRS)

Swauger, Thomas R.

1990-01-01

Fretting corrosion was first noted on Shuttle flight STS-26. This flight was the first usage of the Redesigned Solid Rocket Motor (RSRM). The occurrence of fretting has since been observed on both the field and factory joints of the RSRM. Fretting is a form of corrosion that occurs at the interface between contacting, highly loaded, metal surfaces when exposed to slight relative vibratory motions. The engineering effort performed to evaluate the effect of fretting on the RSRM case hardware is summarized. Based on the results of this evaluation, several conclusions were made concerning flight safety. Also, recommendations were made concerning trending the effects of multiple generations of fretting damage.
Hardware/Software Issues for Video Guidance Systems: The Coreco Frame Grabber

NASA Technical Reports Server (NTRS)

Bales, John W.

1996-01-01

The F64 frame grabber is a high performance video image acquisition and processing board utilizing the TMS320C40 and TMS34020 processors. The hardware is designed for the ISA 16 bit bus and supports multiple digital or analog cameras. It has an acquisition rate of 40 million pixels per second, with a variable sampling frequency of 510 kHz to MO MHz. The board has a 4MB frame buffer memory expandable to 32 MB, and has a simultaneous acquisition and processing capability. It supports both VGA and RGB displays, and accepts all analog and digital video input standards.
CAVIAR: a 45k neuron, 5M synapse, 12G connects/s AER hardware sensory-processing- learning-actuating system for high-speed visual object recognition and tracking.

PubMed

Serrano-Gotarredona, Rafael; Oster, Matthias; Lichtsteiner, Patrick; Linares-Barranco, Alejandro; Paz-Vicente, Rafael; Gomez-Rodriguez, Francisco; Camunas-Mesa, Luis; Berner, Raphael; Rivas-Perez, Manuel; Delbruck, Tobi; Liu, Shih-Chii; Douglas, Rodney; Hafliger, Philipp; Jimenez-Moreno, Gabriel; Civit Ballcels, Anton; Serrano-Gotarredona, Teresa; Acosta-Jimenez, Antonio J; Linares-Barranco, Bernabé

2009-09-01

This paper describes CAVIAR, a massively parallel hardware implementation of a spike-based sensing-processing-learning-actuating system inspired by the physiology of the nervous system. CAVIAR uses the asychronous address-event representation (AER) communication framework and was developed in the context of a European Union funded project. It has four custom mixed-signal AER chips, five custom digital AER interface components, 45k neurons (spiking cells), up to 5M synapses, performs 12G synaptic operations per second, and achieves millisecond object recognition and tracking latencies.
High performance network and channel-based storage

NASA Technical Reports Server (NTRS)

Katz, Randy H.

1991-01-01

In the traditional mainframe-centered view of a computer system, storage devices are coupled to the system through complex hardware subsystems called input/output (I/O) channels. With the dramatic shift towards workstation-based computing, and its associated client/server model of computation, storage facilities are now found attached to file servers and distributed throughout the network. We discuss the underlying technology trends that are leading to high performance network-based storage, namely advances in networks, storage devices, and I/O controller and server architectures. We review several commercial systems and research prototypes that are leading to a new approach to high performance computing based on network-attached storage.
A Real-Time High Performance Data Compression Technique For Space Applications

NASA Technical Reports Server (NTRS)

Yeh, Pen-Shu; Venbrux, Jack; Bhatia, Prakash; Miller, Warner H.

2000-01-01

A high performance lossy data compression technique is currently being developed for space science applications under the requirement of high-speed push-broom scanning. The technique is also error-resilient in that error propagation is contained within a few scan lines. The algorithm is based on block-transform combined with bit-plane encoding; this combination results in an embedded bit string with exactly the desirable compression rate. The lossy coder is described. The compression scheme performs well on a suite of test images typical of images from spacecraft instruments. Hardware implementations are in development; a functional chip set is expected by the end of 2001.
Integration of an open interface PC scene generator using COTS DVI converter hardware

NASA Astrophysics Data System (ADS)

Nordland, Todd; Lyles, Patrick; Schultz, Bret

2006-05-01

Commercial-Off-The-Shelf (COTS) personal computer (PC) hardware is increasingly capable of computing high dynamic range (HDR) scenes for military sensor testing at high frame rates. New electro-optical and infrared (EO/IR) scene projectors feature electrical interfaces that can accept the DVI output of these PC systems. However, military Hardware-in-the-loop (HWIL) facilities such as those at the US Army Aviation and Missile Research Development and Engineering Center (AMRDEC) utilize a sizeable inventory of existing projection systems that were designed to use the Silicon Graphics Incorporated (SGI) digital video port (DVP, also known as DVP2 or DD02) interface. To mate the new DVI-based scene generation systems to these legacy projection systems, CG2 Inc., a Quantum3D Company (CG2), has developed a DVI-to-DVP converter called Delta DVP. This device takes progressive scan DVI input, converts it to digital parallel data, and combines and routes color components to derive a 16-bit wide luminance channel replicated on a DVP output interface. The HWIL Functional Area of AMRDEC has developed a suite of modular software to perform deterministic real-time, wave band-specific rendering of sensor scenes, leveraging the features of commodity graphics hardware and open source software. Together, these technologies enable sensor simulation and test facilities to integrate scene generation and projection components with diverse pedigrees.
Concepts for on-board satellite image registration. Volume 2: IAS prototype performance evaluation standard definition. [NEEDS Information Adaptive System

NASA Technical Reports Server (NTRS)

Daluge, D. R.; Ruedger, W. H.

1981-01-01

Problems encountered in testing onboard signal processing hardware designed to achieve radiometric and geometric correction of satellite imaging data are considered. These include obtaining representative image and ancillary data for simulation and the transfer and storage of a large quantity of image data at very high speed. The high resolution, high speed preprocessing of LANDSAT-D imagery is considered.
CMOL/CMOS hardware architectures and performance/price for Bayesian memory - The building block of intelligent systems

NASA Astrophysics Data System (ADS)

Zaveri, Mazad Shaheriar

The semiconductor/computer industry has been following Moore's law for several decades and has reaped the benefits in speed and density of the resultant scaling. Transistor density has reached almost one billion per chip, and transistor delays are in picoseconds. However, scaling has slowed down, and the semiconductor industry is now facing several challenges. Hybrid CMOS/nano technologies, such as CMOL, are considered as an interim solution to some of the challenges. Another potential architectural solution includes specialized architectures for applications/models in the intelligent computing domain, one aspect of which includes abstract computational models inspired from the neuro/cognitive sciences. Consequently in this dissertation, we focus on the hardware implementations of Bayesian Memory (BM), which is a (Bayesian) Biologically Inspired Computational Model (BICM). This model is a simplified version of George and Hawkins' model of the visual cortex, which includes an inference framework based on Judea Pearl's belief propagation. We then present a "hardware design space exploration" methodology for implementing and analyzing the (digital and mixed-signal) hardware for the BM. This particular methodology involves: analyzing the computational/operational cost and the related micro-architecture, exploring candidate hardware components, proposing various custom hardware architectures using both traditional CMOS and hybrid nanotechnology - CMOL, and investigating the baseline performance/price of these architectures. The results suggest that CMOL is a promising candidate for implementing a BM. Such implementations can utilize the very high density storage/computation benefits of these new nano-scale technologies much more efficiently; for example, the throughput per 858 mm2 (TPM) obtained for CMOL based architectures is 32 to 40 times better than the TPM for a CMOS based multiprocessor/multi-FPGA system, and almost 2000 times better than the TPM for a PC implementation. We later use this methodology to investigate the hardware implementations of cortex-scale spiking neural system, which is an approximate neural equivalent of BICM based cortex-scale system. The results of this investigation also suggest that CMOL is a promising candidate to implement such large-scale neuromorphic systems. In general, the assessment of such hypothetical baseline hardware architectures provides the prospects for building large-scale (mammalian cortex-scale) implementations of neuromorphic/Bayesian/intelligent systems using state-of-the-art and beyond state-of-the-art silicon structures.
Age Life Evaluation of Space Shuttle Crew Escape System Pyrotechnic Components Loaded with Hexanitrostilbene (HNS)

NASA Technical Reports Server (NTRS)

Hoffman, William C., III

1996-01-01

Determining deterioration characteristics of the Space Shuttle crew escape system pyrotechnic components loaded with hexanitrostilbene would enable us to establish a hardware life-limit for these items, so we could better plan our equipment use and, possibly, extend the useful life of the hardware. We subjected components to accelerated-age environments to determine degradation characteristics and established a hardware life-limit based upon observed and calculated trends. We extracted samples using manufacturing lots currently installed in the Space Shuttle crew escape system and from other NASA programs. Hardware included in the study consisted of various forms and ages of mild detonating fuse, linear shaped charge, and flexible confined detonating cord. The hardware types were segregated into 5 groups. One was subjected to detonation velocity testing for a baseline. Two were first subjected to prolonged 155 F heat exposure, and the other two were first subjected to 255 F, before undergoing detonation velocity testing and/or chromatography analysis. Test results showed no measurable changes in performance to allow a prediction of an end of life given the storage and elevated temperature environments the hardware experiences. Given the lack of a definitive performance trend, coupled with previous tests on post-flight Space Shuttle hardware showing no significant changes in chemical purity or detonation velocity, we recommend a safe increase in the useful life of the hardware to 20 years, from the current maximum limits of 10 and 15 years, depending on the hardware.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Bachan, John

Chisel is a new open-source hardware construction language developed at UC Berkeley that supports advanced hardware design using highly parameterized generators and layered domain-specific hardware languages. Chisel is embedded in the Scala programming language, which raises the level of hardware design abstraction by providing concepts including object orientation, functional programming, parameterized types, and type inference. From the same source, Chisel can generate a high-speed C++-based cycle-accurate software simulator, or low-level Verilog designed to pass on to standard ASIC or FPGA tools for synthesis and place and route.
What you can't feel won't hurt you: Evaluating haptic hardware using a haptic contrast sensitivity function.

PubMed

Salisbury, C M; Gillespie, R B; Tan, H Z; Barbagli, F; Salisbury, J K

2011-01-01

In this paper, we extend the concept of the contrast sensitivity function - used to evaluate video projectors - to the evaluation of haptic devices. We propose using human observers to determine if vibrations rendered using a given haptic device are accompanied by artifacts detectable to humans. This determination produces a performance measure that carries particular relevance to applications involving texture rendering. For cases in which a device produces detectable artifacts, we have developed a protocol that localizes deficiencies in device design and/or hardware implementation. In this paper, we present results from human vibration detection experiments carried out using three commercial haptic devices and one high performance voice coil motor. We found that all three commercial devices produced perceptible artifacts when rendering vibrations near human detection thresholds. Our protocol allowed us to pinpoint the deficiencies, however, and we were able to show that minor modifications to the haptic hardware were sufficient to make these devices well suited for rendering vibrations, and by extension, the vibratory components of textures. We generalize our findings to provide quantitative design guidelines that ensure the ability of haptic devices to proficiently render the vibratory components of textures.
Test Hardware Design for Flightlike Operation of Advanced Stirling Convertors (ASC-E3)

NASA Technical Reports Server (NTRS)

Oriti, Salvatore M.

2012-01-01

NASA Glenn Research Center (GRC) has been supporting development of the Advanced Stirling Radioisotope Generator (ASRG) since 2006. A key element of the ASRG project is providing life, reliability, and performance testing of the Advanced Stirling Convertor (ASC). For this purpose, the Thermal Energy Conversion branch at GRC has been conducting extended operation of a multitude of free-piston Stirling convertors. The goal of this effort is to generate long-term performance data (tens of thousands of hours) simultaneously on multiple units to build a life and reliability database. The test hardware for operation of these convertors was designed to permit in-air investigative testing, such as performance mapping over a range of environmental conditions. With this, there was no requirement to accurately emulate the flight hardware. For the upcoming ASC-E3 units, the decision has been made to assemble the convertors into a flight-like configuration. This means the convertors will be arranged in the dual-opposed configuration in a housing that represents the fit, form, and thermal function of the ASRG. The goal of this effort is to enable system level tests that could not be performed with the traditional test hardware at GRC. This offers the opportunity to perform these system-level tests much earlier in the ASRG flight development, as they would normally not be performed until fabrication of the qualification unit. This paper discusses the requirements, process, and results of this flight-like hardware design activity.

Test Hardware Design for Flight-Like Operation of Advanced Stirling Convertors

NASA Technical Reports Server (NTRS)

Oriti, Salvatore M.

2012-01-01

NASA Glenn Research Center (GRC) has been supporting development of the Advanced Stirling Radioisotope Generator (ASRG) since 2006. A key element of the ASRG project is providing life, reliability, and performance testing of the Advanced Stirling Convertor (ASC). For this purpose, the Thermal Energy Conversion branch at GRC has been conducting extended operation of a multitude of free-piston Stirling convertors. The goal of this effort is to generate long-term performance data (tens of thousands of hours) simultaneously on multiple units to build a life and reliability database. The test hardware for operation of these convertors was designed to permit in-air investigative testing, such as performance mapping over a range of environmental conditions. With this, there was no requirement to accurately emulate the flight hardware. For the upcoming ASC-E3 units, the decision has been made to assemble the convertors into a flight-like configuration. This means the convertors will be arranged in the dual-opposed configuration in a housing that represents the fit, form, and thermal function of the ASRG. The goal of this effort is to enable system level tests that could not be performed with the traditional test hardware at GRC. This offers the opportunity to perform these system-level tests much earlier in the ASRG flight development, as they would normally not be performed until fabrication of the qualification unit. This paper discusses the requirements, process, and results of this flight-like hardware design activity.
Getting to the Point in Pinpoint Landing

NASA Technical Reports Server (NTRS)

1998-01-01

Assisted by Langley Research Center's Small Business Technology Transfer (STTR) Program, IntegriNautics has developed a commercialized precision landing system. The idea finds its origins in Stanford University work on a satellite test of Einstein's General Theory of Relativity, where Stanford has designed a new high-performance altitude-determining hardware.
Maintenance on the Advanced Colloids Experiment Module

NASA Image and Video Library

2018-04-16

iss055e035366 (April 16, 2018) --- NASA astronaut Ricky Arnold performs maintenance on the Advanced Colloids Experiment Module located inside the Light Microscopy Module which is a modified commercial, highly flexible, state-of-the-art light imaging microscope facility that provides researchers with powerful diagnostic hardware and software in microgravity.
Tse computers. [ultrahigh speed optical processing for two dimensional binary image

NASA Technical Reports Server (NTRS)

Schaefer, D. H.; Strong, J. P., III

1977-01-01

An ultra-high-speed computer that utilizes binary images as its basic computational entity is being developed. The basic logic components perform thousands of operations simultaneously. Technologies of the fiber optics, display, thin film, and semiconductor industries are being utilized in the building of the hardware.
FPGA-accelerated adaptive optics wavefront control

NASA Astrophysics Data System (ADS)

Mauch, S.; Reger, J.; Reinlein, C.; Appelfelder, M.; Goy, M.; Beckert, E.; Tünnermann, A.

2014-03-01

The speed of real-time adaptive optical systems is primarily restricted by the data processing hardware and computational aspects. Furthermore, the application of mirror layouts with increasing numbers of actuators reduces the bandwidth (speed) of the system and, thus, the number of applicable control algorithms. This burden turns out a key-impediment for deformable mirrors with continuous mirror surface and highly coupled actuator influence functions. In this regard, specialized hardware is necessary for high performance real-time control applications. Our approach to overcome this challenge is an adaptive optics system based on a Shack-Hartmann wavefront sensor (SHWFS) with a CameraLink interface. The data processing is based on a high performance Intel Core i7 Quadcore hard real-time Linux system. Employing a Xilinx Kintex-7 FPGA, an own developed PCie card is outlined in order to accelerate the analysis of a Shack-Hartmann Wavefront Sensor. A recently developed real-time capable spot detection algorithm evaluates the wavefront. The main features of the presented system are the reduction of latency and the acceleration of computation For example, matrix multiplications which in general are of complexity O(n3 are accelerated by using the DSP48 slices of the field-programmable gate array (FPGA) as well as a novel hardware implementation of the SHWFS algorithm. Further benefits are the Streaming SIMD Extensions (SSE) which intensively use the parallelization capability of the processor for further reducing the latency and increasing the bandwidth of the closed-loop. Due to this approach, up to 64 actuators of a deformable mirror can be handled and controlled without noticeable restriction from computational burdens.
Model-Based Verification and Validation of Spacecraft Avionics

NASA Technical Reports Server (NTRS)

Khan, M. Omair; Sievers, Michael; Standley, Shaun

2012-01-01

Verification and Validation (V&V) at JPL is traditionally performed on flight or flight-like hardware running flight software. For some time, the complexity of avionics has increased exponentially while the time allocated for system integration and associated V&V testing has remained fixed. There is an increasing need to perform comprehensive system level V&V using modeling and simulation, and to use scarce hardware testing time to validate models; the norm for thermal and structural V&V for some time. Our approach extends model-based V&V to electronics and software through functional and structural models implemented in SysML. We develop component models of electronics and software that are validated by comparison with test results from actual equipment. The models are then simulated enabling a more complete set of test cases than possible on flight hardware. SysML simulations provide access and control of internal nodes that may not be available in physical systems. This is particularly helpful in testing fault protection behaviors when injecting faults is either not possible or potentially damaging to the hardware. We can also model both hardware and software behaviors in SysML, which allows us to simulate hardware and software interactions. With an integrated model and simulation capability we can evaluate the hardware and software interactions and identify problems sooner. The primary missing piece is validating SysML model correctness against hardware; this experiment demonstrated such an approach is possible.
No-hardware-signature cybersecurity-crypto-module: a resilient cyber defense agent

NASA Astrophysics Data System (ADS)

Zaghloul, A. R. M.; Zaghloul, Y. A.

2014-06-01

We present an optical cybersecurity-crypto-module as a resilient cyber defense agent. It has no hardware signature since it is bitstream reconfigurable, where single hardware architecture functions as any selected device of all possible ones of the same number of inputs. For a two-input digital device, a 4-digit bitstream of 0s and 1s determines which device, of a total of 16 devices, the hardware performs as. Accordingly, the hardware itself is not physically reconfigured, but its performance is. Such a defense agent allows the attack to take place, rendering it harmless. On the other hand, if the system is already infected with malware sending out information, the defense agent allows the information to go out, rendering it meaningless. The hardware architecture is immune to side attacks since such an attack would reveal information on the attack itself and not on the hardware. This cyber defense agent can be used to secure a point-to-point, point-to-multipoint, a whole network, and/or a single entity in the cyberspace. Therefore, ensuring trust between cyber resources. It can provide secure communication in an insecure network. We provide the hardware design and explain how it works. Scalability of the design is briefly discussed. (Protected by United States Patents No.: US 8,004,734; US 8,325,404; and other National Patents worldwide.)
DOE Office of Scientific and Technical Information (OSTI.GOV)

Lee, Seyong; Kim, Jungwon; Vetter, Jeffrey S

This paper presents a directive-based, high-level programming framework for high-performance reconfigurable computing. It takes a standard, portable OpenACC C program as input and generates a hardware configuration file for execution on FPGAs. We implemented this prototype system using our open-source OpenARC compiler; it performs source-to-source translation and optimization of the input OpenACC program into an OpenCL code, which is further compiled into a FPGA program by the backend Altera Offline OpenCL compiler. Internally, the design of OpenARC uses a high- level intermediate representation that separates concerns of program representation from underlying architectures, which facilitates portability of OpenARC. In fact, thismore » design allowed us to create the OpenACC-to-FPGA translation framework with minimal extensions to our existing system. In addition, we show that our proposed FPGA-specific compiler optimizations and novel OpenACC pragma extensions assist the compiler in generating more efficient FPGA hardware configuration files. Our empirical evaluation on an Altera Stratix V FPGA with eight OpenACC benchmarks demonstrate the benefits of our strategy. To demonstrate the portability of OpenARC, we show results for the same benchmarks executing on other heterogeneous platforms, including NVIDIA GPUs, AMD GPUs, and Intel Xeon Phis. This initial evidence helps support the goal of using a directive-based, high-level programming strategy for performance portability across heterogeneous HPC architectures.« less
Parameterized hardware description as object oriented hardware model implementation

NASA Astrophysics Data System (ADS)

Drabik, Pawel K.

2010-09-01

The paper introduces novel model for design, visualization and management of complex, highly adaptive hardware systems. The model settles component oriented environment for both hardware modules and software application. It is developed on parameterized hardware description research. Establishment of stable link between hardware and software, as a purpose of designed and realized work, is presented. Novel programming framework model for the environment, named Graphic-Functional-Components is presented. The purpose of the paper is to present object oriented hardware modeling with mentioned features. Possible model implementation in FPGA chips and its management by object oriented software in Java is described.
Many-core graph analytics using accelerated sparse linear algebra routines

NASA Astrophysics Data System (ADS)

Kozacik, Stephen; Paolini, Aaron L.; Fox, Paul; Kelmelis, Eric

2016-05-01

Graph analytics is a key component in identifying emerging trends and threats in many real-world applications. Largescale graph analytics frameworks provide a convenient and highly-scalable platform for developing algorithms to analyze large datasets. Although conceptually scalable, these techniques exhibit poor performance on modern computational hardware. Another model of graph computation has emerged that promises improved performance and scalability by using abstract linear algebra operations as the basis for graph analysis as laid out by the GraphBLAS standard. By using sparse linear algebra as the basis, existing highly efficient algorithms can be adapted to perform computations on the graph. This approach, however, is often less intuitive to graph analytics experts, who are accustomed to vertex-centric APIs such as Giraph, GraphX, and Tinkerpop. We are developing an implementation of the high-level operations supported by these APIs in terms of linear algebra operations. This implementation is be backed by many-core implementations of the fundamental GraphBLAS operations required, and offers the advantages of both the intuitive programming model of a vertex-centric API and the performance of a sparse linear algebra implementation. This technology can reduce the number of nodes required, as well as the run-time for a graph analysis problem, enabling customers to perform more complex analysis with less hardware at lower cost. All of this can be accomplished without the requirement for the customer to make any changes to their analytics code, thanks to the compatibility with existing graph APIs.
FPGA-Based High-Performance Embedded Systems for Adaptive Edge Computing in Cyber-Physical Systems: The ARTICo³ Framework.

PubMed

Rodríguez, Alfonso; Valverde, Juan; Portilla, Jorge; Otero, Andrés; Riesgo, Teresa; de la Torre, Eduardo

2018-06-08

Cyber-Physical Systems are experiencing a paradigm shift in which processing has been relocated to the distributed sensing layer and is no longer performed in a centralized manner. This approach, usually referred to as Edge Computing, demands the use of hardware platforms that are able to manage the steadily increasing requirements in computing performance, while keeping energy efficiency and the adaptability imposed by the interaction with the physical world. In this context, SRAM-based FPGAs and their inherent run-time reconfigurability, when coupled with smart power management strategies, are a suitable solution. However, they usually fail in user accessibility and ease of development. In this paper, an integrated framework to develop FPGA-based high-performance embedded systems for Edge Computing in Cyber-Physical Systems is presented. This framework provides a hardware-based processing architecture, an automated toolchain, and a runtime to transparently generate and manage reconfigurable systems from high-level system descriptions without additional user intervention. Moreover, it provides users with support for dynamically adapting the available computing resources to switch the working point of the architecture in a solution space defined by computing performance, energy consumption and fault tolerance. Results show that it is indeed possible to explore this solution space at run time and prove that the proposed framework is a competitive alternative to software-based edge computing platforms, being able to provide not only faster solutions, but also higher energy efficiency for computing-intensive algorithms with significant levels of data-level parallelism.
Performance Comparison of HPF and MPI Based NAS Parallel Benchmarks

NASA Technical Reports Server (NTRS)

Saini, Subhash

1997-01-01

Compilers supporting High Performance Form (HPF) features first appeared in late 1994 and early 1995 from Applied Parallel Research (APR), Digital Equipment Corporation, and The Portland Group (PGI). IBM introduced an HPF compiler for the IBM RS/6000 SP2 in April of 1996. Over the past two years, these implementations have shown steady improvement in terms of both features and performance. The performance of various hardware/ programming model (HPF and MPI) combinations will be compared, based on latest NAS Parallel Benchmark results, thus providing a cross-machine and cross-model comparison. Specifically, HPF based NPB results will be compared with MPI based NPB results to provide perspective on performance currently obtainable using HPF versus MPI or versus hand-tuned implementations such as those supplied by the hardware vendors. In addition, we would also present NPB, (Version 1.0) performance results for the following systems: DEC Alpha Server 8400 5/440, Fujitsu CAPP Series (VX, VPP300, and VPP700), HP/Convex Exemplar SPP2000, IBM RS/6000 SP P2SC node (120 MHz), NEC SX-4/32, SGI/CRAY T3E, and SGI Origin2000. We would also present sustained performance per dollar for Class B LU, SP and BT benchmarks.
Runtime and Architecture Support for Efficient Data Exchange in Multi-Accelerator Applications.

PubMed

Cabezas, Javier; Gelado, Isaac; Stone, John E; Navarro, Nacho; Kirk, David B; Hwu, Wen-Mei

2015-05-01

Heterogeneous parallel computing applications often process large data sets that require multiple GPUs to jointly meet their needs for physical memory capacity and compute throughput. However, the lack of high-level abstractions in previous heterogeneous parallel programming models force programmers to resort to multiple code versions, complex data copy steps and synchronization schemes when exchanging data between multiple GPU devices, which results in high software development cost, poor maintainability, and even poor performance. This paper describes the HPE runtime system, and the associated architecture support, which enables a simple, efficient programming interface for exchanging data between multiple GPUs through either interconnects or cross-node network interfaces. The runtime and architecture support presented in this paper can also be used to support other types of accelerators. We show that the simplified programming interface reduces programming complexity. The research presented in this paper started in 2009. It has been implemented and tested extensively in several generations of HPE runtime systems as well as adopted into the NVIDIA GPU hardware and drivers for CUDA 4.0 and beyond since 2011. The availability of real hardware that support key HPE features gives rise to a rare opportunity for studying the effectiveness of the hardware support by running important benchmarks on real runtime and hardware. Experimental results show that in a exemplar heterogeneous system, peer DMA and double-buffering, pinned buffers, and software techniques can improve the inter-accelerator data communication bandwidth by 2×. They can also improve the execution speed by 1.6× for a 3D finite difference, 2.5× for 1D FFT, and 1.6× for merge sort, all measured on real hardware. The proposed architecture support enables the HPE runtime to transparently deploy these optimizations under simple portable user code, allowing system designers to freely employ devices of different capabilities. We further argue that simple interfaces such as HPE are needed for most applications to benefit from advanced hardware features in practice.
Runtime and Architecture Support for Efficient Data Exchange in Multi-Accelerator Applications

PubMed Central

Cabezas, Javier; Gelado, Isaac; Stone, John E.; Navarro, Nacho; Kirk, David B.; Hwu, Wen-mei

2014-01-01

Heterogeneous parallel computing applications often process large data sets that require multiple GPUs to jointly meet their needs for physical memory capacity and compute throughput. However, the lack of high-level abstractions in previous heterogeneous parallel programming models force programmers to resort to multiple code versions, complex data copy steps and synchronization schemes when exchanging data between multiple GPU devices, which results in high software development cost, poor maintainability, and even poor performance. This paper describes the HPE runtime system, and the associated architecture support, which enables a simple, efficient programming interface for exchanging data between multiple GPUs through either interconnects or cross-node network interfaces. The runtime and architecture support presented in this paper can also be used to support other types of accelerators. We show that the simplified programming interface reduces programming complexity. The research presented in this paper started in 2009. It has been implemented and tested extensively in several generations of HPE runtime systems as well as adopted into the NVIDIA GPU hardware and drivers for CUDA 4.0 and beyond since 2011. The availability of real hardware that support key HPE features gives rise to a rare opportunity for studying the effectiveness of the hardware support by running important benchmarks on real runtime and hardware. Experimental results show that in a exemplar heterogeneous system, peer DMA and double-buffering, pinned buffers, and software techniques can improve the inter-accelerator data communication bandwidth by 2×. They can also improve the execution speed by 1.6× for a 3D finite difference, 2.5× for 1D FFT, and 1.6× for merge sort, all measured on real hardware. The proposed architecture support enables the HPE runtime to transparently deploy these optimizations under simple portable user code, allowing system designers to freely employ devices of different capabilities. We further argue that simple interfaces such as HPE are needed for most applications to benefit from advanced hardware features in practice. PMID:26180487
Algorithm for fast event parameters estimation on GEM acquired data

NASA Astrophysics Data System (ADS)

Linczuk, Paweł; Krawczyk, Rafał D.; Poźniak, Krzysztof T.; Kasprowicz, Grzegorz; Wojeński, Andrzej; Chernyshova, Maryna; Czarski, Tomasz

2016-09-01

We present study of a software-hardware environment for developing fast computation with high throughput and low latency methods, which can be used as back-end in High Energy Physics (HEP) and other High Performance Computing (HPC) systems, based on high amount of input from electronic sensor based front-end. There is a parallelization possibilities discussion and testing on Intel HPC solutions with consideration of applications with Gas Electron Multiplier (GEM) measurement systems presented in this paper.
High-performance software-only H.261 video compression on PC

NASA Astrophysics Data System (ADS)

Kasperovich, Leonid

1996-03-01

This paper describes an implementation of a software H.261 codec for PC, that takes an advantage of the fast computational algorithms for DCT-based video compression, which have been presented by the author at the February's 1995 SPIE/IS&T meeting. The motivation for developing the H.261 prototype system is to demonstrate a feasibility of real time software- only videoconferencing solution to operate across a wide range of network bandwidth, frame rate, and resolution of the input video. As the bandwidths of current network technology will be increased, the higher frame rate and resolution of video to be transmitted is allowed, that requires, in turn, a software codec to be able to compress pictures of CIF (352 X 288) resolution at up to 30 frame/sec. Running on Pentium 133 MHz PC the codec presented is capable to compress video in CIF format at 21 - 23 frame/sec. This result is comparable to the known hardware-based H.261 solutions, but it doesn't require any specific hardware. The methods to achieve high performance, the program optimization technique for Pentium microprocessor along with the performance profile, showing the actual contribution of the different encoding/decoding stages to the overall computational process, are presented.
High Temperature Perforating System for Geothermal Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Smart, Moises E.

The objective of this project is to develop a perforating system consisting of all the explosive components and hardware, capable of reliable performance in high temperatures geothermal wells (>200 ºC). In this light we will focused on engineering development of these components, characterization of the explosive raw powder and developing the internal infrastructure to increase the production of the explosive from laboratory scale to industrial scale.
The Unified Floating Point Vector Coprocessor for Reconfigurable Hardware

NASA Astrophysics Data System (ADS)

Kathiara, Jainik

There has been an increased interest recently in using embedded cores on FPGAs. Many of the applications that make use of these cores have floating point operations. Due to the complexity and expense of floating point hardware, these algorithms are usually converted to fixed point operations or implemented using floating-point emulation in software. As the technology advances, more and more homogeneous computational resources and fixed function embedded blocks are added to FPGAs and hence implementation of floating point hardware becomes a feasible option. In this research we have implemented a high performance, autonomous floating point vector Coprocessor (FPVC) that works independently within an embedded processor system. We have presented a unified approach to vector and scalar computation, using a single register file for both scalar operands and vector elements. The Hybrid vector/SIMD computational model of FPVC results in greater overall performance for most applications along with improved peak performance compared to other approaches. By parameterizing vector length and the number of vector lanes, we can design an application specific FPVC and take optimal advantage of the FPGA fabric. For this research we have also initiated designing a software library for various computational kernels, each of which adapts FPVC's configuration and provide maximal performance. The kernels implemented are from the area of linear algebra and include matrix multiplication and QR and Cholesky decomposition. We have demonstrated the operation of FPVC on a Xilinx Virtex 5 using the embedded PowerPC.
Evaluation of Emerging Energy-Efficient Heterogeneous Computing Platforms for Biomolecular and Cellular Simulation Workloads

PubMed Central

Stone, John E.; Hallock, Michael J.; Phillips, James C.; Peterson, Joseph R.; Luthey-Schulten, Zaida; Schulten, Klaus

2016-01-01

Many of the continuing scientific advances achieved through computational biology are predicated on the availability of ongoing increases in computational power required for detailed simulation and analysis of cellular processes on biologically-relevant timescales. A critical challenge facing the development of future exascale supercomputer systems is the development of new computing hardware and associated scientific applications that dramatically improve upon the energy efficiency of existing solutions, while providing increased simulation, analysis, and visualization performance. Mobile computing platforms have recently become powerful enough to support interactive molecular visualization tasks that were previously only possible on laptops and workstations, creating future opportunities for their convenient use for meetings, remote collaboration, and as head mounted displays for immersive stereoscopic viewing. We describe early experiences adapting several biomolecular simulation and analysis applications for emerging heterogeneous computing platforms that combine power-efficient system-on-chip multi-core CPUs with high-performance massively parallel GPUs. We present low-cost power monitoring instrumentation that provides sufficient temporal resolution to evaluate the power consumption of individual CPU algorithms and GPU kernels. We compare the performance and energy efficiency of scientific applications running on emerging platforms with results obtained on traditional platforms, identify hardware and algorithmic performance bottlenecks that affect the usability of these platforms, and describe avenues for improving both the hardware and applications in pursuit of the needs of molecular modeling tasks on mobile devices and future exascale computers. PMID:27516922
Large-scale parallel genome assembler over cloud computing environment.

PubMed

Das, Arghya Kusum; Koppa, Praveen Kumar; Goswami, Sayan; Platania, Richard; Park, Seung-Jong

2017-06-01

The size of high throughput DNA sequencing data has already reached the terabyte scale. To manage this huge volume of data, many downstream sequencing applications started using locality-based computing over different cloud infrastructures to take advantage of elastic (pay as you go) resources at a lower cost. However, the locality-based programming model (e.g. MapReduce) is relatively new. Consequently, developing scalable data-intensive bioinformatics applications using this model and understanding the hardware environment that these applications require for good performance, both require further research. In this paper, we present a de Bruijn graph oriented Parallel Giraph-based Genome Assembler (GiGA), as well as the hardware platform required for its optimal performance. GiGA uses the power of Hadoop (MapReduce) and Giraph (large-scale graph analysis) to achieve high scalability over hundreds of compute nodes by collocating the computation and data. GiGA achieves significantly higher scalability with competitive assembly quality compared to contemporary parallel assemblers (e.g. ABySS and Contrail) over traditional HPC cluster. Moreover, we show that the performance of GiGA is significantly improved by using an SSD-based private cloud infrastructure over traditional HPC cluster. We observe that the performance of GiGA on 256 cores of this SSD-based cloud infrastructure closely matches that of 512 cores of traditional HPC cluster.

Devices and circuits for nanoelectronic implementation of artificial neural networks

NASA Astrophysics Data System (ADS)

Turel, Ozgur

Biological neural networks perform complicated information processing tasks at speeds better than conventional computers based on conventional algorithms. This has inspired researchers to look into the way these networks function, and propose artificial networks that mimic their behavior. Unfortunately, most artificial neural networks, either software or hardware, do not provide either the speed or the complexity of a human brain. Nanoelectronics, with high density and low power dissipation that it provides, may be used in developing more efficient artificial neural networks. This work consists of two major contributions in this direction. First is the proposal of the CMOL concept, hybrid CMOS-molecular hardware [1-8]. CMOL may circumvent most of the problems in posed by molecular devices, such as low yield, vet provide high active device density, ˜1012/cm 2. The second contribution is CrossNets, artificial neural networks that are based on CMOL. We showed that CrossNets, with their fault tolerance, exceptional speed (˜ 4 to 6 orders of magnitude faster than biological neural networks) can perform any task any artificial neural network can perform. Moreover, there is a hope that if their integration scale is increased to that of human cerebral cortex (˜ 1010 neurons and ˜ 1014 synapses), they may be capable of performing more advanced tasks.
Real-time synchronized multiple-sensor IR/EO scene generation utilizing the SGI Onyx2

NASA Astrophysics Data System (ADS)

Makar, Robert J.; O'Toole, Brian E.

1998-07-01

An approach to utilize the symmetric multiprocessing environment of the Silicon Graphics Inc.R (SGI) Onyx2TM has been developed to support the generation of IR/EO scenes in real-time. This development, supported by the Naval Air Warfare Center Aircraft Division (NAWC/AD), focuses on high frame rate hardware-in-the-loop testing of multiple sensor avionics systems. In the past, real-time IR/EO scene generators have been developed as custom architectures that were often expensive and difficult to maintain. Previous COTS scene generation systems, designed and optimized for visual simulation, could not be adapted for accurate IR/EO sensor stimulation. The new Onyx2 connection mesh architecture made it possible to develop a more economical system while maintaining the fidelity needed to stimulate actual sensors. An SGI based Real-time IR/EO Scene Simulator (RISS) system was developed to utilize the Onyx2's fast multiprocessing hardware to perform real-time IR/EO scene radiance calculations. During real-time scene simulation, the multiprocessors are used to update polygon vertex locations and compute radiometrically accurate floating point radiance values. The output of this process can be utilized to drive a variety of scene rendering engines. Recent advancements in COTS graphics systems, such as the Silicon Graphics InfiniteRealityR make a total COTS solution possible for some classes of sensors. This paper will discuss the critical technologies that apply to infrared scene generation and hardware-in-the-loop testing using SGI compatible hardware. Specifically, the application of RISS high-fidelity real-time radiance algorithms on the SGI Onyx2's multiprocessing hardware will be discussed. Also, issues relating to external real-time control of multiple synchronized scene generation channels will be addressed.
Final postflight hardware evaluation report RSRM-28 (STS-53)

NASA Technical Reports Server (NTRS)

Starrett, William David, Jr.

1993-01-01

The final report for the Clearfield disassembly evaluation and a continuation of the KSC postflight assessment for the RSRM-28 (STS-53) RSRM flight set is presented. All observed hardware conditions were documented on PFOR's and are included in Appendices A through C. Appendices D and E contain the measurements and safety factor data for the nozzle and insulation components. This report, along with the KSC Ten-Day Postflight Hardware Evaluation Report (TWR-64215), represents a summary of the RSRM-28 hardware evaluation. The as-flown hardware configuration is documented in TWR-63638. Disassembly evaluation photograph numbers are logged in TWA-1989. The RSRM-28 flight set disassembly evaluations described were performed at the RSRM Refurbishment Facility in Clearfield, Utah. The final factory joint demate occurred on July 15, 1993. Additional time was required to perform the evaluation of the stiffener rings per special issue 4.1.5.2 because of the washout schedule. The release of this report was after completion of all special issues per program management direction. Detailed evaluations were performed in accordance with the Clearfield PEEP, TWR-50051, Revision A. All observations were compared against limits that are also defined in the PEEP. These limits outline the criteria for categorizing the observations as acceptable, reportable, or critical. Hardware conditions that were unexpected and/or determined to be reportable or critical were evaluated by the applicable team and tracked through the PFAR system.
Implementation of a High-Speed FPGA and DSP Based FFT Processor for Improving Strain Demodulation Performance in a Fiber-Optic-Based Sensing System

NASA Technical Reports Server (NTRS)

Farley, Douglas L.

2005-01-01

NASA's Aviation Safety and Security Program is pursuing research in on-board Structural Health Management (SHM) technologies for purposes of reducing or eliminating aircraft accidents due to system and component failures. Under this program, NASA Langley Research Center (LaRC) is developing a strain-based structural health-monitoring concept that incorporates a fiber optic-based measuring system for acquiring strain values. This fiber optic-based measuring system provides for the distribution of thousands of strain sensors embedded in a network of fiber optic cables. The resolution of strain value at each discrete sensor point requires a computationally demanding data reduction software process that, when hosted on a conventional processor, is not suitable for near real-time measurement. This report describes the development and integration of an alternative computing environment using dedicated computing hardware for performing the data reduction. Performance comparison between the existing and the hardware-based system is presented.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Gottesfeld, Shimshon; Dekel, Dario R.; Page, Miles

The anion exchange membrane fuel cell (AEMFC) is an attractive alternative to acidic proton exchange membrane fuel cells, which to date have required platinum-based catalysts, as well as acid-tolerant stack hardware. The AEMFC could use non-platinum-group metal catalysts and less expensive metal hardware thanks to the high pH of the electrolyte. Over the last decade, substantial progress has been made in improving the performance and durability of the AEMFC through the development of new materials and the optimization of system design and operation conditions. Here in this perspective article, we describe the current status of AEMFCs as having reached beginningmore » of life performance very close to that of PEMFCs when using ultra-low loadings of Pt, while advancing towards operation on non-platinum-group metal catalysts alone. In the latter sections, we identify the remaining technical challenges, which require further research and development, focusing on the materials and operational factors that critically impact AEMFC performance and/or durability. Finally, these perspectives may provide useful insights for the development of next-generation of AEMFCs.« less
JAVA Stereo Display Toolkit

NASA Technical Reports Server (NTRS)

Edmonds, Karina

2008-01-01

This toolkit provides a common interface for displaying graphical user interface (GUI) components in stereo using either specialized stereo display hardware (e.g., liquid crystal shutter or polarized glasses) or anaglyph display (red/blue glasses) on standard workstation displays. An application using this toolkit will work without modification in either environment, allowing stereo software to reach a wider audience without sacrificing high-quality display on dedicated hardware. The toolkit is written in Java for use with the Swing GUI Toolkit and has cross-platform compatibility. It hooks into the graphics system, allowing any standard Swing component to be displayed in stereo. It uses the OpenGL graphics library to control the stereo hardware and to perform the rendering. It also supports anaglyph and special stereo hardware using the same API (application-program interface), and has the ability to simulate color stereo in anaglyph mode by combining the red band of the left image with the green/blue bands of the right image. This is a low-level toolkit that accomplishes simply the display of components (including the JadeDisplay image display component). It does not include higher-level functions such as disparity adjustment, 3D cursor, or overlays all of which can be built using this toolkit.
Study of efficient video compression algorithms for space shuttle applications

NASA Technical Reports Server (NTRS)

Poo, Z.

1975-01-01

Results are presented of a study on video data compression techniques applicable to space flight communication. This study is directed towards monochrome (black and white) picture communication with special emphasis on feasibility of hardware implementation. The primary factors for such a communication system in space flight application are: picture quality, system reliability, power comsumption, and hardware weight. In terms of hardware implementation, these are directly related to hardware complexity, effectiveness of the hardware algorithm, immunity of the source code to channel noise, and data transmission rate (or transmission bandwidth). A system is recommended, and its hardware requirement summarized. Simulations of the study were performed on the improved LIM video controller which is computer-controlled by the META-4 CPU.
An evaluation of Skylab habitability hardware

NASA Technical Reports Server (NTRS)

Stokes, J.

1974-01-01

For effective mission performance, participants in space missions lasting 30-60 days or longer must be provided with hardware to accommodate their personal needs. Such habitability hardware was provided on Skylab. Equipment defined as habitability hardware was that equipment composing the food system, water system, sleep system, waste management system, personal hygiene system, trash management system, and entertainment equipment. Equipment not specifically defined as habitability hardware but which served that function were the Wardroom window, the exercise equipment, and the intercom system, which was occasionally used for private communications. All Skylab habitability hardware generally functioned as intended for the three missions, and most items could be considered as adequate concepts for future flights of similar duration. Specific components were criticized for their shortcomings.
Hardware Testing and System Evaluation: Procedures to Evaluate Commodity Hardware for Production Clusters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Goebel, J

2004-02-27

Without stable hardware any program will fail. The frustration and expense of supporting bad hardware can drain an organization, delay progress, and frustrate everyone involved. At Stanford Linear Accelerator Center (SLAC), we have created a testing method that helps our group, SLAC Computer Services (SCS), weed out potentially bad hardware and purchase the best hardware at the best possible cost. Commodity hardware changes often, so new evaluations happen periodically each time we purchase systems and minor re-evaluations happen for revised systems for our clusters, about twice a year. This general framework helps SCS perform correct, efficient evaluations. This article outlinesmore » SCS's computer testing methods and our system acceptance criteria. We expanded the basic ideas to other evaluations such as storage, and we think the methods outlined in this article has helped us choose hardware that is much more stable and supportable than our previous purchases. We have found that commodity hardware ranges in quality, so systematic method and tools for hardware evaluation were necessary. This article is based on one instance of a hardware purchase, but the guidelines apply to the general problem of purchasing commodity computer systems for production computational work.« less
Closed-loop thrust and pressure profile throttling of a nitrous oxide/hydroxyl-terminated polybutadiene hybrid rocket motor

NASA Astrophysics Data System (ADS)

Peterson, Zachary W.

Hybrid motors that employ non-toxic, non-explosive components with a liquid oxidizer and a solid hydrocarbon fuel grain have inherently safe operating characteristics. The inherent safety of hybrid rocket motors offers the potential to greatly reduce overall operating costs. Another key advantage of hybrid rocket motors is the potential for in-flight shutdown, restart, and throttle by controlling the pressure drop between the oxidizer tank and the injector. This research designed, developed, and ground tested a closed-loop throttle controller for a hybrid rocket motor using nitrous oxide and hydroxyl-terminated polybutadiene as propellants. The research simultaneously developed closed-loop throttle algorithms and lab scale motor hardware to evaluate the fidelity of the throttle simulations and algorithms. Initial open-loop motor tests were performed to better classify system parameters and to validate motor performance values. Deep-throttle open-loop tests evaluated limits of stable thrust that can be achieved on the test hardware. Open-loop tests demonstrated the ability to throttle the motor to less than 10% of maximum thrust with little reduction in effective specific impulse and acoustical stability. Following the open-loop development, closed-loop, hardware-in-the-loop tests were performed. The closed-loop controller successfully tracked prescribed step and ramp command profiles with a high degree of fidelity. Steady-state accuracy was greatly improved over uncontrolled thrust.
Machine learning on-a-chip: a high-performance low-power reusable neuron architecture for artificial neural networks in ECG classifications.

PubMed

Sun, Yuwen; Cheng, Allen C

2012-07-01

Artificial neural networks (ANNs) are a promising machine learning technique in classifying non-linear electrocardiogram (ECG) signals and recognizing abnormal patterns suggesting risks of cardiovascular diseases (CVDs). In this paper, we propose a new reusable neuron architecture (RNA) enabling a performance-efficient and cost-effective silicon implementation for ANN. The RNA architecture consists of a single layer of physical RNA neurons, each of which is designed to use minimal hardware resource (e.g., a single 2-input multiplier-accumulator is used to compute the dot product of two vectors). By carefully applying the principal of time sharing, RNA can multiplexs this single layer of physical neurons to efficiently execute both feed-forward and back-propagation computations of an ANN while conserving the area and reducing the power dissipation of the silicon. A three-layer 51-30-12 ANN is implemented in RNA to perform the ECG classification for CVD detection. This RNA hardware also allows on-chip automatic training update. A quantitative design space exploration in area, power dissipation, and execution speed between RNA and three other implementations representative of different reusable hardware strategies is presented and discussed. Compared with an equivalent software implementation in C executed on an embedded microprocessor, the RNA ASIC achieves three orders of magnitude improvements in both the execution speed and the energy efficiency. Copyright © 2012 Elsevier Ltd. All rights reserved.
Latex samples for RAMSES electrophoresis experiment on IML 2

NASA Technical Reports Server (NTRS)

Seaman, Geoffrey V. F.; Knox, Robert J.

1994-01-01

The objectives of these reported studies were to provide ground based support services for the flight experiment team for the RAMSES experiment to be flown aboard IML-2. The specific areas of support included consultation on the performance of particle based electrophoresis studies, development of methods for the preparation of suitable samples for the flight hardware, the screening of particles to obtain suitable candidates for the flight experiment, and the electrophoretic characterization of sample particle preparations. The first phases of these studies were performed under this contract, while the follow on work was performed under grant number NAG8 1081, 'Preparation and Characterization of Latex Samples for RAMSES Experiment on IML 2.' During this first phase of the experiment the following benchmarks were achieved: Methods were tested for the concentration and resuspension of latex samples in the greater than 0.4 micron diameter range to provide moderately high solids content samples free of particle aggregation which interferred with the normal functioning of the RAMSES hardware. Various candidate latex preparations were screened and two candidate types of latex were identified for use in the flight experiments, carboxylate modified latex (CML) and acrylic acid-acrylamide modified latex (AAM). These latexes have relatively hydrophilic surfaces, are not prone to aggregate, and display sufficiently low electrophoretic mobilities in the flight buffer so that they can be used to make mixtures to test the resolving power of the flight hardware.
MIDAS, prototype Multivariate Interactive Digital Analysis System, phase 1. Volume 1: System description

NASA Technical Reports Server (NTRS)

Kriegler, F. J.

1974-01-01

The MIDAS System is described as a third-generation fast multispectral recognition system able to keep pace with the large quantity and high rates of data acquisition from present and projected sensors. A principal objective of the MIDAS program is to provide a system well interfaced with the human operator and thus to obtain large overall reductions in turnaround time and significant gains in throughput. The hardware and software are described. The system contains a mini-computer to control the various high-speed processing elements in the data path, and a classifier which implements an all-digital prototype multivariate-Gaussian maximum likelihood decision algorithm operating at 200,000 pixels/sec. Sufficient hardware was developed to perform signature extraction from computer-compatible tapes, compute classifier coefficients, control the classifier operation, and diagnose operation.
Design considerations for a 10-kW integrated hydrogen-oxygen regenerative fuel cell system

NASA Technical Reports Server (NTRS)

Hoberecht, M. A.; Miller, T. B.; Rieker, L. L.; Gonzalez-Sanabria, O. D.

1984-01-01

Integration of an alkaline fuel cell subsystem with an alkaline electrolysis subsystem to form a regenerative fuel cell (RFC) system for low earth orbit (LEO) applications characterized by relatively high overall round trip electrical efficiency, long life, and high reliability is possible with present state of the art technology. A hypothetical 10 kW system computer modeled and studied based on data from ongoing contractual efforts in both the alkaline fuel cell and alkaline water electrolysis areas. The alkaline fuel cell technology is under development utilizing advanced cell components and standard Shuttle Orbiter system hardware. The alkaline electrolysis technology uses a static water vapor feed technique and scaled up cell hardware is developed. The computer aided study of the performance, operating, and design parameters of the hypothetical system is addressed.
77 FR 18970 - Airworthiness Directives; Bell Helicopter Textron Canada Helicopters

Federal Register 2010, 2011, 2012, 2013, 2014

2012-03-29

... Model 407 Maintenance Manual and applied during manufacturing was incorrect and exceeded the torque... hardware (attachment hardware), and perform initial and recurring determinations of [[Page 18971
JT8D-15/17 High Pressure Turbine Root Discharged Blade Performance Improvement. [engine design

NASA Technical Reports Server (NTRS)

Janus, A. S.

1981-01-01

The JT8D high pressure turbine blade and seal were modified, using a more efficient blade cooling system, improved airfoil aerodynamics, more effective control of secondary flows, and improved blade tip sealing. Engine testing was conducted to determine the effect of these improvements on performance. The modified turbine package demonstrated significant thrust specific fuel consumption and exhaust gas temperature improvements in sea level and altitude engine tests. Inspection of the improved blade and seal hardware after testing revealed no unusual wear or degradation.
Training Scalable Restricted Boltzmann Machines Using a Quantum Annealer

NASA Astrophysics Data System (ADS)

Kumar, V.; Bass, G.; Dulny, J., III

2016-12-01

Machine learning and the optimization involved therein is of critical importance for commercial and military applications. Due to the computational complexity of many-variable optimization, the conventional approach is to employ meta-heuristic techniques to find suboptimal solutions. Quantum Annealing (QA) hardware offers a completely novel approach with the potential to obtain significantly better solutions with large speed-ups compared to traditional computing. In this presentation, we describe our development of new machine learning algorithms tailored for QA hardware. We are training restricted Boltzmann machines (RBMs) using QA hardware on large, high-dimensional commercial datasets. Traditional optimization heuristics such as contrastive divergence and other closely related techniques are slow to converge, especially on large datasets. Recent studies have indicated that QA hardware when used as a sampler provides better training performance compared to conventional approaches. Most of these studies have been limited to moderately-sized datasets due to the hardware restrictions imposed by exisitng QA devices, which make it difficult to solve real-world problems at scale. In this work we develop novel strategies to circumvent this issue. We discuss scale-up techniques such as enhanced embedding and partitioned RBMs which allow large commercial datasets to be learned using QA hardware. We present our initial results obtained by training an RBM as an autoencoder on an image dataset. The results obtained so far indicate that the convergence rates can be improved significantly by increasing RBM network connectivity. These ideas can be readily applied to generalized Boltzmann machines and we are currently investigating this in an ongoing project.
Miniature Exoplanet Radial Velocity Array (MINERVA) I. Design, Commissioning, and First Science Results

NASA Astrophysics Data System (ADS)

Swift, Jonathan J.; Bottom, Michael; Johnson, John A.; Wright, Jason T.; McCrady, Nate; Wittenmyer, Robert A.; Plavchan, Peter; Riddle, Reed; Muirhead, Philip S.; Herzig, Erich; Myles, Justin; Blake, Cullen H.; Eastman, Jason; Beatty, Thomas G.; Barnes, Stuart I.; Gibson, Steven R.; Lin, Brian; Zhao, Ming; Gardner, Paul; Falco, Emilio; Criswell, Stephen; Nava, Chantanelle; Robinson, Connor; Sliski, David H.; Hedrick, Richard; Ivarsen, Kevin; Hjelstrom, Annie; de Vera, Jon; Szentgyorgyi, Andrew

2015-04-01

The Miniature Exoplanet Radial Velocity Array (MINERVA) is a U.S.-based observational facility dedicated to the discovery and characterization of exoplanets around a nearby sample of bright stars. MINERVA employs a robotic array of four 0.7-m telescopes outfitted for both high-resolution spectroscopy and photometry, and is designed for completely autonomous operation. The primary science program is a dedicated radial velocity survey and the secondary science objective is to obtain high-precision transit light curves. The modular design of the facility and the flexibility of our hardware allows for both science programs to be pursued simultaneously, while the robotic control software provides a robust and efficient means to carry out nightly observations. We describe the design of MINERVA, including major hardware components, software, and science goals. The telescopes and photometry cameras are characterized at our test facility on the Caltech campus in Pasadena, California, and their on-sky performance is validated. The design and simulated performance of the spectrograph is briefly discussed as we await its completion. New observations from our test facility demonstrate sub-mmag photometric precision of one of our radial velocity survey targets, and we present new transit observations and fits of WASP-52b-a known hot-Jupiter with an inflated radius and misaligned orbit. The process of relocating the MINERVA hardware to its final destination at the Fred Lawrence Whipple Observatory in southern Arizona has begun, and science operations are expected to commence in 2015.
High-Performance Computational Modeling of ICRF Physics and Plasma-Surface Interactions in Alcator C-Mod

NASA Astrophysics Data System (ADS)

Jenkins, Thomas; Smithe, David

2016-10-01

Inefficiencies and detrimental physical effects may arise in conjunction with ICRF heating of tokamak plasmas. Large wall potential drops, associated with sheath formation near plasma-facing antenna hardware, give rise to high-Z impurity sputtering from plasma-facing components and subsequent radiative cooling. Linear and nonlinear wave excitations in the plasma edge/SOL also dissipate injected RF power and reduce overall antenna efficiency. Recent advances in finite-difference time-domain (FDTD) modeling techniques allow the physics of localized sheath potentials, and associated sputtering events, to be modeled concurrently with the physics of antenna near- and far-field behavior and RF power flow. The new methods enable time-domain modeling of plasma-surface interactions and ICRF physics in realistic experimental configurations at unprecedented spatial resolution. We present results/animations from high-performance (10k-100k core) FDTD/PIC simulations spanning half of Alcator C-Mod at mm-scale resolution, exploring impurity production due to localized sputtering (in response to self-consistent sheath potentials at antenna surfaces) and the physics of parasitic slow wave excitation near the antenna hardware and SOL. Supported by US DoE (Award DE-SC0009501) and the ALCC program.
Low-power hardware implementation of movement decoding for brain computer interface with reduced-resolution discrete cosine transform.

PubMed

Minho Won; Albalawi, Hassan; Xin Li; Thomas, Donald E

2014-01-01

This paper describes a low-power hardware implementation for movement decoding of brain computer interface. Our proposed hardware design is facilitated by two novel ideas: (i) an efficient feature extraction method based on reduced-resolution discrete cosine transform (DCT), and (ii) a new hardware architecture of dual look-up table to perform discrete cosine transform without explicit multiplication. The proposed hardware implementation has been validated for movement decoding of electrocorticography (ECoG) signal by using a Xilinx FPGA Zynq-7000 board. It achieves more than 56× energy reduction over a reference design using band-pass filters for feature extraction.

A study on low-cost, high-accuracy, and real-time stereo vision algorithms for UAV power line inspection

NASA Astrophysics Data System (ADS)

Wang, Hongyu; Zhang, Baomin; Zhao, Xun; Li, Cong; Lu, Cunyue

2018-04-01

Conventional stereo vision algorithms suffer from high levels of hardware resource utilization due to algorithm complexity, or poor levels of accuracy caused by inadequacies in the matching algorithm. To address these issues, we have proposed a stereo range-finding technique that produces an excellent balance between cost, matching accuracy and real-time performance, for power line inspection using UAV. This was achieved through the introduction of a special image preprocessing algorithm and a weighted local stereo matching algorithm, as well as the design of a corresponding hardware architecture. Stereo vision systems based on this technique have a lower level of resource usage and also a higher level of matching accuracy following hardware acceleration. To validate the effectiveness of our technique, a stereo vision system based on our improved algorithms were implemented using the Spartan 6 FPGA. In comparative experiments, it was shown that the system using the improved algorithms outperformed the system based on the unimproved algorithms, in terms of resource utilization and matching accuracy. In particular, Block RAM usage was reduced by 19%, and the improved system was also able to output range-finding data in real time.
Active Low Intrusion Hybrid Monitor for Wireless Sensor Networks

PubMed Central

Navia, Marlon; Campelo, Jose C.; Bonastre, Alberto; Ors, Rafael; Capella, Juan V.; Serrano, Juan J.

2015-01-01

Several systems have been proposed to monitor wireless sensor networks (WSN). These systems may be active (causing a high degree of intrusion) or passive (low observability inside the nodes). This paper presents the implementation of an active hybrid (hardware and software) monitor with low intrusion. It is based on the addition to the sensor node of a monitor node (hardware part) which, through a standard interface, is able to receive the monitoring information sent by a piece of software executed in the sensor node. The intrusion on time, code, and energy caused in the sensor nodes by the monitor is evaluated as a function of data size and the interface used. Then different interfaces, commonly available in sensor nodes, are evaluated: serial transmission (USART), serial peripheral interface (SPI), and parallel. The proposed hybrid monitor provides highly detailed information, barely disturbed by the measurement tool (interference), about the behavior of the WSN that may be used to evaluate many properties such as performance, dependability, security, etc. Monitor nodes are self-powered and may be removed after the monitoring campaign to be reused in other campaigns and/or WSNs. No other hardware-independent monitoring platforms with such low interference have been found in the literature. PMID:26393604
A hardware fast tracker for the ATLAS trigger

NASA Astrophysics Data System (ADS)

Asbah, Nedaa

2016-09-01

The trigger system of the ATLAS experiment is designed to reduce the event rate from the LHC nominal bunch crossing at 40 MHz to about 1 kHz, at the design luminosity of 1034 cm-2 s-1. After a successful period of data taking from 2010 to early 2013, the LHC already started with much higher instantaneous luminosity. This will increase the load on High Level Trigger system, the second stage of the selection based on software algorithms. More sophisticated algorithms will be needed to achieve higher background rejection while maintaining good efficiency for interesting physics signals. The Fast TracKer (FTK) is part of the ATLAS trigger upgrade project. It is a hardware processor that will provide, at every Level-1 accepted event (100 kHz) and within 100 microseconds, full tracking information for tracks with momentum as low as 1 GeV. Providing fast, extensive access to tracking information, with resolution comparable to the offline reconstruction, FTK will help in precise detection of the primary and secondary vertices to ensure robust selections and improve the trigger performance. FTK exploits hardware technologies with massive parallelism, combining Associative Memory ASICs, FPGAs and high-speed communication links.
Standard high-reliability integrated circuit logic packaging. [for deep space tracking stations

NASA Technical Reports Server (NTRS)

Slaughter, D. W.

1977-01-01

A family of standard, high-reliability hardware used for packaging digital integrated circuits is described. The design transition from early prototypes to production hardware is covered and future plans are discussed. Interconnections techniques are described as well as connectors and related hardware available at both the microcircuit packaging and main-frame level. General applications information is also provided.
iss050e056553

NASA Image and Video Library

2017-03-09

iss050e056553 (03/09/2017) --- NASA astronaut Peggy Whitson unloads spaceflight hardware delivered on SpaceX CRS-10 that was built as part of the NASA High School Students United with NASA to Create Hardware (HUNCH) program. Students in the HUNCH program receive valuable experience creating goods for NASA from hardware to the culinary arts, while NASA receives the creativity of the High School students.
Computerized atmospheric trace contaminant control simulation for manned spacecraft

NASA Technical Reports Server (NTRS)

Perry, J. L.

1993-01-01

Buildup of atmospheric trace contaminants in enclosed volumes such as a spacecraft may lead to potentially serious health problems for the crew members. For this reason, active control methods must be implemented to minimize the concentration of atmospheric contaminants to levels that are considered safe for prolonged, continuous exposure. Designing hardware to accomplish this has traditionally required extensive testing to characterize and select appropriate control technologies. Data collected since the Apollo project can now be used in a computerized performance simulation to predict the performance and life of contamination control hardware to allow for initial technology screening, performance prediction, and operations and contingency studies to determine the most suitable hardware approach before specific design and testing activities begin. The program, written in FORTRAN 77, provides contaminant removal rate, total mass removed, and per pass efficiency for each control device for discrete time intervals. In addition, projected cabin concentration is provided. Input and output data are manipulated using commercial spreadsheet and data graphing software. These results can then be used in analyzing hardware design parameters such as sizing and flow rate, overall process performance and program economics. Test performance may also be predicted to aid test design.
Bandwidth Efficient Wireless Digital Modem Developed

NASA Technical Reports Server (NTRS)

Kifle, Muli

1999-01-01

NASA Lewis Research Center has developed a digital approach for broadcasting highfidelity audio (nearly compact disk (CD) quality sound) in the commercial frequencymodulated (FM) broadcast band. This digital approach provides a means of achieving high data transmission rates with low hardware complexity--including low mass, size, and power consumption. Lewis has completed the design and prototype development of a bandwidth-efficient digital modem (modulator and demodulator) that uses a spectrally efficient modulation scheme: 16-ary rectangular quadrature amplitude modulation, or 16- ary QAM. The digital implementation is based strictly on inexpensive, commercial off-theshelf digital signal processing (DSP) hardware to perform up and down conversions and pulse shaping. The digital modem transmits data at rates up to 76 kilobits per second (kbps), which is almost 3 times faster than standard 28.8-kbps telephone modems. In addition, the modem offers improved power and spectral performance, flexible operation, and low-cost implementation.
Optimized Two-Party Video Chat with Restored Eye Contact Using Graphics Hardware

NASA Astrophysics Data System (ADS)

Dumont, Maarten; Rogmans, Sammy; Maesen, Steven; Bekaert, Philippe

We present a practical system prototype to convincingly restore eye contact between two video chat participants, with a minimal amount of constraints. The proposed six-fold camera setup is easily integrated into the monitor frame, and is used to interpolate an image as if its virtual camera captured the image through a transparent screen. The peer user has a large freedom of movement, resulting in system specifications that enable genuine practical usage. Our software framework thereby harnesses the powerful computational resources inside graphics hardware, and maximizes arithmetic intensity to achieve over real-time performance up to 42 frames per second for 800 ×600 resolution images. Furthermore, an optimal set of fine tuned parameters are presented, that optimizes the end-to-end performance of the application to achieve high subjective visual quality, and still allows for further algorithmic advancement without loosing its real-time capabilities.
Matrix-vector multiplication using digital partitioning for more accurate optical computing

NASA Technical Reports Server (NTRS)

Gary, C. K.

1992-01-01

Digital partitioning offers a flexible means of increasing the accuracy of an optical matrix-vector processor. This algorithm can be implemented with the same architecture required for a purely analog processor, which gives optical matrix-vector processors the ability to perform high-accuracy calculations at speeds comparable with or greater than electronic computers as well as the ability to perform analog operations at a much greater speed. Digital partitioning is compared with digital multiplication by analog convolution, residue number systems, and redundant number representation in terms of the size and the speed required for an equivalent throughput as well as in terms of the hardware requirements. Digital partitioning and digital multiplication by analog convolution are found to be the most efficient alogrithms if coding time and hardware are considered, and the architecture for digital partitioning permits the use of analog computations to provide the greatest throughput for a single processor.
On-orbit experience with the HEAO attitude control subsystem

NASA Technical Reports Server (NTRS)

Hoffman, D. P.; Berkery, E. A.

1978-01-01

The first satellite (HEAO-1) in the High Energy Astronomy Observatory Program series was launched successfully on Aug. 12, 1977. To date it has completed over nine months of orbital operation in a science data gathering mode. During this period all attitude control modes have been exercised and all primary mission objectives have been achieved. This paper highlights the characteristics of the attitude control subsystem design and compares the predicted performance with the actual flight operations experience. Environmental disturbance modeling, component hardware/software characteristics, and overall attitude control performance are reviewed and are found to compare very well with the prelaunch analytical predictions. Brief comments are also included regarding the operations aspects of the attitude control subsystem. The experience in this regard demonstrates the effectiveness of the design flexibility afforded by the presence of a general purpose digital processor in the subsystem flight hardware implementation.
KITTEN Lightweight Kernel 0.1 Beta

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pedretti, Kevin; Levenhagen, Michael; Kelly, Suzanne

2007-12-12

The Kitten Lightweight Kernel is a simplified OS (operating system) kernel that is intended to manage a compute node's hardware resources. It provides a set of mechanisms to user-level applications for utilizing hardware resources (e.g., allocating memory, creating processes, accessing the network). Kitten is much simpler than general-purpose OS kernels, such as Linux or Windows, but includes all of the esssential functionality needed to support HPC (high-performance computing) MPI, PGAS and OpenMP applications. Kitten provides unique capabilities such as physically contiguous application memory, transparent large page support, and noise-free tick-less operation, which enable HPC applications to obtain greater efficiency andmore » scalability than with general purpose OS kernels.« less
Software handlers for process interfaces

NASA Technical Reports Server (NTRS)

Bercaw, R. W.

1976-01-01

The principles involved in the development of software handlers for custom interfacing problems are discussed. Handlers for the CAMAC standard are examined in detail. The types of transactions that must be supported have been established by standards groups, eliminating conflicting requirements arising out of different design philosophies and applications. Implementation of the standard handlers has been facilititated by standardization of hardware. The necessary local processors can be placed in the handler when it is written or at run time by means of input/output directives, or they can be built into a high-performance input/output processor. The full benefits of these process interfaces will only be realized when software requirements are incorporated uniformly into the hardware.
Adaptive Neuron Model: An architecture for the rapid learning of nonlinear topological transformations

NASA Technical Reports Server (NTRS)

Tawel, Raoul (Inventor)

1994-01-01

A method for the rapid learning of nonlinear mappings and topological transformations using a dynamically reconfigurable artificial neural network is presented. This fully-recurrent Adaptive Neuron Model (ANM) network was applied to the highly degenerate inverse kinematics problem in robotics, and its performance evaluation is bench-marked. Once trained, the resulting neuromorphic architecture was implemented in custom analog neural network hardware and the parameters capturing the functional transformation downloaded onto the system. This neuroprocessor, capable of 10(exp 9) ops/sec, was interfaced directly to a three degree of freedom Heathkit robotic manipulator. Calculation of the hardware feed-forward pass for this mapping was benchmarked at approximately 10 microsec.
Diamond turning machine controller implementation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Garrard, K.P.; Taylor, L.W.; Knight, B.F.

The standard controller for a Pnuemo ASG 2500 Diamond Turning Machine, an Allen Bradley 8200, has been replaced with a custom high-performance design. This controller consists of four major components. Axis position feedback information is provided by a Zygo Axiom 2/20 laser interferometer with 0.1 micro-inch resolution. Hardware interface logic couples the computers digital and analog I/O channels to the diamond turning machine`s analog motor controllers, the laser interferometer, and other machine status and control information. It also provides front panel switches for operator override of the computer controller and implement the emergency stop sequence. The remaining two components, themore » control computer hardware and software, are discussed in detail below.« less
The Case of Nuclear Propulsion

NASA Technical Reports Server (NTRS)

Koroteev, Anatoly S.; Ponomarev-Stepnoi, Nicolai N.; Smetannikov, Vladimir P.; Gafarov, Albert A.; Houts, Mike; VanDyke, Melissa; Godfroy, Tom; Martin, James; Bragg-Sitton, Shannon; Dickens, Ricky

2003-01-01

Fission technology can enable rapid, affordable access to any point in the solar system. If fission propulsion systems are to be developed to their full potential; however, near-term customers must be identified and initial fission systems successfully developed, launched, and utilized. Successful utilization will simultaneously develop the infrastructure and experience necessary for developing even higher power and performance systems. To be successful, development programs must devise strategies for rapidly converting paper reactor concepts into actual flight hardware. One approach to accomplishing this is to design highly testable systems, and to structure the program to contain frequent, significant hardware milestones. This paper discusses ongoing efforts in Russia and the United States aimed at enabling near-term utilization of space fission systems.
Development and evaluation of a fault-tolerant multiprocessor (FTMP) computer. Volume 4: FTMP executive summary

NASA Technical Reports Server (NTRS)

Smith, T. B., III; Lala, J. H.

1984-01-01

The FTMP architecture is a high reliability computer concept modeled after a homogeneous multiprocessor architecture. Elements of the FTMP are operated in tight synchronism with one another and hardware fault-detection and fault-masking is provided which is transparent to the software. Operating system design and user software design is thus greatly simplified. Performance of the FTMP is also comparable to that of a simplex equivalent due to the efficiency of fault handling hardware. The FTMP project constructed an engineering module of the FTMP, programmed the machine and extensively tested the architecture through fault injection and other stress testing. This testing confirmed the soundness of the FTMP concepts.
A digitally implemented preambleless demodulator for maritime and mobile data communications

NASA Astrophysics Data System (ADS)

Chalmers, Harvey; Shenoy, Ajit; Verahrami, Farhad B.

The hardware design and software algorithms for a low-bit-rate, low-cost, all-digital preambleless demodulator are described. The demodulator operates under severe high-noise conditions, fast Doppler frequency shifts, large frequency offsets, and multipath fading. Sophisticated algorithms, including a fast Fourier transform (FFT)-based burst acquisition algorithm, a cycle-slip resistant carrier phase tracker, an innovative Doppler tracker, and a fast acquisition symbol synchronizer, were developed and extensively simulated for reliable burst reception. The compact digital signal processor (DSP)-based demodulator hardware uses a unique personal computer test interface for downloading test data files. The demodulator test results demonstrate a near-ideal performance within 0.2 dB of theory.
Optical Diagnosis of Gas Turbine Combustors Being Conducted

NASA Technical Reports Server (NTRS)

Hicks, Yolanda R.; Locke, Randy J.; Anderson, Robert C.; DeGroot, Wilhelmus A.

2001-01-01

Researchers at the NASA Glenn Research Center, in collaboration with industry, are reducing gas turbine engine emissions by studying visually the air-fuel interactions and combustion processes in combustors. This is especially critical for next generation engines that, in order to be more fuel-efficient, operate at higher temperatures and pressures than the current fleet engines. Optically based experiments were conducted in support of the Ultra-Efficient Engine Technology program in Glenn's unique, world-class, advanced subsonic combustion rig (ASCR) facility. The ASCR can supply air and jet fuel at the flow rates, temperatures, and pressures that simulate the conditions expected in the combustors of high-performance, civilian aircraft engines. In addition, this facility is large enough to support true sectors ("pie" slices of a full annular combustor). Sectors enable one to test true shapes rather than rectangular approximations of the actual hardware. Therefore, there is no compromise to actual engine geometry. A schematic drawing of the sector test stand is shown. The test hardware is mounted just upstream of the instrumentation section. The test stand can accommodate hardware up to 0.76-m diameter by 1.2-m long; thus sectors or small full annular combustors can be examined in this facility. Planar (two-dimensional) imaging using laser-induced fluorescence and Mie scattering, chemiluminescence, and video imagery were obtained for a variety of engine cycle conditions. The hardware tested was a double annular sector (two adjacent fuel injectors aligned radially) representing approximately 15 of a full annular combustor. An example of the two-dimensional data obtained for this configuration is also shown. The fluorescence data show the location of fuel and hydroxyl radical (OH) along the centerline of the fuel injectors. The chemiluminescence data show C2 within the total observable volume. The top row of this figure shows images obtained at an engine low-power condition, and the bottom row shows data from a higher power operating point. The data show distinctly the differences in flame structure between low-power and high-power engine conditions, in both location and amount of species produced (OH, C2) or consumed (fuel). The unique capability of the facility coupled with its optical accessibility helps to eliminate the need for high-pressure performance extrapolations. Tests such as described here have been used successfully to assess the performance of fuel-injection concepts and to modify those designs, if needed.
Efficient Bit-to-Symbol Likelihood Mappings

NASA Technical Reports Server (NTRS)

Moision, Bruce E.; Nakashima, Michael A.

2010-01-01

This innovation is an efficient algorithm designed to perform bit-to-symbol and symbol-to-bit likelihood mappings that represent a significant portion of the complexity of an error-correction code decoder for high-order constellations. Recent implementation of the algorithm in hardware has yielded an 8- percent reduction in overall area relative to the prior design.
Design and Implementation of High Performance Content-Addressable Memories.

DTIC Science & Technology

1985-12-01

content addressability and two basic implementations of content addressing. The need and application of hardware CAM is presented to motivate the " topic...3r Pass 4th Ps4 Pass Figure 2.15 Maximum SearchUsing All-Parallel CAM - left-most position (the most significant bit) and the other IF bits are zeros

Excellence: Achieving High Performance in the Army Personnel System.

DTIC Science & Technology

1985-05-09

11 Megatrends . . . . . . . . . . . . . . . . . . . . . . .. 12 Demographics . . . . . . . . . . . . . . . . . . . . . .. 12 Economic...system. State-of-the-art microcomputers could revolutionize the way personnel management is accomplished. Not only is the hardware available to automate...changes fundamentally are altering our society. We are in a transition from an industrial to an information based society. We are relocating from the
Cascaded VLSI neural network architecture for on-line learning

NASA Technical Reports Server (NTRS)

Thakoor, Anilkumar P. (Inventor); Duong, Tuan A. (Inventor); Daud, Taher (Inventor)

1992-01-01

High-speed, analog, fully-parallel, and asynchronous building blocks are cascaded for larger sizes and enhanced resolution. A hardware compatible algorithm permits hardware-in-the-loop learning despite limited weight resolution. A computation intensive feature classification application was demonstrated with this flexible hardware and new algorithm at high speed. This result indicates that these building block chips can be embedded as an application specific coprocessor for solving real world problems at extremely high data rates.
Cascaded VLSI neural network architecture for on-line learning

NASA Technical Reports Server (NTRS)

Duong, Tuan A. (Inventor); Daud, Taher (Inventor); Thakoor, Anilkumar P. (Inventor)

1995-01-01

High-speed, analog, fully-parallel and asynchronous building blocks are cascaded for larger sizes and enhanced resolution. A hardware-compatible algorithm permits hardware-in-the-loop learning despite limited weight resolution. A comparison-intensive feature classification application has been demonstrated with this flexible hardware and new algorithm at high speed. This result indicates that these building block chips can be embedded as application-specific-coprocessors for solving real-world problems at extremely high data rates.
CID-720 aircraft Langley Research Center preflight hardware tests: Development, flight acceptance and qualification

NASA Technical Reports Server (NTRS)

Pride, J. D.

1986-01-01

The testing conducted on LaRC-developed hardware for the controlled impact demonstration transport aircraft is discussed. To properly develop flight qualified crash systems, two environments were considered: the aircraft flight environment with the focus on vibration and temperature effects, and the crash environment with the long pulse shock effects. Also with the large quantity of fuel in the wing tanks the possibility of fire was considered to be a threat to data retrieval and thus fire tests were included in the development test process. The aircraft test successfully demonstrated the performance of the LaRC developed heat shields. Good telemetered data (S-band) was received during the impact and slide-out phase, and even after the aircraft came to rest. The two onboard DAS tape recorders were protected from the intense fire and high quality tape data was recovered. The complete photographic system performed as planned throughout the 40.0 sec of film supply. The four photo power distribution pallets remained in good condition and all ten onboard 16 mm high speed (400 frames/sec) cameras produced good film data.
Viking 75 project: Viking lander system primary mission performance report

NASA Technical Reports Server (NTRS)

Cooley, C. G.

1977-01-01

Viking Lander hardware performance during launch, interplanetary cruise, Mars orbit insertion, preseparation, separation through landing, and the primary landed mission, with primary emphasis on Lander engineering and science hardware operations, the as-flown mission are described with respect to Lander system performance and anomalies during the various mission phases. The extended mission and predicted Lander performance is discussed along with a summary of Viking goals, mission plans, and description of the Lander, and its subsystem definitions.
Hardware Implementation of a MIMO Decoder Using Matrix Factorization Based Channel Estimation

NASA Astrophysics Data System (ADS)

Islam, Mohammad Tariqul; Numan, Mostafa Wasiuddin; Misran, Norbahiah; Ali, Mohd Alauddin Mohd; Singh, Mandeep

2011-05-01

This paper presents an efficient hardware realization of multiple-input multiple-output (MIMO) wireless communication decoder that utilizes the available resources by adopting the technique of parallelism. The hardware is designed and implemented on Xilinx Virtex™-4 XC4VLX60 field programmable gate arrays (FPGA) device in a modular approach which simplifies and eases hardware update, and facilitates testing of the various modules independently. The decoder involves a proficient channel estimation module that employs matrix factorization on least squares (LS) estimation to reduce a full rank matrix into a simpler form in order to eliminate matrix inversion. This results in performance improvement and complexity reduction of the MIMO system. Performance evaluation of the proposed method is validated through MATLAB simulations which indicate 2 dB improvement in terms of SNR compared to LS estimation. Moreover complexity comparison is performed in terms of mathematical operations, which shows that the proposed approach appreciably outperforms LS estimation at a lower complexity and represents a good solution for channel estimation technique.
Grayscale image segmentation for real-time traffic sign recognition: the hardware point of view

NASA Astrophysics Data System (ADS)

Cao, Tam P.; Deng, Guang; Elton, Darrell

2009-02-01

In this paper, we study several grayscale-based image segmentation methods for real-time road sign recognition applications on an FPGA hardware platform. The performance of different image segmentation algorithms in different lighting conditions are initially compared using PC simulation. Based on these results and analysis, suitable algorithms are implemented and tested on a real-time FPGA speed sign detection system. Experimental results show that the system using segmented images uses significantly less hardware resources on an FPGA while maintaining comparable system's performance. The system is capable of processing 60 live video frames per second.
Development of Hardware-in-the-Loop Simulation Based on Gazebo and Pixhawk for Unmanned Aerial Vehicles

NASA Astrophysics Data System (ADS)

Nguyen, Khoa Dang; Ha, Cheolkeun

2018-04-01

Hardware-in-the-loop simulation (HILS) is well known as an effective approach in the design of unmanned aerial vehicles (UAV) systems, enabling engineers to test the control algorithm on a hardware board with a UAV model on the software. Performance of HILS is determined by performances of the control algorithm, the developed model, and the signal transfer between the hardware and software. The result of HILS is degraded if any signal could not be transferred to the correct destination. Therefore, this paper aims to develop a middleware software to secure communications in HILS system for testing the operation of a quad-rotor UAV. In our HILS, the Gazebo software is used to generate a nonlinear six-degrees-of-freedom (6DOF) model, sensor model, and 3D visualization for the quad-rotor UAV. Meanwhile, the flight control algorithm is designed and implemented on the Pixhawk hardware. New middleware software, referred to as the control application software (CAS), is proposed to ensure the connection and data transfer between Gazebo and Pixhawk using the multithread structure in Qt Creator. The CAS provides a graphical user interface (GUI), allowing the user to monitor the status of packet transfer, and perform the flight control commands and the real-time tuning parameters for the quad-rotor UAV. Numerical implementations have been performed to prove the effectiveness of the middleware software CAS suggested in this paper.
Parametric dense stereovision implementation on a system-on chip (SoC).

PubMed

Gardel, Alfredo; Montejo, Pablo; García, Jorge; Bravo, Ignacio; Lázaro, José L

2012-01-01

This paper proposes a novel hardware implementation of a dense recovery of stereovision 3D measurements. Traditionally 3D stereo systems have imposed the maximum number of stereo correspondences, introducing a large restriction on artificial vision algorithms. The proposed system-on-chip (SoC) provides great performance and efficiency, with a scalable architecture available for many different situations, addressing real time processing of stereo image flow. Using double buffering techniques properly combined with pipelined processing, the use of reconfigurable hardware achieves a parametrisable SoC which gives the designer the opportunity to decide its right dimension and features. The proposed architecture does not need any external memory because the processing is done as image flow arrives. Our SoC provides 3D data directly without the storage of whole stereo images. Our goal is to obtain high processing speed while maintaining the accuracy of 3D data using minimum resources. Configurable parameters may be controlled by later/parallel stages of the vision algorithm executed on an embedded processor. Considering hardware FPGA clock of 100 MHz, image flows up to 50 frames per second (fps) of dense stereo maps of more than 30,000 depth points could be obtained considering 2 Mpix images, with a minimum initial latency. The implementation of computer vision algorithms on reconfigurable hardware, explicitly low level processing, opens up the prospect of its use in autonomous systems, and they can act as a coprocessor to reconstruct 3D images with high density information in real time.
Data to hardware binding with physical unclonable functions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hamlet, Jason

The various technologies presented herein relate to binding data (e.g., software) to hardware, wherein the hardware is to utilize the data. The generated binding can be utilized to detect whether at least one of the hardware or the data has been modified between an initial moment (enrollment) and a later moment (authentication). During enrollment, an enrollment value is generated that includes a signature of the data, a first response from a PUF located on the hardware, and a code word. During authentication, a second response from the PUF is utilized to authenticate any of the content in the enrollment value,more » and based upon the authentication, a determination can be made regarding whether the hardware and/or the data have been modified. If modification is detected then a mitigating operation can be performed, e.g., the hardware is prevented from utilizing the data. If no modification is detected, the data can be utilized.« less
Digitally balanced detection for optical tomography.

PubMed

Hafiz, Rehan; Ozanyan, Krikor B

2007-10-01

Analog balanced Photodetection has found extensive usage for sensing of a weak absorption signal buried in laser intensity noise. This paper proposes schemes for compact, affordable, and flexible digital implementation of the already established analog balanced detection, as part of a multichannel digital tomography system. Variants of digitally balanced detection (DBD) schemes, suitable for weak signals on a largely varying background or weakly varying envelopes of high frequency carrier waves, are introduced analytically and elaborated in terms of algorithmic and hardware flow. The DBD algorithms are implemented on a low-cost general purpose reconfigurable hardware (field-programmable gate array), utilizing less than half of its resources. The performance of the DBD schemes compare favorably with their analog counterpart: A common mode rejection ratio of 50 dB was observed over a bandwidth of 300 kHz, limited mainly by the host digital hardware. The close relationship between the DBD outputs and those of known analog balancing circuits is discussed in principle and shown experimentally in the example case of propane gas detection.
An Efficient Hardware Circuit for Spike Sorting Based on Competitive Learning Networks.

PubMed

Chen, Huan-Yuan; Chen, Chih-Chang; Hwang, Wen-Jyi

2017-09-28

This study aims to present an effective VLSI circuit for multi-channel spike sorting. The circuit supports the spike detection, feature extraction and classification operations. The detection circuit is implemented in accordance with the nonlinear energy operator algorithm. Both the peak detection and area computation operations are adopted for the realization of the hardware architecture for feature extraction. The resulting feature vectors are classified by a circuit for competitive learning (CL) neural networks. The CL circuit supports both online training and classification. In the proposed architecture, all the channels share the same detection, feature extraction, learning and classification circuits for a low area cost hardware implementation. The clock-gating technique is also employed for reducing the power dissipation. To evaluate the performance of the architecture, an application-specific integrated circuit (ASIC) implementation is presented. Experimental results demonstrate that the proposed circuit exhibits the advantages of a low chip area, a low power dissipation and a high classification success rate for spike sorting.
An Efficient Hardware Circuit for Spike Sorting Based on Competitive Learning Networks

PubMed Central

Chen, Huan-Yuan; Chen, Chih-Chang

2017-01-01

This study aims to present an effective VLSI circuit for multi-channel spike sorting. The circuit supports the spike detection, feature extraction and classification operations. The detection circuit is implemented in accordance with the nonlinear energy operator algorithm. Both the peak detection and area computation operations are adopted for the realization of the hardware architecture for feature extraction. The resulting feature vectors are classified by a circuit for competitive learning (CL) neural networks. The CL circuit supports both online training and classification. In the proposed architecture, all the channels share the same detection, feature extraction, learning and classification circuits for a low area cost hardware implementation. The clock-gating technique is also employed for reducing the power dissipation. To evaluate the performance of the architecture, an application-specific integrated circuit (ASIC) implementation is presented. Experimental results demonstrate that the proposed circuit exhibits the advantages of a low chip area, a low power dissipation and a high classification success rate for spike sorting. PMID:28956859
OpenSoC Fabric

DOE Office of Scientific and Technical Information (OSTI.GOV)

2014-08-21

Recent advancements in technology scaling have shown a trend towards greater integration with large-scale chips containing thousands of processors connected to memories and other I/O devices using non-trivial network topologies. Software simulation proves insufficient to study the tradeoffs in such complex systems due to slow execution time, whereas hardware RTL development is too time-consuming. We present OpenSoC Fabric, an on-chip network generation infrastructure which aims to provide a parameterizable and powerful on-chip network generator for evaluating future high performance computing architectures based on SoC technology. OpenSoC Fabric leverages a new hardware DSL, Chisel, which contains powerful abstractions provided by itsmore » base language, Scala, and generates both software (C++) and hardware (Verilog) models from a single code base. The OpenSoC Fabric2 infrastructure is modeled after existing state-of-the-art simulators, offers large and powerful collections of configuration options, and follows object-oriented design and functional programming to make functionality extension as easy as possible.« less
Tri-state delta modulation system for Space Shuttle digital TV downlink

NASA Technical Reports Server (NTRS)

Udalov, S.; Huth, G. K.; Roberts, D.; Batson, B. H.

1981-01-01

Future requirements for Shuttle Orbiter downlink communication may include transmission of digital video which, in addition to black and white, may also be either field-sequential or NTSC color format. The use of digitized video could provide for picture privacy at the expense of additional onboard hardware, together with an increased bandwidth due to the digitization process. A general objective for the Space Shuttle application is to develop a digitization technique that is compatible with data rates in the 20-30 Mbps range but still provides good quality pictures. This paper describes a tri-state delta modulation/demodulation (TSDM) technique which is a good compromise between implementation complexity and performance. The unique feature of TSDM is that it provides for efficient run-length encoding of constant-intensity segments of a TV picture. Axiomatix has developed a hardware implementation of a high-speed TSDM transmitter and receiver for black-and-white TV and field-sequential color. The hardware complexity of this TSDM implementation is summarized in the paper.
Long-term CF6 engine performance deterioration: Evaluation of engine S/N 451-380

NASA Technical Reports Server (NTRS)

Kramer, W. H.; Smith, J. J.

1978-01-01

The performance testing and analytical teardown of CF6-6D engine serial number 451-380 which was recently removed from a DC-10 aircraft is summarized. The investigative test program was conducted inbound prior to normal overhaul/refurbishment. The performance testing included an inbound test, a test following cleaning of the low pressure turbine airfoils, and a final test after leading edge rework and cleaning the stage one fan blades. The analytical teardown consisted of detailed disassembly inspection measurements and airfoil surface finish checks of the as-received deteriorated hardware. Aspects discussed include the analysis of the test cell performance data, a complete analytical teardown report with a detailed description of all observed hardware distress, and an analytical assessment of the performance loss (deterioration) relating measured hardware conditions to losses in both specific fuel comsumption and exhaust gas temperature.
Long-term CF6 engine performance deterioration: Evaluation of engine S/N 451-479

NASA Technical Reports Server (NTRS)

Kramer, W. H.; Smith, J. J.

1978-01-01

The performance testing and analytical teardown of CF6-6D engine is summarized. This engine had completed its initial installation on DC-10 aircraft. The investigative test program was conducted inbound prior to normal overhaul/refurbishment. The performance testing included an inbound test, a test following cleaning of the low pressure turbine airfoils, and a final test after leading edge rework and cleaning the stage one fan blades. The analytical teardown consisted of detailed disassembly inspection measurements and airfoil surface finish checks of the as received deteriorated hardware. Included in this report is a detailed analysis of the test cell performance data, a complete analytical teardown report with a detailed description of all observed hardware distress, and an analytical assessment of the performance loss (deterioration) relating measured hardware conditions to losses in both SFC (specific fuel consumption) and EGT (exhaust gas temperature).
Design and test of a high power electromechanical actuator for thrust vector control

NASA Technical Reports Server (NTRS)

Cowan, J. R.; Myers, W. N.

1992-01-01

NASA-Marshall is involved in the development of electromechanical actuators (EMA) for thrust-vector control (TVC) system testing and implementation in spacecraft control/gimballing systems, with a view to the replacement of hydraulic hardware. TVC system control is furnished by solid state controllers and power supplies; a pair of resolvers supply position feedback to the controller for precise positioning. Performance comparisons between EMA and hydraulic TVC systems are performed.
The Use of High Performance Computing (HPC) to Strengthen the Development of Army Systems

DTIC Science & Technology

2011-11-01

accurately predicting the supersonic magus effect about spinning cones, ogive- cylinders , and boat-tailed afterbodies. This work led to the successful...successful computer model of the proposed product or system, one can then build prototypes on the computer and study the effects on the performance of...needed. The NRC report discusses the requirements for effective use of such computing power. One needs “models, algorithms, software, hardware
Design and test of a high power electromechanical actuator for thrust vector control

NASA Astrophysics Data System (ADS)

Cowan, J. R.; Myers, W. N.

1992-07-01

NASA-Marshall is involved in the development of electromechanical actuators (EMA) for thrust-vector control (TVC) system testing and implementation in spacecraft control/gimballing systems, with a view to the replacement of hydraulic hardware. TVC system control is furnished by solid state controllers and power supplies; a pair of resolvers supply position feedback to the controller for precise positioning. Performance comparisons between EMA and hydraulic TVC systems are performed.

Zero gravity tissue-culture laboratory

NASA Technical Reports Server (NTRS)

Cook, J. E.; Montgomery, P. O., Jr.; Paul, J. S.

1972-01-01

Hardware was developed for performing experiments to detect the effects that zero gravity may have on living human cells. The hardware is composed of a timelapse camera that photographs the activity of cell specimens and an experiment module in which a variety of living-cell experiments can be performed using interchangeable modules. The experiment is scheduled for the first manned Skylab mission.
Similarity constraints in testing of cooled engine parts

NASA Technical Reports Server (NTRS)

Colladay, R. S.; Stepka, F. S.

1974-01-01

A study is made of the effect of testing cooled parts of current and advanced gas turbine engines at the reduced temperature and pressure conditions which maintain similarity with the engine environment. Some of the problems facing the experimentalist in evaluating heat transfer and aerodynamic performance when hardware is tested at conditions other than the actual engine environment are considered. Low temperature and pressure test environments can simulate the performance of actual size prototype engine hardware within the tolerance of experimental accuracy if appropriate similarity conditions are satisfied. Failure to adhere to these similarity constraints because of test facility limitations or other reasons, can result in a number of serious errors in projecting the performance of test hardware to engine conditions.
Long-Wavelength Beam Steerer Based on a Micro-Electromechanical Mirror

PubMed Central

Kos, Anthony B; Gerecht, Eyal

2013-01-01

Commercially available mirrors for scanning long-wavelength beams are too large for high-speed imaging. There is a need for a smaller, more agile pointing apparatus to provide images in seconds, not minutes or hours. A fast long-wavelength beam steerer uses a commercial micro-electro-mechanical system (MEMS) mirror controlled by a high-performance digital signal processor (DSP). The DSP allows high-speed raster scanning of the incident radiation, which is focused to a small waist onto the 9mm2, gold-coated, MEMS mirror surface, while simultaneously acquiring an undistorted, high spatial-resolution image of an object. The beam steerer hardware, software and performance are described. The system can also serve as a miniaturized, high-performance long-wavelength beam chopper for lock-in detection. PMID:26401426
Profiling an application for power consumption during execution on a compute node

DOEpatents

Archer, Charles J; Blocksome, Michael A; Peters, Amanda E; Ratterman, Joseph D; Smith, Brian E

2013-09-17

Methods, apparatus, and products are disclosed for profiling an application for power consumption during execution on a compute node that include: receiving an application for execution on a compute node; identifying a hardware power consumption profile for the compute node, the hardware power consumption profile specifying power consumption for compute node hardware during performance of various processing operations; determining a power consumption profile for the application in dependence upon the application and the hardware power consumption profile for the compute node; and reporting the power consumption profile for the application.
A fast, programmable hardware architecture for spaceborne SAR processing

NASA Technical Reports Server (NTRS)

Bennett, J. R.; Cumming, I. G.; Lim, J.; Wedding, R. M.

1983-01-01

The launch of spaceborne SARs during the 1980's is discussed. The satellite SARs require high quality and high throughput ground processors. Compression ratios in range and azimuth of greater than 500 and 150 respectively lead to frequency domain processing and data computation rates in excess of 2000 million real operations per second for C-band SARs under consideration. Various hardware architectures are examined and two promising candidates and proceeds to recommend a fast, programmable hardware architecture for spaceborne SAR processing are selected. Modularity and programmability are introduced as desirable attributes for the purpose of HTSP hardware selection.
Towards Portable Large-Scale Image Processing with High-Performance Computing.

PubMed

Huo, Yuankai; Blaber, Justin; Damon, Stephen M; Boyd, Brian D; Bao, Shunxing; Parvathaneni, Prasanna; Noguera, Camilo Bermudez; Chaganti, Shikha; Nath, Vishwesh; Greer, Jasmine M; Lyu, Ilwoo; French, William R; Newton, Allen T; Rogers, Baxter P; Landman, Bennett A

2018-05-03

High-throughput, large-scale medical image computing demands tight integration of high-performance computing (HPC) infrastructure for data storage, job distribution, and image processing. The Vanderbilt University Institute for Imaging Science (VUIIS) Center for Computational Imaging (CCI) has constructed a large-scale image storage and processing infrastructure that is composed of (1) a large-scale image database using the eXtensible Neuroimaging Archive Toolkit (XNAT), (2) a content-aware job scheduling platform using the Distributed Automation for XNAT pipeline automation tool (DAX), and (3) a wide variety of encapsulated image processing pipelines called "spiders." The VUIIS CCI medical image data storage and processing infrastructure have housed and processed nearly half-million medical image volumes with Vanderbilt Advanced Computing Center for Research and Education (ACCRE), which is the HPC facility at the Vanderbilt University. The initial deployment was natively deployed (i.e., direct installations on a bare-metal server) within the ACCRE hardware and software environments, which lead to issues of portability and sustainability. First, it could be laborious to deploy the entire VUIIS CCI medical image data storage and processing infrastructure to another HPC center with varying hardware infrastructure, library availability, and software permission policies. Second, the spiders were not developed in an isolated manner, which has led to software dependency issues during system upgrades or remote software installation. To address such issues, herein, we describe recent innovations using containerization techniques with XNAT/DAX which are used to isolate the VUIIS CCI medical image data storage and processing infrastructure from the underlying hardware and software environments. The newly presented XNAT/DAX solution has the following new features: (1) multi-level portability from system level to the application level, (2) flexible and dynamic software development and expansion, and (3) scalable spider deployment compatible with HPC clusters and local workstations.
The use of imprecise processing to improve accuracy in weather & climate prediction

NASA Astrophysics Data System (ADS)

Düben, Peter D.; McNamara, Hugh; Palmer, T. N.

2014-08-01

The use of stochastic processing hardware and low precision arithmetic in atmospheric models is investigated. Stochastic processors allow hardware-induced faults in calculations, sacrificing bit-reproducibility and precision in exchange for improvements in performance and potentially accuracy of forecasts, due to a reduction in power consumption that could allow higher resolution. A similar trade-off is achieved using low precision arithmetic, with improvements in computation and communication speed and savings in storage and memory requirements. As high-performance computing becomes more massively parallel and power intensive, these two approaches may be important stepping stones in the pursuit of global cloud-resolving atmospheric modelling. The impact of both hardware induced faults and low precision arithmetic is tested using the Lorenz '96 model and the dynamical core of a global atmosphere model. In the Lorenz '96 model there is a natural scale separation; the spectral discretisation used in the dynamical core also allows large and small scale dynamics to be treated separately within the code. Such scale separation allows the impact of lower-accuracy arithmetic to be restricted to components close to the truncation scales and hence close to the necessarily inexact parametrised representations of unresolved processes. By contrast, the larger scales are calculated using high precision deterministic arithmetic. Hardware faults from stochastic processors are emulated using a bit-flip model with different fault rates. Our simulations show that both approaches to inexact calculations do not substantially affect the large scale behaviour, provided they are restricted to act only on smaller scales. By contrast, results from the Lorenz '96 simulations are superior when small scales are calculated on an emulated stochastic processor than when those small scales are parametrised. This suggests that inexact calculations at the small scale could reduce computation and power costs without adversely affecting the quality of the simulations. This would allow higher resolution models to be run at the same computational cost.
Hardware enabled performance counters with support for operating system context switching

DOEpatents

Salapura, Valentina; Wisniewski, Robert W.

2015-06-30

A device for supporting hardware enabled performance counters with support for context switching include a plurality of performance counters operable to collect information associated with one or more computer system related activities, a first register operable to store a memory address, a second register operable to store a mode indication, and a state machine operable to read the second register and cause the plurality of performance counters to copy the information to memory area indicated by the memory address based on the mode indication.
Using SRAM Based FPGAs for Power-Aware High Performance Wireless Sensor Networks

PubMed Central

Valverde, Juan; Otero, Andres; Lopez, Miguel; Portilla, Jorge; de la Torre, Eduardo; Riesgo, Teresa

2012-01-01

While for years traditional wireless sensor nodes have been based on ultra-low power microcontrollers with sufficient but limited computing power, the complexity and number of tasks of today’s applications are constantly increasing. Increasing the node duty cycle is not feasible in all cases, so in many cases more computing power is required. This extra computing power may be achieved by either more powerful microcontrollers, though more power consumption or, in general, any solution capable of accelerating task execution. At this point, the use of hardware based, and in particular FPGA solutions, might appear as a candidate technology, since though power use is higher compared with lower power devices, execution time is reduced, so energy could be reduced overall. In order to demonstrate this, an innovative WSN node architecture is proposed. This architecture is based on a high performance high capacity state-of-the-art FPGA, which combines the advantages of the intrinsic acceleration provided by the parallelism of hardware devices, the use of partial reconfiguration capabilities, as well as a careful power-aware management system, to show that energy savings for certain higher-end applications can be achieved. Finally, comprehensive tests have been done to validate the platform in terms of performance and power consumption, to proof that better energy efficiency compared to processor based solutions can be achieved, for instance, when encryption is imposed by the application requirements. PMID:22736971
Using SRAM based FPGAs for power-aware high performance wireless sensor networks.

PubMed

Valverde, Juan; Otero, Andres; Lopez, Miguel; Portilla, Jorge; de la Torre, Eduardo; Riesgo, Teresa

2012-01-01

While for years traditional wireless sensor nodes have been based on ultra-low power microcontrollers with sufficient but limited computing power, the complexity and number of tasks of today's applications are constantly increasing. Increasing the node duty cycle is not feasible in all cases, so in many cases more computing power is required. This extra computing power may be achieved by either more powerful microcontrollers, though more power consumption or, in general, any solution capable of accelerating task execution. At this point, the use of hardware based, and in particular FPGA solutions, might appear as a candidate technology, since though power use is higher compared with lower power devices, execution time is reduced, so energy could be reduced overall. In order to demonstrate this, an innovative WSN node architecture is proposed. This architecture is based on a high performance high capacity state-of-the-art FPGA, which combines the advantages of the intrinsic acceleration provided by the parallelism of hardware devices, the use of partial reconfiguration capabilities, as well as a careful power-aware management system, to show that energy savings for certain higher-end applications can be achieved. Finally, comprehensive tests have been done to validate the platform in terms of performance and power consumption, to proof that better energy efficiency compared to processor based solutions can be achieved, for instance, when encryption is imposed by the application requirements.
System for detecting operating errors in a variable valve timing engine using pressure sensors

DOEpatents

Wiles, Matthew A.; Marriot, Craig D

2013-07-02

A method and control module includes a pressure sensor data comparison module that compares measured pressure volume signal segments to ideal pressure volume segments. A valve actuation hardware remedy module performs a hardware remedy in response to comparing the measured pressure volume signal segments to the ideal pressure volume segments when a valve actuation hardware failure is detected.
Approximation of Engine Casing Temperature Constraints for Casing Mounted Electronics

NASA Technical Reports Server (NTRS)

Kratz, Jonathan L.; Culley, Dennis E.; Chapman, Jeffryes W.

2017-01-01

The performance of propulsion engine systems is sensitive to weight and volume considerations. This can severely constrain the configuration and complexity of the control system hardware. Distributed Engine Control technology is a response to these concerns by providing more flexibility in designing the control system, and by extension, more functionality leading to higher performing engine systems. Consequently, there can be a weight benefit to mounting modular electronic hardware on the engine core casing in a high temperature environment. This paper attempts to quantify the in-flight temperature constraints for engine casing mounted electronics. In addition, an attempt is made at studying heat soak back effects. The Commercial Modular Aero Propulsion System Simulation 40k (C-MAPSS40k) software is leveraged with real flight data as the inputs to the simulation. A two-dimensional (2-D) heat transfer model is integrated with the engine simulation to approximate the temperature along the length of the engine casing. This modification to the existing C-MAPSS40k software will provide tools and methodologies to develop a better understanding of the requirements for the embedded electronics hardware in future engine systems. Results of the simulations are presented and their implications on temperature constraints for engine casing mounted electronics is discussed.
Approximation of Engine Casing Temperature Constraints for Casing Mounted Electronics

NASA Technical Reports Server (NTRS)

Kratz, Jonathan; Culley, Dennis; Chapman, Jeffryes

2016-01-01

The performance of propulsion engine systems is sensitive to weight and volume considerations. This can severely constrain the configuration and complexity of the control system hardware. Distributed Engine Control technology is a response to these concerns by providing more flexibility in designing the control system, and by extension, more functionality leading to higher performing engine systems. Consequently, there can be a weight benefit to mounting modular electronic hardware on the engine core casing in a high temperature environment. This paper attempts to quantify the in-flight temperature constraints for engine casing mounted electronics. In addition, an attempt is made at studying heat soak back effects. The Commercial Modular Aero Propulsion System Simulation 40k (C-MAPSS40k) software is leveraged with real flight data as the inputs to the simulation. A two-dimensional (2-D) heat transfer model is integrated with the engine simulation to approximate the temperature along the length of the engine casing. This modification to the existing C-MAPSS40k software will provide tools and methodologies to develop a better understanding of the requirements for the embedded electronics hardware in future engine systems. Results of the simulations are presented and their implications on temperature constraints for engine casing mounted electronics is discussed.
Autonomous target tracking of UAVs based on low-power neural network hardware

NASA Astrophysics Data System (ADS)

Yang, Wei; Jin, Zhanpeng; Thiem, Clare; Wysocki, Bryant; Shen, Dan; Chen, Genshe

2014-05-01

Detecting and identifying targets in unmanned aerial vehicle (UAV) images and videos have been challenging problems due to various types of image distortion. Moreover, the significantly high processing overhead of existing image/video processing techniques and the limited computing resources available on UAVs force most of the processing tasks to be performed by the ground control station (GCS) in an off-line manner. In order to achieve fast and autonomous target identification on UAVs, it is thus imperative to investigate novel processing paradigms that can fulfill the real-time processing requirements, while fitting the size, weight, and power (SWaP) constrained environment. In this paper, we present a new autonomous target identification approach on UAVs, leveraging the emerging neuromorphic hardware which is capable of massively parallel pattern recognition processing and demands only a limited level of power consumption. A proof-of-concept prototype was developed based on a micro-UAV platform (Parrot AR Drone) and the CogniMemTMneural network chip, for processing the video data acquired from a UAV camera on the y. The aim of this study was to demonstrate the feasibility and potential of incorporating emerging neuromorphic hardware into next-generation UAVs and their superior performance and power advantages towards the real-time, autonomous target tracking.
Developing an Integration Infrastructure for Distributed Engine Control Technologies

NASA Technical Reports Server (NTRS)

Culley, Dennis; Zinnecker, Alicia; Aretskin-Hariton, Eliot; Kratz, Jonathan

2014-01-01

Turbine engine control technology is poised to make the first revolutionary leap forward since the advent of full authority digital engine control in the mid-1980s. This change aims squarely at overcoming the physical constraints that have historically limited control system hardware on aero-engines to a federated architecture. Distributed control architecture allows complex analog interfaces existing between system elements and the control unit to be replaced by standardized digital interfaces. Embedded processing, enabled by high temperature electronics, provides for digitization of signals at the source and network communications resulting in a modular system at the hardware level. While this scheme simplifies the physical integration of the system, its complexity appears in other ways. In fact, integration now becomes a shared responsibility among suppliers and system integrators. While these are the most obvious changes, there are additional concerns about performance, reliability, and failure modes due to distributed architecture that warrant detailed study. This paper describes the development of a new facility intended to address the many challenges of the underlying technologies of distributed control. The facility is capable of performing both simulation and hardware studies ranging from component to system level complexity. Its modular and hierarchical structure allows the user to focus their interaction on specific areas of interest.
RotCFD Analysis of the AH-56 Cheyenne Hub Drag

NASA Technical Reports Server (NTRS)

Solis, Eduardo; Bass, Tal A.; Keith, Matthew D.; Oppenheim, Rebecca T.; Runyon, Bryan T.; Veras-Alba, Belen

2016-01-01

In 2016, the U.S. Army Aviation Development Directorate (ADD) conducted tests in the U.S. Army 7- by 10- Foot Wind Tunnel at NASA Ames Research Center of a nonrotating 2/5th-scale AH-56 rotor hub. The objective of the tests was to determine how removing the mechanical control gyro affected the drag. Data for the lift, drag, and pitching moment were recorded for the 4-bladed rotor hub in various hardware configurations, azimuth angles, and angles of attack. Numerical simulations of a selection of the configurations and orientations were then performed, and the results were compared with the test data. To generate the simulation results, the hardware configurations were modeled using Creo and Rhinoceros 5, three-dimensional surface modeling computer-aided design (CAD) programs. The CAD model was imported into Rotorcraft Computational Fluid Dynamics (RotCFD), a computational fluid dynamics (CFD) tool used for analyzing rotor flow fields. RotCFD simulation results were compared with the experimental results of three hardware configurations at two azimuth angles, two angles of attack, and with and without wind tunnel walls. The results help validate RotCFD as a tool for analyzing low-drag rotor hub designs for advanced high-speed rotorcraft concepts. Future work will involve simulating additional hub geometries to reduce drag or tailor to other desired performance levels.
Verification Challenges of Dynamic Testing of Space Flight Hardware

NASA Technical Reports Server (NTRS)

Winnitoy, Susan

2010-01-01

The Six Degree-of-Freedom Dynamic Test System (SDTS) is a test facility at the National Aeronautics and Space Administration (NASA) Johnson Space Center in Houston, Texas for performing dynamic verification of space structures and hardware. Some examples of past and current tests include the verification of on-orbit robotic inspection systems, space vehicle assembly procedures and docking/berthing systems. The facility is able to integrate a dynamic simulation of on-orbit spacecraft mating or demating using flight-like mechanical interface hardware. A force moment sensor is utilized for input to the simulation during the contact phase, thus simulating the contact dynamics. While the verification of flight hardware presents many unique challenges, one particular area of interest is with respect to the use of external measurement systems to ensure accurate feedback of dynamic contact. There are many commercial off-the-shelf (COTS) measurement systems available on the market, and the test facility measurement systems have evolved over time to include two separate COTS systems. The first system incorporates infra-red sensing cameras, while the second system employs a laser interferometer to determine position and orientation data. The specific technical challenges with the measurement systems in a large dynamic environment include changing thermal and humidity levels, operational area and measurement volume, dynamic tracking, and data synchronization. The facility is located in an expansive high-bay area that is occasionally exposed to outside temperature when large retractable doors at each end of the building are opened. The laser interferometer system, in particular, is vulnerable to the environmental changes in the building. The operational area of the test facility itself is sizeable, ranging from seven meters wide and five meters deep to as much as seven meters high. Both facility measurement systems have desirable measurement volumes and the accuracies vary within the respective volumes. In addition, because this is a dynamic facility with a moving test bed, direct line-of-sight may not be available at all times between the measurement sensors and the tracking targets. Finally, the feedback data from the active test bed along with the two external measurement systems must be synchronized to allow for data correlation. To ensure the desired accuracy and resolution of these systems, calibration of the systems must be performed regularly. New innovations in sensor technology itself are periodically incorporated into the facility s overall measurement scheme. In addressing the challenges of the measurement systems, the facility is able to provide essential position and orientation data to verify the dynamic performance of space flight hardware.
Closed-Loop Neuromorphic Benchmarks

PubMed Central

Stewart, Terrence C.; DeWolf, Travis; Kleinhans, Ashley; Eliasmith, Chris

2015-01-01

Evaluating the effectiveness and performance of neuromorphic hardware is difficult. It is even more difficult when the task of interest is a closed-loop task; that is, a task where the output from the neuromorphic hardware affects some environment, which then in turn affects the hardware's future input. However, closed-loop situations are one of the primary potential uses of neuromorphic hardware. To address this, we present a methodology for generating closed-loop benchmarks that makes use of a hybrid of real physical embodiment and a type of “minimal” simulation. Minimal simulation has been shown to lead to robust real-world performance, while still maintaining the practical advantages of simulation, such as making it easy for the same benchmark to be used by many researchers. This method is flexible enough to allow researchers to explicitly modify the benchmarks to identify specific task domains where particular hardware excels. To demonstrate the method, we present a set of novel benchmarks that focus on motor control for an arbitrary system with unknown external forces. Using these benchmarks, we show that an error-driven learning rule can consistently improve motor control performance across a randomly generated family of closed-loop simulations, even when there are up to 15 interacting joints to be controlled. PMID:26696820
Data flow modeling techniques

NASA Technical Reports Server (NTRS)

Kavi, K. M.

1984-01-01

There have been a number of simulation packages developed for the purpose of designing, testing and validating computer systems, digital systems and software systems. Complex analytical tools based on Markov and semi-Markov processes have been designed to estimate the reliability and performance of simulated systems. Petri nets have received wide acceptance for modeling complex and highly parallel computers. In this research data flow models for computer systems are investigated. Data flow models can be used to simulate both software and hardware in a uniform manner. Data flow simulation techniques provide the computer systems designer with a CAD environment which enables highly parallel complex systems to be defined, evaluated at all levels and finally implemented in either hardware or software. Inherent in data flow concept is the hierarchical handling of complex systems. In this paper we will describe how data flow can be used to model computer system.
From MIMO-OFDM Algorithms to a Real-Time Wireless Prototype: A Systematic Matlab-to-Hardware Design Flow

NASA Astrophysics Data System (ADS)

Weijers, Jan-Willem; Derudder, Veerle; Janssens, Sven; Petré, Frederik; Bourdoux, André

2006-12-01

To assess the performance of forthcoming 4th generation wireless local area networks, the algorithmic functionality is usually modelled using a high-level mathematical software package, for instance, Matlab. In order to validate the modelling assumptions against the real physical world, the high-level functional model needs to be translated into a prototype. A systematic system design methodology proves very valuable, since it avoids, or, at least reduces, numerous design iterations. In this paper, we propose a novel Matlab-to-hardware design flow, which allows to map the algorithmic functionality onto the target prototyping platform in a systematic and reproducible way. The proposed design flow is partly manual and partly tool assisted. It is shown that the proposed design flow allows to use the same testbench throughout the whole design flow and avoids time-consuming and error-prone intermediate translation steps.

MIDAS, prototype Multivariate Interactive Digital Analysis System, phase 1. Volume 3: Wiring diagrams

NASA Technical Reports Server (NTRS)

Kriegler, F. J.; Christenson, D.; Gordon, M.; Kistler, R.; Lampert, S.; Marshall, R.; Mclaughlin, R.

1974-01-01

The Midas System is a third-generation, fast, multispectral recognition system able to keep pace with the large quantity and high rates of data acquisition from present and projected sensors. A principal objective of the MIDAS Program is to provide a system well interfaced with the human operator and thus to obtain large overall reductions in turn-around time and significant gains in throughput. The hardware and software generated in Phase I of the overall program are described. The system contains a mini-computer to control the various high-speed processing elements in the data path and a classifier which implements an all-digital prototype multivariate-Gaussian maximum likelihood decision algorithm operating at 2 x 100,000 pixels/sec. Sufficient hardware was developed to perform signature extraction from computer-compatible tapes, compute classifier coefficients, control the classifier operation, and diagnose operation. The MIDAS construction and wiring diagrams are given.
MIDAS, prototype Multivariate Interactive Digital Analysis System, Phase 1. Volume 2: Diagnostic system

NASA Technical Reports Server (NTRS)

Kriegler, F. J.; Christenson, D.; Gordon, M.; Kistler, R.; Lampert, S.; Marshall, R.; Mclaughlin, R.

1974-01-01

The MIDAS System is a third-generation, fast, multispectral recognition system able to keep pace with the large quantity and high rates of data acquisition from present and projected sensors. A principal objective of the MIDAS Program is to provide a system well interfaced with the human operator and thus to obtain large overall reductions in turn-around time and significant gains in throughout. The hardware and software generated in Phase I of the over-all program are described. The system contains a mini-computer to control the various high-speed processing elements in the data path and a classifier which implements an all-digital prototype multivariate-Gaussian maximum likelihood decision algorithm operating 2 x 105 pixels/sec. Sufficient hardware was developed to perform signature extraction from computer-compatible tapes, compute classifier coefficients, control the classifier operation, and diagnose operation. Diagnostic programs used to test MIDAS' operations are presented.
Highly Reconfigurable Beamformer Stimulus Generator

NASA Astrophysics Data System (ADS)

Vaviļina, E.; Gaigals, G.

2018-02-01

The present paper proposes a highly reconfigurable beamformer stimulus generator of radar antenna array, which includes three main blocks: settings of antenna array, settings of objects (signal sources) and a beamforming simulator. Following from the configuration of antenna array and object settings, different stimulus can be generated as the input signal for a beamformer. This stimulus generator is developed under a greater concept with two utterly independent paths where one is the stimulus generator and the other is the hardware beamformer. Both paths can be complemented in final and in intermediate steps as well to check and improve system performance. This way the technology development process is promoted by making each of the future hardware steps more substantive. Stimulus generator configuration capabilities and test results are presented proving the application of the stimulus generator for FPGA based beamforming unit development and tuning as an alternative to an actual antenna system.
Open-Source 3-D Platform for Low-Cost Scientific Instrument Ecosystem.

PubMed

Zhang, C; Wijnen, B; Pearce, J M

2016-08-01

The combination of open-source software and hardware provides technically feasible methods to create low-cost, highly customized scientific research equipment. Open-source 3-D printers have proven useful for fabricating scientific tools. Here the capabilities of an open-source 3-D printer are expanded to become a highly flexible scientific platform. An automated low-cost 3-D motion control platform is presented that has the capacity to perform scientific applications, including (1) 3-D printing of scientific hardware; (2) laboratory auto-stirring, measuring, and probing; (3) automated fluid handling; and (4) shaking and mixing. The open-source 3-D platform not only facilities routine research while radically reducing the cost, but also inspires the creation of a diverse array of custom instruments that can be shared and replicated digitally throughout the world to drive down the cost of research and education further. © 2016 Society for Laboratory Automation and Screening.
Relating design and environmental variables to reliability

NASA Astrophysics Data System (ADS)

Kolarik, William J.; Landers, Thomas L.

The combination of space application and nuclear power source demands high reliability hardware. The possibilities of failure, either an inability to provide power or a catastrophic accident, must be minimized. Nuclear power experiences on the ground have led to highly sophisticated probabilistic risk assessment procedures, most of which require quantitative information to adequately assess such risks. In the area of hardware risk analysis, reliability information plays a key role. One of the lessons learned from the Three Mile Island experience is that thorough analyses of critical components are essential. Nuclear grade equipment shows some reliability advantages over commercial. However, no statistically significant difference has been found. A recent study pertaining to spacecraft electronics reliability, examined some 2500 malfunctions on more than 300 aircraft. The study classified the equipment failures into seven general categories. Design deficiencies and lack of environmental protection accounted for about half of all failures. Within each class, limited reliability modeling was performed using a Weibull failure model.
Multi-user Droplet Combustion Apparatus (MDCA) Hardware Replacement

NASA Image and Video Library

2013-10-02

ISS037-E-004956 (2 Oct. 2013) --- NASA astronaut Karen Nyberg, Expedition 37 flight engineer, performs the Multi-user Droplet Combustion Apparatus (MDCA) hardware replacement in the Harmony node of the International Space Station.
Multi-user Droplet Combustion Apparatus (MDCA) Hardware Replacement

NASA Image and Video Library

2013-10-02

ISS037-E-004959 (2 Oct. 2013) --- NASA astronaut Karen Nyberg, Expedition 37 flight engineer, performs the Multi-user Droplet Combustion Apparatus (MDCA) hardware replacement in the Harmony node of the International Space Station.
Scalable parallel communications

NASA Technical Reports Server (NTRS)

Maly, K.; Khanna, S.; Overstreet, C. M.; Mukkamala, R.; Zubair, M.; Sekhar, Y. S.; Foudriat, E. C.

1992-01-01

Coarse-grain parallelism in networking (that is, the use of multiple protocol processors running replicated software sending over several physical channels) can be used to provide gigabit communications for a single application. Since parallel network performance is highly dependent on real issues such as hardware properties (e.g., memory speeds and cache hit rates), operating system overhead (e.g., interrupt handling), and protocol performance (e.g., effect of timeouts), we have performed detailed simulations studies of both a bus-based multiprocessor workstation node (based on the Sun Galaxy MP multiprocessor) and a distributed-memory parallel computer node (based on the Touchstone DELTA) to evaluate the behavior of coarse-grain parallelism. Our results indicate: (1) coarse-grain parallelism can deliver multiple 100 Mbps with currently available hardware platforms and existing networking protocols (such as Transmission Control Protocol/Internet Protocol (TCP/IP) and parallel Fiber Distributed Data Interface (FDDI) rings); (2) scale-up is near linear in n, the number of protocol processors, and channels (for small n and up to a few hundred Mbps); and (3) since these results are based on existing hardware without specialized devices (except perhaps for some simple modifications of the FDDI boards), this is a low cost solution to providing multiple 100 Mbps on current machines. In addition, from both the performance analysis and the properties of these architectures, we conclude: (1) multiple processors providing identical services and the use of space division multiplexing for the physical channels can provide better reliability than monolithic approaches (it also provides graceful degradation and low-cost load balancing); (2) coarse-grain parallelism supports running several transport protocols in parallel to provide different types of service (for example, one TCP handles small messages for many users, other TCP's running in parallel provide high bandwidth service to a single application); and (3) coarse grain parallelism will be able to incorporate many future improvements from related work (e.g., reduced data movement, fast TCP, fine-grain parallelism) also with near linear speed-ups.
On the use of programmable hardware and reduced numerical precision in earth-system modeling.

PubMed

Düben, Peter D; Russell, Francis P; Niu, Xinyu; Luk, Wayne; Palmer, T N

2015-09-01

Programmable hardware, in particular Field Programmable Gate Arrays (FPGAs), promises a significant increase in computational performance for simulations in geophysical fluid dynamics compared with CPUs of similar power consumption. FPGAs allow adjusting the representation of floating-point numbers to specific application needs. We analyze the performance-precision trade-off on FPGA hardware for the two-scale Lorenz '95 model. We scale the size of this toy model to that of a high-performance computing application in order to make meaningful performance tests. We identify the minimal level of precision at which changes in model results are not significant compared with a maximal precision version of the model and find that this level is very similar for cases where the model is integrated for very short or long intervals. It is therefore a useful approach to investigate model errors due to rounding errors for very short simulations (e.g., 50 time steps) to obtain a range for the level of precision that can be used in expensive long-term simulations. We also show that an approach to reduce precision with increasing forecast time, when model errors are already accumulated, is very promising. We show that a speed-up of 1.9 times is possible in comparison to FPGA simulations in single precision if precision is reduced with no strong change in model error. The single-precision FPGA setup shows a speed-up of 2.8 times in comparison to our model implementation on two 6-core CPUs for large model setups.
Interfacing a high performance disk array file server to a Gigabit LAN

NASA Technical Reports Server (NTRS)

Seshan, Srinivasan; Katz, Randy H.

1993-01-01

Our previous prototype, RAID-1, identified several bottlenecks in typical file server architectures. The most important bottleneck was the lack of a high-bandwidth path between disk, memory, and the network. Workstation servers, such as the Sun-4/280, have very slow access to peripherals on busses far from the CPU. For the RAID-2 system, we addressed this problem by designing a crossbar interconnect, Xbus board, that provides a 40MB/s path between disk, memory, and the network interfaces. However, this interconnect does not provide the system CPU with low latency access to control the various interfaces. To provide a high data rate to clients on the network, we were forced to carefully and efficiently design the network software. A block diagram of the system hardware architecture is given. In the following subsections, we describe pieces of the RAID-2 file server hardware that had a significant impact on the design of the network interface.
VME rollback hardware for time warp multiprocessor systems

NASA Technical Reports Server (NTRS)

Robb, Michael J.; Buzzell, Calvin A.

1992-01-01

The purpose of the research effort is to develop and demonstrate innovative hardware to implement specific rollback and timing functions required for efficient queue management and precision timekeeping in multiprocessor discrete event simulations. The previously completed phase 1 effort demonstrated the technical feasibility of building hardware modules which eliminate the state saving overhead of the Time Warp paradigm used in distributed simulations on multiprocessor systems. The current phase 2 effort will build multiple pre-production rollback hardware modules integrated with a network of Sun workstations, and the integrated system will be tested by executing a Time Warp simulation. The rollback hardware will be designed to interface with the greatest number of multiprocessor systems possible. The authors believe that the rollback hardware will provide for significant speedup of large scale discrete event simulation problems and allow multiprocessors using Time Warp to dramatically increase performance.
Probabilistic performance-based design for high performance control systems

NASA Astrophysics Data System (ADS)

Micheli, Laura; Cao, Liang; Gong, Yongqiang; Cancelli, Alessandro; Laflamme, Simon; Alipour, Alice

2017-04-01

High performance control systems (HPCS) are advanced damping systems capable of high damping performance over a wide frequency bandwidth, ideal for mitigation of multi-hazards. They include active, semi-active, and hybrid damping systems. However, HPCS are more expensive than typical passive mitigation systems, rely on power and hardware (e.g., sensors, actuators) to operate, and require maintenance. In this paper, a life cycle cost analysis (LCA) approach is proposed to estimate the economic benefit these systems over the entire life of the structure. The novelty resides in the life cycle cost analysis in the performance based design (PBD) tailored to multi-level wind hazards. This yields a probabilistic performance-based design approach for HPCS. Numerical simulations are conducted on a building located in Boston, MA. LCA are conducted for passive control systems and HPCS, and the concept of controller robustness is demonstrated. Results highlight the promise of the proposed performance-based design procedure.
First incremental buy for Increment 2 of the Space Transportation System (STS)

NASA Technical Reports Server (NTRS)

1989-01-01

Thiokol manufactured and delivered 9 flight motors to KSC on schedule. All test flights were successful. All spent SRMs were recovered. Design, development, manufacture, and delivery of required transportation, handling, and checkout equipment to MSFC and to KSC were completed on schedule. All items of data required by DPD 400 were prepared and delivered as directed. In the system requirements and analysis area, the point of departure from Buy 1 to the operational phase was developed in significant detail with a complete set of transition documentation available. The documentation prepared during the Buy 1 program was maintained and updated where required. The following flight support activities should be continued through other production programs: as-built materials usage tracking on all flight hardware; mass properties reporting for all flight hardware until sample size is large enough to verify that the weight limit requirements were met; ballistic predictions and postflight performance assessments for all production flights; and recovered SRM hardware inspection and anomaly identification. In the safety, reliability, and quality assurance area, activities accomplished were assurance oriented in nature and specifically formulated to prevent problems and hardware failures. The flight program to date has adequately demonstrated the success of this assurance approach. The attention focused on details of design, analysis, manufacture, and inspection to assure the production of high-quality hardware has resulted in the absence of flight failures. The few anomalies which did occur were evaluated, design or manufacturing changes incorporated, and corrective actions taken to preclude recurrence.
Image processing system design for microcantilever-based optical readout infrared arrays

NASA Astrophysics Data System (ADS)

Tong, Qiang; Dong, Liquan; Zhao, Yuejin; Gong, Cheng; Liu, Xiaohua; Yu, Xiaomei; Yang, Lei; Liu, Weiyu

2012-12-01

Compared with the traditional infrared imaging technology, the new type of optical-readout uncooled infrared imaging technology based on MEMS has many advantages, such as low cost, small size, producing simple. In addition, the theory proves that the technology's high thermal detection sensitivity. So it has a very broad application prospects in the field of high performance infrared detection. The paper mainly focuses on an image capturing and processing system in the new type of optical-readout uncooled infrared imaging technology based on MEMS. The image capturing and processing system consists of software and hardware. We build our image processing core hardware platform based on TI's high performance DSP chip which is the TMS320DM642, and then design our image capturing board based on the MT9P031. MT9P031 is Micron's company high frame rate, low power consumption CMOS chip. Last we use Intel's company network transceiver devices-LXT971A to design the network output board. The software system is built on the real-time operating system DSP/BIOS. We design our video capture driver program based on TI's class-mini driver and network output program based on the NDK kit for image capturing and processing and transmitting. The experiment shows that the system has the advantages of high capturing resolution and fast processing speed. The speed of the network transmission is up to 100Mbps.
A design methodology for portable software on parallel computers

NASA Technical Reports Server (NTRS)

Nicol, David M.; Miller, Keith W.; Chrisman, Dan A.

1993-01-01

This final report for research that was supported by grant number NAG-1-995 documents our progress in addressing two difficulties in parallel programming. The first difficulty is developing software that will execute quickly on a parallel computer. The second difficulty is transporting software between dissimilar parallel computers. In general, we expect that more hardware-specific information will be included in software designs for parallel computers than in designs for sequential computers. This inclusion is an instance of portability being sacrificed for high performance. New parallel computers are being introduced frequently. Trying to keep one's software on the current high performance hardware, a software developer almost continually faces yet another expensive software transportation. The problem of the proposed research is to create a design methodology that helps designers to more precisely control both portability and hardware-specific programming details. The proposed research emphasizes programming for scientific applications. We completed our study of the parallelizability of a subsystem of the NASA Earth Radiation Budget Experiment (ERBE) data processing system. This work is summarized in section two. A more detailed description is provided in Appendix A ('Programming Practices to Support Eventual Parallelism'). Mr. Chrisman, a graduate student, wrote and successfully defended a Ph.D. dissertation proposal which describes our research associated with the issues of software portability and high performance. The list of research tasks are specified in the proposal. The proposal 'A Design Methodology for Portable Software on Parallel Computers' is summarized in section three and is provided in its entirety in Appendix B. We are currently studying a proposed subsystem of the NASA Clouds and the Earth's Radiant Energy System (CERES) data processing system. This software is the proof-of-concept for the Ph.D. dissertation. We have implemented and measured the performance of a portion of this subsystem on the Intel iPSC/2 parallel computer. These results are provided in section four. Our future work is summarized in section five, our acknowledgements are stated in section six, and references for published papers associated with NAG-1-995 are provided in section seven.
Automation of checkout for the shuttle operations era

NASA Technical Reports Server (NTRS)

Anderson, J. A.; Hendrickson, K. O.

1985-01-01

The Space Shuttle checkout is different from its Apollo predecessor. The complexity of the hardware, the shortened turnaround time, and the software that performs ground checkout are outlined. Generating new techniques and standards for software development and the management structure to control it are implemented. The utilization of computer systems for vehicle testing is high lighted.
High-Performance Computing Opportunities and Challenges for Army R&D

DTIC Science & Technology

2006-01-01

39 6.5. P(k) for E . Coli ...............................................................................40 ix Tables 2.1. Hardware... vaccines and immune enhancements for expeditionary warfare and home- land security (including gene vaccines , edible vaccines , and radioprotective...of a bacterium—such as the ubiquitous E . Coli . However, at least the intention exists to extend this research to more complicated “eukaryotic”5
Relation of Parallel Discrete Event Simulation algorithms with physical models

NASA Astrophysics Data System (ADS)

Shchur, L. N.; Shchur, L. V.

2015-09-01

We extend concept of local simulation times in parallel discrete event simulation (PDES) in order to take into account architecture of the current hardware and software in high-performance computing. We shortly review previous research on the mapping of PDES on physical problems, and emphasise how physical results may help to predict parallel algorithms behaviour.
The California All-sky Meteor Surveillance (CAMS) System

NASA Astrophysics Data System (ADS)

Gural, P. S.

2011-01-01

A unique next generation multi-camera, multi-site video meteor system is being developed and deployed in California to provide high accuracy orbits of simultaneously captured meteors. Included herein is a description of the goals, concept of operations, hardware, and software development progress. An appendix contains a meteor camera performance trade study made for video systems circa 2010.
An automated, high-throughput plant phenotyping system using machine learning-based plant segmentation and image analysis.

PubMed

Lee, Unseok; Chang, Sungyul; Putra, Gian Anantrio; Kim, Hyoungseok; Kim, Dong Hwan

2018-01-01

A high-throughput plant phenotyping system automatically observes and grows many plant samples. Many plant sample images are acquired by the system to determine the characteristics of the plants (populations). Stable image acquisition and processing is very important to accurately determine the characteristics. However, hardware for acquiring plant images rapidly and stably, while minimizing plant stress, is lacking. Moreover, most software cannot adequately handle large-scale plant imaging. To address these problems, we developed a new, automated, high-throughput plant phenotyping system using simple and robust hardware, and an automated plant-imaging-analysis pipeline consisting of machine-learning-based plant segmentation. Our hardware acquires images reliably and quickly and minimizes plant stress. Furthermore, the images are processed automatically. In particular, large-scale plant-image datasets can be segmented precisely using a classifier developed using a superpixel-based machine-learning algorithm (Random Forest), and variations in plant parameters (such as area) over time can be assessed using the segmented images. We performed comparative evaluations to identify an appropriate learning algorithm for our proposed system, and tested three robust learning algorithms. We developed not only an automatic analysis pipeline but also a convenient means of plant-growth analysis that provides a learning data interface and visualization of plant growth trends. Thus, our system allows end-users such as plant biologists to analyze plant growth via large-scale plant image data easily.

Composite Structural Materials

NASA Technical Reports Server (NTRS)

Ansell, G. S.; Loewy, R. G.; Wiberly, S. E.

1984-01-01

The development and application of filamentary composite materials, is considered. Such interest is based on the possibility of using relatively brittle materials with high modulus, high strength, but low density in composites with good durability and high tolerance to damage. Fiber reinforced composite materials of this kind offer substantially improved performance and potentially lower costs for aerospace hardware. Much progress has been made since the initial developments in the mid 1960's. There were only limited applied to the primary structure of operational vehicles, mainly as aircrafts.
Container System Hardware Status Report

DTIC Science & Technology

1986-01-01

includes the proureentofeight SL-7 class high - speed containerships and their Subsequent conversion to a cargo configuration specifically designed for...wide, 53.5-in high , 242-in long, and Weighs 4,000 lbs. The MILVAN chassis were competitively procured from incustry utilizing a performance military...accept load transfer from a cargo ship and equipped with a ramp for Roll On/Roll Off (RO/RO) discharge systems. The LAMP-H will :1replace the LARC-LX
Project UNITY: Cross Domain Visualization Collaboration

DTIC Science & Technology

2015-10-18

location is at the Space Operations Coordination Center (UK-SPOCC) in High Wycombe, UK. Identical AFRL-developed ErgoWorkstations (see Figure 2) were...installed in both locations. The AFRL ErgoWorkstation is made up of a high performance Windows-based PC with three displays: two 30” Dell Cinema ...system can be seen in Figure 1. The intent of using identical hardware is to minimize complexity, to simplify debugging, and to provide an opportunity
Memory Benchmarks for SMP-Based High Performance Parallel Computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yoo, A B; de Supinski, B; Mueller, F

2001-11-20

As the speed gap between CPU and main memory continues to grow, memory accesses increasingly dominates the performance of many applications. The problem is particularly acute for symmetric multiprocessor (SMP) systems, where the shared memory may be accessed concurrently by a group of threads running on separate CPUs. Unfortunately, several key issues governing memory system performance in current systems are not well understood. Complex interactions between the levels of the memory hierarchy, buses or switches, DRAM back-ends, system software, and application access patterns can make it difficult to pinpoint bottlenecks and determine appropriate optimizations, and the situation is even moremore » complex for SMP systems. To partially address this problem, we formulated a set of multi-threaded microbenchmarks for characterizing and measuring the performance of the underlying memory system in SMP-based high-performance computers. We report our use of these microbenchmarks on two important SMP-based machines. This paper has four primary contributions. First, we introduce a microbenchmark suite to systematically assess and compare the performance of different levels in SMP memory hierarchies. Second, we present a new tool based on hardware performance monitors to determine a wide array of memory system characteristics, such as cache sizes, quickly and easily; by using this tool, memory performance studies can be targeted to the full spectrum of performance regimes with many fewer data points than is otherwise required. Third, we present experimental results indicating that the performance of applications with large memory footprints remains largely constrained by memory. Fourth, we demonstrate that thread-level parallelism further degrades memory performance, even for the latest SMPs with hardware prefetching and switch-based memory interconnects.« less
Asynchronous Replica Exchange Software for Grid and Heterogeneous Computing.

PubMed

Gallicchio, Emilio; Xia, Junchao; Flynn, William F; Zhang, Baofeng; Samlalsingh, Sade; Mentes, Ahmet; Levy, Ronald M

2015-11-01

Parallel replica exchange sampling is an extended ensemble technique often used to accelerate the exploration of the conformational ensemble of atomistic molecular simulations of chemical systems. Inter-process communication and coordination requirements have historically discouraged the deployment of replica exchange on distributed and heterogeneous resources. Here we describe the architecture of a software (named ASyncRE) for performing asynchronous replica exchange molecular simulations on volunteered computing grids and heterogeneous high performance clusters. The asynchronous replica exchange algorithm on which the software is based avoids centralized synchronization steps and the need for direct communication between remote processes. It allows molecular dynamics threads to progress at different rates and enables parameter exchanges among arbitrary sets of replicas independently from other replicas. ASyncRE is written in Python following a modular design conducive to extensions to various replica exchange schemes and molecular dynamics engines. Applications of the software for the modeling of association equilibria of supramolecular and macromolecular complexes on BOINC campus computational grids and on the CPU/MIC heterogeneous hardware of the XSEDE Stampede supercomputer are illustrated. They show the ability of ASyncRE to utilize large grids of desktop computers running the Windows, MacOS, and/or Linux operating systems as well as collections of high performance heterogeneous hardware devices.
Lunar base heat pump

NASA Technical Reports Server (NTRS)

Goldman, Jeffrey H.; Tetreault, R.; Fischbach, D.; Walker, D.

1994-01-01

A heat pump is a device which elevates the temperature of a heat flow by a means of an energy input. By doing this, the heat pump can cause heat to transfer faster from a warm region to a cool region, or it can cause heat to flow from a cool region to a warmer region. The second case is the one which finds vast commercial applications such as air conditioning, heating, and refrigeration. Aerospace applications of heat pumps include both cases. The NASA Johnson Space Center is currently developing a Life Support Systems Integration Facility (LSSIF, previously SIRF) to provide system-level integration, operational test experience, and performance data that will enable NASA to develop flight-certified hardware for future planetary missions. A high lift heat pump is a significant part of the TCS hardware development associated with the LSSIF. The high lift heat pump program discussed here is being performed in three phases. In Phase 1, the objective is to develop heat pump concepts for a lunar base, a lunar lander, and for a ground development unit for the SIRF. In Phase 2, the design of the SIRF ground test unit is being performed, including identification and evaluation of safety and reliability issues. In Phase 3, the SIRF unit will be manufactured, tested, and delivered to the NASA Johnson Space Center.
Aerodynamics and Control of Quadrotors

NASA Astrophysics Data System (ADS)

Bangura, Moses

Quadrotors are aerial vehicles with a four motor-rotor assembly for generating lift and controllability. Their light weight, ease of design and simple dynamics have increased their use in aerial robotics research. There are many quadrotors that are commercially available or under development. Commercial off-the-shelf quadrotors usually lack the ability to be reprogrammed and are unsuitable for use as research platforms. The open-source code developed in this thesis differs from other open-source systems by focusing on the key performance road blocks in implementing high performance experimental quadrotor platforms for research: motor-rotor control for thrust regulation, velocity and attitude estimation, and control for position regulation and trajectory tracking. In all three of these fundamental subsystems, code sub modules for implementation on commonly available hardware are provided. In addition, the thesis provides guidance on scoping and commissioning open-source hardware components to build a custom quadrotor. A key contribution of the thesis is then a design methodology for the development of experimental quadrotor platforms from open-source or commercial off-the-shelf software and hardware components that have active community support. Quadrotors built following the methodology allows the user access to the operation of the subsystems and, in particular, the user can tune the gains of the observers and controllers in order to push the overall system to its performance limits. This enables the quadrotor framework to be used for a variety of applications such as heavy lifting and high performance aggressive manoeuvres by both the hobby and academic communities. To address the question of thrust control, momentum and blade element theories are used to develop aerodynamic models for rotor blades specific to quadrotors. With the aerodynamic models, a novel thrust estimation and control scheme that improves on existing RPM (revolutions per minute) control of rotors is proposed. The approach taken uses the measured electrical power into the rotors compensating for electrical loses, to estimate changing aerodynamic conditions around a rotor as well as the aerodynamic thrust force. The resulting control algorithms are implemented in real-time on the embedded electronic speed controller (ESC) hardware. Using the estimates of the aerodynamic conditions around the rotor at this level improves the dynamic response to gust as the low-level thrust control is the fastest dynamic level on the vehicle. The aerodynamic estimation scheme enables the vehicle to react almost instantaneously to aerodynamic changes in the environment without affecting the overall dynamic performance of the vehicle. (Abstract shortened by ProQuest.).
Mobile high-performance computing (HPC) for synthetic aperture radar signal processing

NASA Astrophysics Data System (ADS)

Misko, Joshua; Kim, Youngsoo; Qi, Chenchen; Sirkeci, Birsen

2018-04-01

The importance of mobile high-performance computing has emerged in numerous battlespace applications at the tactical edge in hostile environments. Energy efficient computing power is a key enabler for diverse areas ranging from real-time big data analytics and atmospheric science to network science. However, the design of tactical mobile data centers is dominated by power, thermal, and physical constraints. Presently, it is very unlikely to achieve required computing processing power by aggregating emerging heterogeneous many-core processing platforms consisting of CPU, Field Programmable Gate Arrays and Graphic Processor cores constrained by power and performance. To address these challenges, we performed a Synthetic Aperture Radar case study for Automatic Target Recognition (ATR) using Deep Neural Networks (DNNs). However, these DNN models are typically trained using GPUs with gigabytes of external memories and massively used 32-bit floating point operations. As a result, DNNs do not run efficiently on hardware appropriate for low power or mobile applications. To address this limitation, we proposed for compressing DNN models for ATR suited to deployment on resource constrained hardware. This proposed compression framework utilizes promising DNN compression techniques including pruning and weight quantization while also focusing on processor features common to modern low-power devices. Following this methodology as a guideline produced a DNN for ATR tuned to maximize classification throughput, minimize power consumption, and minimize memory footprint on a low-power device.
Hardware-Efficient On-line Learning through Pipelined Truncated-Error Backpropagation in Binary-State Networks

PubMed Central

Mostafa, Hesham; Pedroni, Bruno; Sheik, Sadique; Cauwenberghs, Gert

2017-01-01

Artificial neural networks (ANNs) trained using backpropagation are powerful learning architectures that have achieved state-of-the-art performance in various benchmarks. Significant effort has been devoted to developing custom silicon devices to accelerate inference in ANNs. Accelerating the training phase, however, has attracted relatively little attention. In this paper, we describe a hardware-efficient on-line learning technique for feedforward multi-layer ANNs that is based on pipelined backpropagation. Learning is performed in parallel with inference in the forward pass, removing the need for an explicit backward pass and requiring no extra weight lookup. By using binary state variables in the feedforward network and ternary errors in truncated-error backpropagation, the need for any multiplications in the forward and backward passes is removed, and memory requirements for the pipelining are drastically reduced. Further reduction in addition operations owing to the sparsity in the forward neural and backpropagating error signal paths contributes to highly efficient hardware implementation. For proof-of-concept validation, we demonstrate on-line learning of MNIST handwritten digit classification on a Spartan 6 FPGA interfacing with an external 1Gb DDR2 DRAM, that shows small degradation in test error performance compared to an equivalently sized binary ANN trained off-line using standard back-propagation and exact errors. Our results highlight an attractive synergy between pipelined backpropagation and binary-state networks in substantially reducing computation and memory requirements, making pipelined on-line learning practical in deep networks. PMID:28932180
Hardware-Efficient On-line Learning through Pipelined Truncated-Error Backpropagation in Binary-State Networks.

PubMed

Mostafa, Hesham; Pedroni, Bruno; Sheik, Sadique; Cauwenberghs, Gert

2017-01-01

Artificial neural networks (ANNs) trained using backpropagation are powerful learning architectures that have achieved state-of-the-art performance in various benchmarks. Significant effort has been devoted to developing custom silicon devices to accelerate inference in ANNs. Accelerating the training phase, however, has attracted relatively little attention. In this paper, we describe a hardware-efficient on-line learning technique for feedforward multi-layer ANNs that is based on pipelined backpropagation. Learning is performed in parallel with inference in the forward pass, removing the need for an explicit backward pass and requiring no extra weight lookup. By using binary state variables in the feedforward network and ternary errors in truncated-error backpropagation, the need for any multiplications in the forward and backward passes is removed, and memory requirements for the pipelining are drastically reduced. Further reduction in addition operations owing to the sparsity in the forward neural and backpropagating error signal paths contributes to highly efficient hardware implementation. For proof-of-concept validation, we demonstrate on-line learning of MNIST handwritten digit classification on a Spartan 6 FPGA interfacing with an external 1Gb DDR2 DRAM, that shows small degradation in test error performance compared to an equivalently sized binary ANN trained off-line using standard back-propagation and exact errors. Our results highlight an attractive synergy between pipelined backpropagation and binary-state networks in substantially reducing computation and memory requirements, making pipelined on-line learning practical in deep networks.
Independent Orbiter Assessment (IOA): Analysis of the electrical power generation/fuel cell powerplant subsystem

NASA Technical Reports Server (NTRS)

Brown, K. L.; Bertsch, P. J.

1986-01-01

Results of the Independent Orbiter Assessment (IOA) of the Failure Modes and Effects Analysis (FMEA) and Critical Items List (CIL) are presented. The IOA approach features a top-down analysis of the hardware to determine failure modes, criticality, and potential critical items. To preserve independence, this analysis was accomplished without reliance upon the results contained within the NASA FMEA/CIL documentation. This report documents the independent analysis results corresponding to the Orbiter Electrical Power Generation (EPG)/Fuel Cell Powerplant (FCP) hardware. The EPG/FCP hardware is required for performing functions of electrical power generation and product water distribution in the Orbiter. Specifically, the EPG/FCP hardware consists of the following divisions: (1) Power Section Assembly (PSA); (2) Reactant Control Subsystem (RCS); (3) Thermal Control Subsystem (TCS); and (4) Water Removal Subsystem (WRS). The IOA analysis process utilized available EPG/FCP hardware drawings and schematics for defining hardware assemblies, components, and hardware items. Each level of hardware was evaluated and analyzed for possible failure modes and effects. Criticality was assigned based upon the severity of the effect for each failure mode.
Compressive Sensing Image Sensors-Hardware Implementation

PubMed Central

Dadkhah, Mohammadreza; Deen, M. Jamal; Shirani, Shahram

2013-01-01

The compressive sensing (CS) paradigm uses simultaneous sensing and compression to provide an efficient image acquisition technique. The main advantages of the CS method include high resolution imaging using low resolution sensor arrays and faster image acquisition. Since the imaging philosophy in CS imagers is different from conventional imaging systems, new physical structures have been developed for cameras that use the CS technique. In this paper, a review of different hardware implementations of CS encoding in optical and electrical domains is presented. Considering the recent advances in CMOS (complementary metal–oxide–semiconductor) technologies and the feasibility of performing on-chip signal processing, important practical issues in the implementation of CS in CMOS sensors are emphasized. In addition, the CS coding for video capture is discussed. PMID:23584123
Monitoring and detection platform to prevent anomalous situations in home care.

PubMed

Villarrubia, Gabriel; Bajo, Javier; De Paz, Juan F; Corchado, Juan M

2014-06-05

Monitoring and tracking people at home usually requires high cost hardware installations, which implies they are not affordable in many situations. This study/paper proposes a monitoring and tracking system for people with medical problems. A virtual organization of agents based on the PANGEA platform, which allows the easy integration of different devices, was created for this study. In this case, a virtual organization was implemented to track and monitor patients carrying a Holter monitor. The system includes the hardware and software required to perform: ECG measurements, monitoring through accelerometers and WiFi networks. Furthermore, the use of interactive television can moderate interactivity with the user. The system makes it possible to merge the information and facilitates patient tracking efficiently with low cost.
FPGA-based real-time phase measuring profilometry algorithm design and implementation

NASA Astrophysics Data System (ADS)

Zhan, Guomin; Tang, Hongwei; Zhong, Kai; Li, Zhongwei; Shi, Yusheng

2016-11-01

Phase measuring profilometry (PMP) has been widely used in many fields, like Computer Aided Verification (CAV), Flexible Manufacturing System (FMS) et al. High frame-rate (HFR) real-time vision-based feedback control will be a common demands in near future. However, the instruction time delay in the computer caused by numerous repetitive operations greatly limit the efficiency of data processing. FPGA has the advantages of pipeline architecture and parallel execution, and it fit for handling PMP algorithm. In this paper, we design a fully pipelined hardware architecture for PMP. The functions of hardware architecture includes rectification, phase calculation, phase shifting, and stereo matching. The experiment verified the performance of this method, and the factors that may influence the computation accuracy was analyzed.
Method for star identification using neural networks

NASA Astrophysics Data System (ADS)

Lindsey, Clark S.; Lindblad, Thomas; Eide, Age J.

1997-04-01

Identification of star constellations with an onboard star tracker provides the highest precision of all attitude determination techniques for spacecraft. A method for identification of star constellations inspired by neural network (NNW) techniques is presented. It compares feature vectors derived from histograms of distances to multiple stars around the unknown star. The NNW method appears most robust with respect to position noise and would require a smaller database than conventional methods, especially for small fields of view. The neural network method is quite slow when performed on a sequential (serial) processor, but would provide very high speed if implemented in special hardware. Such hardware solutions could also yield lower low weight and low power consumption, both important features for small satellites.
Smart Cameras for Remote Science Survey

NASA Technical Reports Server (NTRS)

Thompson, David R.; Abbey, William; Allwood, Abigail; Bekker, Dmitriy; Bornstein, Benjamin; Cabrol, Nathalie A.; Castano, Rebecca; Estlin, Tara; Fuchs, Thomas; Wagstaff, Kiri L.

2012-01-01

Communication with remote exploration spacecraft is often intermittent and bandwidth is highly constrained. Future missions could use onboard science data understanding to prioritize downlink of critical features [1], draft summary maps of visited terrain [2], or identify targets of opportunity for followup measurements [3]. We describe a generic approach to classify geologic surfaces for autonomous science operations, suitable for parallelized implementations in FPGA hardware. We map these surfaces with texture channels - distinctive numerical signatures that differentiate properties such as roughness, pavement coatings, regolith characteristics, sedimentary fabrics and differential outcrop weathering. This work describes our basic image analysis approach and reports an initial performance evaluation using surface images from the Mars Exploration Rovers. Future work will incorporate these methods into camera hardware for real-time processing.
Hardware Evolution of Analog Speed Controllers for a DC Motor

NASA Technical Reports Server (NTRS)

Gwaltney, David A.; Ferguson, Michael I.

2003-01-01

Evolvable hardware provides the capability to evolve analog circuits to produce amplifier and filter functions. Conventional analog controller designs employ these same functions. Analog controllers for the control of the shaft speed of a DC motor are evolved on an evolvable hardware platform utilizing a Field Programmable Transistor Array (FPTA). The performance of these evolved controllers is compared to that of a conventional proportional-integral (PI) controller.
An iterative approach to region growing using associative memories

NASA Technical Reports Server (NTRS)

Snyder, W. E.; Cowart, A.

1983-01-01

Region growing, often given as a classical example of the recursive control structures used in image processing which are often awkward to implement in hardware where the intent is the segmentation of an image at raster scan rates, is addressed in light of the postulate that any computation which can be performed recursively can be performed easily and efficiently by iteration coupled with association. Attention is given to an algorithm and hardware structure able to perform region labeling iteratively at scan rates. Every pixel is individually labeled with an identifier which signifies the region to which it belongs. Difficulties otherwise requiring recursion are handled by maintaining an equivalence table in hardware transparent to the computer, which reads the labeled pixels. A simulation of the associative memory has demonstrated its effectiveness.
Independent Orbiter Assessment (IOA): Analysis of the landing/deceleration subsystem

NASA Technical Reports Server (NTRS)

Compton, J. M.; Beaird, H. G.; Weissinger, W. D.

1987-01-01

The results of the Independent Orbiter Assessment (IOA) of the Failure Modes and Effects Analysis (FMEA) and Critical Items List (CIL) are presented. The IOA approach features a top-down analysis of the hardware to determine failure modes, criticality, and potential critical items. To preserve independence, this analysis was accomplished without reliance upon the results contained within the NASA FMEA/CIL documentation. This report documents the independent analysis results corresponding to the Orbiter Landing/Deceleration Subsystem hardware. The Landing/Deceleration Subsystem is utilized to allow the Orbiter to perform a safe landing, allowing for landing-gear deploy activities, steering and braking control throughout the landing rollout to wheel-stop, and to allow for ground-handling capability during the ground-processing phase of the flight cycle. Specifically, the Landing/Deceleration hardware consists of the following components: Nose Landing Gear (NLG); Main Landing Gear (MLG); Brake and Antiskid (B and AS) Electrical Power Distribution and Controls (EPD and C); Nose Wheel Steering (NWS); and Hydraulics Actuators. Each level of hardware was evaluated and analyzed for possible failure modes and effects. Criticality was assigned based upon the severity of the effect for each failure mode. Due to the lack of redundancy in the Landing/Deceleration Subsystems there is a high number of critical items.
Malleable architecture generator for FPGA computing

NASA Astrophysics Data System (ADS)

Gokhale, Maya; Kaba, James; Marks, Aaron; Kim, Jang

1996-10-01

The malleable architecture generator (MARGE) is a tool set that translates high-level parallel C to configuration bit streams for field-programmable logic based computing systems. MARGE creates an application-specific instruction set and generates the custom hardware components required to perform exactly those computations specified by the C program. In contrast to traditional fixed-instruction processors, MARGE's dynamic instruction set creation provides for efficient use of hardware resources. MARGE processes intermediate code in which each operation is annotated by the bit lengths of the operands. Each basic block (sequence of straight line code) is mapped into a single custom instruction which contains all the operations and logic inherent in the block. A synthesis phase maps the operations comprising the instructions into register transfer level structural components and control logic which have been optimized to exploit functional parallelism and function unit reuse. As a final stage, commercial technology-specific tools are used to generate configuration bit streams for the desired target hardware. Technology- specific pre-placed, pre-routed macro blocks are utilized to implement as much of the hardware as possible. MARGE currently supports the Xilinx-based Splash-2 reconfigurable accelerator and National Semiconductor's CLAy-based parallel accelerator, MAPA. The MARGE approach has been demonstrated on systolic applications such as DNA sequence comparison.

High-fidelity real-time maritime scene rendering

NASA Astrophysics Data System (ADS)

Shyu, Hawjye; Taczak, Thomas M.; Cox, Kevin; Gover, Robert; Maraviglia, Carlos; Cahill, Colin

2011-06-01

The ability to simulate authentic engagements using real-world hardware is an increasingly important tool. For rendering maritime environments, scene generators must be capable of rendering radiometrically accurate scenes with correct temporal and spatial characteristics. When the simulation is used as input to real-world hardware or human observers, the scene generator must operate in real-time. This paper introduces a novel, real-time scene generation capability for rendering radiometrically accurate scenes of backgrounds and targets in maritime environments. The new model is an optimized and parallelized version of the US Navy CRUISE_Missiles rendering engine. It was designed to accept environmental descriptions and engagement geometry data from external sources, render a scene, transform the radiometric scene using the electro-optical response functions of a sensor under test, and output the resulting signal to real-world hardware. This paper reviews components of the scene rendering algorithm, and details the modifications required to run this code in real-time. A description of the simulation architecture and interfaces to external hardware and models is presented. Performance assessments of the frame rate and radiometric accuracy of the new code are summarized. This work was completed in FY10 under Office of Secretary of Defense (OSD) Central Test and Evaluation Investment Program (CTEIP) funding and will undergo a validation process in FY11.
Protein Crystal Growth With the Aid of Microfluidics

NASA Technical Reports Server (NTRS)

vanderWoerd, Mark

2003-01-01

Protein crystallography is one of three well-known methods to obtain the structure of proteins. A major rate limiting step in protein crystallography is protein crystal nucleation and growth, which is still largely a process conducted by trial-and-error methods. Many attempts have been made to improve protein crystal growth by performing growth in microgravity. Although the use of microgravity appears to improve crystal quality in some attempts, this method has been inefficient because several reasons: we lack a fundamental understanding of macromolecular crystal growth in general and of the influence of microgravity in particular, we have to start with crystal growth conditions in microgravity based on conditions on the ground and finally the hardware does not allow for experimental iteration without reloading samples on the ground. To partially accommodate the disadvantages of the current hardware, we have used microfluidic technology (Lab-on-a-Chip devices) to design the concept of a more efficient crystallization device, suitable for use on the International Space Station and in high-throughput applications on the ground. The concept and properties of microfluidics, the application design process, and the advances in protein crystal growth hardware will be discussed in this presentation. Some examples of proteins crystallized in the new hardware will be discussed, including the differences between conventional crystallization versus crystallization in microfluidics.
Postflight hardware evaluation 360T026 (RSRM-26, STS-47)

NASA Technical Reports Server (NTRS)

Nielson, Greg

1993-01-01

The final report for the Clearfield disassembly evaluation and a continuation of the KSC postflight assessment for the 360T026 (STS-47) Redesigned Solid Rocket Motor (RSRM) flight set is provided. All observed hardware conditions were documented on PFOR's and are included in Appendices A, B, and C. Appendices D and E contain the measurements and safety factor data for the nozzle and insulation components. This report, along with the KSC Ten-Day Postflight Hardware Evaluation Report (TWR-64203), represents a summary of the 360T026 hardware evaluation. The as-flown hardware configuration is documented in TWR-60472. Disassembly evaluation photograph numbers are logged in TWA-1987. The 360T026 flight set disassembly evaluations described were performed at the RSRM Refurbishment Facility in Clearfield, Utah. The final factory joint demate occurred on 12 April 1993. Detailed evaluations were performed in accordance with the Clearfield Postflight Engineering Evaluation Plan (PEEP), TWR-50051, Revision A. All observations were compared against limits that are also defined in the PEEP. These limits outline the criteria for categorizing the observations as acceptable, reportable, or critical. Hardware conditions that were unexpected and/or determined to be reportable or critical were evaluated by the applicable CPT and tracked through the PFAR system.
Final postflight hardware evaluation report RSRM-32 (STS-57)

NASA Technical Reports Server (NTRS)

Nielson, Greg

1993-01-01

This document is the final report for the postflight assessment of the RSRM-32 (STS-57) flight set. This report presents the disassembly evaluations performed at the Thiokol facilities in Utah and is a continuation of the evaluations performed at KSC (TWR-64239). The PEEP for this assessment is outlined in TWR-50051, Revision B. The PEEP defines the requirements for evaluating RSRM hardware. Special hardware issues pertaining to this flight set requiring additional or modified assessment are outlined in TWR-64237. All observed hardware conditions were documented on PFOR's which are included in Appendix A. Observations were compared against limits defined in the PEEP. Any observation that was categorized as reportable or had no defined limits was documented on a preliminary PFAR by the assessment engineers. Preliminary PFAR's were reviewed by the Thiokol SPAT Executive Board to determine if elevation to PFAR's was required.
Performance of the Extravehicular Mobility Unit (EMU) Airlock Coolant Loop Remediation (A/L CLR) Hardware - Final

NASA Technical Reports Server (NTRS)

Steele, John W.; Rector, Tony; Gazda, Daniel; Lewis, John

2011-01-01

An EMU water processing kit (Airlock Coolant Loop Recovery -- A/L CLR) was developed as a corrective action to Extravehicular Mobility Unit (EMU) coolant flow disruptions experienced on the International Space Station (ISS) in May of 2004 and thereafter. A conservative duty cycle and set of use parameters for A/L CLR use and component life were initially developed and implemented based on prior analysis results and analytical modeling. Several initiatives were undertaken to optimize the duty cycle and use parameters of the hardware. Examination of post-flight samples and EMU Coolant Loop hardware provided invaluable information on the performance of the A/L CLR and has allowed for an optimization of the process. The intent of this paper is to detail the evolution of the A/L CLR hardware, efforts to optimize the duty cycle and use parameters, and the final recommendations for implementation in the post-Shuttle retirement era.
Chemical calculations on Cray computers

NASA Technical Reports Server (NTRS)

Taylor, Peter R.; Bauschlicher, Charles W., Jr.; Schwenke, David W.

1989-01-01

The influence of recent developments in supercomputing on computational chemistry is discussed with particular reference to Cray computers and their pipelined vector/limited parallel architectures. After reviewing Cray hardware and software the performance of different elementary program structures are examined, and effective methods for improving program performance are outlined. The computational strategies appropriate for obtaining optimum performance in applications to quantum chemistry and dynamics are discussed. Finally, some discussion is given of new developments and future hardware and software improvements.
Los Alamos radiation transport code system on desktop computing platforms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Briesmeister, J.F.; Brinkley, F.W.; Clark, B.A.

The Los Alamos Radiation Transport Code System (LARTCS) consists of state-of-the-art Monte Carlo and discrete ordinates transport codes and data libraries. These codes were originally developed many years ago and have undergone continual improvement. With a large initial effort and continued vigilance, the codes are easily portable from one type of hardware to another. The performance of scientific work-stations (SWS) has evolved to the point that such platforms can be used routinely to perform sophisticated radiation transport calculations. As the personal computer (PC) performance approaches that of the SWS, the hardware options for desk-top radiation transport calculations expands considerably. Themore » current status of the radiation transport codes within the LARTCS is described: MCNP, SABRINA, LAHET, ONEDANT, TWODANT, TWOHEX, and ONELD. Specifically, the authors discuss hardware systems on which the codes run and present code performance comparisons for various machines.« less
Representation and matching of knowledge to design digital systems

NASA Technical Reports Server (NTRS)

Jones, J. U.; Shiva, S. G.

1988-01-01

A knowledge-based expert system is described that provides an approach to solve a problem requiring an expert with considerable domain expertise and facts about available digital hardware building blocks. To design digital hardware systems from their high level VHDL (Very High Speed Integrated Circuit Hardware Description Language) representation to their finished form, a special data representation is required. This data representation as well as the functioning of the overall system is described.
High-speed multiple sequence alignment on a reconfigurable platform.

PubMed

Oliver, Tim; Schmidt, Bertil; Maskell, Douglas; Nathan, Darran; Clemens, Ralf

2006-01-01

Progressive alignment is a widely used approach to compute multiple sequence alignments (MSAs). However, aligning several hundred sequences by popular progressive alignment tools requires hours on sequential computers. Due to the rapid growth of sequence databases biologists have to compute MSAs in a far shorter time. In this paper we present a new approach to MSA on reconfigurable hardware platforms to gain high performance at low cost. We have constructed a linear systolic array to perform pairwise sequence distance computations using dynamic programming. This results in an implementation with significant runtime savings on a standard FPGA.
High temperature solar receiver

NASA Technical Reports Server (NTRS)

1981-01-01

The development of a high temperature solar thermal receiver is described. A prototype receiver and associated test support (auxiliary) hardware was fabricated. Shakedown and initial performance tests of the prototype receiver were performed. Maximum outlet temperatures of 1600 F were achieved at 100% solar (70-75 kW) input power with 900 F inlet temperatures and a subsequent testing was concluded by a 2550 F outlet run. The window retaining assembly was modified to improve its tolerance for thermal distortion of the flanges. It is shown that cost effective receiver designs can be implemented within the framework of present materials technology.
Comparing performance of many-core CPUs and GPUs for static and motion compensated reconstruction of C-arm CT data.

PubMed

Hofmann, Hannes G; Keck, Benjamin; Rohkohl, Christopher; Hornegger, Joachim

2011-01-01

Interventional reconstruction of 3-D volumetric data from C-arm CT projections is a computationally demanding task. Hardware optimization is not an option but mandatory for interventional image processing and, in particular, for image reconstruction due to the high demands on performance. Several groups have published fast analytical 3-D reconstruction on highly parallel hardware such as GPUs to mitigate this issue. The authors show that the performance of modern CPU-based systems is in the same order as current GPUs for static 3-D reconstruction and outperforms them for a recent motion compensated (3-D+time) image reconstruction algorithm. This work investigates two algorithms: Static 3-D reconstruction as well as a recent motion compensated algorithm. The evaluation was performed using a standardized reconstruction benchmark, RABBITCT, to get comparable results and two additional clinical data sets. The authors demonstrate for a parametric B-spline motion estimation scheme that the derivative computation, which requires many write operations to memory, performs poorly on the GPU and can highly benefit from modern CPU architectures with large caches. Moreover, on a 32-core Intel Xeon server system, the authors achieve linear scaling with the number of cores used and reconstruction times almost in the same range as current GPUs. Algorithmic innovations in the field of motion compensated image reconstruction may lead to a shift back to CPUs in the future. For analytical 3-D reconstruction, the authors show that the gap between GPUs and CPUs became smaller. It can be performed in less than 20 s (on-the-fly) using a 32-core server.
Profiling an application for power consumption during execution on a plurality of compute nodes

DOEpatents

Archer, Charles J.; Blocksome, Michael A.; Peters, Amanda E.; Ratterman, Joseph D.; Smith, Brian E.

2012-08-21

Methods, apparatus, and products are disclosed for profiling an application for power consumption during execution on a compute node that include: receiving an application for execution on a compute node; identifying a hardware power consumption profile for the compute node, the hardware power consumption profile specifying power consumption for compute node hardware during performance of various processing operations; determining a power consumption profile for the application in dependence upon the application and the hardware power consumption profile for the compute node; and reporting the power consumption profile for the application.
The International Space Station human life sciences experiment implementation process

NASA Technical Reports Server (NTRS)

Miller, L. J.; Haven, C. P.; McCollum, S. G.; Lee, A. M.; Kamman, M. R.; Baumann, D. K.; Anderson, M. E.; Buderer, M. C.

2001-01-01

The selection, definition, and development phases of a Life Sciences flight research experiment has been consistent throughout the past decade. The implementation process, however, has changed significantly within the past two years. This change is driven primarily by the shift from highly integrated, dedicated research missions on platforms with well defined processes to self contained experiments with stand alone operations on platforms which are being concurrently designed. For experiments manifested on the International Space Station (ISS) and/or on short duration missions, the more modular, streamlined, and independent the individual experiment is, the more likely it is to be successfully implemented before the ISS assembly is completed. During the assembly phase of the ISS, science operations are lower in priority than the construction of the station. After the station has been completed, it is expected that more resources will be available to perform research. The complexity of implementing investigations increases with the logistics needed to perform the experiment. Examples of logistics issues include- hardware unique to the experiment; large up and down mass and volume needs; access to crew and hardware during the ascent or descent phases; maintenance of hardware and supplies with a limited shelf life,- baseline data collection schedules with lengthy sessions or sessions close to the launch or landing; onboard stowage availability, particularly cold stowage; and extensive training where highly proficient skills must be maintained. As the ISS processes become better defined, experiment implementation will meet new challenges due to distributed management, on-orbit resource sharing, and adjustments to crew availability pre- and post-increment. c 2001. Elsevier Science Ltd. All rights reserved.
Optical communication for space missions

NASA Technical Reports Server (NTRS)

Firtmaurice, M.

1991-01-01

Activities performed at NASA/GSFC (Goddard Space Flight Center) related to direct detection optical communications for space applications are discussed. The following subject areas are covered: (1) requirements for optical communication systems (data rates and channel quality; spatial acquisition; fine tracking and pointing; and transmit point-ahead correction); (2) component testing and development (laser diodes performance characterization and life testing; and laser diode power combining); (3) system development and simulations (The GSFC pointing, acquisition and tracking system; hardware description; preliminary performance analysis; and high data rate transmitter/receiver systems); and (4) proposed flight demonstration of optical communications.
High Productivity DRIE solutions for 3D-SiP and MEMS Volume Manufacturing

NASA Astrophysics Data System (ADS)

Puech, M.; Thevenoud, JM; Launay, N.; Arnal, N.; Godinat, P.; Andrieu, B.; Gruffat, JM

2006-04-01

Emerging 3D-SiP technologies and high volume MEMS applications require high productivity mass production DRIE systems. The Alcatel DRIE product range has recently been optimised to reach the highest process and hardware production performances. A study based on sub-micron high aspect ratio structures encountered in the most stringent 3D-SiP has been carried out. The optimization of the Bosch process parameters has resulted in ultra high silicon etch rates, with unrivalled uniformity and repeatability leading to excellent process. In parallel, most recent hardware and proprietary design optimization including vacuum pumping lines, process chamber, wafer chucks, pressure control system, gas delivery are discussed. These improvements have been monitored in a mass production environment for a mobile phone application. Field data analysis shows a significant reduction of cost of ownership thanks to increased throughput and much lower running costs. These benefits are now available for all 3D-SiP and high volume MEMS applications. The typical etched patterns include tapered trenches for CMOS imagers, through silicon via holes for die stacking, well controlled profile angle for 3D high precision inertial sensors, and large exposed area features for inkjet printer heads and Silicon microphones.
Test program, helium II orbital resupply coupling

NASA Technical Reports Server (NTRS)

Hyatt, William S.

1991-01-01

The full scope of this program was to have included development tests, design and production of custom test equipment and acceptance and qualification testing of prototype and protoflight coupling hardware. This program was performed by Ball Aerospace Systems Division, Boulder, Colorado until its premature termination in May 1991. Development tests were performed on cryogenic face seals and flow control devices at superfluid helium (He II) conditions. Special equipment was developed to allow quantified leak detection at large leak rates up to 8.4 x 10(exp -4) SCCS. Two major fixtures were developed and characterized: The Cryogenic Test Fixture (CTF) and the Thermal Mismatch Fixture (Glovebox). The CTF allows the coupling hardware to be filled with liquid nitrogen (LN2), liquid helium (LHe) or sub-cooled liquid helium when hardware flow control valves are either open or closed. Heat leak measurements, internal and external helium leakage measurements, cryogenic proof pressure tests and external load applications are performed in this fixture. Special reusable MLI closures were developed to provide repeatable installations in the CTF. The Thermal Mismatch Fixture allows all design configurations of coupling hardware to be engaged and disengaged while measuring applied forces and torques. Any two hardware components may be individually thermally preconditioned within the range of 117 deg K to 350 deg K prior to engage/disengage cycling. This verifies dimensional compatibility and operation when thermally mismatched. A clean, dry GN2 atmosphere is maintained in the fixture at all times. The first shipset of hardware was received, inspected and cycled at room temperature just prior to program termination.
Measuring the Pain Area: An Intra- and Inter-Rater Reliability Study Using Image Analysis Software.

PubMed

Dos Reis, Felipe Jose Jandre; de Barros E Silva, Veronica; de Lucena, Raphaela Nunes; Mendes Cardoso, Bruno Alexandre; Nogueira, Leandro Calazans

2016-01-01

Pain drawings have frequently been used for clinical information and research. The aim of this study was to investigate intra- and inter-rater reliability of area measurements performed on pain drawings. Our secondary objective was to verify the reliability when using computers with different screen sizes, both with and without mouse hardware. Pain drawings were completed by patients with chronic neck pain or neck-shoulder-arm pain. Four independent examiners participated in the study. Examiners A and B used the same computer with a 16-inch screen and wired mouse hardware. Examiner C used a notebook with a 16-inch screen and no mouse hardware, and Examiner D used a computer with an 11.6-inch screen and a wireless mouse. Image measurements were obtained using GIMP and NIH ImageJ computer programs. The length of all the images was measured using GIMP software to a set scale in ImageJ. Thus, each marked area was encircled and the total surface area (cm(2) ) was calculated for each pain drawing measurement. A total of 117 areas were identified and 52 pain drawings were analyzed. The intrarater reliability between all examiners was high (ICC = 0.989). The inter-rater reliability was also high. No significant differences were observed when using different screen sizes or when using or not using the mouse hardware. This suggests that the precision of these measurements is acceptable for the use of this method as a measurement tool in clinical practice and research. © 2014 World Institute of Pain.
Anion exchange membrane fuel cells: Current status and remaining challenges

NASA Astrophysics Data System (ADS)

Gottesfeld, Shimshon; Dekel, Dario R.; Page, Miles; Bae, Chulsung; Yan, Yushan; Zelenay, Piotr; Kim, Yu Seung

2018-01-01

The anion exchange membrane fuel cell (AEMFC) is an attractive alternative to acidic proton exchange membrane fuel cells, which to date have required platinum-based catalysts, as well as acid-tolerant stack hardware. The AEMFC could use non-platinum-group metal catalysts and less expensive metal hardware thanks to the high pH of the electrolyte. Over the last decade, substantial progress has been made in improving the performance and durability of the AEMFC through the development of new materials and the optimization of system design and operation conditions. In this perspective article, we describe the current status of AEMFCs as having reached beginning of life performance very close to that of PEMFCs when using ultra-low loadings of Pt, while advancing towards operation on non-platinum-group metal catalysts alone. In the latter sections, we identify the remaining technical challenges, which require further research and development, focusing on the materials and operational factors that critically impact AEMFC performance and/or durability. These perspectives may provide useful insights for the development of next-generation of AEMFCs.
Anion exchange membrane fuel cells: Current status and remaining challenges

DOE PAGES

Gottesfeld, Shimshon; Dekel, Dario R.; Page, Miles; ...

2017-09-01

The anion exchange membrane fuel cell (AEMFC) is an attractive alternative to acidic proton exchange membrane fuel cells, which to date have required platinum-based catalysts, as well as acid-tolerant stack hardware. The AEMFC could use non-platinum-group metal catalysts and less expensive metal hardware thanks to the high pH of the electrolyte. Over the last decade, substantial progress has been made in improving the performance and durability of the AEMFC through the development of new materials and the optimization of system design and operation conditions. Here in this perspective article, we describe the current status of AEMFCs as having reached beginningmore » of life performance very close to that of PEMFCs when using ultra-low loadings of Pt, while advancing towards operation on non-platinum-group metal catalysts alone. In the latter sections, we identify the remaining technical challenges, which require further research and development, focusing on the materials and operational factors that critically impact AEMFC performance and/or durability. Finally, these perspectives may provide useful insights for the development of next-generation of AEMFCs.« less
Flow visualization of CFD using graphics workstations

NASA Technical Reports Server (NTRS)

Lasinski, Thomas; Buning, Pieter; Choi, Diana; Rogers, Stuart; Bancroft, Gordon

1987-01-01

High performance graphics workstations are used to visualize the fluid flow dynamics obtained from supercomputer solutions of computational fluid dynamic programs. The visualizations can be done independently on the workstation or while the workstation is connected to the supercomputer in a distributed computing mode. In the distributed mode, the supercomputer interactively performs the computationally intensive graphics rendering tasks while the workstation performs the viewing tasks. A major advantage of the workstations is that the viewers can interactively change their viewing position while watching the dynamics of the flow fields. An overview of the computer hardware and software required to create these displays is presented. For complex scenes the workstation cannot create the displays fast enough for good motion analysis. For these cases, the animation sequences are recorded on video tape or 16 mm film a frame at a time and played back at the desired speed. The additional software and hardware required to create these video tapes or 16 mm movies are also described. Photographs illustrating current visualization techniques are discussed. Examples of the use of the workstations for flow visualization through animation are available on video tape.

EHWPACK: An evolvable hardware environment using the SPICE simulator and the Field Programmable Transistor Array

NASA Technical Reports Server (NTRS)

Keymeulen, D.; Klimeck, G.; Zebulum, R.; Stoica, A.; Jin, Y.; Lazaro, C.

2000-01-01

This paper describes the EHW development system, a tool that performs the evolutionary synthesis of electronic circuits, using the SPICE simulator and the Field Programmable Transistor Array hardware (FPTA) developed at JPL.
Affordable Emerging Computer Hardware for Neuromorphic Computing Applications

DTIC Science & Technology

2011-09-01

DATES COVERED (From - To) 4 . TITLE AND SUBTITLE AFFORDABLE EMERGING COMPUTER HARDWARE FOR NEUROMORPHIC COMPUTING APPLICATIONS 5a. CONTRACT NUMBER...speedup over software [3, 4 ]. 3 Table 1 shows a comparison of the computing performance, communication performance, power consumption...time is probably 5 frames per second, corresponding to 5 saccades. III. RESULTS AND DISCUSSION The use of IBM Cell-BE technology (Sony PlayStation
The Art of Space Flight Exercise Hardware: Design and Implementation

NASA Technical Reports Server (NTRS)

Beyene, Nahom M.

2004-01-01

The design of space flight exercise hardware depends on experience with crew health maintenance in a microgravity environment, history in development of flight-quality exercise hardware, and a foundation for certifying proper project management and design methodology. Developed over the past 40 years, the expertise in designing exercise countermeasures hardware at the Johnson Space Center stems from these three aspects of design. The medical community has steadily pursued an understanding of physiological changes in humans in a weightless environment and methods of counteracting negative effects on the cardiovascular and musculoskeletal system. The effects of weightlessness extend to the pulmonary and neurovestibular system as well with conditions ranging from motion sickness to loss of bone density. Results have shown losses in water weight and muscle mass in antigravity muscle groups. With the support of university-based research groups and partner space agencies, NASA has identified exercise to be the primary countermeasure for long-duration space flight. The history of exercise hardware began during the Apollo Era and leads directly to the present hardware on the International Space Station. Under the classifications of aerobic and resistive exercise, there is a clear line of development from the early devices to the countermeasures hardware used today. In support of all engineering projects, the engineering directorate has created a structured framework for project management. Engineers have identified standards and "best practices" to promote efficient and elegant design of space exercise hardware. The quality of space exercise hardware depends on how well hardware requirements are justified by exercise performance guidelines and crew health indicators. When considering the microgravity environment of the device, designers must consider performance of hardware separately from the combined human-in-hardware system. Astronauts are the caretakers of the hardware while it is deployed and conduct all sanitization, calibration, and maintenance for the devices. Thus, hardware designs must account for these issues with a goal of minimizing crew time on orbit required to complete these tasks. In the future, humans will venture to Mars and exercise countermeasures will play a critical role in allowing us to continue in our spirit of exploration. NASA will benefit from further experimentation on Earth, through the International Space Station, and with advanced biomechanical models to quantify how each device counteracts specific symptoms of weightlessness. With the continued support of international space agencies and the academic research community, we will usher the next frontier in human space exploration.
A hardware-oriented concurrent TZ search algorithm for High-Efficiency Video Coding

NASA Astrophysics Data System (ADS)

Doan, Nghia; Kim, Tae Sung; Rhee, Chae Eun; Lee, Hyuk-Jae

2017-12-01

High-Efficiency Video Coding (HEVC) is the latest video coding standard, in which the compression performance is double that of its predecessor, the H.264/AVC standard, while the video quality remains unchanged. In HEVC, the test zone (TZ) search algorithm is widely used for integer motion estimation because it effectively searches the good-quality motion vector with a relatively small amount of computation. However, the complex computation structure of the TZ search algorithm makes it difficult to implement it in the hardware. This paper proposes a new integer motion estimation algorithm which is designed for hardware execution by modifying the conventional TZ search to allow parallel motion estimations of all prediction unit (PU) partitions. The algorithm consists of the three phases of zonal, raster, and refinement searches. At the beginning of each phase, the algorithm obtains the search points required by the original TZ search for all PU partitions in a coding unit (CU). Then, all redundant search points are removed prior to the estimation of the motion costs, and the best search points are then selected for all PUs. Compared to the conventional TZ search algorithm, experimental results show that the proposed algorithm significantly decreases the Bjøntegaard Delta bitrate (BD-BR) by 0.84%, and it also reduces the computational complexity by 54.54%.
YARR - A PCIe based Readout Concept for Current and Future ATLAS Pixel Modules

NASA Astrophysics Data System (ADS)

Heim, Timon

2017-10-01

The Yet Another Rapid Readout (YARR) system is a DAQ system designed for the readout of current generation ATLAS Pixel FE-I4 and next generation chips. It utilises a commercial-off-the-shelf PCIe FPGA card as a reconfigurable I/O interface, which acts as a simple gateway to pipe all data from the Pixel modules via the high speed PCIe connection into the host system’s memory. Relying on modern CPU architectures, which enables the usage of parallelised processing in threads and commercial high speed interfaces in everyday computers, it is possible to perform all processing on a software level in the host CPU. Although FPGAs are very powerful at parallel signal processing their firmware is hard to maintain and constrained by their connected hardware. Software, on the other hand, is very portable and upgraded frequently with new features coming at no cost. A DAQ concept which does not rely on the underlying hardware for acceleration also eases the transition from prototyping in the laboratory to the full scale implementation in the experiment. The overall concept and data flow will be outlined, as well as the challenges and possible bottlenecks which can be encountered when moving the processing from hardware to software.
High Frequency Adaptive Instability Suppression Controls in a Liquid-Fueled Combustor

NASA Technical Reports Server (NTRS)

Kopasakis, George

2003-01-01

This effort extends into high frequency (>500 Hz), an earlier developed adaptive control algorithm for the suppression of thermo-acoustic instabilities in a liquidfueled combustor. The earlier work covered the development of a controls algorithm for the suppression of a low frequency (280 Hz) combustion instability based on simulations, with no hardware testing involved. The work described here includes changes to the simulation and controller design necessary to control the high frequency instability, augmentations to the control algorithm to improve its performance, and finally hardware testing and results with an experimental combustor rig developed for the high frequency case. The Adaptive Sliding Phasor Averaged Control (ASPAC) algorithm modulates the fuel flow in the combustor with a control phase that continuously slides back and forth within the phase region that reduces the amplitude of the instability. The results demonstrate the power of the method - that it can identify and suppress the instability even when the instability amplitude is buried in the noise of the combustor pressure. The successful testing of the ASPAC approach helped complete an important NASA milestone to demonstrate advanced technologies for low-emission combustors.
VLSI neuroprocessors

NASA Technical Reports Server (NTRS)

Kemeny, Sabrina E.

1994-01-01

Electronic and optoelectronic hardware implementations of highly parallel computing architectures address several ill-defined and/or computation-intensive problems not easily solved by conventional computing techniques. The concurrent processing architectures developed are derived from a variety of advanced computing paradigms including neural network models, fuzzy logic, and cellular automata. Hardware implementation technologies range from state-of-the-art digital/analog custom-VLSI to advanced optoelectronic devices such as computer-generated holograms and e-beam fabricated Dammann gratings. JPL's concurrent processing devices group has developed a broad technology base in hardware implementable parallel algorithms, low-power and high-speed VLSI designs and building block VLSI chips, leading to application-specific high-performance embeddable processors. Application areas include high throughput map-data classification using feedforward neural networks, terrain based tactical movement planner using cellular automata, resource optimization (weapon-target assignment) using a multidimensional feedback network with lateral inhibition, and classification of rocks using an inner-product scheme on thematic mapper data. In addition to addressing specific functional needs of DOD and NASA, the JPL-developed concurrent processing device technology is also being customized for a variety of commercial applications (in collaboration with industrial partners), and is being transferred to U.S. industries. This viewgraph p resentation focuses on two application-specific processors which solve the computation intensive tasks of resource allocation (weapon-target assignment) and terrain based tactical movement planning using two extremely different topologies. Resource allocation is implemented as an asynchronous analog competitive assignment architecture inspired by the Hopfield network. Hardware realization leads to a two to four order of magnitude speed-up over conventional techniques and enables multiple assignments, (many to many), not achievable with standard statistical approaches. Tactical movement planning (finding the best path from A to B) is accomplished with a digital two-dimensional concurrent processor array. By exploiting the natural parallel decomposition of the problem in silicon, a four order of magnitude speed-up over optimized software approaches has been demonstrated.
Human performance interfaces in air traffic control.

PubMed

Chang, Yu-Hern; Yeh, Chung-Hsing

2010-01-01

This paper examines how human performance factors in air traffic control (ATC) affect each other through their mutual interactions. The paper extends the conceptual SHEL model of ergonomics to describe the ATC system as human performance interfaces in which the air traffic controllers interact with other human performance factors including other controllers, software, hardware, environment, and organisation. New research hypotheses about the relationships between human performance interfaces of the system are developed and tested on data collected from air traffic controllers, using structural equation modelling. The research result suggests that organisation influences play a more significant role than individual differences or peer influences on how the controllers interact with the software, hardware, and environment of the ATC system. There are mutual influences between the controller-software, controller-hardware, controller-environment, and controller-organisation interfaces of the ATC system, with the exception of the controller-controller interface. Research findings of this study provide practical insights in managing human performance interfaces of the ATC system in the face of internal or external change, particularly in understanding its possible consequences in relation to the interactions between human performance factors.
Multimission helicopter information display technology

NASA Astrophysics Data System (ADS)

Terry, William S.

1995-06-01

A new Operator display subsystem is being incorporated as part of the next generation United States Navy (USN) helicopter avionics system to be integrated into the Multi-Mission Helicopter (MMH) which will replace both the SH-60B and the SH- 60F in 2001. This subsystem exploits state-of-the-art technology for the display hardware, the display driver hardware, information presentation methodologies, and software architecture. The technologies to be base technologies have evolved during the development period and the solution has been modified to include current elements including high resolution AMLCD color displays that are sunlight readable, highly reliable, and significantly lighter that CRT technology, as well as Reduced Instruction Set Computer (RISC) based high-performance display generators that have only recently become feasible to implement in a military aircraft. This paper describes the overall subsystem architecture, some detail on the individual elements along with supporting rationale, the manner in which the display subsystem provides the necessary tools to significantly enhance the performance of the weapon system through the vital Operator-System Interface. Also addressed is a summary of the evolution of design leading to the current approach to MMH Operator displays and display processing as well as the growth path that the MMH display subsystem will most likely follow as additional technology evolution occurs.
Accelerated Application Development: The ORNL Titan Experience

DOE PAGES

Joubert, Wayne; Archibald, Richard K.; Berrill, Mark A.; ...

2015-05-09

The use of computational accelerators such as NVIDIA GPUs and Intel Xeon Phi processors is now widespread in the high performance computing community, with many applications delivering impressive performance gains. However, programming these systems for high performance, performance portability and software maintainability has been a challenge. In this paper we discuss experiences porting applications to the Titan system. Titan, which began planning in 2009 and was deployed for general use in 2013, was the first multi-petaflop system based on accelerator hardware. To ready applications for accelerated computing, a preparedness effort was undertaken prior to delivery of Titan. In this papermore » we report experiences and lessons learned from this process and describe how users are currently making use of computational accelerators on Titan.« less
Accelerated application development: The ORNL Titan experience

DOE Office of Scientific and Technical Information (OSTI.GOV)

Joubert, Wayne; Archibald, Rick; Berrill, Mark

2015-08-01

The use of computational accelerators such as NVIDIA GPUs and Intel Xeon Phi processors is now widespread in the high performance computing community, with many applications delivering impressive performance gains. However, programming these systems for high performance, performance portability and software maintainability has been a challenge. In this paper we discuss experiences porting applications to the Titan system. Titan, which began planning in 2009 and was deployed for general use in 2013, was the first multi-petaflop system based on accelerator hardware. To ready applications for accelerated computing, a preparedness effort was undertaken prior to delivery of Titan. In this papermore » we report experiences and lessons learned from this process and describe how users are currently making use of computational accelerators on Titan.« less
Hardware support for software controlled fast reconfiguration of performance counters

DOEpatents

Salapura, Valentina; Wisniewski, Robert W.

2013-06-18

Hardware support for software controlled reconfiguration of performance counters may include a plurality of performance counters collecting one or more counts of one or more selected activities. A storage element stores data value representing a time interval, and a timer element reads the data value and detects expiration of the time interval based on the data value and generates a signal. A plurality of configuration registers stores a set of performance counter configurations. A state machine receives the signal and selects a configuration register from the plurality of configuration registers for reconfiguring the one or more performance counters.
Hardware support for software controlled fast reconfiguration of performance counters

DOEpatents

Salapura, Valentina; Wisniewski, Robert W

2013-09-24

Hardware support for software controlled reconfiguration of performance counters may include a plurality of performance counters collecting one or more counts of one or more selected activities. A storage element stores data value representing a time interval, and a timer element reads the data value and detects expiration of the time interval based on the data value and generates a signal. A plurality of configuration registers stores a set of performance counter configurations. A state machine receives the signal and selects a configuration register from the plurality of configuration registers for reconfiguring the one or more performance counters.
Impact of Turbine Modulation on Variable-Cycle Engine Performance. Phase 4. Additional Hardware Design and Fabrication, Engine Modification, and Altitude Test. Part 3 B

DTIC Science & Technology

1974-12-01

urbofan engine performance. An AiKesearch Model TFE731 -2 Turbofan Engine was modified to incorporate production-type variable-geometry hardware...reliability was shown for the variable- geometry components. The TFE731 , modified to include variable geometry, proved to be an inexpensive...Atm at a Met Thrust of 3300 LBF 929 85 Variable-Cycle Engine TFE731 Exhaust-Nozzle Performance 948 86 Analytical Model Comparisons, Aerodynamic
Noise tolerant illumination optimization applied to display devices

NASA Astrophysics Data System (ADS)

Cassarly, William J.; Irving, Bruce

2005-02-01

Display devices have historically been designed through an iterative process using numerous hardware prototypes. This process is effective but the number of iterations is limited by the time and cost to make the prototypes. In recent years, virtual prototyping using illumination software modeling tools has replaced many of the hardware prototypes. Typically, the designer specifies the design parameters, builds the software model, predicts the performance using a Monte Carlo simulation, and uses the performance results to repeat this process until an acceptable design is obtained. What is highly desired, and now possible, is to use illumination optimization to automate the design process. Illumination optimization provides the ability to explore a wider range of design options while also providing improved performance. Since Monte Carlo simulations are often used to calculate the system performance but those predictions have statistical uncertainty, the use of noise tolerant optimization algorithms is important. The use of noise tolerant illumination optimization is demonstrated by considering display device designs that extract light using 2D paint patterns as well as 3D textured surfaces. A hybrid optimization approach that combines a mesh feedback optimization with a classical optimizer is demonstrated. Displays with LED sources and cold cathode fluorescent lamps are considered.
Use of Heritage Hardware on Orion MPCV Exploration Flight Test One

NASA Technical Reports Server (NTRS)

Rains, George Edward; Cross, Cynthia D.

2012-01-01

Due to an aggressive schedule for the first space flight of an unmanned Orion capsule, currently known as Exploration Flight Test One (EFT1), combined with severe programmatic funding constraints, an effort was made within the Orion Program to identify heritage hardware, i.e., already existing, flight-certified components from previous manned space programs, which might be available for use on EFT1. With the end of the Space Shuttle Program, no current means exists to launch Multi-Purpose Logistics Modules (MPLMs) to the International Space Station (ISS), and so the inventory of many flight-certified Shuttle and MPLM components are available for other purposes. Two of these items are the MPLM cabin Positive Pressure Relief Assembly (PPRA), and the Shuttle Ground Support Equipment Heat Exchanger (GSE HX). In preparation for the utilization of these components by the Orion Program, analyses and testing of the hardware were performed. The PPRA had to be analyzed to determine its susceptibility to pyrotechnic shock, and vibration testing had to be performed, since those environments are predicted to be more severe during an Orion mission than those the hardware was originally designed to accommodate. The GSE HX had to be tested for performance with the Orion thermal working fluids, which are different from those used by the Space Shuttle. This paper summarizes the activities required in order to utilize heritage hardware for EFT1.
Use of Heritage Hardware on MPCV Exploration Flight Test One

NASA Technical Reports Server (NTRS)

Rains, George Edward; Cross, Cynthia D.

2011-01-01

Due to an aggressive schedule for the first orbital test flight of an unmanned Orion capsule, known as Exploration Flight Test One (EFT1), combined with severe programmatic funding constraints, an effort was made to identify heritage hardware, i.e., already existing, flight-certified components from previous manned space programs, which might be available for use on EFT1. With the end of the Space Shuttle Program, no current means exists to launch Multi Purpose Logistics Modules (MPLMs) to the International Space Station (ISS), and so the inventory of many flight-certified Shuttle and MPLM components are available for other purposes. Two of these items are the Shuttle Ground Support Equipment Heat Exchanger (GSE Hx) and the MPLM cabin Positive Pressure Relief Assembly (PPRA). In preparation for the utilization of these components by the Orion Program, analyses and testing of the hardware were performed. The PPRA had to be analyzed to determine its susceptibility to pyrotechnic shock, and vibration testing had to be performed, since those environments are predicted to be significantly more severe during an Orion mission than those the hardware was originally designed to accommodate. The GSE Hx had to be tested for performance with the Orion thermal working fluids, which are different from those used by the Space Shuttle. This paper summarizes the certification of the use of heritage hardware for EFT1.
Real-time demonstration hardware for enhanced DPCM video compression algorithm

NASA Technical Reports Server (NTRS)

Bizon, Thomas P.; Whyte, Wayne A., Jr.; Marcopoli, Vincent R.

1992-01-01

The lack of available wideband digital links as well as the complexity of implementation of bandwidth efficient digital video CODECs (encoder/decoder) has worked to keep the cost of digital television transmission too high to compete with analog methods. Terrestrial and satellite video service providers, however, are now recognizing the potential gains that digital video compression offers and are proposing to incorporate compression systems to increase the number of available program channels. NASA is similarly recognizing the benefits of and trend toward digital video compression techniques for transmission of high quality video from space and therefore, has developed a digital television bandwidth compression algorithm to process standard National Television Systems Committee (NTSC) composite color television signals. The algorithm is based on differential pulse code modulation (DPCM), but additionally utilizes a non-adaptive predictor, non-uniform quantizer and multilevel Huffman coder to reduce the data rate substantially below that achievable with straight DPCM. The non-adaptive predictor and multilevel Huffman coder combine to set this technique apart from other DPCM encoding algorithms. All processing is done on a intra-field basis to prevent motion degradation and minimize hardware complexity. Computer simulations have shown the algorithm will produce broadcast quality reconstructed video at an average transmission rate of 1.8 bits/pixel. Hardware implementation of the DPCM circuit, non-adaptive predictor and non-uniform quantizer has been completed, providing realtime demonstration of the image quality at full video rates. Video sampling/reconstruction circuits have also been constructed to accomplish the analog video processing necessary for the real-time demonstration. Performance results for the completed hardware compare favorably with simulation results. Hardware implementation of the multilevel Huffman encoder/decoder is currently under development along with implementation of a buffer control algorithm to accommodate the variable data rate output of the multilevel Huffman encoder. A video CODEC of this type could be used to compress NTSC color television signals where high quality reconstruction is desirable (e.g., Space Station video transmission, transmission direct-to-the-home via direct broadcast satellite systems or cable television distribution to system headends and direct-to-the-home).
Requirements analysis for a hardware, discrete-event, simulation engine accelerator

NASA Astrophysics Data System (ADS)

Taylor, Paul J., Jr.

1991-12-01

An analysis of a general Discrete Event Simulation (DES), executing on the distributed architecture of an eight mode Intel PSC/2 hypercube, was performed. The most time consuming portions of the general DES algorithm were determined to be the functions associated with message passing of required simulation data between processing nodes of the hypercube architecture. A behavioral description, using the IEEE standard VHSIC Hardware Description and Design Language (VHDL), for a general DES hardware accelerator is presented. The behavioral description specifies the operational requirements for a DES coprocessor to augment the hypercube's execution of DES simulations. The DES coprocessor design implements the functions necessary to perform distributed discrete event simulations using a conservative time synchronization protocol.
Performance of the Research Animal Holding Facility (RAHF) and General Purpose Work Station (GPWS) and other hardware in the microgravity environment

NASA Technical Reports Server (NTRS)

Hogan, Robert P.; Dalton, Bonnie P.

1991-01-01

This paper discusses the performance of the Research Animal Holding Facility (RAHF) and General Purpose Work Station (GPWS) plus other associated hardware during the recent flight of Spacelab Life Sciences 1 (SLS-1). The RAHF was developed to provide proper housing (food, water, temperature control, lighting and waste management) for up to 24 rodents during flights on the Spacelab. The GPWS was designed to contain particulates and toxic chemicals generated during plant and animal handling and dissection/fixation activities during space flights. A history of the hardware development involves as well as the redesign activities prior to the actual flight are discussed.

A real time sorting algorithm to time sort any deterministic time disordered data stream

NASA Astrophysics Data System (ADS)

Saini, J.; Mandal, S.; Chakrabarti, A.; Chattopadhyay, S.

2017-12-01

In new generation high intensity high energy physics experiments, millions of free streaming high rate data sources are to be readout. Free streaming data with associated time-stamp can only be controlled by thresholds as there is no trigger information available for the readout. Therefore, these readouts are prone to collect large amount of noise and unwanted data. For this reason, these experiments can have output data rate of several orders of magnitude higher than the useful signal data rate. It is therefore necessary to perform online processing of the data to extract useful information from the full data set. Without trigger information, pre-processing on the free streaming data can only be done with time based correlation among the data set. Multiple data sources have different path delays and bandwidth utilizations and therefore the unsorted merged data requires significant computational efforts for real time manifestation of sorting before analysis. Present work reports a new high speed scalable data stream sorting algorithm with its architectural design, verified through Field programmable Gate Array (FPGA) based hardware simulation. Realistic time based simulated data likely to be collected in an high energy physics experiment have been used to study the performance of the algorithm. The proposed algorithm uses parallel read-write blocks with added memory management and zero suppression features to make it efficient for high rate data-streams. This algorithm is best suited for online data streams with deterministic time disorder/unsorting on FPGA like hardware.
Recent Technology Advances in Distributed Engine Control

NASA Technical Reports Server (NTRS)

Culley, Dennis

2017-01-01

This presentation provides an overview of the work performed at NASA Glenn Research Center in distributed engine control technology. This is control system hardware technology that overcomes engine system constraints by modularizing control hardware and integrating the components over communication networks.
Skylab mission report, second visit. [postflight analysis of engineering, experimentation, and medical aspects

NASA Technical Reports Server (NTRS)

1974-01-01

An evaluation is presented of the operational and engineering aspects of the second Skylab flight. Other areas described include: the performance of experimental hardware; the crew's evaluation of the flight; medical aspects; and hardware anomalies.
A Subsystem Test Bed for Chinese Spectral Radioheliograph

NASA Astrophysics Data System (ADS)

Zhao, An; Yan, Yihua; Wang, Wei

2014-11-01

The Chinese Spectral Radioheliograph is a solar dedicated radio interferometric array that will produce high spatial resolution, high temporal resolution, and high spectral resolution images of the Sun simultaneously in decimetre and centimetre wave range. Digital processing of intermediate frequency signal is an important part in a radio telescope. This paper describes a flexible and high-speed digital down conversion system for the CSRH by applying complex mixing, parallel filtering, and extracting algorithms to process IF signal at the time of being designed and incorporates canonic-signed digit coding and bit-plane method to improve program efficiency. The DDC system is intended to be a subsystem test bed for simulation and testing for CSRH. Software algorithms for simulation and hardware language algorithms based on FPGA are written which use less hardware resources and at the same time achieve high performances such as processing high-speed data flow (1 GHz) with 10 MHz spectral resolution. An experiment with the test bed is illustrated by using geostationary satellite data observed on March 20, 2014. Due to the easy alterability of the algorithms on FPGA, the data can be recomputed with different digital signal processing algorithms for selecting optimum algorithm.
Effect of Heterogeneity on Decorrelation Mechanisms in Spiking Neural Networks: A Neuromorphic-Hardware Study

NASA Astrophysics Data System (ADS)

Pfeil, Thomas; Jordan, Jakob; Tetzlaff, Tom; Grübl, Andreas; Schemmel, Johannes; Diesmann, Markus; Meier, Karlheinz

2016-04-01

High-level brain function, such as memory, classification, or reasoning, can be realized by means of recurrent networks of simplified model neurons. Analog neuromorphic hardware constitutes a fast and energy-efficient substrate for the implementation of such neural computing architectures in technical applications and neuroscientific research. The functional performance of neural networks is often critically dependent on the level of correlations in the neural activity. In finite networks, correlations are typically inevitable due to shared presynaptic input. Recent theoretical studies have shown that inhibitory feedback, abundant in biological neural networks, can actively suppress these shared-input correlations and thereby enable neurons to fire nearly independently. For networks of spiking neurons, the decorrelating effect of inhibitory feedback has so far been explicitly demonstrated only for homogeneous networks of neurons with linear subthreshold dynamics. Theory, however, suggests that the effect is a general phenomenon, present in any system with sufficient inhibitory feedback, irrespective of the details of the network structure or the neuronal and synaptic properties. Here, we investigate the effect of network heterogeneity on correlations in sparse, random networks of inhibitory neurons with nonlinear, conductance-based synapses. Emulations of these networks on the analog neuromorphic-hardware system Spikey allow us to test the efficiency of decorrelation by inhibitory feedback in the presence of hardware-specific heterogeneities. The configurability of the hardware substrate enables us to modulate the extent of heterogeneity in a systematic manner. We selectively study the effects of shared input and recurrent connections on correlations in membrane potentials and spike trains. Our results confirm that shared-input correlations are actively suppressed by inhibitory feedback also in highly heterogeneous networks exhibiting broad, heavy-tailed firing-rate distributions. In line with former studies, cell heterogeneities reduce shared-input correlations. Overall, however, correlations in the recurrent system can increase with the level of heterogeneity as a consequence of diminished effective negative feedback.
Study on application of aerospace technology to improve surgical implants

NASA Technical Reports Server (NTRS)

Johnson, R. E.; Youngblood, J. L.

1982-01-01

The areas where aerospace technology could be used to improve the reliability and performance of metallic, orthopedic implants was assessed. Specifically, comparisons were made of material controls, design approaches, analytical methods and inspection approaches being used in the implant industry with hardware for the aerospace industries. Several areas for possible improvement were noted such as increased use of finite element stress analysis and fracture control programs on devices where the needs exist for maximum reliability and high structural performance.
RRTMGP: A High-Performance Broadband Radiation Code for the Next Decade

DTIC Science & Technology

2014-09-30

Hardware counters were used to measure several performance metrics, including the number of double-precision (DP) floating- point operations ( FLOPs ...0.2 DP FLOPs per CPU cycle. Experience with production science code is that it is possible to achieve execution rates in the range of 0.5 to 1.0...DP FLOPs per cycle. Looking at the ratio of vectorized DP FLOPs to total DP FLOPs we see (Figure PROF) that for most of the execution time the
Impact of uniform electrode current distribution on ETF. [Engineering Test Facility MHD generator

NASA Technical Reports Server (NTRS)

Bents, D. J.

1982-01-01

A basic reason for the complexity and sheer volume of electrode consolidation hardware in the MHD ETF Powertrain system is the channel electrode current distribution, which is non-uniform. If the channel design is altered to provide uniform electrode current distribution, the amount of hardware required decreases considerably, but at the possible expense of degraded channel performance. This paper explains the design impacts on the ETF electrode consolidation network associated with uniform channel electrode current distribution, and presents the alternate consolidation designs which occur. They are compared to the baseline (non-uniform current) design with respect to performance, and hardware requirements. A rational basis is presented for comparing the requirements for the different designs and the savings that result from uniform current distribution. Performance and cost impacts upon the combined cycle plant are discussed.
GPUs: An Emerging Platform for General-Purpose Computation

DTIC Science & Technology

2007-08-01

programming; real-time cinematic quality graphics Peak stream (26) License required (limited time no- cost evaluation program) Commercially...folding.stanford.edu (accessed 30 March 2007). 2. Fan, Z.; Qiu, F.; Kaufman, A.; Yoakum-Stover, S. GPU Cluster for High Performance Computing. ACM/IEEE...accessed 30 March 2007). 8. Goodnight, N.; Wang, R.; Humphreys, G. Computation on Programmable Graphics Hardware. IEEE Computer Graphics and
Fault Tolerance for VLSI Multicomputers

DTIC Science & Technology

1985-08-01

that consists of hundreds or thousands of VLSI computation nodes interconnected by dedicated links. Some important applications of high-end computers...technology, and intended applications . A proposed fault tolerance scheme combines hardware that performs error detection and system-level protocols for...order to recover from the error and resume correct operation, a valid system state must be restored. A low-overhead, application -transparent error
Multichannel reconfigurable measurement system for hot plasma diagnostics based on GEM-2D detector

NASA Astrophysics Data System (ADS)

Wojenski, A. J.; Kasprowicz, G.; Pozniak, K. T.; Byszuk, A.; Chernyshova, M.; Czarski, T.; Jablonski, S.; Juszczyk, B.; Zienkiewicz, P.

2015-12-01

In the future magnetically confined fusion research reactors (e.g. ITER tokamak), precise determination of the level of the soft X-ray radiation of plasma with temperature above 30 keV (around 350 mln K) will be very important in plasma parameters optimization. This paper presents the first version of a designed spectrography measurement system. The system is already installed at JET tokamak. Based on the experience gained from the project, the new generation of hardware for spectrography measurements, was designed and also described in the paper. The GEM detector readout structure was changed to 2D in order to perform measurements of i.e. laser generated plasma. The hardware structure of the system was redesigned in order to provide large number of high speed input channels. Finally, this paper also covers the issue of new control software, necessary to set-up a complete system of certain complexity and perform data acquisition. The main goal of the project was to develop a new version of the system, which includes upgraded structure and data transmission infrastructure (i.e. handling large number of measurement channels, high sampling rate).
High Frequency Sampling of TTL Pulses on a Raspberry Pi for Diffuse Correlation Spectroscopy Applications.

PubMed

Tivnan, Matthew; Gurjar, Rajan; Wolf, David E; Vishwanath, Karthik

2015-08-12

Diffuse Correlation Spectroscopy (DCS) is a well-established optical technique that has been used for non-invasive measurement of blood flow in tissues. Instrumentation for DCS includes a correlation device that computes the temporal intensity autocorrelation of a coherent laser source after it has undergone diffuse scattering through a turbid medium. Typically, the signal acquisition and its autocorrelation are performed by a correlation board. These boards have dedicated hardware to acquire and compute intensity autocorrelations of rapidly varying input signal and usually are quite expensive. Here we show that a Raspberry Pi minicomputer can acquire and store a rapidly varying time-signal with high fidelity. We show that this signal collected by a Raspberry Pi device can be processed numerically to yield intensity autocorrelations well suited for DCS applications. DCS measurements made using the Raspberry Pi device were compared to those acquired using a commercial hardware autocorrelation board to investigate the stability, performance, and accuracy of the data acquired in controlled experiments. This paper represents a first step toward lowering the instrumentation cost of a DCS system and may offer the potential to make DCS become more widely used in biomedical applications.
High Frequency Sampling of TTL Pulses on a Raspberry Pi for Diffuse Correlation Spectroscopy Applications

PubMed Central

Tivnan, Matthew; Gurjar, Rajan; Wolf, David E.; Vishwanath, Karthik

2015-01-01

Diffuse Correlation Spectroscopy (DCS) is a well-established optical technique that has been used for non-invasive measurement of blood flow in tissues. Instrumentation for DCS includes a correlation device that computes the temporal intensity autocorrelation of a coherent laser source after it has undergone diffuse scattering through a turbid medium. Typically, the signal acquisition and its autocorrelation are performed by a correlation board. These boards have dedicated hardware to acquire and compute intensity autocorrelations of rapidly varying input signal and usually are quite expensive. Here we show that a Raspberry Pi minicomputer can acquire and store a rapidly varying time-signal with high fidelity. We show that this signal collected by a Raspberry Pi device can be processed numerically to yield intensity autocorrelations well suited for DCS applications. DCS measurements made using the Raspberry Pi device were compared to those acquired using a commercial hardware autocorrelation board to investigate the stability, performance, and accuracy of the data acquired in controlled experiments. This paper represents a first step toward lowering the instrumentation cost of a DCS system and may offer the potential to make DCS become more widely used in biomedical applications. PMID:26274961
A Review of High-Performance Computational Strategies for Modeling and Imaging of Electromagnetic Induction Data

NASA Astrophysics Data System (ADS)

Newman, Gregory A.

2014-01-01

Many geoscientific applications exploit electrostatic and electromagnetic fields to interrogate and map subsurface electrical resistivity—an important geophysical attribute for characterizing mineral, energy, and water resources. In complex three-dimensional geologies, where many of these resources remain to be found, resistivity mapping requires large-scale modeling and imaging capabilities, as well as the ability to treat significant data volumes, which can easily overwhelm single-core and modest multicore computing hardware. To treat such problems requires large-scale parallel computational resources, necessary for reducing the time to solution to a time frame acceptable to the exploration process. The recognition that significant parallel computing processes must be brought to bear on these problems gives rise to choices that must be made in parallel computing hardware and software. In this review, some of these choices are presented, along with the resulting trade-offs. We also discuss future trends in high-performance computing and the anticipated impact on electromagnetic (EM) geophysics. Topics discussed in this review article include a survey of parallel computing platforms, graphics processing units to multicore CPUs with a fast interconnect, along with effective parallel solvers and associated solver libraries effective for inductive EM modeling and imaging.
Validating the simulation of large-scale parallel applications using statistical characteristics

DOE PAGES

Zhang, Deli; Wilke, Jeremiah; Hendry, Gilbert; ...

2016-03-01

Simulation is a widely adopted method to analyze and predict the performance of large-scale parallel applications. Validating the hardware model is highly important for complex simulations with a large number of parameters. Common practice involves calculating the percent error between the projected and the real execution time of a benchmark program. However, in a high-dimensional parameter space, this coarse-grained approach often suffers from parameter insensitivity, which may not be known a priori. Moreover, the traditional approach cannot be applied to the validation of software models, such as application skeletons used in online simulations. In this work, we present a methodologymore » and a toolset for validating both hardware and software models by quantitatively comparing fine-grained statistical characteristics obtained from execution traces. Although statistical information has been used in tasks like performance optimization, this is the first attempt to apply it to simulation validation. Lastly, our experimental results show that the proposed evaluation approach offers significant improvement in fidelity when compared to evaluation using total execution time, and the proposed metrics serve as reliable criteria that progress toward automating the simulation tuning process.« less
Forward and adjoint spectral-element simulations of seismic wave propagation using hardware accelerators

NASA Astrophysics Data System (ADS)

Peter, Daniel; Videau, Brice; Pouget, Kevin; Komatitsch, Dimitri

2015-04-01

Improving the resolution of tomographic images is crucial to answer important questions on the nature of Earth's subsurface structure and internal processes. Seismic tomography is the most prominent approach where seismic signals from ground-motion records are used to infer physical properties of internal structures such as compressional- and shear-wave speeds, anisotropy and attenuation. Recent advances in regional- and global-scale seismic inversions move towards full-waveform inversions which require accurate simulations of seismic wave propagation in complex 3D media, providing access to the full 3D seismic wavefields. However, these numerical simulations are computationally very expensive and need high-performance computing (HPC) facilities for further improving the current state of knowledge. During recent years, many-core architectures such as graphics processing units (GPUs) have been added to available large HPC systems. Such GPU-accelerated computing together with advances in multi-core central processing units (CPUs) can greatly accelerate scientific applications. There are mainly two possible choices of language support for GPU cards, the CUDA programming environment and OpenCL language standard. CUDA software development targets NVIDIA graphic cards while OpenCL was adopted mainly by AMD graphic cards. In order to employ such hardware accelerators for seismic wave propagation simulations, we incorporated a code generation tool BOAST into an existing spectral-element code package SPECFEM3D_GLOBE. This allows us to use meta-programming of computational kernels and generate optimized source code for both CUDA and OpenCL languages, running simulations on either CUDA or OpenCL hardware accelerators. We show here applications of forward and adjoint seismic wave propagation on CUDA/OpenCL GPUs, validating results and comparing performances for different simulations and hardware usages.
PCI bus content-addressable-memory (CAM) implementation on FPGA for pattern recognition/image retrieval in a distributed environment

NASA Astrophysics Data System (ADS)

Megherbi, Dalila B.; Yan, Yin; Tanmay, Parikh; Khoury, Jed; Woods, C. L.

2004-11-01

Recently surveillance and Automatic Target Recognition (ATR) applications are increasing as the cost of computing power needed to process the massive amount of information continues to fall. This computing power has been made possible partly by the latest advances in FPGAs and SOPCs. In particular, to design and implement state-of-the-Art electro-optical imaging systems to provide advanced surveillance capabilities, there is a need to integrate several technologies (e.g. telescope, precise optics, cameras, image/compute vision algorithms, which can be geographically distributed or sharing distributed resources) into a programmable system and DSP systems. Additionally, pattern recognition techniques and fast information retrieval, are often important components of intelligent systems. The aim of this work is using embedded FPGA as a fast, configurable and synthesizable search engine in fast image pattern recognition/retrieval in a distributed hardware/software co-design environment. In particular, we propose and show a low cost Content Addressable Memory (CAM)-based distributed embedded FPGA hardware architecture solution with real time recognition capabilities and computing for pattern look-up, pattern recognition, and image retrieval. We show how the distributed CAM-based architecture offers a performance advantage of an order-of-magnitude over RAM-based architecture (Random Access Memory) search for implementing high speed pattern recognition for image retrieval. The methods of designing, implementing, and analyzing the proposed CAM based embedded architecture are described here. Other SOPC solutions/design issues are covered. Finally, experimental results, hardware verification, and performance evaluations using both the Xilinx Virtex-II and the Altera Apex20k are provided to show the potential and power of the proposed method for low cost reconfigurable fast image pattern recognition/retrieval at the hardware/software co-design level.
The effect of fuel/air mixer design parameters on the continuous and discrete phase structure in the reaction-stabilizing region

NASA Astrophysics Data System (ADS)

Ateshkadi, Arash

The demands on current and future aero gas turbine combustors are demanding a greater insight into the role of the injector/dome design on combustion performance. The structure of the two-phase flow and combustion performance associated with practical injector/dome hardware is thoroughly investigated. A spray injector with two radial inflow swirlers was custom-designed to maintain tight tolerances and strict assembly protocol to isolate the sensitivity of performance to hardware design. The custom set is a unique modular design that (1) accommodates parametric variation in geometry, (2) retains symmetry, and (3) maintains effective area. Swirl sense and presence of a venturi were found to be the most influential on fuel distribution and Lean Blowout. The venturi acts as a fuel-prefilming surface and constrains the highest fuel mass concentration to an annular ring near the centerline. Co-swirl enhances the radial dispersion of the continuous phase and counter-swirl increases the level of mixing that occurs in the downstream region of the mixer. The smallest drop size distributions were found to occur with the counter-swirl configuration with venturi. In the case of counter-swirl without venturi the high concentration of fluid mass is found in the center region of the flow. The Lean Blowout (LBO) equivalence ratio was lower for counter-swirl due to the coupling of the centerline recirculation zone with the location of high fuel concentration emanating from smaller droplets. In the co-swirl configuration a more intense reaction was found near the mixer exit leading to the lowest concentration of NOx, CO and UHC. An LBO model with good agreement to the measured values was developed that related, for the first time, specific hardware parameters and operating condition to stability performance. A semi-analytical model, which agreed best with co-swirl configurations, was modified and used to describe the axial velocity profile downstream of the mixer exit. The development of these two models exemplifies the use of mathematical expressions to guide the design and development procedure for mixer geometry that meet the stringent demands on increasing combustion performance.
Post-Shuttle EVA Operations on ISS

NASA Technical Reports Server (NTRS)

West, William; Witt, Vincent; Chullen, Cinda

2010-01-01

The expected retirement of the NASA Space Transportation System (also known as the Space Shuttle ) by 2011 will pose a significant challenge to Extra-Vehicular Activities (EVA) on-board the International Space Station (ISS). The EVA hardware currently used to assemble and maintain the ISS was designed assuming that it would be returned to Earth on the Space Shuttle for refurbishment, or if necessary for failure investigation. With the retirement of the Space Shuttle, a new concept of operations was developed to enable EVA hardware (Extra-vehicular Mobility Unit (EMU), Airlock Systems, EVA tools, and associated support hardware and consumables) to perform ISS EVAs until 2015, and possibly beyond to 2020. Shortly after the decision to retire the Space Shuttle was announced, the EVA 2010 Project was jointly initiated by NASA and the One EVA contractor team. The challenges addressed were to extend the operating life and certification of EVA hardware, to secure the capability to launch EVA hardware safely on alternate launch vehicles, to protect for EMU hardware operability on-orbit, and to determine the source of high water purity to support recharge of PLSSs (no longer available via Shuttle). EVA 2010 Project includes the following tasks: the development of a launch fixture that would allow the EMU Portable Life Support System (PLSS) to be launched on-board alternate vehicles; extension of the EMU hardware maintenance interval from 3 years (current certification) to a minimum of 6 years (to extend to 2015); testing of recycled ISS Water Processor Assembly (WPA) water for use in the EMU cooling system in lieu of water resupplied by International Partner (IP) vehicles; development of techniques to remove & replace critical components in the PLSS on-orbit (not routine); extension of on-orbit certification of EVA tools; and development of an EVA hardware logistical plan to support the ISS without the Space Shuttle. Assumptions for the EVA 2010 Project included no more than 8 EVAs per year for ISS EVA operations in the Post-Shuttle environment and limited availability of cargo upmass on IP launch vehicles. From 2010 forward, EVA operations on-board the ISS without the Space Shuttle will be a paradigm shift in safely operating EVA hardware on orbit and the EVA 2010 effort was initiated to accommodate this significant change in EVA evolutionary history. 1
Photon Counting Using Edge-Detection Algorithm

NASA Technical Reports Server (NTRS)

Gin, Jonathan W.; Nguyen, Danh H.; Farr, William H.

2010-01-01

New applications such as high-datarate, photon-starved, free-space optical communications require photon counting at flux rates into gigaphoton-per-second regimes coupled with subnanosecond timing accuracy. Current single-photon detectors that are capable of handling such operating conditions are designed in an array format and produce output pulses that span multiple sample times. In order to discern one pulse from another and not to overcount the number of incoming photons, a detection algorithm must be applied to the sampled detector output pulses. As flux rates increase, the ability to implement such a detection algorithm becomes difficult within a digital processor that may reside within a field-programmable gate array (FPGA). Systems have been developed and implemented to both characterize gigahertz bandwidth single-photon detectors, as well as process photon count signals at rates into gigaphotons per second in order to implement communications links at SCPPM (serial concatenated pulse position modulation) encoded data rates exceeding 100 megabits per second with efficiencies greater than two bits per detected photon. A hardware edge-detection algorithm and corresponding signal combining and deserialization hardware were developed to meet these requirements at sample rates up to 10 GHz. The photon discriminator deserializer hardware board accepts four inputs, which allows for the ability to take inputs from a quadphoton counting detector, to support requirements for optical tracking with a reduced number of hardware components. The four inputs are hardware leading-edge detected independently. After leading-edge detection, the resultant samples are ORed together prior to deserialization. The deserialization is performed to reduce the rate at which data is passed to a digital signal processor, perhaps residing within an FPGA. The hardware implements four separate analog inputs that are connected through RF connectors. Each analog input is fed to a high-speed 1-bit comparator, which digitizes the input referenced to an adjustable threshold value. This results in four independent serial sample streams of binary 1s and 0s, which are ORed together at rates up to 10 GHz. This single serial stream is then deserialized by a factor of 16 to create 16 signal lines at a rate of 622.5 MHz or lower for input to a high-speed digital processor assembly. The new design and corresponding hardware can be employed with a quad-photon counting detector capable of handling photon rates on the order of multi-gigaphotons per second, whereas prior art was only capable of handling a single input at 1/4 the flux rate. Additionally, the hardware edge-detection algorithm has provided the ability to process 3-10 higher photon flux rates than previously possible by removing the limitation that photoncounting detector output pulses on multiple channels being ORed not overlap. Now, only the leading edges of the pulses are required to not overlap. This new photon counting digitizer hardware architecture supports a universal front end for an optical communications receiver operating at data rates from kilobits to over one gigabit per second to meet increased mission data volume requirements.

Post-Shuttle EVA Operations on ISS

NASA Technical Reports Server (NTRS)

West, Bill; Witt, Vincent; Chullen, Cinda

2010-01-01

The EVA hardware used to assemble and maintain the ISS was designed with the assumption that it would be returned to Earth on the Space Shuttle for ground processing, refurbishment, or failure investigation (if necessary). With the retirement of the Space Shuttle, a new concept of operations was developed to enable EVA hardware (EMU, Airlock Systems, EVA tools, and associated support equipment and consumables) to perform ISS EVAs until 2016 and possibly beyond to 2020. Shortly after the decision to retire the Space Shuttle was announced, NASA and the One EVA contractor team jointly initiated the EVA 2010 Project. Challenges were addressed to extend the operating life and certification of EVA hardware, secure the capability to launch EVA hardware safely on alternate launch vehicles, and protect EMU hardware operability on orbit for long durations.
Management of the Atmosphere Resource Recovery and Environmental Monitoring Project

NASA Technical Reports Server (NTRS)

Roman, Monsi; Perry, Jay; Howard, David

2013-01-01

The Advanced Exploration Systems Program's Atmosphere Resource Recovery and Environmental Monitoring (ARREM) project is working to further optimize atmosphere revitalization and environmental monitoring system architectures. This paper discusses project management strategies that tap into skill sets across multiple engineering disciplines, projects, field centers, and industry to achieve the project success. It is the project's objective to contribute to system advances that will enable sustained exploration missions beyond Lower Earth Orbit (LEO) and improve affordability by focusing on the primary goals of achieving high reliability, improving efficiency, and reducing dependence on ground-based logistics resupply. Technology demonstrations are achieved by infusing new technologies and concepts with existing developmental hardware and operating in a controlled environment simulating various crewed habitat scenarios. The ARREM project's strengths include access to a vast array of existing developmental hardware that perform all the vital atmosphere revitalization functions, exceptional test facilities to fully evaluate system performance, and a well-coordinated partnering effort among the NASA field centers and industry partners to provide the innovative expertise necessary to succeed.
A comprehensive approach to decipher biological computation to achieve next generation high-performance exascale computing.

DOE Office of Scientific and Technical Information (OSTI.GOV)

James, Conrad D.; Schiess, Adrian B.; Howell, Jamie

2013-10-01

The human brain (volume=1200cm3) consumes 20W and is capable of performing > 10^16 operations/s. Current supercomputer technology has reached 1015 operations/s, yet it requires 1500m^3 and 3MW, giving the brain a 10^12 advantage in operations/s/W/cm^3. Thus, to reach exascale computation, two achievements are required: 1) improved understanding of computation in biological tissue, and 2) a paradigm shift towards neuromorphic computing where hardware circuits mimic properties of neural tissue. To address 1), we will interrogate corticostriatal networks in mouse brain tissue slices, specifically with regard to their frequency filtering capabilities as a function of input stimulus. To address 2), we willmore » instantiate biological computing characteristics such as multi-bit storage into hardware devices with future computational and memory applications. Resistive memory devices will be modeled, designed, and fabricated in the MESA facility in consultation with our internal and external collaborators.« less
Efficient Lookup Table-Based Adaptive Baseband Predistortion Architecture for Memoryless Nonlinearity

NASA Astrophysics Data System (ADS)

Ba, Seydou N.; Waheed, Khurram; Zhou, G. Tong

2010-12-01

Digital predistortion is an effective means to compensate for the nonlinear effects of a memoryless system. In case of a cellular transmitter, a digital baseband predistorter can mitigate the undesirable nonlinear effects along the signal chain, particularly the nonlinear impairments in the radiofrequency (RF) amplifiers. To be practically feasible, the implementation complexity of the predistorter must be minimized so that it becomes a cost-effective solution for the resource-limited wireless handset. This paper proposes optimizations that facilitate the design of a low-cost high-performance adaptive digital baseband predistorter for memoryless systems. A comparative performance analysis of the amplitude and power lookup table (LUT) indexing schemes is presented. An optimized low-complexity amplitude approximation and its hardware synthesis results are also studied. An efficient LUT predistorter training algorithm that combines the fast convergence speed of the normalized least mean squares (NLMSs) with a small hardware footprint is proposed. Results of fixed-point simulations based on the measured nonlinear characteristics of an RF amplifier are presented.
A.I.-based real-time support for high performance aircraft operations

NASA Technical Reports Server (NTRS)

Vidal, J. J.

1985-01-01

Artificial intelligence (AI) based software and hardware concepts are applied to the handling system malfunctions during flight tests. A representation of malfunction procedure logic using Boolean normal forms are presented. The representation facilitates the automation of malfunction procedures and provides easy testing for the embedded rules. It also forms a potential basis for a parallel implementation in logic hardware. The extraction of logic control rules, from dynamic simulation and their adaptive revision after partial failure are examined. It uses a simplified 2-dimensional aircraft model with a controller that adaptively extracts control rules for directional thrust that satisfies a navigational goal without exceeding pre-established position and velocity limits. Failure recovery (rule adjusting) is examined after partial actuator failure. While this experiment was performed with primitive aircraft and mission models, it illustrates an important paradigm and provided complexity extrapolations for the proposed extraction of expertise from simulation, as discussed. The use of relaxation and inexact reasoning in expert systems was also investigated.
Trustworthy data collection from implantable medical devices via high-speed security implementation based on IEEE 1363.

PubMed

Hu, Fei; Hao, Qi; Lukowiak, Marcin; Sun, Qingquan; Wilhelm, Kyle; Radziszowski, Stanisław; Wu, Yao

2010-11-01

Implantable medical devices (IMDs) have played an important role in many medical fields. Any failure in IMDs operations could cause serious consequences and it is important to protect the IMDs access from unauthenticated access. This study investigates secure IMD data collection within a telehealthcare [mobile health (m-health)] network. We use medical sensors carried by patients to securely access IMD data and perform secure sensor-to-sensor communications between patients to relay the IMD data to a remote doctor's server. To meet the requirements on low computational complexity, we choose N-th degree truncated polynomial ring (NTRU)-based encryption/decryption to secure IMD-sensor and sensor-sensor communications. An extended matryoshkas model is developed to estimate direct/indirect trust relationship among sensors. An NTRU hardware implementation in very large integrated circuit hardware description language is studied based on industry Standard IEEE 1363 to increase the speed of key generation. The performance analysis results demonstrate the security robustness of the proposed IMD data access trust model.
Novel algorithm implementations in DARC: the Durham AO real-time controller

NASA Astrophysics Data System (ADS)

Basden, Alastair; Bitenc, Urban; Jenkins, David

2016-07-01

The Durham AO Real-time Controller has been used on-sky with the CANARY AO demonstrator instrument since 2010, and is also used to provide control for several AO test-benches, including DRAGON. Over this period, many new real-time algorithms have been developed, implemented and demonstrated, leading to performance improvements for CANARY. Additionally, the computational performance of this real-time system has continued to improve. Here, we provide details about recent updates and changes made to DARC, and the relevance of these updates, including new algorithms, to forthcoming AO systems. We present the computational performance of DARC when used on different hardware platforms, including hardware accelerators, and determine the relevance and potential for ELT scale systems. Recent updates to DARC have included algorithms to handle elongated laser guide star images, including correlation wavefront sensing, with options to automatically update references during AO loop operation. Additionally, sub-aperture masking options have been developed to increase signal to noise ratio when operating with non-symmetrical wavefront sensor images. The development of end-user tools has progressed with new options for configuration and control of the system. New wavefront sensor camera models and DM models have been integrated with the system, increasing the number of possible hardware configurations available, and a fully open-source AO system is now a reality, including drivers necessary for commercial cameras and DMs. The computational performance of DARC makes it suitable for ELT scale systems when implemented on suitable hardware. We present tests made on different hardware platforms, along with the strategies taken to optimise DARC for these systems.
Integrated Tools for Future Distributed Engine Control Technologies

NASA Technical Reports Server (NTRS)

Culley, Dennis; Thomas, Randy; Saus, Joseph

2013-01-01

Turbine engines are highly complex mechanical systems that are becoming increasingly dependent on control technologies to achieve system performance and safety metrics. However, the contribution of controls to these measurable system objectives is difficult to quantify due to a lack of tools capable of informing the decision makers. This shortcoming hinders technology insertion in the engine design process. NASA Glenn Research Center is developing a Hardware-inthe- Loop (HIL) platform and analysis tool set that will serve as a focal point for new control technologies, especially those related to the hardware development and integration of distributed engine control. The HIL platform is intended to enable rapid and detailed evaluation of new engine control applications, from conceptual design through hardware development, in order to quantify their impact on engine systems. This paper discusses the complex interactions of the control system, within the context of the larger engine system, and how new control technologies are changing that paradigm. The conceptual design of the new HIL platform is then described as a primary tool to address those interactions and how it will help feed the insertion of new technologies into future engine systems.
The dynamical analysis of modified two-compartment neuron model and FPGA implementation

NASA Astrophysics Data System (ADS)

Lin, Qianjin; Wang, Jiang; Yang, Shuangming; Yi, Guosheng; Deng, Bin; Wei, Xile; Yu, Haitao

2017-10-01

The complexity of neural models is increasing with the investigation of larger biological neural network, more various ionic channels and more detailed morphologies, and the implementation of biological neural network is a task with huge computational complexity and power consumption. This paper presents an efficient digital design using piecewise linearization on field programmable gate array (FPGA), to succinctly implement the reduced two-compartment model which retains essential features of more complicated models. The design proposes an approximate neuron model which is composed of a set of piecewise linear equations, and it can reproduce different dynamical behaviors to depict the mechanisms of a single neuron model. The consistency of hardware implementation is verified in terms of dynamical behaviors and bifurcation analysis, and the simulation results including varied ion channel characteristics coincide with the biological neuron model with a high accuracy. Hardware synthesis on FPGA demonstrates that the proposed model has reliable performance and lower hardware resource compared with the original two-compartment model. These investigations are conducive to scalability of biological neural network in reconfigurable large-scale neuromorphic system.
Synthetic hardware performance analysis in virtualized cloud environment for healthcare organization.

PubMed

Tan, Chee-Heng; Teh, Ying-Wah

2013-08-01

The main obstacles in mass adoption of cloud computing for database operations in healthcare organization are the data security and privacy issues. In this paper, it is shown that IT services particularly in hardware performance evaluation in virtual machine can be accomplished effectively without IT personnel gaining access to actual data for diagnostic and remediation purposes. The proposed mechanisms utilized the hypothetical data from TPC-H benchmark, to achieve 2 objectives. First, the underlying hardware performance and consistency is monitored via a control system, which is constructed using TPC-H queries. Second, the mechanism to construct stress-testing scenario is envisaged in the host, using a single or combination of TPC-H queries, so that the resource threshold point can be verified, if the virtual machine is still capable of serving critical transactions at this constraining juncture. This threshold point uses server run queue size as input parameter, and it serves 2 purposes: It provides the boundary threshold to the control system, so that periodic learning of the synthetic data sets for performance evaluation does not reach the host's constraint level. Secondly, when the host undergoes hardware change, stress-testing scenarios are simulated in the host by loading up to this resource threshold level, for subsequent response time verification from real and critical transactions.
Independent Orbiter Assessment (IOA): Analysis of the DPS subsystem

NASA Technical Reports Server (NTRS)

Lowery, H. J.; Haufler, W. A.; Pietz, K. C.

1986-01-01

The results of the Independent Orbiter Assessment (IOA) of the Failure Modes and Effects Analysis/Critical Items List (FMEA/CIL) is presented. The IOA approach features a top-down analysis of the hardware to independently determine failure modes, criticality, and potential critical items. The independent analysis results corresponding to the Orbiter Data Processing System (DPS) hardware are documented. The DPS hardware is required for performing critical functions of data acquisition, data manipulation, data display, and data transfer throughout the Orbiter. Specifically, the DPS hardware consists of the following components: Multiplexer/Demultiplexer (MDM); General Purpose Computer (GPC); Multifunction CRT Display System (MCDS); Data Buses and Data Bus Couplers (DBC); Data Bus Isolation Amplifiers (DBIA); Mass Memory Unit (MMU); and Engine Interface Unit (EIU). The IOA analysis process utilized available DPS hardware drawings and schematics for defining hardware assemblies, components, and hardware items. Each level of hardware was evaluated and analyzed for possible failure modes and effects. Criticality was assigned based upon the severity of the effect for each failure mode. Due to the extensive redundancy built into the DPS the number of critical items are few. Those identified resulted from premature operation and erroneous output of the GPCs.
Compiler-Assisted Multiple Instruction Rollback Recovery Using a Read Buffer. Ph.D. Thesis

NASA Technical Reports Server (NTRS)

Alewine, Neal Jon

1993-01-01

Multiple instruction rollback (MIR) is a technique to provide rapid recovery from transient processor failures and was implemented in hardware by researchers and slow in mainframe computers. Hardware-based MIR designs eliminate rollback data hazards by providing data redundancy implemented in hardware. Compiler-based MIR designs were also developed which remove rollback data hazards directly with data flow manipulations, thus eliminating the need for most data redundancy hardware. Compiler-assisted techniques to achieve multiple instruction rollback recovery are addressed. It is observed that data some hazards resulting from instruction rollback can be resolved more efficiently by providing hardware redundancy while others are resolved more efficiently with compiler transformations. A compiler-assisted multiple instruction rollback scheme is developed which combines hardware-implemented data redundancy with compiler-driven hazard removal transformations. Experimental performance evaluations were conducted which indicate improved efficiency over previous hardware-based and compiler-based schemes. Various enhancements to the compiler transformations and to the data redundancy hardware developed for the compiler-assisted MIR scheme are described and evaluated. The final topic deals with the application of compiler-assisted MIR techniques to aid in exception repair and branch repair in a speculative execution architecture.
Independent Orbiter Assessment (IOA): Analysis of the electrical power distribution and control/electrical power generation subsystem

NASA Technical Reports Server (NTRS)

Patton, Jeff A.

1986-01-01

The results of the Independent Orbiter Assessment (IOA) of the Failure Modes and Effects Analysis (FMEA) and Critical Items List (CIL) are presented. The IOA approach features a top-down analysis of the hardware to determine failure modes, criticality, and potential critical items. To preserve independence, this analysis was accomplished without reliance upon the results contained within the NASA FMEA/CIL documentation. This report documents the independent analysis results corresponding to the Orbiter Electrical Power Distribution and Control (EPD and C)/Electrical Power Generation (EPG) hardware. The EPD and C/EPG hardware is required for performing critical functions of cryogenic reactant storage, electrical power generation and product water distribution in the Orbiter. Specifically, the EPD and C/EPG hardware consists of the following components: Power Section Assembly (PSA); Reactant Control Subsystem (RCS); Thermal Control Subsystem (TCS); Water Removal Subsystem (WRS); and Power Reactant Storage and Distribution System (PRSDS). The IOA analysis process utilized available EPD and C/EPG hardware drawings and schematics for defining hardware assemblies, components, and hardware items. Each level of hardware was evaluated and analyzed for possible failure modes and effects. Criticality was assigned based upon the severity of the effect for each failure mode.
Towards roll-to-roll manufacturing of polymer photonic devices

NASA Astrophysics Data System (ADS)

Subbaraman, Harish; Lin, Xiaohui; Ling, Tao; Guo, L. Jay; Chen, Ray T.

2014-03-01

Traditionally, polymer photonic devices are fabricated using clean-room processes such as photolithography, e-beam lithography, reactive ion etching (RIE) and lift-off methods etc, which leads to long fabrication time, low throughput and high cost. We have utilized a novel process for fabricating polymer photonic devices using a combination of imprinting and ink jet printing methods, which provides high throughput on a variety of rigid and flexible substrates with low cost. We discuss the manufacturing challenges that need to be overcome in order to realize true implementation of roll-to-roll manufacturing of flexible polymer photonic systems. Several metrology and instrumentation challenges involved such as availability of particulate-free high quality substrate, development and implementation of high-speed in-line and off-line inspection and diagnostic tools with adaptive control for patterned and unpatterned material films, development of reliable hardware, etc need to be addressed and overcome in order to realize a successful manufacturing process. Due to extreme resolution requirements compared to print media, the burden of software and hardware tools on the throughput also needs to be carefully determined. Moreover, the effect of web wander and variations in web speed need to accurately be determined in the design of the system hardware and software. In this paper, we show the realization of solutions for few challenges, and utilizing these solutions for developing a high-rate R2R dual stage ink-jet printer that can provide alignment accuracy of <10μm at a web speed of 5m/min. The development of a roll-to-roll manufacturing system for polymer photonic systems opens limitless possibilities for the deployment of high performance components in a variety of applications including communication, sensing, medicine, agriculture, energy, lighting etc.
Adaptive Instrument Module: Space Instrument Controller "Brain" through Programmable Logic Devices

NASA Technical Reports Server (NTRS)

Darrin, Ann Garrison; Conde, Richard; Chern, Bobbie; Luers, Phil; Jurczyk, Steve; Mills, Carl; Day, John H. (Technical Monitor)

2001-01-01

The Adaptive Instrument Module (AIM) will be the first true demonstration of reconfigurable computing with field-programmable gate arrays (FPGAs) in space, enabling the 'brain' of the system to evolve or adapt to changing requirements. In partnership with NASA Goddard Space Flight Center and the Australian Cooperative Research Centre for Satellite Systems (CRC-SS), APL has built the flight version to be flown on the Australian university-class satellite FEDSAT. The AIM provides satellites the flexibility to adapt to changing mission requirements by reconfiguring standardized processing hardware rather than incurring the large costs associated with new builds. This ability to reconfigure the processing in response to changing mission needs leads to true evolveable computing, wherein the instrument 'brain' can learn from new science data in order to perform state-of-the-art data processing. The development of the AIM is significant in its enormous potential to reduce total life-cycle costs for future space exploration missions. The advent of RAM-based FPGAs whose configuration can be changed at any time has enabled the development of the AIM for processing tasks that could not be performed in software. The use of the AIM enables reconfiguration of the FPGA circuitry while the spacecraft is in flight, with many accompanying advantages. The AIM demonstrates the practicalities of using reconfigurable computing hardware devices by conducting a series of designed experiments. These include the demonstration of implementing data compression, data filtering, and communication message processing and inter-experiment data computation. The second generation is the Adaptive Processing Template (ADAPT) which is further described in this paper. The next step forward is to make the hardware itself adaptable and the ADAPT pursues this challenge by developing a reconfigurable module that will be capable of functioning efficiently in various applications. ADAPT will take advantage of radiation tolerant RAM-based field programmable gate array (FPGA) technology to develop a reconfigurable processor that combines the flexibility of a general purpose processor running software with the performance of application specific processing hardware for a variety of high performance computing applications.
Introduction on performance analysis and profiling methodologies for KVM on ARM virtualization

NASA Astrophysics Data System (ADS)

Motakis, Antonios; Spyridakis, Alexander; Raho, Daniel

2013-05-01

The introduction of hardware virtualization extensions on ARM Cortex-A15 processors has enabled the implementation of full virtualization solutions for this architecture, such as KVM on ARM. This trend motivates the need to quantify and understand the performance impact, emerged by the application of this technology. In this work we start looking into some interesting performance metrics on KVM for ARM processors, which can provide us with useful insight that may lead to potential improvements in the future. This includes measurements such as interrupt latency and guest exit cost, performed on ARM Versatile Express and Samsung Exynos 5250 hardware platforms. Furthermore, we discuss additional methodologies that can provide us with a deeper understanding in the future of the performance footprint of KVM. We identify some of the most interesting approaches in this field, and perform a tentative analysis on how these may be implemented in the KVM on ARM port. These take into consideration hardware and software based counters for profiling, and issues related to the limitations of the simulators which are often used, such as the ARM Fast Models platform.
Performance Measurement of Advanced Stirling Convertors (ASC-E3)

NASA Technical Reports Server (NTRS)

Oriti, Salvatore M.

2013-01-01

NASA Glenn Research Center (GRC) has been supporting development of the Advanced Stirling Radioisotope Generator (ASRG) since 2006. A key element of the ASRG project is providing life, reliability, and performance testing data of the Advanced Stirling Convertor (ASC). The latest version of the ASC (ASC-E3, to represent the third cycle of engineering model test hardware) is of a design identical to the forthcoming flight convertors. For this generation of hardware, a joint Sunpower and GRC effort was initiated to improve and standardize the test support hardware. After this effort was completed, the first pair of ASC-E3 units was produced by Sunpower and then delivered to GRC in December 2012. GRC has begun operation of these units. This process included performance verification, which examined the data from various tests to validate the convertor performance to the product specification. Other tests included detailed performance mapping that encompassed the wide range of operating conditions that will exist during a mission. These convertors were then transferred to Lockheed Martin for controller checkout testing. The results of this latest convertor performance verification activity are summarized here.
Alignment Measurements of the Microwave Anisotropy Probe (MAP) Instrument in a Thermal/Vacuum Chamber

NASA Technical Reports Server (NTRS)

Hill, Michael D.; Herrera, Acey A.; Crane, J. Allen; Packard, Edward A.; Aviado, Carlos; Sampler, Henry P.

2000-01-01

The Microwave Anisotropy Probe (MAP) Observatory, scheduled for a fall 2000 launch, is designed to measure temperature fluctuations (anisotropy) and produce a high sensitivity and high spatial resolution (approximately 0.2 degree) map of the cosmic microwave background (CMB) radiation over the entire sky between 22 and 90 GHz. MAP utilizes back-to-back Gregorian telescopes to focus the microwave signals into 10 differential microwave receivers, via 20 feed horns. Proper alignment of the telescope reflectors and the feed horns at the operating temperature of 90 K is a critical element to ensure mission success. We describe the hardware and methods used to validate the displacement/deformation predictions of the reflectors and the microwave feed horns during thermal/vacuum testing of the reflectors and the microwave instrument. The smallest deformation predictions to be measured were on the order of +/- 0.030 inches (+/- 0.762 mm). Performance of these alignment measurements inside a thermal/vacuum chamber with conventional alignment equipment posed several limitations. The most troublesome limitation was the inability to send personnel into the chamber to perform the measurements during the test due to vacuum and the temperature extremes. The photogrammetry (PG) system was chosen to perform the measurements since it is a non- contact measurement system, the measurements can be made relatively quickly and accurately, and the photogrammetric camera can be operated remotely. The hardware and methods developed to perform the MAP alignment measurements using PG proved to be highly successful. The measurements met the desired requirements, for the metal structures enabling the desired distortions to be measured resolving deformations an order of magnitude smaller than the imposed requirements. Viable data were provided to the MAP Project for a full analysis of the on-orbit performance of the Instrument's microwave system.
Demonstration Advanced Avionics System (DAAS) functional description. [Cessna 402B aircraft

NASA Technical Reports Server (NTRS)

1980-01-01

A comprehensive set of general aviation avionics were defined for integration into an advanced hardware mechanization for demonstration in a Cessna 402B aircraft. Block diagrams are shown and system and computer architecture as well as significant hardware elements are described. The multifunction integrated data control center and electronic horizontal situation indicator are discussed. The functions that the DAAS will perform are examined. This function definition is the basis for the DAAS hardware and software design.
A historical survey of algorithms and hardware architectures for neural-inspired and neuromorphic computing applications

DOE PAGES

James, Conrad D.; Aimone, James B.; Miner, Nadine E.; ...

2017-01-04

In this study, biological neural networks continue to inspire new developments in algorithms and microelectronic hardware to solve challenging data processing and classification problems. Here in this research, we survey the history of neural-inspired and neuromorphic computing in order to examine the complex and intertwined trajectories of the mathematical theory and hardware developed in this field. Early research focused on adapting existing hardware to emulate the pattern recognition capabilities of living organisms. Contributions from psychologists, mathematicians, engineers, neuroscientists, and other professions were crucial to maturing the field from narrowly-tailored demonstrations to more generalizable systems capable of addressing difficult problem classesmore » such as object detection and speech recognition. Algorithms that leverage fundamental principles found in neuroscience such as hierarchical structure, temporal integration, and robustness to error have been developed, and some of these approaches are achieving world-leading performance on particular data classification tasks. Additionally, novel microelectronic hardware is being developed to perform logic and to serve as memory in neuromorphic computing systems with optimized system integration and improved energy efficiency. Key to such advancements was the incorporation of new discoveries in neuroscience research, the transition away from strict structural replication and towards the functional replication of neural systems, and the use of mathematical theory frameworks to guide algorithm and hardware developments.« less

Some links on this page may take you to non-federal websites. Their policies may differ from this site.