Interactive Digital Signal Processor
NASA Technical Reports Server (NTRS)
Mish, W. H.
1985-01-01
Interactive Digital Signal Processor, IDSP, consists of set of time series analysis "operators" based on various algorithms commonly used for digital signal analysis. Processing of digital signal time series to extract information usually achieved by applications of number of fairly standard operations. IDSP excellent teaching tool for demonstrating application for time series operators to artificially generated signals.
Processor architecture for airborne SAR systems
NASA Technical Reports Server (NTRS)
Glass, C. M.
1983-01-01
Digital processors for spaceborne imaging radars and application of the technology developed for airborne SAR systems are considered. Transferring algorithms and implementation techniques from airborne to spaceborne SAR processors offers obvious advantages. The following topics are discussed: (1) a quantification of the differences in processing algorithms for airborne and spaceborne SARs; and (2) an overview of three processors for airborne SAR systems.
Method and apparatus for digitally based high speed x-ray spectrometer
Warburton, W.K.; Hubbard, B.
1997-11-04
A high speed, digitally based, signal processing system which accepts input data from a detector-preamplifier and produces a spectral analysis of the x-rays illuminating the detector. The system achieves high throughputs at low cost by dividing the required digital processing steps between a ``hardwired`` processor implemented in combinatorial digital logic, which detects the presence of the x-ray signals in the digitized data stream and extracts filtered estimates of their amplitudes, and a programmable digital signal processing computer, which refines the filtered amplitude estimates and bins them to produce the desired spectral analysis. One set of algorithms allow this hybrid system to match the resolution of analog systems while operating at much higher data rates. A second set of algorithms implemented in the processor allow the system to be self calibrating as well. The same processor also handles the interface to an external control computer. 19 figs.
Method and apparatus for digitally based high speed x-ray spectrometer
Warburton, William K.; Hubbard, Bradley
1997-01-01
A high speed, digitally based, signal processing system which accepts input data from a detector-preamplifier and produces a spectral analysis of the x-rays illuminating the detector. The system achieves high throughputs at low cost by dividing the required digital processing steps between a "hardwired" processor implemented in combinatorial digital logic, which detects the presence of the x-ray signals in the digitized data stream and extracts filtered estimates of their amplitudes, and a programmable digital signal processing computer, which refines the filtered amplitude estimates and bins them to produce the desired spectral analysis. One set of algorithms allow this hybrid system to match the resolution of analog systems while operating at much higher data rates. A second set of algorithms implemented in the processor allow the system to be self calibrating as well. The same processor also handles the interface to an external control computer.
Advanced digital SAR processing study
NASA Technical Reports Server (NTRS)
Martinson, L. W.; Gaffney, B. P.; Liu, B.; Perry, R. P.; Ruvin, A.
1982-01-01
A highly programmable, land based, real time synthetic aperture radar (SAR) processor requiring a processed pixel rate of 2.75 MHz or more in a four look system was designed. Variations in range and azimuth compression, number of looks, range swath, range migration and SR mode were specified. Alternative range and azimuth processing algorithms were examined in conjunction with projected integrated circuit, digital architecture, and software technologies. The advaced digital SAR processor (ADSP) employs an FFT convolver algorithm for both range and azimuth processing in a parallel architecture configuration. Algorithm performace comparisons, design system design, implementation tradeoffs and the results of a supporting survey of integrated circuit and digital architecture technologies are reported. Cost tradeoffs and projections with alternate implementation plans are presented.
Digital camera with apparatus for authentication of images produced from an image file
NASA Technical Reports Server (NTRS)
Friedman, Gary L. (Inventor)
1993-01-01
A digital camera equipped with a processor for authentication of images produced from an image file taken by the digital camera is provided. The digital camera processor has embedded therein a private key unique to it, and the camera housing has a public key that is so uniquely based upon the private key that digital data encrypted with the private key by the processor may be decrypted using the public key. The digital camera processor comprises means for calculating a hash of the image file using a predetermined algorithm, and second means for encrypting the image hash with the private key, thereby producing a digital signature. The image file and the digital signature are stored in suitable recording means so they will be available together. Apparatus for authenticating at any time the image file as being free of any alteration uses the public key for decrypting the digital signature, thereby deriving a secure image hash identical to the image hash produced by the digital camera and used to produce the digital signature. The apparatus calculates from the image file an image hash using the same algorithm as before. By comparing this last image hash with the secure image hash, authenticity of the image file is determined if they match, since even one bit change in the image hash will cause the image hash to be totally different from the secure hash.
Matrix-vector multiplication using digital partitioning for more accurate optical computing
NASA Technical Reports Server (NTRS)
Gary, C. K.
1992-01-01
Digital partitioning offers a flexible means of increasing the accuracy of an optical matrix-vector processor. This algorithm can be implemented with the same architecture required for a purely analog processor, which gives optical matrix-vector processors the ability to perform high-accuracy calculations at speeds comparable with or greater than electronic computers as well as the ability to perform analog operations at a much greater speed. Digital partitioning is compared with digital multiplication by analog convolution, residue number systems, and redundant number representation in terms of the size and the speed required for an equivalent throughput as well as in terms of the hardware requirements. Digital partitioning and digital multiplication by analog convolution are found to be the most efficient alogrithms if coding time and hardware are considered, and the architecture for digital partitioning permits the use of analog computations to provide the greatest throughput for a single processor.
Multi-Core Programming Design Patterns: Stream Processing Algorithms for Dynamic Scene Perceptions
2014-05-01
processor developed by IBM and other companies , incorpo- rates the verb—POWER5— processor as the Power Processor Element (PPE), one of the early general...deliver an power efficient single-precision peak performance of more than 256 GFlops. Substantially more raw power became available later, when nVIDIA ...algorithms, including IBM’s Cell/B.E., GPUs from NVidia and AMD and many-core CPUs from Intel.27 The vast growth of digital video content has been a
Digital Camera with Apparatus for Authentication of Images Produced from an Image File
NASA Technical Reports Server (NTRS)
Friedman, Gary L. (Inventor)
1996-01-01
A digital camera equipped with a processor for authentication of images produced from an image file taken by the digital camera is provided. The digital camera processor has embedded therein a private key unique to it, and the camera housing has a public key that is so uniquely related to the private key that digital data encrypted with the private key may be decrypted using the public key. The digital camera processor comprises means for calculating a hash of the image file using a predetermined algorithm, and second means for encrypting the image hash with the private key, thereby producing a digital signature. The image file and the digital signature are stored in suitable recording means so they will be available together. Apparatus for authenticating the image file as being free of any alteration uses the public key for decrypting the digital signature, thereby deriving a secure image hash identical to the image hash produced by the digital camera and used to produce the digital signature. The authenticating apparatus calculates from the image file an image hash using the same algorithm as before. By comparing this last image hash with the secure image hash, authenticity of the image file is determined if they match. Other techniques to address time-honored methods of deception, such as attaching false captions or inducing forced perspectives, are included.
Distributed digital signal processors for multi-body structures
NASA Technical Reports Server (NTRS)
Lee, Gordon K.
1990-01-01
Several digital filter designs were investigated which may be used to process sensor data from large space structures and to design digital hardware to implement the distributed signal processing architecture. Several experimental tests articles are available at NASA Langley Research Center to evaluate these designs. A summary of some of the digital filter designs is presented, an evaluation of their characteristics relative to control design is discussed, and candidate hardware microcontroller/microcomputer components are given. Future activities include software evaluation of the digital filter designs and actual hardware inplementation of some of the signal processor algorithms on an experimental testbed at NASA Langley.
Optimization of image processing algorithms on mobile platforms
NASA Astrophysics Data System (ADS)
Poudel, Pramod; Shirvaikar, Mukul
2011-03-01
This work presents a technique to optimize popular image processing algorithms on mobile platforms such as cell phones, net-books and personal digital assistants (PDAs). The increasing demand for video applications like context-aware computing on mobile embedded systems requires the use of computationally intensive image processing algorithms. The system engineer has a mandate to optimize them so as to meet real-time deadlines. A methodology to take advantage of the asymmetric dual-core processor, which includes an ARM and a DSP core supported by shared memory, is presented with implementation details. The target platform chosen is the popular OMAP 3530 processor for embedded media systems. It has an asymmetric dual-core architecture with an ARM Cortex-A8 and a TMS320C64x Digital Signal Processor (DSP). The development platform was the BeagleBoard with 256 MB of NAND RAM and 256 MB SDRAM memory. The basic image correlation algorithm is chosen for benchmarking as it finds widespread application for various template matching tasks such as face-recognition. The basic algorithm prototypes conform to OpenCV, a popular computer vision library. OpenCV algorithms can be easily ported to the ARM core which runs a popular operating system such as Linux or Windows CE. However, the DSP is architecturally more efficient at handling DFT algorithms. The algorithms are tested on a variety of images and performance results are presented measuring the speedup obtained due to dual-core implementation. A major advantage of this approach is that it allows the ARM processor to perform important real-time tasks, while the DSP addresses performance-hungry algorithms.
Real-time phase correlation based integrated system for seizure detection
NASA Astrophysics Data System (ADS)
Romaine, James B.; Delgado-Restituto, Manuel; Leñero-Bardallo, Juan A.; Rodríguez-Vázquez, Ángel
2017-05-01
This paper reports a low area, low power, integer-based digital processor for the calculation of phase synchronization between two neural signals. The processor calculates the phase-frequency content of a signal by identifying the specific time periods associated with two consecutive minima. The simplicity of this phase-frequency content identifier allows for the digital processor to utilize only basic digital blocks, such as registers, counters, adders and subtractors, without incorporating any complex multiplication and or division algorithms. In fact, the processor, fabricated in a 0.18μm CMOS process, only occupies an area of 0.0625μm2 and consumes 12.5nW from a 1.2V supply voltage when operated at 128kHz. These low-area, low-power features make the proposed processor a valuable computing element in closed loop neural prosthesis for the treatment of neural diseases, such as epilepsy, or for extracting functional connectivity maps between different recording sites in the brain.
Programmable Remapper with Single Flow Architecture
NASA Technical Reports Server (NTRS)
Fisher, Timothy E. (Inventor)
1993-01-01
An apparatus for image processing comprising a camera for receiving an original visual image and transforming the original visual image into an analog image, a first converter for transforming the analog image of the camera to a digital image, a processor having a single flow architecture for receiving the digital image and producing, with a single algorithm, an output image, a second converter for transforming the digital image of the processor to an analog image, and a viewer for receiving the analog image, transforming the analog image into a transformed visual image for observing the transformations applied to the original visual image. The processor comprises one or more subprocessors for the parallel reception of a digital image for producing an output matrix of the transformed visual image. More particularly, the processor comprises a plurality of subprocessors for receiving in parallel and transforming the digital image for producing a matrix of the transformed visual image, and an output interface means for receiving the respective portions of the transformed visual image from the respective subprocessor for producing an output matrix of the transformed visual image.
Digital signal processing algorithms for automatic voice recognition
NASA Technical Reports Server (NTRS)
Botros, Nazeih M.
1987-01-01
The current digital signal analysis algorithms are investigated that are implemented in automatic voice recognition algorithms. Automatic voice recognition means, the capability of a computer to recognize and interact with verbal commands. The digital signal is focused on, rather than the linguistic, analysis of speech signal. Several digital signal processing algorithms are available for voice recognition. Some of these algorithms are: Linear Predictive Coding (LPC), Short-time Fourier Analysis, and Cepstrum Analysis. Among these algorithms, the LPC is the most widely used. This algorithm has short execution time and do not require large memory storage. However, it has several limitations due to the assumptions used to develop it. The other 2 algorithms are frequency domain algorithms with not many assumptions, but they are not widely implemented or investigated. However, with the recent advances in the digital technology, namely signal processors, these 2 frequency domain algorithms may be investigated in order to implement them in voice recognition. This research is concerned with real time, microprocessor based recognition algorithms.
Single-Scale Retinex Using Digital Signal Processors
NASA Technical Reports Server (NTRS)
Hines, Glenn; Rahman, Zia-Ur; Jobson, Daniel; Woodell, Glenn
2005-01-01
The Retinex is an image enhancement algorithm that improves the brightness, contrast and sharpness of an image. It performs a non-linear spatial/spectral transform that provides simultaneous dynamic range compression and color constancy. It has been used for a wide variety of applications ranging from aviation safety to general purpose photography. Many potential applications require the use of Retinex processing at video frame rates. This is difficult to achieve with general purpose processors because the algorithm contains a large number of complex computations and data transfers. In addition, many of these applications also constrain the potential architectures to embedded processors to save power, weight and cost. Thus we have focused on digital signal processors (DSPs) and field programmable gate arrays (FPGAs) as potential solutions for real-time Retinex processing. In previous efforts we attained a 21 (full) frame per second (fps) processing rate for the single-scale monochromatic Retinex with a TMS320C6711 DSP operating at 150 MHz. This was achieved after several significant code improvements and optimizations. Since then we have migrated our design to the slightly more powerful TMS320C6713 DSP and the fixed point TMS320DM642 DSP. In this paper we briefly discuss the Retinex algorithm, the performance of the algorithm executing on the TMS320C6713 and the TMS320DM642, and compare the results with the TMS320C6711.
Plural-wavelength flame detector that discriminates between direct and reflected radiation
NASA Technical Reports Server (NTRS)
Hall, Gregory H. (Inventor); Barnes, Heidi L. (Inventor); Medelius, Pedro J. (Inventor); Simpson, Howard J. (Inventor); Smith, Harvey S. (Inventor)
1997-01-01
A flame detector employs a plurality of wavelength selective radiation detectors and a digital signal processor programmed to analyze each of the detector signals, and determine whether radiation is received directly from a small flame source that warrants generation of an alarm. The processor's algorithm employs a normalized cross-correlation analysis of the detector signals to discriminate between radiation received directly from a flame and radiation received from a reflection of a flame to insure that reflections will not trigger an alarm. In addition, the algorithm employs a Fast Fourier Transform (FFT) frequency spectrum analysis of one of the detector signals to discriminate between flames of different sizes. In a specific application, the detector incorporates two infrared (IR) detectors and one ultraviolet (UV) detector for discriminating between a directly sensed small hydrogen flame, and reflections from a large hydrogen flame. The signals generated by each of the detectors are sampled and digitized for analysis by the digital signal processor, preferably 250 times a second. A sliding time window of approximately 30 seconds of detector data is created using FIFO memories.
Adaptive control for accelerators
Eaton, Lawrie E.; Jachim, Stephen P.; Natter, Eckard F.
1991-01-01
An adaptive feedforward control loop is provided to stabilize accelerator beam loading of the radio frequency field in an accelerator cavity during successive pulses of the beam into the cavity. A digital signal processor enables an adaptive algorithm to generate a feedforward error correcting signal functionally determined by the feedback error obtained by a beam pulse loading the cavity after the previous correcting signal was applied to the cavity. Each cavity feedforward correcting signal is successively stored in the digital processor and modified by the feedback error resulting from its application to generate the next feedforward error correcting signal. A feedforward error correcting signal is generated by the digital processor in advance of the beam pulse to enable a composite correcting signal and the beam pulse to arrive concurrently at the cavity.
Processing techniques for software based SAR processors
NASA Technical Reports Server (NTRS)
Leung, K.; Wu, C.
1983-01-01
Software SAR processing techniques defined to treat Shuttle Imaging Radar-B (SIR-B) data are reviewed. The algorithms are devised for the data processing procedure selection, SAR correlation function implementation, multiple array processors utilization, cornerturning, variable reference length azimuth processing, and range migration handling. The Interim Digital Processor (IDP) originally implemented for handling Seasat SAR data has been adapted for the SIR-B, and offers a resolution of 100 km using a processing procedure based on the Fast Fourier Transformation fast correlation approach. Peculiarities of the Seasat SAR data processing requirements are reviewed, along with modifications introduced for the SIR-B. An Advanced Digital SAR Processor (ADSP) is under development for use with the SIR-B in the 1986 time frame as an upgrade for the IDP, which will be in service in 1984-5.
Fault tolerant, radiation hard, high performance digital signal processor
NASA Technical Reports Server (NTRS)
Holmann, Edgar; Linscott, Ivan R.; Maurer, Michael J.; Tyler, G. L.; Libby, Vibeke
1990-01-01
An architecture has been developed for a high-performance VLSI digital signal processor that is highly reliable, fault-tolerant, and radiation-hard. The signal processor, part of a spacecraft receiver designed to support uplink radio science experiments at the outer planets, organizes the connections between redundant arithmetic resources, register files, and memory through a shuffle exchange communication network. The configuration of the network and the state of the processor resources are all under microprogram control, which both maps the resources according to algorithmic needs and reconfigures the processing should a failure occur. In addition, the microprogram is reloadable through the uplink to accommodate changes in the science objectives throughout the course of the mission. The processor will be implemented with silicon compiler tools, and its design will be verified through silicon compilation simulation at all levels from the resources to full functionality. By blending reconfiguration with redundancy the processor implementation is fault-tolerant and reliable, and possesses the long expected lifetime needed for a spacecraft mission to the outer planets.
The design of an adaptive predictive coder using a single-chip digital signal processor
NASA Astrophysics Data System (ADS)
Randolph, M. A.
1985-01-01
A speech coding processor architecture design study has been performed in which Texas Instruments TMS32010 has been selected from among three commercially available digital signal processing integrated circuits and evaluated in an implementation study of real-time Adaptive Predictive Coding (APC). The TMS32010 has been compared with AR&T Bell Laboratories DSP I and Nippon Electric Co. PD7720 and was found to be most suitable for a single chip implementation of APC. A preliminary design system based on TMS32010 has been performed, and several of the hardware and software design issues are discussed. Particular attention was paid to the design of an external memory controller which permits rapid sequential access of external RAM. As a result, it has been determined that a compact hardware implementation of the APC algorithm is feasible based of the TSM32010. Originator-supplied keywords include: vocoders, speech compression, adaptive predictive coding, digital signal processing microcomputers, speech processor architectures, and special purpose processor.
High speed quantitative digital microscopy
NASA Technical Reports Server (NTRS)
Castleman, K. R.; Price, K. H.; Eskenazi, R.; Ovadya, M. M.; Navon, M. A.
1984-01-01
Modern digital image processing hardware makes possible quantitative analysis of microscope images at high speed. This paper describes an application to automatic screening for cervical cancer. The system uses twelve MC6809 microprocessors arranged in a pipeline multiprocessor configuration. Each processor executes one part of the algorithm on each cell image as it passes through the pipeline. Each processor communicates with its upstream and downstream neighbors via shared two-port memory. Thus no time is devoted to input-output operations as such. This configuration is expected to be at least ten times faster than previous systems.
Interactive digital signal processor
NASA Technical Reports Server (NTRS)
Mish, W. H.; Wenger, R. M.; Behannon, K. W.; Byrnes, J. B.
1982-01-01
The Interactive Digital Signal Processor (IDSP) is examined. It consists of a set of time series analysis Operators each of which operates on an input file to produce an output file. The operators can be executed in any order that makes sense and recursively, if desired. The operators are the various algorithms used in digital time series analysis work. User written operators can be easily interfaced to the sysatem. The system can be operated both interactively and in batch mode. In IDSP a file can consist of up to n (currently n=8) simultaneous time series. IDSP currently includes over thirty standard operators that range from Fourier transform operations, design and application of digital filters, eigenvalue analysis, to operators that provide graphical output, allow batch operation, editing and display information.
2007-12-11
Implemented both carrier and code phase tracking loop for performance evaluation of a minimum power beam forming algorithm and null steering algorithm...4 Antennal Antenna2 Antenna K RF RF RF ct, Ct~2 ChKx1 X2 ....... Xk A W ~ ~ =Z, x W ,=1 Fig. 5. Schematics of a K-element antenna array spatial...adaptive processor Antennal Antenna K A N-i V/ ( Vil= .i= VK Fig. 6. Schematics of a K-element antenna array space-time adaptive processor Two additional
NASA Astrophysics Data System (ADS)
Lhamon, Michael Earl
A pattern recognition system which uses complex correlation filter banks requires proportionally more computational effort than single-real valued filters. This introduces increased computation burden but also introduces a higher level of parallelism, that common computing platforms fail to identify. As a result, we consider algorithm mapping to both optical and digital processors. For digital implementation, we develop computationally efficient pattern recognition algorithms, referred to as, vector inner product operators that require less computational effort than traditional fast Fourier methods. These algorithms do not need correlation and they map readily onto parallel digital architectures, which imply new architectures for optical processors. These filters exploit circulant-symmetric matrix structures of the training set data representing a variety of distortions. By using the same mathematical basis as with the vector inner product operations, we are able to extend the capabilities of more traditional correlation filtering to what we refer to as "Super Images". These "Super Images" are used to morphologically transform a complicated input scene into a predetermined dot pattern. The orientation of the dot pattern is related to the rotational distortion of the object of interest. The optical implementation of "Super Images" yields feature reduction necessary for using other techniques, such as artificial neural networks. We propose a parallel digital signal processor architecture based on specific pattern recognition algorithms but general enough to be applicable to other similar problems. Such an architecture is classified as a data flow architecture. Instead of mapping an algorithm to an architecture, we propose mapping the DSP architecture to a class of pattern recognition algorithms. Today's optical processing systems have difficulties implementing full complex filter structures. Typically, optical systems (like the 4f correlators) are limited to phase-only implementation with lower detection performance than full complex electronic systems. Our study includes pseudo-random pixel encoding techniques for approximating full complex filtering. Optical filter bank implementation is possible and they have the advantage of time averaging the entire filter bank at real time rates. Time-averaged optical filtering is computational comparable to billions of digital operations-per-second. For this reason, we believe future trends in high speed pattern recognition will involve hybrid architectures of both optical and DSP elements.
Low-Light Image Enhancement Using Adaptive Digital Pixel Binning
Yoo, Yoonjong; Im, Jaehyun; Paik, Joonki
2015-01-01
This paper presents an image enhancement algorithm for low-light scenes in an environment with insufficient illumination. Simple amplification of intensity exhibits various undesired artifacts: noise amplification, intensity saturation, and loss of resolution. In order to enhance low-light images without undesired artifacts, a novel digital binning algorithm is proposed that considers brightness, context, noise level, and anti-saturation of a local region in the image. The proposed algorithm does not require any modification of the image sensor or additional frame-memory; it needs only two line-memories in the image signal processor (ISP). Since the proposed algorithm does not use an iterative computation, it can be easily embedded in an existing digital camera ISP pipeline containing a high-resolution image sensor. PMID:26121609
Negative base encoding in optical linear algebra processors
NASA Technical Reports Server (NTRS)
Perlee, C.; Casasent, D.
1986-01-01
In the digital multiplication by analog convolution algorithm, the bits of two encoded numbers are convolved to form the product of the two numbers in mixed binary representation; this output can be easily converted to binary. Attention is presently given to negative base encoding, treating base -2 initially, and then showing that the negative base system can be readily extended to any radix. In general, negative base encoding in optical linear algebra processors represents a more efficient technique than either sign magnitude or 2's complement encoding, when the additions of digitally encoded products are performed in parallel.
ATCA digital controller hardware for vertical stabilization of plasmas in tokamaks
DOE Office of Scientific and Technical Information (OSTI.GOV)
Batista, A. J. N.; Sousa, J.; Varandas, C. A. F.
2006-10-15
The efficient vertical stabilization (VS) of plasmas in tokamaks requires a fast reaction of the VS controller, for example, after detection of edge localized modes (ELM). For controlling the effects of very large ELMs a new digital control hardware, based on the Advanced Telecommunications Computing Architecture trade mark sign (ATCA), is being developed aiming to reduce the VS digital control loop cycle (down to an optimal value of 10 {mu}s) and improve the algorithm performance. The system has 1 ATCA trade mark sign processor module and up to 12 ATCA trade mark sign control modules, each one with 32 analogmore » input channels (12 bit resolution), 4 analog output channels (12 bit resolution), and 8 digital input/output channels. The Aurora trade mark sign and PCI Express trade mark sign communication protocols will be used for data transport, between modules, with expected latencies below 2 {mu}s. Control algorithms are implemented on a ix86 based processor with 6 Gflops and on field programmable gate arrays with 80 GMACS, interconnected by serial gigabit links in a full mesh topology.« less
A digitally implemented preambleless demodulator for maritime and mobile data communications
NASA Astrophysics Data System (ADS)
Chalmers, Harvey; Shenoy, Ajit; Verahrami, Farhad B.
The hardware design and software algorithms for a low-bit-rate, low-cost, all-digital preambleless demodulator are described. The demodulator operates under severe high-noise conditions, fast Doppler frequency shifts, large frequency offsets, and multipath fading. Sophisticated algorithms, including a fast Fourier transform (FFT)-based burst acquisition algorithm, a cycle-slip resistant carrier phase tracker, an innovative Doppler tracker, and a fast acquisition symbol synchronizer, were developed and extensively simulated for reliable burst reception. The compact digital signal processor (DSP)-based demodulator hardware uses a unique personal computer test interface for downloading test data files. The demodulator test results demonstrate a near-ideal performance within 0.2 dB of theory.
[Image processing system of visual prostheses based on digital signal processor DM642].
Xie, Chengcheng; Lu, Yanyu; Gu, Yun; Wang, Jing; Chai, Xinyu
2011-09-01
This paper employed a DSP platform to create the real-time and portable image processing system, and introduced a series of commonly used algorithms for visual prostheses. The results of performance evaluation revealed that this platform could afford image processing algorithms to be executed in real time.
NASA Astrophysics Data System (ADS)
Fellman, Ronald D.; Kaneshiro, Ronald T.; Konstantinides, Konstantinos
1990-03-01
The authors present the design and evaluation of an architecture for a monolithic, programmable, floating-point digital signal processor (DSP) for instrumentation applications. An investigation of the most commonly used algorithms in instrumentation led to a design that satisfies the requirements for high computational and I/O (input/output) throughput. In the arithmetic unit, a 16- x 16-bit multiplier and a 32-bit accumulator provide the capability for single-cycle multiply/accumulate operations, and three format adjusters automatically adjust the data format for increased accuracy and dynamic range. An on-chip I/O unit is capable of handling data block transfers through a direct memory access port and real-time data streams through a pair of parallel I/O ports. I/O operations and program execution are performed in parallel. In addition, the processor includes two data memories with independent addressing units, a microsequencer with instruction RAM, and multiplexers for internal data redirection. The authors also present the structure and implementation of a design environment suitable for the algorithmic, behavioral, and timing simulation of a complete DSP system. Various benchmarking results are reported.
Processors for wavelet analysis and synthesis: NIFS and TI-C80 MVP
NASA Astrophysics Data System (ADS)
Brooks, Geoffrey W.
1996-03-01
Two processors are considered for image quadrature mirror filtering (QMF). The neuromorphic infrared focal-plane sensor (NIFS) is an existing prototype analog processor offering high speed spatio-temporal Gaussian filtering, which could be used for the QMF low- pass function, and difference of Gaussian filtering, which could be used for the QMF high- pass function. Although not designed specifically for wavelet analysis, the biologically- inspired system accomplishes the most computationally intensive part of QMF processing. The Texas Instruments (TI) TMS320C80 Multimedia Video Processor (MVP) is a 32-bit RISC master processor with four advanced digital signal processors (DSPs) on a single chip. Algorithm partitioning, memory management and other issues are considered for optimal performance. This paper presents these considerations with simulated results leading to processor implementation of high-speed QMF analysis and synthesis.
Real-time digital holographic microscopy using the graphic processing unit.
Shimobaba, Tomoyoshi; Sato, Yoshikuni; Miura, Junya; Takenouchi, Mai; Ito, Tomoyoshi
2008-08-04
Digital holographic microscopy (DHM) is a well-known powerful method allowing both the amplitude and phase of a specimen to be simultaneously observed. In order to obtain a reconstructed image from a hologram, numerous calculations for the Fresnel diffraction are required. The Fresnel diffraction can be accelerated by the FFT (Fast Fourier Transform) algorithm. However, real-time reconstruction from a hologram is difficult even if we use a recent central processing unit (CPU) to calculate the Fresnel diffraction by the FFT algorithm. In this paper, we describe a real-time DHM system using a graphic processing unit (GPU) with many stream processors, which allows use as a highly parallel processor. The computational speed of the Fresnel diffraction using the GPU is faster than that of recent CPUs. The real-time DHM system can obtain reconstructed images from holograms whose size is 512 x 512 grids in 24 frames per second.
FPGA implementation of digital down converter using CORDIC algorithm
NASA Astrophysics Data System (ADS)
Agarwal, Ashok; Lakshmi, Boppana
2013-01-01
In radio receivers, Digital Down Converters (DDC) are used to translate the signal from Intermediate Frequency level to baseband. It also decimates the oversampled signal to a lower sample rate, eliminating the need of a high end digital signal processors. In this paper we have implemented architecture for DDC employing CORDIC algorithm, which down converts an IF signal of 70MHz (3G) to 200 KHz baseband GSM signal, with an SFDR greater than 100dB. The implemented architecture reduces the hardware resource requirements by 15 percent when compared with other architecture available in the literature due to elimination of explicit multipliers and a quadrature phase shifter for mixing.
Concept For Generation Of Long Pseudorandom Sequences
NASA Technical Reports Server (NTRS)
Wang, C. C.
1990-01-01
Conceptual very-large-scale integrated (VLSI) digital circuit performs exponentiation in finite field. Algorithm that generates unusually long sequences of pseudorandom numbers executed by digital processor that includes such circuits. Concepts particularly advantageous for such applications as spread-spectrum communications, cryptography, and generation of ranging codes, synthetic noise, and test data, where usually desirable to make pseudorandom sequences as long as possible.
NASA Technical Reports Server (NTRS)
Kriegler, F.; Marshall, R.; Lampert, S.; Gordon, M.; Cornell, C.; Kistler, R.
1973-01-01
The MIDAS system is a prototype, multiple-pipeline digital processor mechanizing the multivariate-Gaussian, maximum-likelihood decision algorithm operating at 200,000 pixels/second. It incorporates displays and film printer equipment under control of a general purpose midi-computer and possesses sufficient flexibility that operational versions of the equipment may be subsequently specified as subsets of the system.
Analysis and simulation tools for solar array power systems
NASA Astrophysics Data System (ADS)
Pongratananukul, Nattorn
This dissertation presents simulation tools developed specifically for the design of solar array power systems. Contributions are made in several aspects of the system design phases, including solar source modeling, system simulation, and controller verification. A tool to automate the study of solar array configurations using general purpose circuit simulators has been developed based on the modeling of individual solar cells. Hierarchical structure of solar cell elements, including semiconductor properties, allows simulation of electrical properties as well as the evaluation of the impact of environmental conditions. A second developed tool provides a co-simulation platform with the capability to verify the performance of an actual digital controller implemented in programmable hardware such as a DSP processor, while the entire solar array including the DC-DC power converter is modeled in software algorithms running on a computer. This "virtual plant" allows developing and debugging code for the digital controller, and also to improve the control algorithm. One important task in solar arrays is to track the maximum power point on the array in order to maximize the power that can be delivered. Digital controllers implemented with programmable processors are particularly attractive for this task because sophisticated tracking algorithms can be implemented and revised when needed to optimize their performance. The proposed co-simulation tools are thus very valuable in developing and optimizing the control algorithm, before the system is built. Examples that demonstrate the effectiveness of the proposed methodologies are presented. The proposed simulation tools are also valuable in the design of multi-channel arrays. In the specific system that we have designed and tested, the control algorithm is implemented on a single digital signal processor. In each of the channels the maximum power point is tracked individually. In the prototype we built, off-the-shelf commercial DC-DC converters were utilized. At the end, the overall performance of the entire system was evaluated using solar array simulators capable of simulating various I-V characteristics, and also by using an electronic load. Experimental results are presented.
The 2nd Symposium on the Frontiers of Massively Parallel Computations
NASA Technical Reports Server (NTRS)
Mills, Ronnie (Editor)
1988-01-01
Programming languages, computer graphics, neural networks, massively parallel computers, SIMD architecture, algorithms, digital terrain models, sort computation, simulation of charged particle transport on the massively parallel processor and image processing are among the topics discussed.
A parallel algorithm for switch-level timing simulation on a hypercube multiprocessor
NASA Technical Reports Server (NTRS)
Rao, Hariprasad Nannapaneni
1989-01-01
The parallel approach to speeding up simulation is studied, specifically the simulation of digital LSI MOS circuitry on the Intel iPSC/2 hypercube. The simulation algorithm is based on RSIM, an event driven switch-level simulator that incorporates a linear transistor model for simulating digital MOS circuits. Parallel processing techniques based on the concepts of Virtual Time and rollback are utilized so that portions of the circuit may be simulated on separate processors, in parallel for as large an increase in speed as possible. A partitioning algorithm is also developed in order to subdivide the circuit for parallel processing.
A case study for the real-time experimental evaluation of the VIPER microprocessor
NASA Astrophysics Data System (ADS)
Carreno, Victor A.; Angellatta, Rob K.
1991-09-01
An experiment to evaluate the applicability of the Verifiable Integrated Processor for Enhanced Reliability (VIPER) microprocessor to real time control is described. The VIPER microprocessor was invented by the Royal Signals and Radar Establishment (RSRE), U.K., and is an example of the use of formal mathematical methods for developing electronic digital systems with a high degree of assurance on the system design and implementation correctness. The experiment consisted of selecting a control law, writing the control law algorithm for the VIPER processor, and providing real time, dynamic inputs into the processor and monitoring the outputs. The control law selected and coded for the VIPER processor was the yaw damper function of an automatic landing program for a 737 aircraft. The mechanisms for interfacing the VIPER Single Board Computer to the VAX host are described. Results include run time experiences, performance evaluation, and comparison of VIPER and FORTRAN yaw damper algorithm output for accuracy estimation.
A case study for the real-time experimental evaluation of the VIPER microprocessor
NASA Technical Reports Server (NTRS)
Carreno, Victor A.; Angellatta, Rob K.
1991-01-01
An experiment to evaluate the applicability of the Verifiable Integrated Processor for Enhanced Reliability (VIPER) microprocessor to real time control is described. The VIPER microprocessor was invented by the Royal Signals and Radar Establishment (RSRE), U.K., and is an example of the use of formal mathematical methods for developing electronic digital systems with a high degree of assurance on the system design and implementation correctness. The experiment consisted of selecting a control law, writing the control law algorithm for the VIPER processor, and providing real time, dynamic inputs into the processor and monitoring the outputs. The control law selected and coded for the VIPER processor was the yaw damper function of an automatic landing program for a 737 aircraft. The mechanisms for interfacing the VIPER Single Board Computer to the VAX host are described. Results include run time experiences, performance evaluation, and comparison of VIPER and FORTRAN yaw damper algorithm output for accuracy estimation.
A Nonlinear Digital Control Solution for a DC/DC Power Converter
NASA Technical Reports Server (NTRS)
Zhu, Minshao
2002-01-01
A digital Nonlinear Proportional-Integral-Derivative (NPID) control algorithm was proposed to control a 1-kW, PWM, DC/DC, switching power converter. The NPID methodology is introduced and a practical hardware control solution is obtained. The design of the controller was completed using Matlab (trademark) Simulink, while the hardware-in-the-loop testing was performed using both the dSPACE (trademark) rapid prototyping system, and a stand-alone Texas Instruments (trademark) Digital Signal Processor (DSP)-based system. The final Nonlinear digital control algorithm was implemented and tested using the ED408043-1 Westinghouse DC-DC switching power converter. The NPID test results are discussed and compared to the results of a standard Proportional-Integral (PI) controller.
2010-09-01
53 Figure 26. Image of the phased array antenna...................................................................54...69 Figure 38. Computation of correction angle from array factor and sum/difference beams...71 Figure 39. Front panel of the tracking algorithm
On-board multicarrier demodulator for mobile applications using DSP implementation
NASA Astrophysics Data System (ADS)
Yim, W. H.; Kwan, C. C. D.; Coakley, F. P.; Evans, B. G.
1990-11-01
This paper describes the design and implementation of an on-board multicarrier demodulator using commercial digital signal processors. This is for use in a mobile satellite communication system employing an up-link SCPC/FDMA scheme. Channels are separated by a flexible multistage digital filter bank followed by a channel multiplexed digital demodulator array. The cross/dot product design approach of error detector leads to a new QPSK frequency control algorithm that allows fast acquisition without special preamble pattern. Timing correction is performed digitally using an extended stack of polyphase sub-filters.
Accuracy requirements of optical linear algebra processors in adaptive optics imaging systems
NASA Technical Reports Server (NTRS)
Downie, John D.
1990-01-01
A ground-based adaptive optics imaging telescope system attempts to improve image quality by detecting and correcting for atmospherically induced wavefront aberrations. The required control computations during each cycle will take a finite amount of time. Longer time delays result in larger values of residual wavefront error variance since the atmosphere continues to change during that time. Thus an optical processor may be well-suited for this task. This paper presents a study of the accuracy requirements in a general optical processor that will make it competitive with, or superior to, a conventional digital computer for the adaptive optics application. An optimization of the adaptive optics correction algorithm with respect to an optical processor's degree of accuracy is also briefly discussed.
Digital Beamforming Scatterometer
NASA Technical Reports Server (NTRS)
Rincon, Rafael F.; Vega, Manuel; Kman, Luko; Buenfil, Manuel; Geist, Alessandro; Hillard, Larry; Racette, Paul
2009-01-01
This paper discusses scatterometer measurements collected with multi-mode Digital Beamforming Synthetic Aperture Radar (DBSAR) during the SMAP-VEX 2008 campaign. The 2008 SMAP Validation Experiment was conducted to address a number of specific questions related to the soil moisture retrieval algorithms. SMAP-VEX 2008 consisted on a series of aircraft-based.flights conducted on the Eastern Shore of Maryland and Delaware in the fall of 2008. Several other instruments participated in the campaign including the Passive Active L-Band System (PALS), the Marshall Airborne Polarimetric Imaging Radiometer (MAPIR), and the Global Positioning System Reflectometer (GPSR). This campaign was the first SMAP Validation Experiment. DBSAR is a multimode radar system developed at NASA/Goddard Space Flight Center that combines state-of-the-art radar technologies, on-board processing, and advances in signal processing techniques in order to enable new remote sensing capabilities applicable to Earth science and planetary applications [l]. The instrument can be configured to operate in scatterometer, Synthetic Aperture Radar (SAR), or altimeter mode. The system builds upon the L-band Imaging Scatterometer (LIS) developed as part of the RadSTAR program. The radar is a phased array system designed to fly on the NASA P3 aircraft. The instrument consists of a programmable waveform generator, eight transmit/receive (T/R) channels, a microstrip antenna, and a reconfigurable data acquisition and processor system. Each transmit channel incorporates a digital attenuator, and digital phase shifter that enables amplitude and phase modulation on transmit. The attenuators, phase shifters, and calibration switches are digitally controlled by the radar control card (RCC) on a pulse by pulse basis. The antenna is a corporate fed microstrip patch-array centered at 1.26 GHz with a 20 MHz bandwidth. Although only one feed is used with the present configuration, a provision was made for separate corporate feeds for vertical and horizontal polarization. System upgrades to dual polarization are currently under way. The DBSAR processor is a reconfigurable data acquisition and processor system capable of real-time, high-speed data processing. DBSAR uses an FPGA-based architecture to implement digitally down-conversion, in-phase and quadrature (I/Q) demodulation, and subsequent radar specific algorithms. The core of the processor board consists of an analog-to-digital (AID) section, three Altera Stratix field programmable gate arrays (FPGAs), an ARM microcontroller, several memory devices, and an Ethernet interface. The processor also interfaces with a navigation board consisting of a GPS and a MEMS gyro. The processor has been configured to operate in scatterometer, Synthetic Aperture Radar (SAR), and altimeter modes. All the modes are based on digital beamforming which is a digital process that generates the far-field beam patterns at various scan angles from voltages sampled in the antenna array. This technique allows steering the received beam and controlling its beam-width and side-lobe. Several beamforming techniques can be implemented each characterized by unique strengths and weaknesses, and each applicable to different measurement scenarios. In Scatterometer mode, the radar is capable to.generate a wide beam or scan a narrow beam on transmit, and to steer the received beam on processing while controlling its beamwidth and side-lobe level. Table I lists some important radar characteristics
Preliminary Study of Image Reconstruction Algorithm on a Digital Signal Processor
2014-03-01
5.2 Comparison of CPU-GPU, CPU-FPGA, and CPU-DSP Designs The work for implementing VHDL description of the back-projection algorithm on a physical...FPGA was not complete. Hence, the DSP implementation results are compared with the simulated results for the VHDL design. Simulating VHDL provides an...rather than at the software level. Depending on an application’s characteristics, FPGA implementations can provide a significant performance
Optimization of the coherence function estimation for multi-core central processing unit
NASA Astrophysics Data System (ADS)
Cheremnov, A. G.; Faerman, V. A.; Avramchuk, V. S.
2017-02-01
The paper considers use of parallel processing on multi-core central processing unit for optimization of the coherence function evaluation arising in digital signal processing. Coherence function along with other methods of spectral analysis is commonly used for vibration diagnosis of rotating machinery and its particular nodes. An algorithm is given for the function evaluation for signals represented with digital samples. The algorithm is analyzed for its software implementation and computational problems. Optimization measures are described, including algorithmic, architecture and compiler optimization, their results are assessed for multi-core processors from different manufacturers. Thus, speeding-up of the parallel execution with respect to sequential execution was studied and results are presented for Intel Core i7-4720HQ и AMD FX-9590 processors. The results show comparatively high efficiency of the optimization measures taken. In particular, acceleration indicators and average CPU utilization have been significantly improved, showing high degree of parallelism of the constructed calculating functions. The developed software underwent state registration and will be used as a part of a software and hardware solution for rotating machinery fault diagnosis and pipeline leak location with acoustic correlation method.
NASA Technical Reports Server (NTRS)
Krosel, S. M.; Milner, E. J.
1982-01-01
The application of Predictor corrector integration algorithms developed for the digital parallel processing environment are investigated. The algorithms are implemented and evaluated through the use of a software simulator which provides an approximate representation of the parallel processing hardware. Test cases which focus on the use of the algorithms are presented and a specific application using a linear model of a turbofan engine is considered. Results are presented showing the effects of integration step size and the number of processors on simulation accuracy. Real time performance, interprocessor communication, and algorithm startup are also discussed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barhen, Jacob; Imam, Neena
2007-01-01
Revolutionary computing technologies are defined in terms of technological breakthroughs, which leapfrog over near-term projected advances in conventional hardware and software to produce paradigm shifts in computational science. For underwater threat source localization using information provided by a dynamical sensor network, one of the most promising computational advances builds upon the emergence of digital optical-core devices. In this article, we present initial results of sensor network calculations that focus on the concept of signal wavefront time-difference-of-arrival (TDOA). The corresponding algorithms are implemented on the EnLight processing platform recently introduced by Lenslet Laboratories. This tera-scale digital optical core processor is optimizedmore » for array operations, which it performs in a fixed-point-arithmetic architecture. Our results (i) illustrate the ability to reach the required accuracy in the TDOA computation, and (ii) demonstrate that a considerable speed-up can be achieved when using the EnLight 64a prototype processor as compared to a dual Intel XeonTM processor.« less
A scalable SIMD digital signal processor for high-quality multifunctional printer systems
NASA Astrophysics Data System (ADS)
Kang, Hyeong-Ju; Choi, Yongwoo; Kim, Kimo; Park, In-Cheol; Kim, Jung-Wook; Lee, Eul-Hwan; Gahang, Goo-Soo
2005-01-01
This paper describes a high-performance scalable SIMD digital signal processor (DSP) developed for multifunctional printer systems. The DSP supports a variable number of datapaths to cover a wide range of performance and maintain a RISC-like pipeline structure. Many special instructions suitable for image processing algorithms are included in the DSP. Quad/dual instructions are introduced for 8-bit or 16-bit data, and bit-field extraction/insertion instructions are supported to process various data types. Conditional instructions are supported to deal with complex relative conditions efficiently. In addition, an intelligent DMA block is integrated to align data in the course of data reading. Experimental results show that the proposed DSP outperforms a high-end printer-system DSP by at least two times.
Digital Waveguide Architectures for Virtual Musical Instruments
NASA Astrophysics Data System (ADS)
Smith, Julius O.
Digital sound synthesis has become a standard staple of modern music studios, videogames, personal computers, and hand-held devices. As processing power has increased over the years, sound synthesis implementations have evolved from dedicated chip sets, to single-chip solutions, and ultimately to software implementations within processors used primarily for other tasks (such as for graphics or general purpose computing). With the cost of implementation dropping closer and closer to zero, there is increasing room for higher quality algorithms.
Real-time image processing of TOF range images using a reconfigurable processor system
NASA Astrophysics Data System (ADS)
Hussmann, S.; Knoll, F.; Edeler, T.
2011-07-01
During the last years, Time-of-Flight sensors achieved a significant impact onto research fields in machine vision. In comparison to stereo vision system and laser range scanners they combine the advantages of active sensors providing accurate distance measurements and camera-based systems recording a 2D matrix at a high frame rate. Moreover low cost 3D imaging has the potential to open a wide field of additional applications and solutions in markets like consumer electronics, multimedia, digital photography, robotics and medical technologies. This paper focuses on the currently implemented 4-phase-shift algorithm in this type of sensors. The most time critical operation of the phase-shift algorithm is the arctangent function. In this paper a novel hardware implementation of the arctangent function using a reconfigurable processor system is presented and benchmarked against the state-of-the-art CORDIC arctangent algorithm. Experimental results show that the proposed algorithm is well suited for real-time processing of the range images of TOF cameras.
Chakrabartty, Shantanu; Shaga, Ravi K; Aono, Kenji
2013-04-01
Analog circuits that are calibrated using digital-to-analog converters (DACs) use a digital signal processor-based algorithm for real-time adaptation and programming of system parameters. In this paper, we first show that this conventional framework for adaptation yields suboptimal calibration properties because of artifacts introduced by quantization noise. We then propose a novel online stochastic optimization algorithm called noise-shaping or ΣΔ gradient descent, which can shape the quantization noise out of the frequency regions spanning the parameter adaptation trajectories. As a result, the proposed algorithms demonstrate superior parameter search properties compared to floating-point gradient methods and better convergence properties than conventional quantized gradient-methods. In the second part of this paper, we apply the ΣΔ gradient descent algorithm to two examples of real-time digital calibration: 1) balancing and tracking of bias currents, and 2) frequency calibration of a band-pass Gm-C biquad filter biased in weak inversion. For each of these examples, the circuits have been prototyped in a 0.5-μm complementary metal-oxide-semiconductor process, and we demonstrate that the proposed algorithm is able to find the optimal solution even in the presence of spurious local minima, which are introduced by the nonlinear and non-monotonic response of calibration DACs.
Real-time support for high performance aircraft operation
NASA Technical Reports Server (NTRS)
Vidal, Jacques J.
1989-01-01
The feasibility of real-time processing schemes using artificial neural networks (ANNs) is investigated. A rationale for digital neural nets is presented and a general processor architecture for control applications is illustrated. Research results on ANN structures for real-time applications are given. Research results on ANN algorithms for real-time control are also shown.
Power estimation on functional level for programmable processors
NASA Astrophysics Data System (ADS)
Schneider, M.; Blume, H.; Noll, T. G.
2004-05-01
In diesem Beitrag werden verschiedene Ansätze zur Verlustleistungsschätzung von programmierbaren Prozessoren vorgestellt und bezüglich ihrer Übertragbarkeit auf moderne Prozessor-Architekturen wie beispielsweise Very Long Instruction Word (VLIW)-Architekturen bewertet. Besonderes Augenmerk liegt hierbei auf dem Konzept der sogenannten Functional-Level Power Analysis (FLPA). Dieser Ansatz basiert auf der Einteilung der Prozessor-Architektur in funktionale Blöcke wie beispielsweise Processing-Unit, Clock-Netzwerk, interner Speicher und andere. Die Verlustleistungsaufnahme dieser Bl¨ocke wird parameterabhängig durch arithmetische Modellfunktionen beschrieben. Durch automatisierte Analyse von Assemblercodes des zu schätzenden Systems mittels eines Parsers können die Eingangsparameter wie beispielsweise der erzielte Parallelitätsgrad oder die Art des Speicherzugriffs gewonnen werden. Dieser Ansatz wird am Beispiel zweier moderner digitaler Signalprozessoren durch eine Vielzahl von Basis-Algorithmen der digitalen Signalverarbeitung evaluiert. Die ermittelten Schätzwerte für die einzelnen Algorithmen werden dabei mit physikalisch gemessenen Werten verglichen. Es ergibt sich ein sehr kleiner maximaler Schätzfehler von 3%. In this contribution different approaches for power estimation for programmable processors are presented and evaluated concerning their capability to be applied to modern digital signal processor architectures like e.g. Very Long InstructionWord (VLIW) -architectures. Special emphasis will be laid on the concept of so-called Functional-Level Power Analysis (FLPA). This approach is based on the separation of the processor architecture into functional blocks like e.g. processing unit, clock network, internal memory and others. The power consumption of these blocks is described by parameter dependent arithmetic model functions. By application of a parser based automized analysis of assembler codes of the systems to be estimated the input parameters of the Correspondence to: H. Blume (blume@eecs.rwth-aachen.de) arithmetic functions like e.g. the achieved degree of parallelism or the kind and number of memory accesses can be computed. This approach is exemplarily demonstrated and evaluated applying two modern digital signal processors and a variety of basic algorithms of digital signal processing. The resulting estimation values for the inspected algorithms are compared to physically measured values. A resulting maximum estimation error of 3% is achieved.
Yang, Yiwei; Xu, Yuejin; Miu, Jichang; Zhou, Linghong; Xiao, Zhongju
2012-10-01
To apply the classic leakage integrate-and-fire models, based on the mechanism of the generation of physiological auditory stimulation, in the information processing coding of cochlear implants to improve the auditory result. The results of algorithm simulation in digital signal processor (DSP) were imported into Matlab for a comparative analysis. Compared with CIS coding, the algorithm of membrane potential integrate-and-fire (MPIF) allowed more natural pulse discharge in a pseudo-random manner to better fit the physiological structures. The MPIF algorithm can effectively solve the problem of the dynamic structure of the delivered auditory information sequence issued in the auditory center and allowed integration of the stimulating pulses and time coding to ensure the coherence and relevance of the stimulating pulse time.
Implementation of MPEG-2 encoder to multiprocessor system using multiple MVPs (TMS320C80)
NASA Astrophysics Data System (ADS)
Kim, HyungSun; Boo, Kenny; Chung, SeokWoo; Choi, Geon Y.; Lee, YongJin; Jeon, JaeHo; Park, Hyun Wook
1997-05-01
This paper presents the efficient algorithm mapping for the real-time MPEG-2 encoding on the KAIST image computing system (KICS), which has a parallel architecture using five multimedia video processors (MVPs). The MVP is a general purpose digital signal processor (DSP) of Texas Instrument. It combines one floating-point processor and four fixed- point DSPs on a single chip. The KICS uses the MVP as a primary processing element (PE). Two PEs form a cluster, and there are two processing clusters in the KICS. Real-time MPEG-2 encoder is implemented through the spatial and the functional partitioning strategies. Encoding process of spatially partitioned half of the video input frame is assigned to ne processing cluster. Two PEs perform the functionally partitioned MPEG-2 encoding tasks in the pipelined operation mode. One PE of a cluster carries out the transform coding part and the other performs the predictive coding part of the MPEG-2 encoding algorithm. One MVP among five MVPs is used for system control and interface with host computer. This paper introduces an implementation of the MPEG-2 algorithm with a parallel processing architecture.
NASA Astrophysics Data System (ADS)
Leidi, Tiziano; Scocchi, Giulio; Grossi, Loris; Pusterla, Simone; D'Angelo, Claudio; Thiran, Jean-Philippe; Ortona, Alberto
2012-11-01
In recent decades, finite element (FE) techniques have been extensively used for predicting effective properties of random heterogeneous materials. In the case of very complex microstructures, the choice of numerical methods for the solution of this problem can offer some advantages over classical analytical approaches, and it allows the use of digital images obtained from real material samples (e.g., using computed tomography). On the other hand, having a large number of elements is often necessary for properly describing complex microstructures, ultimately leading to extremely time-consuming computations and high memory requirements. With the final objective of reducing these limitations, we improved an existing freely available FE code for the computation of effective conductivity (electrical and thermal) of microstructure digital models. To allow execution on hardware combining multi-core CPUs and a GPU, we first translated the original algorithm from Fortran to C, and we subdivided it into software components. Then, we enhanced the C version of the algorithm for parallel processing with heterogeneous processors. With the goal of maximizing the obtained performances and limiting resource consumption, we utilized a software architecture based on stream processing, event-driven scheduling, and dynamic load balancing. The parallel processing version of the algorithm has been validated using a simple microstructure consisting of a single sphere located at the centre of a cubic box, yielding consistent results. Finally, the code was used for the calculation of the effective thermal conductivity of a digital model of a real sample (a ceramic foam obtained using X-ray computed tomography). On a computer equipped with dual hexa-core Intel Xeon X5670 processors and an NVIDIA Tesla C2050, the parallel application version features near to linear speed-up progression when using only the CPU cores. It executes more than 20 times faster when additionally using the GPU.
Video rate morphological processor based on a redundant number representation
NASA Astrophysics Data System (ADS)
Kuczborski, Wojciech; Attikiouzel, Yianni; Crebbin, Gregory A.
1992-03-01
This paper presents a video rate morphological processor for automated visual inspection of printed circuit boards, integrated circuit masks, and other complex objects. Inspection algorithms are based on gray-scale mathematical morphology. Hardware complexity of the known methods of real-time implementation of gray-scale morphology--the umbra transform and the threshold decomposition--has prompted us to propose a novel technique which applied an arithmetic system without carrying propagation. After considering several arithmetic systems, a redundant number representation has been selected for implementation. Two options are analyzed here. The first is a pure signed digit number representation (SDNR) with the base of 4. The second option is a combination of the base-2 SDNR (to represent gray levels of images) and the conventional twos complement code (to represent gray levels of structuring elements). Operation principle of the morphological processor is based on the concept of the digit level systolic array. Individual processing units and small memory elements create a pipeline. The memory elements store current image windows (kernels). All operation primitives of processing units apply a unified direction of digit processing: most significant digit first (MSDF). The implementation technology is based on the field programmable gate arrays by Xilinx. This paper justified the rationality of a new approach to logic design, which is the decomposition of Boolean functions instead of Boolean minimization.
NASA Astrophysics Data System (ADS)
Gilbert, B. K.; Robb, R. A.; Chu, A.; Kenue, S. K.; Lent, A. H.; Swartzlander, E. E., Jr.
1981-02-01
Rapid advances during the past ten years of several forms of computer-assisted tomography (CT) have resulted in the development of numerous algorithms to convert raw projection data into cross-sectional images. These reconstruction algorithms are either 'iterative,' in which a large matrix algebraic equation is solved by successive approximation techniques; or 'closed form'. Continuing evolution of the closed form algorithms has allowed the newest versions to produce excellent reconstructed images in most applications. This paper will review several computer software and special-purpose digital hardware implementations of closed form algorithms, either proposed during the past several years by a number of workers or actually implemented in commercial or research CT scanners. The discussion will also cover a number of recently investigated algorithmic modifications which reduce the amount of computation required to execute the reconstruction process, as well as several new special-purpose digital hardware implementations under development in laboratories at the Mayo Clinic.
A DSP equipped digitizer for online analysis of nuclear detector signals
NASA Astrophysics Data System (ADS)
Pasquali, G.; Ciaranfi, R.; Bardelli, L.; Bini, M.; Boiano, A.; Giannelli, F.; Ordine, A.; Poggi, G.
2007-01-01
In the framework of the NUCL-EX collaboration, a DSP equipped fast digitizer has been implemented and it has now reached the production stage. Each sampling channel is implemented on a separate daughter-board to be plugged on a VME mother-board. Each channel features a 12-bit, 125 MSamples/s ADC and a Digital Signal Processor (DSP) for online analysis of detector signals. A few algorithms have been written and successfully tested on detectors of different types (scintillators, solid-state, gas-filled), implementing pulse shape discrimination, constant fraction timing, semi-Gaussian shaping, gated integration.
NASA Technical Reports Server (NTRS)
Stone, M. S.; Mcadam, P. L.; Saunders, O. W.
1977-01-01
The results are presented of a 4 month study to design a hybrid analog/digital receiver for outer planet mission probe communication links. The scope of this study includes functional design of the receiver; comparisons between analog and digital processing; hardware tradeoffs for key components including frequency generators, A/D converters, and digital processors; development and simulation of the processing algorithms for acquisition, tracking, and demodulation; and detailed design of the receiver in order to determine its size, weight, power, reliability, and radiation hardness. In addition, an evaluation was made of the receiver's capabilities to perform accurate measurement of signal strength and frequency for radio science missions.
In-camera automation of photographic composition rules.
Banerjee, Serene; Evans, Brian L
2007-07-01
At the time of image acquisition, professional photographers apply many rules of thumb to improve the composition of their photographs. This paper develops a joint optical-digital processing framework for automating composition rules during image acquisition for photographs with one main subject. Within the framework, we automate three photographic composition rules: repositioning the main subject, making the main subject more prominent, and making objects that merge with the main subject less prominent. The idea is to provide to the user alternate pictures obtained by applying photographic composition rules in addition to the original picture taken by the user. The proposed algorithms do not depend on prior knowledge of the indoor/outdoor setting or scene content. The proposed algorithms are also designed to be amenable to software implementation on fixed-point programmable digital signal processors available in digital still cameras.
Stanford Hardware Development Program
NASA Technical Reports Server (NTRS)
Peterson, A.; Linscott, I.; Burr, J.
1986-01-01
Architectures for high performance, digital signal processing, particularly for high resolution, wide band spectrum analysis were developed. These developments are intended to provide instrumentation for NASA's Search for Extraterrestrial Intelligence (SETI) program. The real time signal processing is both formal and experimental. The efficient organization and optimal scheduling of signal processing algorithms were investigated. The work is complemented by efforts in processor architecture design and implementation. A high resolution, multichannel spectrometer that incorporates special purpose microcoded signal processors is being tested. A general purpose signal processor for the data from the multichannel spectrometer was designed to function as the processing element in a highly concurrent machine. The processor performance required for the spectrometer is in the range of 1000 to 10,000 million instructions per second (MIPS). Multiple node processor configurations, where each node performs at 100 MIPS, are sought. The nodes are microprogrammable and are interconnected through a network with high bandwidth for neighboring nodes, and medium bandwidth for nodes at larger distance. The implementation of both the current mutlichannel spectrometer and the signal processor as Very Large Scale Integration CMOS chip sets was commenced.
What does voice-processing technology support today?
Nakatsu, R; Suzuki, Y
1995-01-01
This paper describes the state of the art in applications of voice-processing technologies. In the first part, technologies concerning the implementation of speech recognition and synthesis algorithms are described. Hardware technologies such as microprocessors and DSPs (digital signal processors) are discussed. Software development environment, which is a key technology in developing applications software, ranging from DSP software to support software also is described. In the second part, the state of the art of algorithms from the standpoint of applications is discussed. Several issues concerning evaluation of speech recognition/synthesis algorithms are covered, as well as issues concerning the robustness of algorithms in adverse conditions. Images Fig. 3 PMID:7479720
Phase retrieval algorithm for JWST Flight and Testbed Telescope
NASA Astrophysics Data System (ADS)
Dean, Bruce H.; Aronstein, David L.; Smith, J. Scott; Shiri, Ron; Acton, D. Scott
2006-06-01
An image-based wavefront sensing and control algorithm for the James Webb Space Telescope (JWST) is presented. The algorithm heritage is discussed in addition to implications for algorithm performance dictated by NASA's Technology Readiness Level (TRL) 6. The algorithm uses feedback through an adaptive diversity function to avoid the need for phase-unwrapping post-processing steps. Algorithm results are demonstrated using JWST Testbed Telescope (TBT) commissioning data and the accuracy is assessed by comparison with interferometer results on a multi-wave phase aberration. Strategies for minimizing aliasing artifacts in the recovered phase are presented and orthogonal basis functions are implemented for representing wavefronts in irregular hexagonal apertures. Algorithm implementation on a parallel cluster of high-speed digital signal processors (DSPs) is also discussed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Satake, Shin-ichi; Kanamori, Hiroyuki; Kunugi, Tomoaki
2007-02-01
We have developed a parallel algorithm for microdigital-holographic particle-tracking velocimetry. The algorithm is used in (1) numerical reconstruction of a particle image computer using a digital hologram, and (2) searching for particles. The numerical reconstruction from the digital hologram makes use of the Fresnel diffraction equation and the FFT (fast Fourier transform),whereas the particle search algorithm looks for local maximum graduation in a reconstruction field represented by a 3D matrix. To achieve high performance computing for both calculations (reconstruction and particle search), two memory partitions are allocated to the 3D matrix. In this matrix, the reconstruction part consists of horizontallymore » placed 2D memory partitions on the x-y plane for the FFT, whereas, the particle search part consists of vertically placed 2D memory partitions set along the z axes.Consequently, the scalability can be obtained for the proportion of processor elements,where the benchmarks are carried out for parallel computation by a SGI Altix machine.« less
The factorization of large composite numbers on the MPP
NASA Technical Reports Server (NTRS)
Mckurdy, Kathy J.; Wunderlich, Marvin C.
1987-01-01
The continued fraction method for factoring large integers (CFRAC) was an ideal algorithm to be implemented on a massively parallel computer such as the Massively Parallel Processor (MPP). After much effort, the first 60 digit number was factored on the MPP using about 6 1/2 hours of array time. Although this result added about 10 digits to the size number that could be factored using CFRAC on a serial machine, it was already badly beaten by the implementation of Davis and Holdridge on the CRAY-1 using the quadratic sieve, an algorithm which is clearly superior to CFRAC for large numbers. An algorithm is illustrated which is ideally suited to the single instruction multiple data (SIMD) massively parallel architecture and some of the modifications which were needed in order to make the parallel implementation effective and efficient are described.
Real-Time Visualization of Tissue Ischemia
NASA Technical Reports Server (NTRS)
Bearman, Gregory H. (Inventor); Chrien, Thomas D. (Inventor); Eastwood, Michael L. (Inventor)
2000-01-01
A real-time display of tissue ischemia which comprises three CCD video cameras, each with a narrow bandwidth filter at the correct wavelength is discussed. The cameras simultaneously view an area of tissue suspected of having ischemic areas through beamsplitters. The output from each camera is adjusted to give the correct signal intensity for combining with, the others into an image for display. If necessary a digital signal processor (DSP) can implement algorithms for image enhancement prior to display. Current DSP engines are fast enough to give real-time display. Measurement at three, wavelengths, combined into a real-time Red-Green-Blue (RGB) video display with a digital signal processing (DSP) board to implement image algorithms, provides direct visualization of ischemic areas.
Design and Evaluation of a Personal Digital Assistant-based Research Platform for Cochlear Implants
Ali, Hussnain; Lobo, Arthur P.; Loizou, Philipos C.
2014-01-01
This paper discusses the design, development, features, and clinical evaluation of a personal digital assistant (PDA)-based platform for cochlear implant research. This highly versatile and portable research platform allows researchers to design and perform complex experiments with cochlear implants manufactured by Cochlear Corporation with great ease and flexibility. The research platform includes a portable processor for implementing and evaluating novel speech processing algorithms, a stimulator unit which can be used for electrical stimulation and neurophysio-logic studies with animals, and a recording unit for collecting electroencephalogram/evoked potentials from human subjects. The design of the platform for real time and offline stimulation modes is discussed for electric-only and electric plus acoustic stimulation followed by results from an acute study with implant users for speech intelligibility in quiet and noisy conditions. The results are comparable with users’ clinical processor and very promising for undertaking chronic studies. PMID:23674422
A high-accuracy optical linear algebra processor for finite element applications
NASA Technical Reports Server (NTRS)
Casasent, D.; Taylor, B. K.
1984-01-01
Optical linear processors are computationally efficient computers for solving matrix-matrix and matrix-vector oriented problems. Optical system errors limit their dynamic range to 30-40 dB, which limits their accuray to 9-12 bits. Large problems, such as the finite element problem in structural mechanics (with tens or hundreds of thousands of variables) which can exploit the speed of optical processors, require the 32 bit accuracy obtainable from digital machines. To obtain this required 32 bit accuracy with an optical processor, the data can be digitally encoded, thereby reducing the dynamic range requirements of the optical system (i.e., decreasing the effect of optical errors on the data) while providing increased accuracy. This report describes a new digitally encoded optical linear algebra processor architecture for solving finite element and banded matrix-vector problems. A linear static plate bending case study is described which quantities the processor requirements. Multiplication by digital convolution is explained, and the digitally encoded optical processor architecture is advanced.
ERIC Educational Resources Information Center
Warren, Steven F.; Gilkerson, Jill; Richards, Jeffrey A.; Oller, D. Kimbrough; Xu, Dongxin; Yapanel, Umit; Gray, Sharmistha
2010-01-01
The study compared the vocal production and language learning environments of 26 young children with autism spectrum disorder (ASD) to 78 typically developing children using measures derived from automated vocal analysis. A digital language processor and audio-processing algorithms measured the amount of adult words to children and the amount of…
Image processing for a tactile/vision substitution system using digital CNN.
Lin, Chien-Nan; Yu, Sung-Nien; Hu, Jin-Cheng
2006-01-01
In view of the parallel processing and easy implementation properties of CNN, we propose to use digital CNN as the image processor of a tactile/vision substitution system (TVSS). The digital CNN processor is used to execute the wavelet down-sampling filtering and the half-toning operations, aiming to extract important features from the images. A template combination method is used to embed the two image processing functions into a single CNN processor. The digital CNN processor is implemented on an intellectual property (IP) and is implemented on a XILINX VIRTEX II 2000 FPGA board. Experiments are designated to test the capability of the CNN processor in the recognition of characters and human subjects in different environments. The experiments demonstrates impressive results, which proves the proposed digital CNN processor a powerful component in the design of efficient tactile/vision substitution systems for the visually impaired people.
Servo Platform Circuit Design of Pendulous Gyroscope Based on DSP
NASA Astrophysics Data System (ADS)
Tan, Lilong; Wang, Pengcheng; Zhong, Qiyuan; Zhang, Cui; Liu, Yunfei
2018-03-01
In order to solve the problem when a certain type of pendulous gyroscope in the initial installation deviation more than 40 degrees, that the servo platform can not be up to the speed of the gyroscope in the rough north seeking phase. This paper takes the digital signal processor TMS320F28027 as the core, uses incremental digital PID algorithm, carries out the circuit design of the servo platform. Firstly, the hardware circuit is divided into three parts: DSP minimum system, motor driving circuit and signal processing circuit, then the mathematical model of incremental digital PID algorithm is established, based on the model, writes the PID control program in CCS3.3, finally, the servo motor tracking control experiment is carried out, it shows that the design can significantly improve the tracking ability of the servo platform, and the design has good engineering practice.
DSP Implementation of the Retinex Image Enhancement Algorithm
NASA Technical Reports Server (NTRS)
Hines, Glenn; Rahman, Zia-Ur; Jobson, Daniel; Woodell, Glenn
2004-01-01
The Retinex is a general-purpose image enhancement algorithm that is used to produce good visual representations of scenes. It performs a non-linear spatial/spectral transform that synthesizes strong local contrast enhancement and color constancy. A real-time, video frame rate implementation of the Retinex is required to meet the needs of various potential users. Retinex processing contains a relatively large number of complex computations, thus to achieve real-time performance using current technologies requires specialized hardware and software. In this paper we discuss the design and development of a digital signal processor (DSP) implementation of the Retinex. The target processor is a Texas Instruments TMS320C6711 floating point DSP. NTSC video is captured using a dedicated frame-grabber card, Retinex processed, and displayed on a standard monitor. We discuss the optimizations used to achieve real-time performance of the Retinex and also describe our future plans on using alternative architectures.
The Engineer Topographic Laboratories /ETL/ hybrid optical/digital image processor
NASA Astrophysics Data System (ADS)
Benton, J. R.; Corbett, F.; Tuft, R.
1980-01-01
An optical-digital processor for generalized image enhancement and filtering is described. The optical subsystem is a two-PROM Fourier filter processor. Input imagery is isolated, scaled, and imaged onto the first PROM; this input plane acts like a liquid gate and serves as an incoherent-to-coherent converter. The image is transformed onto a second PROM which also serves as a filter medium; filters are written onto the second PROM with a laser scanner in real time. A solid state CCTV camera records the filtered image, which is then digitized and stored in a digital image processor. The operator can then manipulate the filtered image using the gray scale and color remapping capabilities of the video processor as well as the digital processing capabilities of the minicomputer.
Variable word length encoder reduces TV bandwith requirements
NASA Technical Reports Server (NTRS)
Sivertson, W. E., Jr.
1965-01-01
Adaptive variable resolution encoding technique provides an adaptive compression pseudo-random noise signal processor for reducing television bandwidth requirements. Complementary processors are required in both the transmitting and receiving systems. The pretransmission processor is analog-to-digital, while the postreception processor is digital-to-analog.
Onboard Interferometric SAR Processor for the Ka-Band Radar Interferometer (KaRIn)
NASA Technical Reports Server (NTRS)
Esteban-Fernandez, Daniel; Rodriquez, Ernesto; Peral, Eva; Clark, Duane I.; Wu, Xiaoqing
2011-01-01
An interferometric synthetic aperture radar (SAR) onboard processor concept and algorithm has been developed for the Ka-band radar interferometer (KaRIn) instrument on the Surface and Ocean Topography (SWOT) mission. This is a mission- critical subsystem that will perform interferometric SAR processing and multi-look averaging over the oceans to decrease the data rate by three orders of magnitude, and therefore enable the downlink of the radar data to the ground. The onboard processor performs demodulation, range compression, coregistration, and re-sampling, and forms nine azimuth squinted beams. For each of them, an interferogram is generated, including common-band spectral filtering to improve correlation, followed by averaging to the final 1 1-km ground resolution pixel. The onboard processor has been prototyped on a custom FPGA-based cPCI board, which will be part of the radar s digital subsystem. The level of complexity of this technology, dictated by the implementation of interferometric SAR processing at high resolution, the extremely tight level of accuracy required, and its implementation on FPGAs are unprecedented at the time of this reporting for an onboard processor for flight applications.
Xu, Qun; Wang, Xianchao; Xu, Chao
2017-06-01
Multiplication with traditional electronic computers is faced with a low calculating accuracy and a long computation time delay. To overcome these problems, the modified signed digit (MSD) multiplication routine is established based on the MSD system and the carry-free adder. Also, its parallel algorithm and optimization techniques are studied in detail. With the help of a ternary optical computer's characteristics, the structured data processor is designed especially for the multiplication routine. Several ternary optical operators are constructed to perform M transformations and summations in parallel, which has accelerated the iterative process of multiplication. In particular, the routine allocates data bits of the ternary optical processor based on digits of multiplication input, so the accuracy of the calculation results can always satisfy the users. Finally, the routine is verified by simulation experiments, and the results are in full compliance with the expectations. Compared with an electronic computer, the MSD multiplication routine is not only good at dealing with large-value data and high-precision arithmetic, but also maintains lower power consumption and fewer calculating delays.
NASA Astrophysics Data System (ADS)
Studenny, John; Johnstone, Eric
1991-01-01
The acousto-optic spectrum analyzer has undergone a theoretical design review and a basic parameter tradeoff analysis has been performed. The main conclusion is that for the given scenario of a 55 dB dynamic range and for a one-second temporal resolution, a 3.9 MHz resolution is a reasonable compromise with respect to current technology. Additional configurations are suggested. Noise testing of the signal detection processor algorithm was conducted. Additive white Gaussian noise was introduced to pure data. As expected, the tradeoff was between algorithm sensitivity and false alarms. No additional algorithm improvements could be made. The algorithm was observed to be robust, provided that the noise floor was set at a proper level. The digitization scheme was mainly driven by hardware constraints. To implement an analog to digital conversion scheme that linearly covers a 55 dB dynamic range would require a minimum of 17 bits. The general consensus was that 17 bits would be untenable for very large scale integration.
FPGA implementation of ICA algorithm for blind signal separation and adaptive noise canceling.
Kim, Chang-Min; Park, Hyung-Min; Kim, Taesu; Choi, Yoon-Kyung; Lee, Soo-Young
2003-01-01
An field programmable gate array (FPGA) implementation of independent component analysis (ICA) algorithm is reported for blind signal separation (BSS) and adaptive noise canceling (ANC) in real time. In order to provide enormous computing power for ICA-based algorithms with multipath reverberation, a special digital processor is designed and implemented in FPGA. The chip design fully utilizes modular concept and several chips may be put together for complex applications with a large number of noise sources. Experimental results with a fabricated test board are reported for ANC only, BSS only, and simultaneous ANC/BSS, which demonstrates successful speech enhancement in real environments in real time.
Advanced Physiological Estimation of Cognitive Status. Part 2
2011-05-24
Neurofeedback Algorithms and Gaze Controller EEG Sensor System g.USBamp *, ** • internal 24-bit ADC and digital signal processor • 16 channels (expandable...SUBJECT TERMS EEG eye-tracking mental state estimation machine learning Leonard J. Trejo Pacific Development and Technology LLC 999 Commercial St. Palo...fatigue, overload) Technology Transfer Opportunity Technology from PDT – Methods to acquire various physiological signals ( EEG , EOG, EMG, ECG, etc
Design and test of a regenerative satellite transmultiplexer
NASA Astrophysics Data System (ADS)
Hung, Kenny King-Ming
1993-05-01
In a multiple access scheme for regenerative satellite communications, the bulk frequency division multiple access (FDMA) uplink signal is demodulated on board the satellite and then remodulated for time division multiplexing (TDM) downlink transmission. Conversion from frequency to time division multiplex format requires that the uplink signal be frequency demultiplexed and each individual carrier be subsequently demodulated. For thin-route application which consists of a large number of channels with fixed data rate, multicarrier demodulation can be accomplished efficiently by a digital transmultiplexer (TMUX) using a fast Fourier transform processor followed by a bank of per-channel processors. A time domain description of the TMUX algorithm is derived which elucidates how the TMUX functions. The per-channel processor performs timing and carrier recovery for optimum and coherent data detection. Timing recovery is necessarily achieved asynchronously by a filter coefficient interpolation. Carrier recovery is performed using an all-digital phase-locked loop. The combination of both timing and carrier loops is investigated for a multi-user system. The performance of the overall system is assessed over a multi-user, additive white Gaussian noise channel for a bit energy to noise power spectral density ratio down to zero dB.
A Two-Stage Reconstruction Processor for Human Detection in Compressive Sensing CMOS Radar.
Tsao, Kuei-Chi; Lee, Ling; Chu, Ta-Shun; Huang, Yuan-Hao
2018-04-05
Complementary metal-oxide-semiconductor (CMOS) radar has recently gained much research attraction because small and low-power CMOS devices are very suitable for deploying sensing nodes in a low-power wireless sensing system. This study focuses on the signal processing of a wireless CMOS impulse radar system that can detect humans and objects in the home-care internet-of-things sensing system. The challenges of low-power CMOS radar systems are the weakness of human signals and the high computational complexity of the target detection algorithm. The compressive sensing-based detection algorithm can relax the computational costs by avoiding the utilization of matched filters and reducing the analog-to-digital converter bandwidth requirement. The orthogonal matching pursuit (OMP) is one of the popular signal reconstruction algorithms for compressive sensing radar; however, the complexity is still very high because the high resolution of human respiration leads to high-dimension signal reconstruction. Thus, this paper proposes a two-stage reconstruction algorithm for compressive sensing radar. The proposed algorithm not only has lower complexity than the OMP algorithm by 75% but also achieves better positioning performance than the OMP algorithm especially in noisy environments. This study also designed and implemented the algorithm by using Vertex-7 FPGA chip (Xilinx, San Jose, CA, USA). The proposed reconstruction processor can support the 256 × 13 real-time radar image display with a throughput of 28.2 frames per second.
An application of the MPP to the interactive manipulation of stereo images of digital terrain models
NASA Technical Reports Server (NTRS)
Pol, Sanjay; Mcallister, David; Davis, Edward
1987-01-01
Massively Parallel Processor algorithms were developed for the interactive manipulation of flat shaded digital terrain models defined over grids. The emphasis is on real time manipulation of stereo images. Standard graphics transformations are applied to a 128 x 128 grid of elevations followed by shading and a perspective projection to produce the right eye image. The surface is then rendered using a simple painter's algorithm for hidden surface removal. The left eye image is produced by rotating the surface 6 degs about the viewer's y axis followed by a perspective projection and rendering of the image as described above. The left and right eye images are then presented on a graphics device using standard stereo technology. Performance evaluations and comparisons are presented.
An optical/digital processor - Hardware and applications
NASA Technical Reports Server (NTRS)
Casasent, D.; Sterling, W. M.
1975-01-01
A real-time two-dimensional hybrid processor consisting of a coherent optical system, an optical/digital interface, and a PDP-11/15 control minicomputer is described. The input electrical-to-optical transducer is an electron-beam addressed potassium dideuterium phosphate (KD2PO4) light valve. The requirements and hardware for the output optical-to-digital interface, which is constructed from modular computer building blocks, are presented. Initial experimental results demonstrating the operation of this hybrid processor in phased-array radar data processing, synthetic-aperture image correlation, and text correlation are included. The applications chosen emphasize the role of the interface in the analysis of data from an optical processor and possible extensions to the digital feedback control of an optical processor.
Life sciences flight experiments microcomputer
NASA Technical Reports Server (NTRS)
Bartram, Peter N.
1987-01-01
A promising microcomputer configuration for the Spacelab Life Sciences Lab. Equipment inventory consists of multiple processors. One processor's use is reserved, with additional processors dedicated to real time input and output operations. A simple form of such a configuration, with a processor board for analog to digital conversion and another processor board for digital to analog conversion, was studied. The system used digital parallel data lines between the boards, operating independently of the system bus. Good performance of individual components was demonstrated: the analog to digital converter was at over 10,000 samples per second. The combination of the data transfer between boards with the input or output functions on each board slowed performance, with a maximum throughput of 2800 to 2900 analog samples per second. Any of several techniques, such as use of the system bus for data transfer or the addition of direct memory access hardware to the processor boards, should give significantly improved performance.
Pani, Danilo; Barabino, Gianluca; Citi, Luca; Meloni, Paolo; Raspopovic, Stanisa; Micera, Silvestro; Raffo, Luigi
2016-09-01
The control of upper limb neuroprostheses through the peripheral nervous system (PNS) can allow restoring motor functions in amputees. At present, the important aspect of the real-time implementation of neural decoding algorithms on embedded systems has been often overlooked, notwithstanding the impact that limited hardware resources have on the efficiency/effectiveness of any given algorithm. Present study is addressing the optimization of a template matching based algorithm for PNS signals decoding that is a milestone for its real-time, full implementation onto a floating-point digital signal processor (DSP). The proposed optimized real-time algorithm achieves up to 96% of correct classification on real PNS signals acquired through LIFE electrodes on animals, and can correctly sort spikes of a synthetic cortical dataset with sufficiently uncorrelated spike morphologies (93% average correct classification) comparably to the results obtained with top spike sorter (94% on average on the same dataset). The power consumption enables more than 24 h processing at the maximum load, and latency model has been derived to enable a fair performance assessment. The final embodiment demonstrates the real-time performance onto a low-power off-the-shelf DSP, opening to experiments exploiting the efferent signals to control a motor neuroprosthesis.
[Development of a video image system for wireless capsule endoscopes based on DSP].
Yang, Li; Peng, Chenglin; Wu, Huafeng; Zhao, Dechun; Zhang, Jinhua
2008-02-01
A video image recorder to record video picture for wireless capsule endoscopes was designed. TMS320C6211 DSP of Texas Instruments Inc. is the core processor of this system. Images are periodically acquired from Composite Video Broadcast Signal (CVBS) source and scaled by video decoder (SAA7114H). Video data is transported from high speed buffer First-in First-out (FIFO) to Digital Signal Processor (DSP) under the control of Complex Programmable Logic Device (CPLD). This paper adopts JPEG algorithm for image coding, and the compressed data in DSP was stored to Compact Flash (CF) card. TMS320C6211 DSP is mainly used for image compression and data transporting. Fast Discrete Cosine Transform (DCT) algorithm and fast coefficient quantization algorithm are used to accelerate operation speed of DSP and decrease the executing code. At the same time, proper address is assigned for each memory, which has different speed;the memory structure is also optimized. In addition, this system uses plenty of Extended Direct Memory Access (EDMA) to transport and process image data, which results in stable and high performance.
1987-03-31
processors . The symmetry-breaking algorithms give efficient ways to convert probabilistic algorithms to deterministic algorithms. Some of the...techniques have been applied to construct several efficient linear- processor algorithms for graph problems, including an O(lg* n)-time algorithm for (A + 1...On n-node graphs, the algorithm works in O(log 2 n) time using only n processors , in contrast to the previous best algorithm which used about n3
Solution for the nonuniformity correction of infrared focal plane arrays.
Zhou, Huixin; Liu, Shangqian; Lai, Rui; Wang, Dabao; Cheng, Yubao
2005-05-20
Based on the S-curve model of the detector response of infrared focal plan arrays (IRFPAs), an improved two-point correction algorithm is presented. The algorithm first transforms the nonlinear image data into linear data and then uses the normal two-point algorithm to correct the linear data. The algorithm can effectively overcome the influence of nonlinearity of the detector's response, and it enlarges the correction precision and the dynamic range of the response. A real-time imaging-signal-processing system for IRFPAs that is based on a digital signal processor and field-programmable gate arrays is also presented. The nonuniformity correction capability of the presented solution is validated by experimental imaging procedures of a 128 x 128 pixel IRFPA camera prototype.
Digital Audio Signal Processing and Nde: AN Unlikely but Valuable Partnership
NASA Astrophysics Data System (ADS)
Gaydecki, Patrick
2008-02-01
In the Digital Signal Processing (DSP) group, within the School of Electrical and Electronic Engineering at The University of Manchester, research is conducted into two seemingly distinct and disparate subjects: instrumentation for nondestructive evaluation, and DSP systems & algorithms for digital audio. We have often found that many of the hardware systems and algorithms employed to recover, extract or enhance audio signals may also be applied to signals provided by ultrasonic or magnetic NDE instruments. Furthermore, modern DSP hardware is so fast (typically performing hundreds of millions of operations per second), that much of the processing and signal reconstruction may be performed in real time. Here, we describe some of the hardware systems we have developed, together with algorithms that can be implemented both in real time and offline. A next generation system has now been designed, which incorporates a processor operating at 0.55 Giga MMACS, six input and eight output analogue channels, digital input/output in the form of S/PDIF, a JTAG and a USB interface. The software allows the user, with no knowledge of filter theory or programming, to design and run standard or arbitrary FIR, IIR and adaptive filters. Using audio as a vehicle, we can demonstrate the remarkable properties of modern reconstruction algorithms when used in conjunction with such hardware; applications in NDE include signal enhancement and recovery in acoustic, ultrasonic, magnetic and eddy current modalities.
Rectangular Array Of Digital Processors For Planning Paths
NASA Technical Reports Server (NTRS)
Kemeny, Sabrina E.; Fossum, Eric R.; Nixon, Robert H.
1993-01-01
Prototype 24 x 25 rectangular array of asynchronous parallel digital processors rapidly finds best path across two-dimensional field, which could be patch of terrain traversed by robotic or military vehicle. Implemented as single-chip very-large-scale integrated circuit. Excepting processors on edges, each processor communicates with four nearest neighbors along paths representing travel to north, south, east, and west. Each processor contains delay generator in form of 8-bit ripple counter, preset to 1 of 256 possible values. Operation begins with choice of processor representing starting point. Transmits signals to nearest neighbor processors, which retransmits to other neighboring processors, and process repeats until signals propagated across entire field.
Attitude algorithm and initial alignment method for SINS applied in short-range aircraft
NASA Astrophysics Data System (ADS)
Zhang, Rong-Hui; He, Zhao-Cheng; You, Feng; Chen, Bo
2017-07-01
This paper presents an attitude solution algorithm based on the Micro-Electro-Mechanical System and quaternion method. We completed the numerical calculation and engineering practice by adopting fourth-order Runge-Kutta algorithm in the digital signal processor. The state space mathematical model of initial alignment in static base was established, and the initial alignment method based on Kalman filter was proposed. Based on the hardware in the loop simulation platform, the short-range flight simulation test and the actual flight test were carried out. The results show that the error of pitch, yaw and roll angle is fast convergent, and the fitting rate between flight simulation and flight test is more than 85%.
Circuitry, systems and methods for detecting magnetic fields
Kotter, Dale K [Shelley, ID; Spencer, David F [Idaho Falls, ID; Roybal, Lyle G [Idaho Falls, ID; Rohrbaugh, David T [Idaho Falls, ID
2010-09-14
Circuitry for detecting magnetic fields includes a first magnetoresistive sensor and a second magnetoresistive sensor configured to form a gradiometer. The circuitry includes a digital signal processor and a first feedback loop coupled between the first magnetoresistive sensor and the digital signal processor. A second feedback loop which is discrete from the first feedback loop is coupled between the second magnetoresistive sensor and the digital signal processor.
Photon Counting Using Edge-Detection Algorithm
NASA Technical Reports Server (NTRS)
Gin, Jonathan W.; Nguyen, Danh H.; Farr, William H.
2010-01-01
New applications such as high-datarate, photon-starved, free-space optical communications require photon counting at flux rates into gigaphoton-per-second regimes coupled with subnanosecond timing accuracy. Current single-photon detectors that are capable of handling such operating conditions are designed in an array format and produce output pulses that span multiple sample times. In order to discern one pulse from another and not to overcount the number of incoming photons, a detection algorithm must be applied to the sampled detector output pulses. As flux rates increase, the ability to implement such a detection algorithm becomes difficult within a digital processor that may reside within a field-programmable gate array (FPGA). Systems have been developed and implemented to both characterize gigahertz bandwidth single-photon detectors, as well as process photon count signals at rates into gigaphotons per second in order to implement communications links at SCPPM (serial concatenated pulse position modulation) encoded data rates exceeding 100 megabits per second with efficiencies greater than two bits per detected photon. A hardware edge-detection algorithm and corresponding signal combining and deserialization hardware were developed to meet these requirements at sample rates up to 10 GHz. The photon discriminator deserializer hardware board accepts four inputs, which allows for the ability to take inputs from a quadphoton counting detector, to support requirements for optical tracking with a reduced number of hardware components. The four inputs are hardware leading-edge detected independently. After leading-edge detection, the resultant samples are ORed together prior to deserialization. The deserialization is performed to reduce the rate at which data is passed to a digital signal processor, perhaps residing within an FPGA. The hardware implements four separate analog inputs that are connected through RF connectors. Each analog input is fed to a high-speed 1-bit comparator, which digitizes the input referenced to an adjustable threshold value. This results in four independent serial sample streams of binary 1s and 0s, which are ORed together at rates up to 10 GHz. This single serial stream is then deserialized by a factor of 16 to create 16 signal lines at a rate of 622.5 MHz or lower for input to a high-speed digital processor assembly. The new design and corresponding hardware can be employed with a quad-photon counting detector capable of handling photon rates on the order of multi-gigaphotons per second, whereas prior art was only capable of handling a single input at 1/4 the flux rate. Additionally, the hardware edge-detection algorithm has provided the ability to process 3-10 higher photon flux rates than previously possible by removing the limitation that photoncounting detector output pulses on multiple channels being ORed not overlap. Now, only the leading edges of the pulses are required to not overlap. This new photon counting digitizer hardware architecture supports a universal front end for an optical communications receiver operating at data rates from kilobits to over one gigabit per second to meet increased mission data volume requirements.
A Two-Stage Reconstruction Processor for Human Detection in Compressive Sensing CMOS Radar
Tsao, Kuei-Chi; Lee, Ling; Chu, Ta-Shun
2018-01-01
Complementary metal-oxide-semiconductor (CMOS) radar has recently gained much research attraction because small and low-power CMOS devices are very suitable for deploying sensing nodes in a low-power wireless sensing system. This study focuses on the signal processing of a wireless CMOS impulse radar system that can detect humans and objects in the home-care internet-of-things sensing system. The challenges of low-power CMOS radar systems are the weakness of human signals and the high computational complexity of the target detection algorithm. The compressive sensing-based detection algorithm can relax the computational costs by avoiding the utilization of matched filters and reducing the analog-to-digital converter bandwidth requirement. The orthogonal matching pursuit (OMP) is one of the popular signal reconstruction algorithms for compressive sensing radar; however, the complexity is still very high because the high resolution of human respiration leads to high-dimension signal reconstruction. Thus, this paper proposes a two-stage reconstruction algorithm for compressive sensing radar. The proposed algorithm not only has lower complexity than the OMP algorithm by 75% but also achieves better positioning performance than the OMP algorithm especially in noisy environments. This study also designed and implemented the algorithm by using Vertex-7 FPGA chip (Xilinx, San Jose, CA, USA). The proposed reconstruction processor can support the 256×13 real-time radar image display with a throughput of 28.2 frames per second. PMID:29621170
Razza, Sergio; Zaccone, Monica; Meli, Aannalisa; Cristofari, Eliana
2017-12-01
Children affected by hearing loss can experience difficulties in challenging and noisy environments even when deafness is corrected by Cochlear implant (CI) devices. These patients have a selective attention deficit in multiple listening conditions. At present, the most effective ways to improve the performance of speech recognition in noise consists of providing CI processors with noise reduction algorithms and of providing patients with bilateral CIs. The aim of this study was to compare speech performances in noise, across increasing noise levels, in CI recipients using two kinds of wireless remote-microphone radio systems that use digital radio frequency transmission: the Roger Inspiro accessory and the Cochlear Wireless Mini Microphone accessory. Eleven Nucleus Cochlear CP910 CI young user subjects were studied. The signal/noise ratio, at a speech reception threshold (SRT) value of 50%, was measured in different conditions for each patient: with CI only, with the Roger or with the MiniMic accessory. The effect of the application of the SNR-noise reduction algorithm in each of these conditions was also assessed. The tests were performed with the subject positioned in front of the main speaker, at a distance of 2.5 m. Another two speakers were positioned at 3.50 m. The main speaker at 65 dB issued disyllabic words. Babble noise signal was delivered through the other speakers, with variable intensity. The use of both wireless remote microphones improved the SRT results. Both systems improved gain of speech performances. The gain was higher with the Mini Mic system (SRT = -4.76) than the Roger system (SRT = -3.01). The addition of the NR algorithm did not statistically further improve the results. There is significant improvement in speech recognition results with both wireless digital remote microphone accessories, in particular with the Mini Mic system when used with the CP910 processor. The use of a remote microphone accessory surpasses the benefit of application of NR algorithm. Copyright © 2017. Published by Elsevier B.V.
Real-time 3D adaptive filtering for portable imaging systems
NASA Astrophysics Data System (ADS)
Bockenbach, Olivier; Ali, Murtaza; Wainwright, Ian; Nadeski, Mark
2015-03-01
Portable imaging devices have proven valuable for emergency medical services both in the field and hospital environments and are becoming more prevalent in clinical settings where the use of larger imaging machines is impractical. 3D adaptive filtering is one of the most advanced techniques aimed at noise reduction and feature enhancement, but is computationally very demanding and hence often not able to run with sufficient performance on a portable platform. In recent years, advanced multicore DSPs have been introduced that attain high processing performance while maintaining low levels of power dissipation. These processors enable the implementation of complex algorithms like 3D adaptive filtering, improving the image quality of portable medical imaging devices. In this study, the performance of a 3D adaptive filtering algorithm on a digital signal processor (DSP) is investigated. The performance is assessed by filtering a volume of size 512x256x128 voxels sampled at a pace of 10 MVoxels/sec.
Towards an Analogue Neuromorphic VLSI Instrument for the Sensing of Complex Odours
NASA Astrophysics Data System (ADS)
Ab Aziz, Muhammad Fazli; Harun, Fauzan Khairi Che; Covington, James A.; Gardner, Julian W.
2011-09-01
Almost all electronic nose instruments reported today employ pattern recognition algorithms written in software and run on digital processors, e.g. micro-processors, microcontrollers or FPGAs. Conversely, in this paper we describe the analogue VLSI implementation of an electronic nose through the design of a neuromorphic olfactory chip. The modelling, design and fabrication of the chip have already been reported. Here a smart interface has been designed and characterised for thisneuromorphic chip. Thus we can demonstrate the functionality of the a VLSI neuromorphic chip, producing differing principal neuron firing patterns to real sensor response data. Further work is directed towards integrating 9 separate neuromorphic chips to create a large neuronal network to solve more complex olfactory problems.
NASA Technical Reports Server (NTRS)
Rickard, D. A.; Bodenheimer, R. E.
1976-01-01
Digital computer components which perform two dimensional array logic operations (Tse logic) on binary data arrays are described. The properties of Golay transforms which make them useful in image processing are reviewed, and several architectures for Golay transform processors are presented with emphasis on the skeletonizing algorithm. Conventional logic control units developed for the Golay transform processors are described. One is a unique microprogrammable control unit that uses a microprocessor to control the Tse computer. The remaining control units are based on programmable logic arrays. Performance criteria are established and utilized to compare the various Golay transform machines developed. A critique of Tse logic is presented, and recommendations for additional research are included.
The Use of Field Programmable Gate Arrays (FPGA) in Small Satellite Communication Systems
NASA Technical Reports Server (NTRS)
Varnavas, Kosta; Sims, William Herbert; Casas, Joseph
2015-01-01
This paper will describe the use of digital Field Programmable Gate Arrays (FPGA) to contribute to advancing the state-of-the-art in software defined radio (SDR) transponder design for the emerging SmallSat and CubeSat industry and to provide advances for NASA as described in the TAO5 Communication and Navigation Roadmap (Ref 4). The use of software defined radios (SDR) has been around for a long time. A typical implementation of the SDR is to use a processor and write software to implement all the functions of filtering, carrier recovery, error correction, framing etc. Even with modern high speed and low power digital signal processors, high speed memories, and efficient coding, the compute intensive nature of digital filters, error correcting and other algorithms is too much for modern processors to get efficient use of the available bandwidth to the ground. By using FPGAs, these compute intensive tasks can be done in parallel, pipelined fashion and more efficiently use every clock cycle to significantly increase throughput while maintaining low power. These methods will implement digital radios with significant data rates in the X and Ka bands. Using these state-of-the-art technologies, unprecedented uplink and downlink capabilities can be achieved in a 1/2 U sized telemetry system. Additionally, modern FPGAs have embedded processing systems, such as ARM cores, integrated inside the FPGA allowing mundane tasks such as parameter commanding to occur easily and flexibly. Potential partners include other NASA centers, industry and the DOD. These assets are associated with small satellite demonstration flights, LEO and deep space applications. MSFC currently has an SDR transponder test-bed using Hardware-in-the-Loop techniques to evaluate and improve SDR technologies.
A novel strategy for load balancing of distributed medical applications.
Logeswaran, Rajasvaran; Chen, Li-Choo
2012-04-01
Current trends in medicine, specifically in the electronic handling of medical applications, ranging from digital imaging, paperless hospital administration and electronic medical records, telemedicine, to computer-aided diagnosis, creates a burden on the network. Distributed Service Architectures, such as Intelligent Network (IN), Telecommunication Information Networking Architecture (TINA) and Open Service Access (OSA), are able to meet this new challenge. Distribution enables computational tasks to be spread among multiple processors; hence, performance is an important issue. This paper proposes a novel approach in load balancing, the Random Sender Initiated Algorithm, for distribution of tasks among several nodes sharing the same computational object (CO) instances in Distributed Service Architectures. Simulations illustrate that the proposed algorithm produces better network performance than the benchmark load balancing algorithms-the Random Node Selection Algorithm and the Shortest Queue Algorithm, especially under medium and heavily loaded conditions.
A wideband, high-resolution spectrum analyzer
NASA Technical Reports Server (NTRS)
Quirk, M. P.; Wilck, H. C.; Garyantes, M. F.; Grimm, M. J.
1988-01-01
A two-million-channel, 40 MHz bandwidth, digital spectrum analyzer under development at the Jet Propulsion Laboratory is described. The analyzer system will serve as a prototype processor for the sky survey portion of NASA's Search for Extraterrestrial Intelligence program and for other applications in the Deep Space Network. The analyzer digitizes an analog input, performs a 2 (sup 21) point Discrete Fourier Transform, accumulates the output power, normalizes the output to remove frequency-dependent gain, and automates simple signal detection algorithms. Due to its built-in frequency-domain processing functions and configuration flexibility, the analyzer is a very powerful tool for real-time signal analysis.
A wide-band high-resolution spectrum analyzer
NASA Technical Reports Server (NTRS)
Quirk, Maureen P.; Garyantes, Michael F.; Wilck, Helmut C.; Grimm, Michael J.
1988-01-01
A two-million-channel, 40 MHz bandwidth, digital spectrum analyzer under development at the Jet Propulsion Laboratory is described. The analyzer system will serve as a prototype processor for the sky survey portion of NASA's Search for Extraterrestrial Intelligence program and for other applications in the Deep Space Network. The analyzer digitizes an analog input, performs a 2 (sup 21) point Discrete Fourier Transform, accumulates the output power, normalizes the output to remove frequency-dependent gain, and automates simple detection algorithms. Due to its built-in frequency-domain processing functions and configuration flexibility, the analyzer is a very powerful tool for real-time signal analysis.
A wide-band high-resolution spectrum analyzer.
Quirk, M P; Garyantes, M F; Wilck, H C; Grimm, M J
1988-12-01
This paper describes a two-million-channel 40-MHz-bandwidth, digital spectrum analyzer under development at the Jet Propulsion Laboratory. The analyzer system will serve as a prototype processor for the sky survey portion of NASA's Search for Extraterrestrial Intelligence program and for other applications in the Deep Space Network. The analyzer digitizes an analog input, performs a 2(21)-point, Discrete Fourier Transform, accumulates the output power, normalizes the output to remove frequency-dependent gain, and automates simple signal detection algorithms. Due to its built-in frequency-domain processing functions and configuration flexibility, the analyzer is a very powerful tool for real-time signal analysis and detection.
Acoustooptic linear algebra processors - Architectures, algorithms, and applications
NASA Technical Reports Server (NTRS)
Casasent, D.
1984-01-01
Architectures, algorithms, and applications for systolic processors are described with attention to the realization of parallel algorithms on various optical systolic array processors. Systolic processors for matrices with special structure and matrices of general structure, and the realization of matrix-vector, matrix-matrix, and triple-matrix products and such architectures are described. Parallel algorithms for direct and indirect solutions to systems of linear algebraic equations and their implementation on optical systolic processors are detailed with attention to the pipelining and flow of data and operations. Parallel algorithms and their optical realization for LU and QR matrix decomposition are specifically detailed. These represent the fundamental operations necessary in the implementation of least squares, eigenvalue, and SVD solutions. Specific applications (e.g., the solution of partial differential equations, adaptive noise cancellation, and optimal control) are described to typify the use of matrix processors in modern advanced signal processing.
Eigensolution of finite element problems in a completely connected parallel architecture
NASA Technical Reports Server (NTRS)
Akl, F.; Morel, M.
1989-01-01
A parallel algorithm is presented for the solution of the generalized eigenproblem in linear elastic finite element analysis. The algorithm is based on a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm is successfully implemented on a tightly coupled MIMD parallel processor. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor or to a logical processor (task) if the number of domains exceeds the number of physical processors. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts, and the dimension of the subspace on the performance of the algorithm is investigated. For a 64-element rectangular plate, speed-ups of 1.86, 3.13, 3.18, and 3.61 are achieved on two, four, six, and eight processors, respectively.
Contextual classification on a CDC Flexible Processor system. [for photomapped remote sensing data
NASA Technical Reports Server (NTRS)
Smith, B. W.; Siegel, H. J.; Swain, P. H.
1981-01-01
A potential hardware organization for the Flexible Processor Array is presented. An algorithm that implements a contextual classifier for remote sensing data analysis is given, along with uniprocessor classification algorithms. The Flexible Processor algorithm is provided, as are simulated timings for contextual classifiers run on the Flexible Processor Array and another system. The timings are analyzed for context neighborhoods of sizes three and nine.
Digital signal processor and processing method for GPS receivers
NASA Technical Reports Server (NTRS)
Thomas, Jr., Jess B. (Inventor)
1989-01-01
A digital signal processor and processing method therefor for use in receivers of the NAVSTAR/GLOBAL POSITIONING SYSTEM (GPS) employs a digital carrier down-converter, digital code correlator and digital tracking processor. The digital carrier down-converter and code correlator consists of an all-digital, minimum bit implementation that utilizes digital chip and phase advancers, providing exceptional control and accuracy in feedback phase and in feedback delay. Roundoff and commensurability errors can be reduced to extremely small values (e.g., less than 100 nanochips and 100 nanocycles roundoff errors and 0.1 millichip and 1 millicycle commensurability errors). The digital tracking processor bases the fast feedback for phase and for group delay in the C/A, P.sub.1, and P.sub.2 channels on the L.sub.1 C/A carrier phase thereby maintaining lock at lower signal-to-noise ratios, reducing errors in feedback delays, reducing the frequency of cycle slips and in some cases obviating the need for quadrature processing in the P channels. Simple and reliable methods are employed for data bit synchronization, data bit removal and cycle counting. Improved precision in averaged output delay values is provided by carrier-aided data-compression techniques. The signal processor employs purely digital operations in the sense that exactly the same carrier phase and group delay measurements are obtained, to the last decimal place, every time the same sampled data (i.e., exactly the same bits) are processed.
An Efficient Functional Test Generation Method For Processors Using Genetic Algorithms
NASA Astrophysics Data System (ADS)
Hudec, Ján; Gramatová, Elena
2015-07-01
The paper presents a new functional test generation method for processors testing based on genetic algorithms and evolutionary strategies. The tests are generated over an instruction set architecture and a processor description. Such functional tests belong to the software-oriented testing. Quality of the tests is evaluated by code coverage of the processor description using simulation. The presented test generation method uses VHDL models of processors and the professional simulator ModelSim. The rules, parameters and fitness functions were defined for various genetic algorithms used in automatic test generation. Functionality and effectiveness were evaluated using the RISC type processor DP32.
ISFET-based sensor signal processor chip design for environment monitoring applications
NASA Astrophysics Data System (ADS)
Chung, Wen-Yaw; Yang, Chung-Huang; Wang, Ming-Ga
2004-12-01
In recent years Ion-Sensitive Field Effect Transistor (ISFET) based transducers create valuable applications in physiological data acquisition and environment monitoring. This paper presents a mixed-mode ASIC design for potentiometric ISFET-based bio-chemical sensor applications including H+ sensing and hand-held pH meter. For battery power consideration, the proposed system consists of low voltage (3V) analog front-end readout circuits and digital processor has been developed and fabricated in a 0.5mm double-poly double-metal CMOS technology. To assure that the correct pH value can be measured, the two-point calibration circuitry based on the response of standard pH4 and pH7 buffer solution has been implemented by using algorithmic state machine hardware algorithms. The measurement accuracy of the chip is 10 bits and the measured range between pH 2 to pH 12 compared to ideal values is within the accuracy of 0.1pH. For homeland environmental applications, the system provide rapid, easy to use, and cost-effective on-site testing on the quality of water, such as drinking water, ground water and river water. The processor has a potential usage in battery-operated and portable devices in environmental monitoring applications compared to commercial hand-held pH meter.
Accuracy requirements of optical linear algebra processors in adaptive optics imaging systems
NASA Technical Reports Server (NTRS)
Downie, John D.; Goodman, Joseph W.
1989-01-01
The accuracy requirements of optical processors in adaptive optics systems are determined by estimating the required accuracy in a general optical linear algebra processor (OLAP) that results in a smaller average residual aberration than that achieved with a conventional electronic digital processor with some specific computation speed. Special attention is given to an error analysis of a general OLAP with regard to the residual aberration that is created in an adaptive mirror system by the inaccuracies of the processor, and to the effect of computational speed of an electronic processor on the correction. Results are presented on the ability of an OLAP to compete with a digital processor in various situations.
Massively Multithreaded Maxflow for Image Segmentation on the Cray XMT-2
Bokhari, Shahid H.; Çatalyürek, Ümit V.; Gurcan, Metin N.
2014-01-01
SUMMARY Image segmentation is a very important step in the computerized analysis of digital images. The maxflow mincut approach has been successfully used to obtain minimum energy segmentations of images in many fields. Classical algorithms for maxflow in networks do not directly lend themselves to efficient parallel implementations on contemporary parallel processors. We present the results of an implementation of Goldberg-Tarjan preflow-push algorithm on the Cray XMT-2 massively multithreaded supercomputer. This machine has hardware support for 128 threads in each physical processor, a uniformly accessible shared memory of up to 4 TB and hardware synchronization for each 64 bit word. It is thus well-suited to the parallelization of graph theoretic algorithms, such as preflow-push. We describe the implementation of the preflow-push code on the XMT-2 and present the results of timing experiments on a series of synthetically generated as well as real images. Our results indicate very good performance on large images and pave the way for practical applications of this machine architecture for image analysis in a production setting. The largest images we have run are 320002 pixels in size, which are well beyond the largest previously reported in the literature. PMID:25598745
Ship detection in panchromatic images: a new method and its DSP implementation
NASA Astrophysics Data System (ADS)
Yao, Yuan; Jiang, Zhiguo; Zhang, Haopeng; Wang, Mengfei; Meng, Gang
2016-03-01
In this paper, a new ship detection method is proposed after analyzing the characteristics of panchromatic remote sensing images and ship targets. Firstly, AdaBoost(Adaptive Boosting) classifiers trained by Haar features are utilized to make coarse detection of ship targets. Then LSD (Line Segment Detector) is adopted to extract the line features in target slices to make fine detection. Experimental results on a dataset of panchromatic remote sensing images with a spatial resolution of 2m show that the proposed algorithm can achieve high detection rate and low false alarm rate. Meanwhile, the algorithm can meet the needs of practical applications on DSP (Digital Signal Processor).
More About Vector Adaptive/Predictive Coding Of Speech
NASA Technical Reports Server (NTRS)
Jedrey, Thomas C.; Gersho, Allen
1992-01-01
Report presents additional information about digital speech-encoding and -decoding system described in "Vector Adaptive/Predictive Encoding of Speech" (NPO-17230). Summarizes development of vector adaptive/predictive coding (VAPC) system and describes basic functions of algorithm. Describes refinements introduced enabling receiver to cope with errors. VAPC algorithm implemented in integrated-circuit coding/decoding processors (codecs). VAPC and other codecs tested under variety of operating conditions. Tests designed to reveal effects of various background quiet and noisy environments and of poor telephone equipment. VAPC found competitive with and, in some respects, superior to other 4.8-kb/s codecs and other codecs of similar complexity.
Radiation Hardened Low Power Digital Signal Processor
2005-04-15
Image Figure 53.0 Point Spread Function PSF Figure 54.0 Restored Image and Restored PSF Figure 55.0 Newly Created Array Figure 56.0 Deblurred Image and... noise and interference rejection. WOA’s of 32-taps and greater are easily managed by the TCSP. An architecture that could efficiently perform filter...to quickly calculate a Remez filter impulse response to be used in place of the window function. Using the Remez exchange algorithm to calculate the
Portable laser speckle perfusion imaging system based on digital signal processor.
Tang, Xuejun; Feng, Nengyun; Sun, Xiaoli; Li, Pengcheng; Luo, Qingming
2010-12-01
The ability to monitor blood flow in vivo is of major importance in clinical diagnosis and in basic researches of life science. As a noninvasive full-field technique without the need of scanning, laser speckle contrast imaging (LSCI) is widely used to study blood flow with high spatial and temporal resolution. Current LSCI systems are based on personal computers for image processing with large size, which potentially limit the widespread clinical utility. The need for portable laser speckle contrast imaging system that does not compromise processing efficiency is crucial in clinical diagnosis. However, the processing of laser speckle contrast images is time-consuming due to the heavy calculation for enormous high-resolution image data. To address this problem, a portable laser speckle perfusion imaging system based on digital signal processor (DSP) and the algorithm which is suitable for DSP is described. With highly integrated DSP and the algorithm, we have markedly reduced the size and weight of the system as well as its energy consumption while preserving the high processing speed. In vivo experiments demonstrate that our portable laser speckle perfusion imaging system can obtain blood flow images at 25 frames per second with the resolution of 640 × 480 pixels. The portable and lightweight features make it capable of being adapted to a wide variety of application areas such as research laboratory, operating room, ambulance, and even disaster site.
A bunch to bucket phase detector for the RHIC LLRF upgrade platform
DOE Office of Scientific and Technical Information (OSTI.GOV)
Smith, K.S.; Harvey, M.; Hayes, T.
2011-03-28
As part of the overall development effort for the RHIC LLRF Upgrade Platform [1,2,3], a generic four channel 16 bit Analog-to-Digital Converter (ADC) daughter module was developed to provide high speed, wide dynamic range digitizing and processing of signals from DC to several hundred megahertz. The first operational use of this card was to implement the bunch to bucket phase detector for the RHIC LLRF beam control feedback loops. This paper will describe the design and performance features of this daughter module as a bunch to bucket phase detector, and also provide an overview of its place within the overallmore » LLRF platform architecture as a high performance digitizer and signal processing module suitable to a variety of applications. In modern digital control and signal processing systems, ADCs provide the interface between the analog and digital signal domains. Once digitized, signals are then typically processed using algorithms implemented in field programmable gate array (FPGA) logic, general purpose processors (GPPs), digital signal processors (DSPs) or a combination of these. For the recently developed and commissioned RHIC LLRF Upgrade Platform, we've developed a four channel ADC daughter module based on the Linear Technology LTC2209 16 bit, 160 MSPS ADC and the Xilinx V5FX70T FPGA. The module is designed to be relatively generic in application, and with minimal analog filtering on board, is capable of processing signals from DC to 500 MHz or more. The module's first application was to implement the bunch to bucket phase detector (BTB-PD) for the RHIC LLRF system. The same module also provides DC digitizing of analog processed BPM signals used by the LLRF system for radial feedback.« less
Implementation of Adaptive Digital Controllers on Programmable Logic Devices
NASA Technical Reports Server (NTRS)
Gwaltney, David A.; King, Kenneth D.; Smith, Keary J.; Monenegro, Justino (Technical Monitor)
2002-01-01
Much has been made of the capabilities of FPGA's (Field Programmable Gate Arrays) in the hardware implementation of fast digital signal processing. Such capability also makes an FPGA a suitable platform for the digital implementation of closed loop controllers. Other researchers have implemented a variety of closed-loop digital controllers on FPGA's. Some of these controllers include the widely used proportional-integral-derivative (PID) controller, state space controllers, neural network and fuzzy logic based controllers. There are myriad advantages to utilizing an FPGA for discrete-time control functions which include the capability for reconfiguration when SRAM-based FPGA's are employed, fast parallel implementation of multiple control loops and implementations that can meet space level radiation tolerance requirements in a compact form-factor. Generally, a software implementation on a DSP (Digital Signal Processor) or microcontroller is used to implement digital controllers. At Marshall Space Flight Center, the Control Electronics Group has been studying adaptive discrete-time control of motor driven actuator systems using digital signal processor (DSP) devices. While small form factor, commercial DSP devices are now available with event capture, data conversion, pulse width modulated (PWM) outputs and communication peripherals, these devices are not currently available in designs and packages which meet space level radiation requirements. In general, very few DSP devices are produced that are designed to meet any level of radiation tolerance or hardness. The goal of this effort is to create a fully digital, flight ready controller design that utilizes an FPGA for implementation of signal conditioning for control feedback signals, generation of commands to the controlled system, and hardware insertion of adaptive control algorithm approaches. An alternative is required for compact implementation of such functionality to withstand the harsh environment encountered on spacecraft. Radiation tolerant FPGA's are a feasible option for reaching this goal.
Scalable load balancing for massively parallel distributed Monte Carlo particle transport
DOE Office of Scientific and Technical Information (OSTI.GOV)
O'Brien, M. J.; Brantley, P. S.; Joy, K. I.
2013-07-01
In order to run computer simulations efficiently on massively parallel computers with hundreds of thousands or millions of processors, care must be taken that the calculation is load balanced across the processors. Examining the workload of every processor leads to an unscalable algorithm, with run time at least as large as O(N), where N is the number of processors. We present a scalable load balancing algorithm, with run time 0(log(N)), that involves iterated processor-pair-wise balancing steps, ultimately leading to a globally balanced workload. We demonstrate scalability of the algorithm up to 2 million processors on the Sequoia supercomputer at Lawrencemore » Livermore National Laboratory. (authors)« less
Army/NASA small turboshaft engine digital controls research program
NASA Technical Reports Server (NTRS)
Sellers, J. F.; Baez, A. N.
1981-01-01
The emphasis of a program to conduct digital controls research for small turboshaft engines is on engine test evaluation of advanced control logic using a flexible microprocessor based digital control system designed specifically for research on advanced control logic. Control software is stored in programmable memory. New control algorithms may be stored in a floppy disk and loaded directly into memory. This feature facilitates comparative evaluation of different advanced control modes. The central processor in the digital control is an Intel 8086 16 bit microprocessor. Control software is programmed in assembly language. Software checkout is accomplished prior to engine test by connecting the digital control to a real time hybrid computer simulation of the engine. The engine currently installed in the facility has a hydromechanical control modified to allow electrohydraulic fuel metering and VG actuation by the digital control. Simulation results are presented which show that the modern control reduces the transient rotor speed droop caused by unanticipated load changes such as cyclic pitch or wind gust transients.
Reconfigurable signal processor designs for advanced digital array radar systems
NASA Astrophysics Data System (ADS)
Suarez, Hernan; Zhang, Yan (Rockee); Yu, Xining
2017-05-01
The new challenges originated from Digital Array Radar (DAR) demands a new generation of reconfigurable backend processor in the system. The new FPGA devices can support much higher speed, more bandwidth and processing capabilities for the need of digital Line Replaceable Unit (LRU). This study focuses on using the latest Altera and Xilinx devices in an adaptive beamforming processor. The field reprogrammable RF devices from Analog Devices are used as analog front end transceivers. Different from other existing Software-Defined Radio transceivers on the market, this processor is designed for distributed adaptive beamforming in a networked environment. The following aspects of the novel radar processor will be presented: (1) A new system-on-chip architecture based on Altera's devices and adaptive processing module, especially for the adaptive beamforming and pulse compression, will be introduced, (2) Successful implementation of generation 2 serial RapidIO data links on FPGA, which supports VITA-49 radio packet format for large distributed DAR processing. (3) Demonstration of the feasibility and capabilities of the processor in a Micro-TCA based, SRIO switching backplane to support multichannel beamforming in real-time. (4) Application of this processor in ongoing radar system development projects, including OU's dual-polarized digital array radar, the planned new cylindrical array radars, and future airborne radars.
Scalable Domain Decomposed Monte Carlo Particle Transport
NASA Astrophysics Data System (ADS)
O'Brien, Matthew Joseph
In this dissertation, we present the parallel algorithms necessary to run domain decomposed Monte Carlo particle transport on large numbers of processors (millions of processors). Previous algorithms were not scalable, and the parallel overhead became more computationally costly than the numerical simulation. The main algorithms we consider are: • Domain decomposition of constructive solid geometry: enables extremely large calculations in which the background geometry is too large to fit in the memory of a single computational node. • Load Balancing: keeps the workload per processor as even as possible so the calculation runs efficiently. • Global Particle Find: if particles are on the wrong processor, globally resolve their locations to the correct processor based on particle coordinate and background domain. • Visualizing constructive solid geometry, sourcing particles, deciding that particle streaming communication is completed and spatial redecomposition. These algorithms are some of the most important parallel algorithms required for domain decomposed Monte Carlo particle transport. We demonstrate that our previous algorithms were not scalable, prove that our new algorithms are scalable, and run some of the algorithms up to 2 million MPI processes on the Sequoia supercomputer.
RASSP signal processing architectures
NASA Astrophysics Data System (ADS)
Shirley, Fred; Bassett, Bob; Letellier, J. P.
1995-06-01
The rapid prototyping of application specific signal processors (RASSP) program is an ARPA/tri-service effort to dramatically improve the process by which complex digital systems, particularly embedded signal processors, are specified, designed, documented, manufactured, and supported. The domain of embedded signal processing was chosen because it is important to a variety of military and commercial applications as well as for the challenge it presents in terms of complexity and performance demands. The principal effort is being performed by two major contractors, Lockheed Sanders (Nashua, NH) and Martin Marietta (Camden, NJ). For both, improvements in methodology are to be exercised and refined through the performance of individual 'Demonstration' efforts. The Lockheed Sanders' Demonstration effort is to develop an infrared search and track (IRST) processor. In addition, both contractors' results are being measured by a series of externally administered (by Lincoln Labs) six-month Benchmark programs that measure process improvement as a function of time. The first two Benchmark programs are designing and implementing a synthetic aperture radar (SAR) processor. Our demonstration team is using commercially available VME modules from Mercury Computer to assemble a multiprocessor system scalable from one to hundreds of Intel i860 microprocessors. Custom modules for the sensor interface and display driver are also being developed. This system implements either proprietary or Navy owned algorithms to perform the compute-intensive IRST function in real time in an avionics environment. Our Benchmark team is designing custom modules using commercially available processor ship sets, communication submodules, and reconfigurable logic devices. One of the modules contains multiple vector processors optimized for fast Fourier transform processing. Another module is a fiberoptic interface that accepts high-rate input data from the sensors and provides video-rate output data to a display. This paper discusses the impact of simulation on choosing signal processing algorithms and architectures, drawing from the experiences of the Demonstration and Benchmark inter-company teams at Lockhhed Sanders, Motorola, Hughes, and ISX.
NASA Astrophysics Data System (ADS)
Asztalos, Stephen J.; Hennig, Wolfgang; Warburton, William K.
2016-01-01
Pulse shape discrimination applied to certain fast scintillators is usually performed offline. In sufficiently high-event rate environments data transfer and storage become problematic, which suggests a different analysis approach. In response, we have implemented a general purpose pulse shape analysis algorithm in the XIA Pixie-500 and Pixie-500 Express digital spectrometers. In this implementation waveforms are processed in real time, reducing the pulse characteristics to a few pulse shape analysis parameters and eliminating time-consuming waveform transfer and storage. We discuss implementation of these features, their advantages, necessary trade-offs and performance. Measurements from bench top and experimental setups using fast scintillators and XIA processors are presented.
Abdellah, Marwan; Eldeib, Ayman; Owis, Mohamed I
2015-01-01
This paper features an advanced implementation of the X-ray rendering algorithm that harnesses the giant computing power of the current commodity graphics processors to accelerate the generation of high resolution digitally reconstructed radiographs (DRRs). The presented pipeline exploits the latest features of NVIDIA Graphics Processing Unit (GPU) architectures, mainly bindless texture objects and dynamic parallelism. The rendering throughput is substantially improved by exploiting the interoperability mechanisms between CUDA and OpenGL. The benchmarks of our optimized rendering pipeline reflect its capability of generating DRRs with resolutions of 2048(2) and 4096(2) at interactive and semi interactive frame-rates using an NVIDIA GeForce 970 GTX device.
LIBS data analysis using a predictor-corrector based digital signal processor algorithm
NASA Astrophysics Data System (ADS)
Sanders, Alex; Griffin, Steven T.; Robinson, Aaron
2012-06-01
There are many accepted sensor technologies for generating spectra for material classification. Once the spectra are generated, communication bandwidth limitations favor local material classification with its attendant reduction in data transfer rates and power consumption. Transferring sensor technologies such as Cavity Ring-Down Spectroscopy (CRDS) and Laser Induced Breakdown Spectroscopy (LIBS) require effective material classifiers. A result of recent efforts has been emphasis on Partial Least Squares - Discriminant Analysis (PLS-DA) and Principle Component Analysis (PCA). Implementation of these via general purpose computers is difficult in small portable sensor configurations. This paper addresses the creation of a low mass, low power, robust hardware spectra classifier for a limited set of predetermined materials in an atmospheric matrix. Crucial to this is the incorporation of PCA or PLS-DA classifiers into a predictor-corrector style implementation. The system configuration guarantees rapid convergence. Software running on multi-core Digital Signal Processor (DSPs) simulates a stream-lined plasma physics model estimator, reducing Analog-to-Digital (ADC) power requirements. This paper presents the results of a predictorcorrector model implemented on a low power multi-core DSP to perform substance classification. This configuration emphasizes the hardware system and software design via a predictor corrector model that simultaneously decreases the sample rate while performing the classification.
A portable detection instrument based on DSP for beef marbling
NASA Astrophysics Data System (ADS)
Zhou, Tong; Peng, Yankun
2014-05-01
Beef marbling is one of the most important indices to assess beef quality. Beef marbling is graded by the measurement of the fat distribution density in the rib-eye region. However quality grades of beef in most of the beef slaughtering houses and businesses depend on trainees using their visual senses or comparing the beef slice to the Chinese standard sample cards. Manual grading demands not only great labor but it also lacks objectivity and accuracy. Aiming at the necessity of beef slaughtering houses and businesses, a beef marbling detection instrument was designed. The instrument employs Charge-coupled Device (CCD) imaging techniques, digital image processing, Digital Signal Processor (DSP) control and processing techniques and Liquid Crystal Display (LCD) screen display techniques. The TMS320DM642 digital signal processor of Texas Instruments (TI) is the core that combines high-speed data processing capabilities and real-time processing features. All processes such as image acquisition, data transmission, image processing algorithms and display were implemented on this instrument for a quick, efficient, and non-invasive detection of beef marbling. Structure of the system, working principle, hardware and software are introduced in detail. The device is compact and easy to transport. The instrument can determine the grade of beef marbling reliably and correctly.
Software Acquisition Manager’s Workstation (SAM/WS) System Design.
1984-04-30
3. Tactical Digital System Requirements ..................... 31General...pspc t14 3. Tactical Digital System Requirements pspc-tiS 3.1 General pspc-t16 3.2 Program Description pspc-t17 3.2.1 General...pspc-t22 3.3.2 Digital Processor Input/Output Utilization Table pspc t23 3.3.3 Digital Processor Interface Block Diagram pspc-t24 3.3.4 Program
Digital system for structural dynamics simulation
NASA Technical Reports Server (NTRS)
Krauter, A. I.; Lagace, L. J.; Wojnar, M. K.; Glor, C.
1982-01-01
State-of-the-art digital hardware and software for the simulation of complex structural dynamic interactions, such as those which occur in rotating structures (engine systems). System were incorporated in a designed to use an array of processors in which the computation for each physical subelement or functional subsystem would be assigned to a single specific processor in the simulator. These node processors are microprogrammed bit-slice microcomputers which function autonomously and can communicate with each other and a central control minicomputer over parallel digital lines. Inter-processor nearest neighbor communications busses pass the constants which represent physical constraints and boundary conditions. The node processors are connected to the six nearest neighbor node processors to simulate the actual physical interface of real substructures. Computer generated finite element mesh and force models can be developed with the aid of the central control minicomputer. The control computer also oversees the animation of a graphics display system, disk-based mass storage along with the individual processing elements.
IMAGEP - A FORTRAN ALGORITHM FOR DIGITAL IMAGE PROCESSING
NASA Technical Reports Server (NTRS)
Roth, D. J.
1994-01-01
IMAGEP is a FORTRAN computer algorithm containing various image processing, analysis, and enhancement functions. It is a keyboard-driven program organized into nine subroutines. Within the subroutines are other routines, also, selected via keyboard. Some of the functions performed by IMAGEP include digitization, storage and retrieval of images; image enhancement by contrast expansion, addition and subtraction, magnification, inversion, and bit shifting; display and movement of cursor; display of grey level histogram of image; and display of the variation of grey level intensity as a function of image position. This algorithm has possible scientific, industrial, and biomedical applications in material flaw studies, steel and ore analysis, and pathology, respectively. IMAGEP is written in VAX FORTRAN for DEC VAX series computers running VMS. The program requires the use of a Grinnell 274 image processor which can be obtained from Mark McCloud Associates, Campbell, CA. An object library of the required GMR series software is included on the distribution media. IMAGEP requires 1Mb of RAM for execution. The standard distribution medium for this program is a 1600 BPI 9track magnetic tape in VAX FILES-11 format. It is also available on a TK50 tape cartridge in VAX FILES-11 format. This program was developed in 1991. DEC, VAX, VMS, and TK50 are trademarks of Digital Equipment Corporation.
Hardware realization of an SVM algorithm implemented in FPGAs
NASA Astrophysics Data System (ADS)
Wiśniewski, Remigiusz; Bazydło, Grzegorz; Szcześniak, Paweł
2017-08-01
The paper proposes a technique of hardware realization of a space vector modulation (SVM) of state function switching in matrix converter (MC), oriented on the implementation in a single field programmable gate array (FPGA). In MC the SVM method is based on the instantaneous space-vector representation of input currents and output voltages. The traditional computation algorithms usually involve digital signal processors (DSPs) which consumes the large number of power transistors (18 transistors and 18 independent PWM outputs) and "non-standard positions of control pulses" during the switching sequence. Recently, hardware implementations become popular since computed operations may be executed much faster and efficient due to nature of the digital devices (especially concurrency). In the paper, we propose a hardware algorithm of SVM computation. In opposite to the existing techniques, the presented solution applies COordinate Rotation DIgital Computer (CORDIC) method to solve the trigonometric operations. Furthermore, adequate arithmetic modules (that is, sub-devices) used for intermediate calculations, such as code converters or proper sectors selectors (for output voltages and input current) are presented in detail. The proposed technique has been implemented as a design described with the use of Verilog hardware description language. The preliminary results of logic implementation oriented on the Xilinx FPGA (particularly, low-cost device from Artix-7 family from Xilinx was used) are also presented.
NASA Astrophysics Data System (ADS)
Yu, Fei; Hui, Mei; Zhao, Yue-jin
2009-08-01
The image block matching algorithm based on motion vectors of correlative pixels in oblique direction is presented for digital image stabilization. The digital image stabilization is a new generation of image stabilization technique which can obtains the information of relative motion among frames of dynamic image sequences by the method of digital image processing. In this method the matching parameters are calculated from the vectors projected in the oblique direction. The matching parameters based on the vectors contain the information of vectors in transverse and vertical direction in the image blocks at the same time. So the better matching information can be obtained after making correlative operation in the oblique direction. And an iterative weighted least square method is used to eliminate the error of block matching. The weights are related with the pixels' rotational angle. The center of rotation and the global emotion estimation of the shaking image can be obtained by the weighted least square from the estimation of each block chosen evenly from the image. Then, the shaking image can be stabilized with the center of rotation and the global emotion estimation. Also, the algorithm can run at real time by the method of simulated annealing in searching method of block matching. An image processing system based on DSP was used to exam this algorithm. The core processor in the DSP system is TMS320C6416 of TI, and the CCD camera with definition of 720×576 pixels was chosen as the input video signal. Experimental results show that the algorithm can be performed at the real time processing system and have an accurate matching precision.
Picoradio: Communication/Computation Piconodes for Sensor Networks
2003-01-02
diagram of PicoNode III, or Quark node. It is made from two custom chips, Strange RF and Charm digital processor , and is complemented by a set of...the chipset comprising of Strange (analog OOK transceiver) and Charm (digital processor ) chips. 44 Figure 33: System block diagram of the Quark node...19 2.B PICONODE II - TWO-CHIP PICONODE IMPLEMENTATION ......................................... 21 2.B.1 Baseband processor (BBP
Systems and methods for performing wireless financial transactions
DOE Office of Scientific and Technical Information (OSTI.GOV)
McCown, Steven Harvey
2012-07-03
A secure computing module (SCM) is configured for connection with a host device. The SCM includes a processor for performing secure processing operations, a host interface for coupling the processor to the host device, and a memory connected to the processor wherein the processor logically isolates at least some of the memory from access by the host device. The SCM also includes a proximate-field wireless communicator connected to the processor to communicate with another SCM associated with another host device. The SCM generates a secure digital signature for a financial transaction package and communicates the package and the signature tomore » the other SCM using the proximate-field wireless communicator. Financial transactions are performed from person to person using the secure digital signature of each person's SCM and possibly message encryption. The digital signatures and transaction details are communicated to appropriate financial organizations to authenticate the transaction parties and complete the transaction.« less
Modeling heterogeneous processor scheduling for real time systems
NASA Technical Reports Server (NTRS)
Leathrum, J. F.; Mielke, R. R.; Stoughton, J. W.
1994-01-01
A new model is presented to describe dataflow algorithms implemented in a multiprocessing system. Called the resource/data flow graph (RDFG), the model explicitly represents cyclo-static processor schedules as circuits of processor arcs which reflect the order that processors execute graph nodes. The model also allows the guarantee of meeting hard real-time deadlines. When unfolded, the model identifies statically the processor schedule. The model therefore is useful for determining the throughput and latency of systems with heterogeneous processors. The applicability of the model is demonstrated using a space surveillance algorithm.
Reagor, David [Los Alamos, NM; Vasquez-Dominguez, Jose [Los Alamos, NM
2006-05-09
A method and apparatus for effective through-the-earth communication involves a signal input device connected to a transmitter operating at a predetermined frequency sufficiently low to effectively penetrate useful distances through-the earth, and having an analog to digital converter receiving the signal input and passing the signal input to a data compression circuit that is connected to an encoding processor, the encoding processor output being provided to a digital to analog converter. An amplifier receives the analog output from the digital to analog converter for amplifying said analog output and outputting said analog output to an antenna. A receiver having an antenna receives the analog output passes the analog signal to a band pass filter whose output is connected to an analog to digital converter that provides a digital signal to a decoding processor whose output is connected to an data decompressor, the data decompressor providing a decompressed digital signal to a digital to analog converter. An audio output device receives the analog output form the digital to analog converter for producing audible output.
Wang, Po T; Gandasetiawan, Keulanna; McCrimmon, Colin M; Karimi-Bidhendi, Alireza; Liu, Charles Y; Heydari, Payam; Nenadic, Zoran; Do, An H
2016-08-01
A fully implantable brain-computer interface (BCI) can be a practical tool to restore independence to those affected by spinal cord injury. We envision that such a BCI system will invasively acquire brain signals (e.g. electrocorticogram) and translate them into control commands for external prostheses. The feasibility of such a system was tested by implementing its benchtop analogue, centered around a commercial, ultra-low power (ULP) digital signal processor (DSP, TMS320C5517, Texas Instruments). A suite of signal processing and BCI algorithms, including (de)multiplexing, Fast Fourier Transform, power spectral density, principal component analysis, linear discriminant analysis, Bayes rule, and finite state machine was implemented and tested in the DSP. The system's signal acquisition fidelity was tested and characterized by acquiring harmonic signals from a function generator. In addition, the BCI decoding performance was tested, first with signals from a function generator, and subsequently using human electroencephalogram (EEG) during eyes opening and closing task. On average, the system spent 322 ms to process and analyze 2 s of data. Crosstalk (<;-65 dB) and harmonic distortion (~1%) were minimal. Timing jitter averaged 49 μs per 1000 ms. The online BCI decoding accuracies were 100% for both function generator and EEG data. These results show that a complex BCI algorithm can be executed on an ULP DSP without compromising performance. This suggests that the proposed hardware platform may be used as a basis for future, fully implantable BCI systems.
Page, Andrew J.; Keane, Thomas M.; Naughton, Thomas J.
2010-01-01
We present a multi-heuristic evolutionary task allocation algorithm to dynamically map tasks to processors in a heterogeneous distributed system. It utilizes a genetic algorithm, combined with eight common heuristics, in an effort to minimize the total execution time. It operates on batches of unmapped tasks and can preemptively remap tasks to processors. The algorithm has been implemented on a Java distributed system and evaluated with a set of six problems from the areas of bioinformatics, biomedical engineering, computer science and cryptography. Experiments using up to 150 heterogeneous processors show that the algorithm achieves better efficiency than other state-of-the-art heuristic algorithms. PMID:20862190
NASA Astrophysics Data System (ADS)
Edwards, A. W.; Blackler, K.; Gill, R. D.; van der Goot, E.; Holm, J.
1990-10-01
Based upon the experience gained with the present soft x-ray data acquisition system, new techniques are being developed which make extensive use of digital signal processors (DSPs). Digital filters make 13 further frequencies available in real time from the input sampling frequency of 200 kHz. In parallel, various algorithms running on further DSPs generate triggers in response to a range of events in the plasma. The sawtooth crash can be detected, for example, with a delay of only 50 μs from the onset of the collapse. The trigger processor interacts with the digital filter boards to ensure data of the appropriate frequency is recorded throughout a plasma discharge. An independent link is used to pass 780 and 24 Hz filtered data to a network of transputers. A full tomographic inversion and display of the 24 Hz data is carried out in real time using this 15 transputer array. The 780 Hz data are stored for immediate detailed playback following the pulse. Such a system could considerably improve the quality of present plasma diagnostic data which is, in general, sampled at one fixed frequency throughout a discharge. Further, it should provide valuable information towards designing diagnostic data acquisition systems for future long pulse operation machines when a high degree of real-time processing will be required, while retaining the ability to detect, record, and analyze events of interest within such long plasma discharges.
Improved Remapping Processor For Digital Imagery
NASA Technical Reports Server (NTRS)
Fisher, Timothy E.
1991-01-01
Proposed digital image processor improved version of Programmable Remapper, which performs geometric and radiometric transformations on digital images. Features include overlapping and variably sized preimages. Overcomes some of limitations of image-warping circuit boards implementing only those geometric tranformations expressible in terms of polynomials of limited order. Also overcomes limitations of existing Programmable Remapper and made to perform transformations at video rate.
A Survey of Parallel Sorting Algorithms.
1981-12-01
see that, in this algorithm, each Processor i, for 1 itp -2, interacts directly only with Processors i+l and i-l. Processor j 0 only interacts with...Chan76] Chandra, A.K., "Maximal Parallelism in Matrix Multiplication," IBM Report RC. 6193, Watson Research Center, Yorktown Heights, N.Y., October 1976
Parallel Directionally Split Solver Based on Reformulation of Pipelined Thomas Algorithm
NASA Technical Reports Server (NTRS)
Povitsky, A.
1998-01-01
In this research an efficient parallel algorithm for 3-D directionally split problems is developed. The proposed algorithm is based on a reformulated version of the pipelined Thomas algorithm that starts the backward step computations immediately after the completion of the forward step computations for the first portion of lines This algorithm has data available for other computational tasks while processors are idle from the Thomas algorithm. The proposed 3-D directionally split solver is based on the static scheduling of processors where local and non-local, data-dependent and data-independent computations are scheduled while processors are idle. A theoretical model of parallelization efficiency is used to define optimal parameters of the algorithm, to show an asymptotic parallelization penalty and to obtain an optimal cover of a global domain with subdomains. It is shown by computational experiments and by the theoretical model that the proposed algorithm reduces the parallelization penalty about two times over the basic algorithm for the range of the number of processors (subdomains) considered and the number of grid nodes per subdomain.
NASA Technical Reports Server (NTRS)
Peters, Kevin A.; Hammond, Ernest C., Jr.
1987-01-01
The age of the surf clam (Spisula solidissima) can be determined with the use of the Digital Image Processor. This technique is used in conjunction with a modified method for aging, refined by John Ropes of the Woods Hole Laboratory, Massachusetts. This method utilizes a thinned sectioned chondrophore of the surf clam which contains annual rings. The rings of the chondrophore are then counted to determine age. By digitizing the chondrophore, the Digital Image Processor is clearly able to separate these annual rings more accurately. This technique produces an easier and more efficient way to count annual rings to determine the age of the surf clam.
Finite elements and the method of conjugate gradients on a concurrent processor
NASA Technical Reports Server (NTRS)
Lyzenga, G. A.; Raefsky, A.; Hager, G. H.
1985-01-01
An algorithm for the iterative solution of finite element problems on a concurrent processor is presented. The method of conjugate gradients is used to solve the system of matrix equations, which is distributed among the processors of a MIMD computer according to an element-based spatial decomposition. This algorithm is implemented in a two-dimensional elastostatics program on the Caltech Hypercube concurrent processor. The results of tests on up to 32 processors show nearly linear concurrent speedup, with efficiencies over 90 percent for sufficiently large problems.
Finite elements and the method of conjugate gradients on a concurrent processor
NASA Technical Reports Server (NTRS)
Lyzenga, G. A.; Raefsky, A.; Hager, B. H.
1984-01-01
An algorithm for the iterative solution of finite element problems on a concurrent processor is presented. The method of conjugate gradients is used to solve the system of matrix equations, which is distributed among the processors of a MIMD computer according to an element-based spatial decomposition. This algorithm is implemented in a two-dimensional elastostatics program on the Caltech Hypercube concurrent processor. The results of tests on up to 32 processors show nearly linear concurrent speedup, with efficiencies over 90% for sufficiently large problems.
A Parallel Algorithm for Contact in a Finite Element Hydrocode
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pierce, Timothy G.
A parallel algorithm is developed for contact/impact of multiple three dimensional bodies undergoing large deformation. As time progresses the relative positions of contact between the multiple bodies changes as collision and sliding occurs. The parallel algorithm is capable of tracking these changes and enforcing an impenetrability constraint and momentum transfer across the surfaces in contact. Portions of the various surfaces of the bodies are assigned to the processors of a distributed-memory parallel machine in an arbitrary fashion, known as the primary decomposition. A secondary, dynamic decomposition is utilized to bring opposing sections of the contacting surfaces together on the samemore » processors, so that opposing forces may be balanced and the resultant deformation of the bodies calculated. The secondary decomposition is accomplished and updated using only local communication with a limited subset of neighbor processors. Each processor represents both a domain of the primary decomposition and a domain of the secondary, or contact, decomposition. Thus each processor has four sets of neighbor processors: (a) those processors which represent regions adjacent to it in the primary decomposition, (b) those processors which represent regions adjacent to it in the contact decomposition, (c) those processors which send it the data from which it constructs its contact domain, and (d) those processors to which it sends its primary domain data, from which they construct their contact domains. The latter three of these neighbor sets change dynamically as the simulation progresses. By constraining all communication to these sets of neighbors, all global communication, with its attendant nonscalable performance, is avoided. A set of tests are provided to measure the degree of scalability achieved by this algorithm on up to 1024 processors. Issues related to the operating system of the test platform which lead to some degradation of the results are analyzed. This algorithm has been implemented as the contact capability of the ALE3D multiphysics code, and is currently in production use.« less
NASA Technical Reports Server (NTRS)
Wu, C.; Barkan, B.; Huneycutt, B.; Leang, C.; Pang, S.
1981-01-01
Basic engineering data regarding the Interim Digital SAR Processor (IDP) and the digitally correlated Seasat synthetic aperature radar (SAR) imagery are presented. The correlation function and IDP hardware/software configuration are described, and a preliminary performance assessment presented. The geometric and radiometric characteristics, with special emphasis on those peculiar to the IDP produced imagery, are described.
User's guide to the Fault Inferring Nonlinear Detection System (FINDS) computer program
NASA Technical Reports Server (NTRS)
Caglayan, A. K.; Godiwala, P. M.; Satz, H. S.
1988-01-01
Described are the operation and internal structure of the computer program FINDS (Fault Inferring Nonlinear Detection System). The FINDS algorithm is designed to provide reliable estimates for aircraft position, velocity, attitude, and horizontal winds to be used for guidance and control laws in the presence of possible failures in the avionics sensors. The FINDS algorithm was developed with the use of a digital simulation of a commercial transport aircraft and tested with flight recorded data. The algorithm was then modified to meet the size constraints and real-time execution requirements on a flight computer. For the real-time operation, a multi-rate implementation of the FINDS algorithm has been partitioned to execute on a dual parallel processor configuration: one based on the translational dynamics and the other on the rotational kinematics. The report presents an overview of the FINDS algorithm, the implemented equations, the flow charts for the key subprograms, the input and output files, program variable indexing convention, subprogram descriptions, and the common block descriptions used in the program.
Automobile Crash Sensor Signal Processor
DOT National Transportation Integrated Search
1973-11-01
The crash sensor signal processor described interfaces between an automobile-installed doppler radar and an air bag activating solenoid or equivalent electromechanical device. The processor utilizes both digital and analog techniques to produce an ou...
Digital Platform for Wafer-Level MEMS Testing and Characterization Using Electrical Response
Brito, Nuno; Ferreira, Carlos; Alves, Filipe; Cabral, Jorge; Gaspar, João; Monteiro, João; Rocha, Luís
2016-01-01
The uniqueness of microelectromechanical system (MEMS) devices, with their multiphysics characteristics, presents some limitations to the borrowed test methods from traditional integrated circuits (IC) manufacturing. Although some improvements have been performed, this specific area still lags behind when compared to the design and manufacturing competencies developed over the last decades by the IC industry. A complete digital solution for fast testing and characterization of inertial sensors with built-in actuation mechanisms is presented in this paper, with a fast, full-wafer test as a leading ambition. The full electrical approach and flexibility of modern hardware design technologies allow a fast adaptation for other physical domains with minimum effort. The digital system encloses a processor and the tailored signal acquisition, processing, control, and actuation hardware control modules, capable of the structure position and response analysis when subjected to controlled actuation signals in real time. The hardware performance, together with the simplicity of the sequential programming on a processor, results in a flexible and powerful tool to evaluate the newest and fastest control algorithms. The system enables measurement of resonant frequency (Fr), quality factor (Q), and pull-in voltage (Vpi) within 1.5 s with repeatability better than 5 ppt (parts per thousand). A full-wafer with 420 devices under test (DUTs) has been evaluated detecting the faulty devices and providing important design specification feedback to the designers. PMID:27657087
Digital Platform for Wafer-Level MEMS Testing and Characterization Using Electrical Response.
Brito, Nuno; Ferreira, Carlos; Alves, Filipe; Cabral, Jorge; Gaspar, João; Monteiro, João; Rocha, Luís
2016-09-21
The uniqueness of microelectromechanical system (MEMS) devices, with their multiphysics characteristics, presents some limitations to the borrowed test methods from traditional integrated circuits (IC) manufacturing. Although some improvements have been performed, this specific area still lags behind when compared to the design and manufacturing competencies developed over the last decades by the IC industry. A complete digital solution for fast testing and characterization of inertial sensors with built-in actuation mechanisms is presented in this paper, with a fast, full-wafer test as a leading ambition. The full electrical approach and flexibility of modern hardware design technologies allow a fast adaptation for other physical domains with minimum effort. The digital system encloses a processor and the tailored signal acquisition, processing, control, and actuation hardware control modules, capable of the structure position and response analysis when subjected to controlled actuation signals in real time. The hardware performance, together with the simplicity of the sequential programming on a processor, results in a flexible and powerful tool to evaluate the newest and fastest control algorithms. The system enables measurement of resonant frequency (Fr), quality factor (Q), and pull-in voltage (Vpi) within 1.5 s with repeatability better than 5 ppt (parts per thousand). A full-wafer with 420 devices under test (DUTs) has been evaluated detecting the faulty devices and providing important design specification feedback to the designers.
Speech coding at 4800 bps for mobile satellite communications
NASA Technical Reports Server (NTRS)
Gersho, Allen; Chan, Wai-Yip; Davidson, Grant; Chen, Juin-Hwey; Yong, Mei
1988-01-01
A speech compression project has recently been completed to develop a speech coding algorithm suitable for operation in a mobile satellite environment aimed at providing telephone quality natural speech at 4.8 kbps. The work has resulted in two alternative techniques which achieve reasonably good communications quality at 4.8 kbps while tolerating vehicle noise and rather severe channel impairments. The algorithms are embodied in a compact self-contained prototype consisting of two AT and T 32-bit floating-point DSP32 digital signal processors (DSP). A Motorola 68HC11 microcomputer chip serves as the board controller and interface handler. On a wirewrapped card, the prototype's circuit footprint amounts to only 200 sq cm, and consumes about 9 watts of power.
DESDynI Quad First Stage Processor - A Four Channel Digitizer and Digital Beam Forming Processor
NASA Technical Reports Server (NTRS)
Chuang, Chung-Lun; Shaffer, Scott; Smythe, Robert; Niamsuwan, Noppasin; Li, Samuel; Liao, Eric; Lim, Chester; Morfopolous, Arin; Veilleux, Louise
2013-01-01
The proposed Deformation, Eco-Systems, and Dynamics of Ice Radar (DESDynI-R) L-band SAR instrument employs multiple digital channels to optimize resolution while keeping a large swath on a single pass. High-speed digitization with very fine synchronization and digital beam forming are necessary in order to facilitate this new technique. The Quad First Stage Processor (qFSP) was developed to achieve both the processing performance as well as the digitizing fidelity in order to accomplish this sweeping SAR technique. The qFSP utilizes high precision and high-speed analog to digital converters (ADCs), each with a finely adjustable clock distribution network to digitize the channels at the fidelity necessary to allow for digital beam forming. The Xilinx produced FX130T Virtex 5 part handles the processing to digitally calibrate each channel as well as filter and beam form the receive signals. Demonstrating the digital processing required for digital beam forming and digital calibration is instrumental to the viability of the proposed DESDynI instrument. The qFSP development brings this implementation to Technology Readiness Level (TRL) 6. This paper will detail the design and development of the prototype qFSP as well as the preliminary results from hardware tests.
ASR-9 processor augmentation card (9-PAC) phase II scan-scan correlator algorithms
DOT National Transportation Integrated Search
2001-04-26
The report documents the scan-scan correlator (tracker) algorithm developed for Phase II of the ASR-9 Processor Augmentation Card (9-PAC) project. The improved correlation and tracking algorithms in 9-PAC Phase II decrease the incidence of false-alar...
Eigensolution of finite element problems in a completely connected parallel architecture
NASA Technical Reports Server (NTRS)
Akl, Fred A.; Morel, Michael R.
1989-01-01
A parallel algorithm for the solution of the generalized eigenproblem in linear elastic finite element analysis, (K)(phi)=(M)(phi)(omega), where (K) and (M) are of order N, and (omega) is of order q is presented. The parallel algorithm is based on a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm has been successfully implemented on a tightly coupled multiple-instruction-multiple-data (MIMD) parallel processing computer, Cray X-MP. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor, or to a logical processor (task) if the number of domains exceeds the number of physical processors. The macro-tasking library routines are used in mapping each domain to a user task. Computational speed-up and efficiency are used to determine the effectiveness of the algorithm. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts and the dimension of the subspace on the performance of the algorithm are investigated. For a 64-element rectangular plate, speed-ups of 1.86, 3.13, 3.18 and 3.61 are achieved on two, four, six and eight processors, respectively.
Digital Flight Control System Redundancy Study
1974-07-01
has its own separate power supr, y . d. Digital Processor The digital processor consists of the followdnq components: (1) Program Counter - This...1-3 Yaw Axis Control 108 1-4 Autothrottle (Airspeed Hold Mode) 109 1-5 Approach Power Compensation 110 1-6 Glideslope Flare 111 I-7 Glideslope Track...considsred to the extent that they imposed constraints on the candidate con- figurations. Cost, size, weight, power , maintainability, survivability and
Handling of huge multispectral image data volumes from a spectral hole burning device (SHBD)
NASA Astrophysics Data System (ADS)
Graff, Werner; Rosselet, Armel C.; Wild, Urs P.; Gschwind, Rudolf; Keller, Christoph U.
1995-06-01
We use chlorin-doped polymer films at low temperatures as the primary imaging detector. Based on the principles of persistent spectral hole burning, this system is capable of storing spatial and spectral information simultaneously in one exposure with extremely high resolution. The sun as an extended light source has been imaged onto the film. The information recorded amounts to tens of GBytes. This data volume is read out by scanning the frequency of a tunable dye laser and reading the images with a digital CCD camera. For acquisition, archival, processing, and visualization, we use MUSIC (MUlti processor System with Intelligent Communication), a single instruction multiple data parallel processor system equipped with the necessary I/O facilities. The huge amount of data requires the developemnt of sophisticated algorithms to efficiently calibrate the data and to extract useful and new information for solar physics.
Parallel programming with Easy Java Simulations
NASA Astrophysics Data System (ADS)
Esquembre, F.; Christian, W.; Belloni, M.
2018-01-01
Nearly all of today's processors are multicore, and ideally programming and algorithm development utilizing the entire processor should be introduced early in the computational physics curriculum. Parallel programming is often not introduced because it requires a new programming environment and uses constructs that are unfamiliar to many teachers. We describe how we decrease the barrier to parallel programming by using a java-based programming environment to treat problems in the usual undergraduate curriculum. We use the easy java simulations programming and authoring tool to create the program's graphical user interface together with objects based on those developed by Kaminsky [Building Parallel Programs (Course Technology, Boston, 2010)] to handle common parallel programming tasks. Shared-memory parallel implementations of physics problems, such as time evolution of the Schrödinger equation, are available as source code and as ready-to-run programs from the AAPT-ComPADRE digital library.
Implicit, nonswitching, vector-oriented algorithm for steady transonic flow
NASA Technical Reports Server (NTRS)
Lottati, I.
1983-01-01
A rapid computation of a sequence of transonic flow solutions has to be performed in many areas of aerodynamic technology. The employment of low-cost vector array processors makes the conduction of such calculations economically feasible. However, for a full utilization of the new hardware, the developed algorithms must take advantage of the special characteristics of the vector array processor. The present investigation has the objective to develop an efficient algorithm for solving transonic flow problems governed by mixed partial differential equations on an array processor.
A VME-based software trigger system using UNIX processors
NASA Astrophysics Data System (ADS)
Atmur, Robert; Connor, David F.; Molzon, William
1997-02-01
We have constructed a distributed computing platform with eight processors to assemble and filter data from digitization crates. The filtered data were transported to a tape-writing UNIX computer via ethernet. Each processor ran a UNIX operating system and was installed in its own VME crate. Each VME crate contained dual-port memories which interfaced with the digitizers. Using standard hardware and software (VME and UNIX) allows us to select from a wide variety of non-proprietary products and makes upgrades simpler, if they are necessary.
NASA Astrophysics Data System (ADS)
Zhang, Yuli; Han, Jun; Weng, Xinqian; He, Zhongzhu; Zeng, Xiaoyang
This paper presents an Application Specific Instruction-set Processor (ASIP) for the SHA-3 BLAKE algorithm family by instruction set extensions (ISE) from an RISC (reduced instruction set computer) processor. With a design space exploration for this ASIP to increase the performance and reduce the area cost, we accomplish an efficient hardware and software implementation of BLAKE algorithm. The special instructions and their well-matched hardware function unit improve the calculation of the key section of the algorithm, namely G-functions. Also, relaxing the time constraint of the special function unit can decrease its hardware cost, while keeping the high data throughput of the processor. Evaluation results reveal the ASIP achieves 335Mbps and 176Mbps for BLAKE-256 and BLAKE-512. The extra area cost is only 8.06k equivalent gates. The proposed ASIP outperforms several software approaches on various platforms in cycle per byte. In fact, both high throughput and low hardware cost achieved by this programmable processor are comparable to that of ASIC implementations.
Autonomous Telemetry Collection for Single-Processor Small Satellites
NASA Technical Reports Server (NTRS)
Speer, Dave
2003-01-01
For the Space Technology 5 mission, which is being developed under NASA's New Millennium Program, a single spacecraft processor will be required to do on-board real-time computations and operations associated with attitude control, up-link and down-link communications, science data processing, solid-state recorder management, power switching and battery charge management, experiment data collection, health and status data collection, etc. Much of the health and status information is in analog form, and each of the analog signals must be routed to the input of an analog-to-digital converter, converted to digital form, and then stored in memory. If the micro-operations of the analog data collection process are implemented in software, the processor may use up a lot of time either waiting for the analog signal to settle, waiting for the analog-to-digital conversion to complete, or servicing a large number of high frequency interrupts. In order to off-load a very busy processor, the collection and digitization of all analog spacecraft health and status data will be done autonomously by a field-programmable gate array that can configure the analog signal chain, control the analog-to-digital converter, and store the converted data in memory.
Broadband set-top box using MAP-CA processor
NASA Astrophysics Data System (ADS)
Bush, John E.; Lee, Woobin; Basoglu, Chris
2001-12-01
Advances in broadband access are expected to exert a profound impact in our everyday life. It will be the key to the digital convergence of communication, computer and consumer equipment. A common thread that facilitates this convergence comprises digital media and Internet. To address this market, Equator Technologies, Inc., is developing the Dolphin broadband set-top box reference platform using its MAP-CA Broadband Signal ProcessorT chip. The Dolphin reference platform is a universal media platform for display and presentation of digital contents on end-user entertainment systems. The objective of the Dolphin reference platform is to provide a complete set-top box system based on the MAP-CA processor. It includes all the necessary hardware and software components for the emerging broadcast and the broadband digital media market based on IP protocols. Such reference design requires a broadband Internet access and high-performance digital signal processing. By using the MAP-CA processor, the Dolphin reference platform is completely programmable, allowing various codecs to be implemented in software, such as MPEG-2, MPEG-4, H.263 and proprietary codecs. The software implementation also enables field upgrades to keep pace with evolving technology and industry demands.
Interior Noise Reduction by Adaptive Feedback Vibration Control
NASA Technical Reports Server (NTRS)
Lim, Tae W.
1998-01-01
The objective of this project is to investigate the possible use of adaptive digital filtering techniques in simultaneous, multiple-mode identification of the modal parameters of a vibrating structure in real-time. It is intended that the results obtained from this project will be used for state estimation needed in adaptive structural acoustics control. The work done in this project is basically an extension of the work on real-time single mode identification, which was performed successfully using a digital signal processor (DSP) at NASA, Langley. Initially, in this investigation the single mode identification work was duplicated on a different processor, namely the Texas Instruments TMS32OC40 DSP. The system identification results for the single mode case were very good. Then an algorithm for simultaneous two mode identification was developed and tested using analytical simulation. When it successfully performed the expected tasks, it was implemented in real-time on the DSP system to identify the first two modes of vibration of a cantilever aluminum beam. The results of the simultaneous two mode case were good but some problems were identified related to frequency warping and spurious mode identification. The frequency warping problem was found to be due to the bilinear transformation used in the algorithm to convert the system transfer function from the continuous-time domain to the discrete-time domain. An alternative approach was developed to rectify the problem. The spurious mode identification problem was found to be associated with high sampling rates. Noise in the signal is suspected to be the cause of this problem but further investigation will be needed to clarify the cause. For simultaneous identification of more than two modes, it was found that theoretically an adaptive digital filter can be designed to identify the required number of modes, but the algebra became very complex which made it impossible to implement in the DSP system used in this study. The on-line identification algorithm developed in this research will be useful in constructing a state estimator for feedback vibration control.
Very low cost real time histogram-based contrast enhancer utilizing fixed-point DSP processing
NASA Astrophysics Data System (ADS)
McCaffrey, Nathaniel J.; Pantuso, Francis P.
1998-03-01
A real time contrast enhancement system utilizing histogram- based algorithms has been developed to operate on standard composite video signals. This low-cost DSP based system is designed with fixed-point algorithms and an off-chip look up table (LUT) to reduce the cost considerably over other contemporary approaches. This paper describes several real- time contrast enhancing systems advanced at the Sarnoff Corporation for high-speed visible and infrared cameras. The fixed-point enhancer was derived from these high performance cameras. The enhancer digitizes analog video and spatially subsamples the stream to qualify the scene's luminance. Simultaneously, the video is streamed through a LUT that has been programmed with the previous calculation. Reducing division operations by subsampling reduces calculation- cycles and also allows the processor to be used with cameras of nominal resolutions. All values are written to the LUT during blanking so no frames are lost. The enhancer measures 13 cm X 6.4 cm X 3.2 cm, operates off 9 VAC and consumes 12 W. This processor is small and inexpensive enough to be mounted with field deployed security cameras and can be used for surveillance, video forensics and real- time medical imaging.
High performance 3D adaptive filtering for DSP based portable medical imaging systems
NASA Astrophysics Data System (ADS)
Bockenbach, Olivier; Ali, Murtaza; Wainwright, Ian; Nadeski, Mark
2015-03-01
Portable medical imaging devices have proven valuable for emergency medical services both in the field and hospital environments and are becoming more prevalent in clinical settings where the use of larger imaging machines is impractical. Despite their constraints on power, size and cost, portable imaging devices must still deliver high quality images. 3D adaptive filtering is one of the most advanced techniques aimed at noise reduction and feature enhancement, but is computationally very demanding and hence often cannot be run with sufficient performance on a portable platform. In recent years, advanced multicore digital signal processors (DSP) have been developed that attain high processing performance while maintaining low levels of power dissipation. These processors enable the implementation of complex algorithms on a portable platform. In this study, the performance of a 3D adaptive filtering algorithm on a DSP is investigated. The performance is assessed by filtering a volume of size 512x256x128 voxels sampled at a pace of 10 MVoxels/sec with an Ultrasound 3D probe. Relative performance and power is addressed between a reference PC (Quad Core CPU) and a TMS320C6678 DSP from Texas Instruments.
Photonic quantum digital signatures operating over kilometer ranges in installed optical fiber
NASA Astrophysics Data System (ADS)
Collins, Robert J.; Fujiwara, Mikio; Amiri, Ryan; Honjo, Toshimori; Shimizu, Kaoru; Tamaki, Kiyoshi; Takeoka, Masahiro; Andersson, Erika; Buller, Gerald S.; Sasaki, Masahide
2016-10-01
The security of electronic communications is a topic that has gained noteworthy public interest in recent years. As a result, there is an increasing public recognition of the existence and importance of mathematically based approaches to digital security. Many of these implement digital signatures to ensure that a malicious party has not tampered with the message in transit, that a legitimate receiver can validate the identity of the signer and that messages are transferable. The security of most digital signature schemes relies on the assumed computational difficulty of solving certain mathematical problems. However, reports in the media have shown that certain implementations of such signature schemes are vulnerable to algorithmic breakthroughs and emerging quantum processing technologies. Indeed, even without quantum processors, the possibility remains that classical algorithmic breakthroughs will render these schemes insecure. There is ongoing research into information-theoretically secure signature schemes, where the security is guaranteed against an attacker with arbitrary computational resources. One such approach is quantum digital signatures. Quantum signature schemes can be made information-theoretically secure based on the laws of quantum mechanics while comparable classical protocols require additional resources such as anonymous broadcast and/or a trusted authority. Previously, most early demonstrations of quantum digital signatures required dedicated single-purpose hardware and operated over restricted ranges in a laboratory environment. Here, for the first time, we present a demonstration of quantum digital signatures conducted over several kilometers of installed optical fiber. The system reported here operates at a higher signature generation rate than previous fiber systems.
DFT algorithms for bit-serial GaAs array processor architectures
NASA Technical Reports Server (NTRS)
Mcmillan, Gary B.
1988-01-01
Systems and Processes Engineering Corporation (SPEC) has developed an innovative array processor architecture for computing Fourier transforms and other commonly used signal processing algorithms. This architecture is designed to extract the highest possible array performance from state-of-the-art GaAs technology. SPEC's architectural design includes a high performance RISC processor implemented in GaAs, along with a Floating Point Coprocessor and a unique Array Communications Coprocessor, also implemented in GaAs technology. Together, these data processors represent the latest in technology, both from an architectural and implementation viewpoint. SPEC has examined numerous algorithms and parallel processing architectures to determine the optimum array processor architecture. SPEC has developed an array processor architecture with integral communications ability to provide maximum node connectivity. The Array Communications Coprocessor embeds communications operations directly in the core of the processor architecture. A Floating Point Coprocessor architecture has been defined that utilizes Bit-Serial arithmetic units, operating at very high frequency, to perform floating point operations. These Bit-Serial devices reduce the device integration level and complexity to a level compatible with state-of-the-art GaAs device technology.
Rapid prototyping and evaluation of programmable SIMD SDR processors in LISA
NASA Astrophysics Data System (ADS)
Chen, Ting; Liu, Hengzhu; Zhang, Botao; Liu, Dongpei
2013-03-01
With the development of international wireless communication standards, there is an increase in computational requirement for baseband signal processors. Time-to-market pressure makes it impossible to completely redesign new processors for the evolving standards. Due to its high flexibility and low power, software defined radio (SDR) digital signal processors have been proposed as promising technology to replace traditional ASIC and FPGA fashions. In addition, there are large numbers of parallel data processed in computation-intensive functions, which fosters the development of single instruction multiple data (SIMD) architecture in SDR platform. So a new way must be found to prototype the SDR processors efficiently. In this paper we present a bit-and-cycle accurate model of programmable SIMD SDR processors in a machine description language LISA. LISA is a language for instruction set architecture which can gain rapid model at architectural level. In order to evaluate the availability of our proposed processor, three common baseband functions, FFT, FIR digital filter and matrix multiplication have been mapped on the SDR platform. Analytical results showed that the SDR processor achieved the maximum of 47.1% performance boost relative to the opponent processor.
An Efficient Solution Method for Multibody Systems with Loops Using Multiple Processors
NASA Technical Reports Server (NTRS)
Ghosh, Tushar K.; Nguyen, Luong A.; Quiocho, Leslie J.
2015-01-01
This paper describes a multibody dynamics algorithm formulated for parallel implementation on multiprocessor computing platforms using the divide-and-conquer approach. The system of interest is a general topology of rigid and elastic articulated bodies with or without loops. The algorithm divides the multibody system into a number of smaller sets of bodies in chain or tree structures, called "branches" at convenient joints called "connection points", and uses an Order-N (O (N)) approach to formulate the dynamics of each branch in terms of the unknown spatial connection forces. The equations of motion for the branches, leaving the connection forces as unknowns, are implemented in separate processors in parallel for computational efficiency, and the equations for all the unknown connection forces are synthesized and solved in one or several processors. The performances of two implementations of this divide-and-conquer algorithm in multiple processors are compared with an existing method implemented on a single processor.
Solving very large, sparse linear systems on mesh-connected parallel computers
NASA Technical Reports Server (NTRS)
Opsahl, Torstein; Reif, John
1987-01-01
The implementation of Pan and Reif's Parallel Nested Dissection (PND) algorithm on mesh connected parallel computers is described. This is the first known algorithm that allows very large, sparse linear systems of equations to be solved efficiently in polylog time using a small number of processors. How the processor bound of PND can be matched to the number of processors available on a given parallel computer by slowing down the algorithm by constant factors is described. Also, for the important class of problems where G(A) is a grid graph, a unique memory mapping that reduces the inter-processor communication requirements of PND to those that can be executed on mesh connected parallel machines is detailed. A description of an implementation on the Goodyear Massively Parallel Processor (MPP), located at Goddard is given. Also, a detailed discussion of data mappings and performance issues is given.
Computations on the massively parallel processor at the Goddard Space Flight Center
NASA Technical Reports Server (NTRS)
Strong, James P.
1991-01-01
Described are four significant algorithms implemented on the massively parallel processor (MPP) at the Goddard Space Flight Center. Two are in the area of image analysis. Of the other two, one is a mathematical simulation experiment and the other deals with the efficient transfer of data between distantly separated processors in the MPP array. The first algorithm presented is the automatic determination of elevations from stereo pairs. The second algorithm solves mathematical logistic equations capable of producing both ordered and chaotic (or random) solutions. This work can potentially lead to the simulation of artificial life processes. The third algorithm is the automatic segmentation of images into reasonable regions based on some similarity criterion, while the fourth is an implementation of a bitonic sort of data which significantly overcomes the nearest neighbor interconnection constraints on the MPP for transferring data between distant processors.
A general multiscroll Lorenz system family and its realization via digital signal processors.
Yu, Simin; Lü, Jinhu; Tang, Wallace K S; Chen, Guanrong
2006-09-01
This paper proposes a general multiscroll Lorenz system family by introducing a novel parameterized nth-order polynomial transformation. Some basic dynamical behaviors of this general multiscroll Lorenz system family are then investigated, including bifurcations, maximum Lyapunov exponents, and parameters regions. Furthermore, the general multiscroll Lorenz attractors are physically verified by using digital signal processors.
Bezanilla, F
1985-03-01
A modified digital audio processor, a video cassette recorder, and some simple added circuitry are assembled into a recording device of high capacity. The unit converts two analog channels into digital form at 44-kHz sampling rate and stores the information in digital form in a common video cassette. Bandwidth of each channel is from direct current to approximately 20 kHz and the dynamic range is close to 90 dB. The total storage capacity in a 3-h video cassette is 2 Gbytes. The information can be retrieved in analog or digital form.
Bezanilla, F
1985-01-01
A modified digital audio processor, a video cassette recorder, and some simple added circuitry are assembled into a recording device of high capacity. The unit converts two analog channels into digital form at 44-kHz sampling rate and stores the information in digital form in a common video cassette. Bandwidth of each channel is from direct current to approximately 20 kHz and the dynamic range is close to 90 dB. The total storage capacity in a 3-h video cassette is 2 Gbytes. The information can be retrieved in analog or digital form. PMID:3978213
Experience with a Genetic Algorithm Implemented on a Multiprocessor Computer
NASA Technical Reports Server (NTRS)
Plassman, Gerald E.; Sobieszczanski-Sobieski, Jaroslaw
2000-01-01
Numerical experiments were conducted to find out the extent to which a Genetic Algorithm (GA) may benefit from a multiprocessor implementation, considering, on one hand, that analyses of individual designs in a population are independent of each other so that they may be executed concurrently on separate processors, and, on the other hand, that there are some operations in a GA that cannot be so distributed. The algorithm experimented with was based on a gaussian distribution rather than bit exchange in the GA reproductive mechanism, and the test case was a hub frame structure of up to 1080 design variables. The experimentation engaging up to 128 processors confirmed expectations of radical elapsed time reductions comparing to a conventional single processor implementation. It also demonstrated that the time spent in the non-distributable parts of the algorithm and the attendant cross-processor communication may have a very detrimental effect on the efficient utilization of the multiprocessor machine and on the number of processors that can be used effectively in a concurrent manner. Three techniques were devised and tested to mitigate that effect, resulting in efficiency increasing to exceed 99 percent.
Design of a broadband active silencer using μ-synthesis
NASA Astrophysics Data System (ADS)
Bai, Mingsian R.; Zeung, Pingshun
2004-01-01
A robust spatially feedforward controller is developed for broadband attenuation of noise in ducts. To meet the requirements of robust performance and robust stability in the presence of plant uncertainties, a μ-synthesis procedure via D- K iteration is exploited to obtain the optimal controller. This approach considers uncertainties as modelling errors of the nominal plant in high frequency and is implemented using a floating point digital signal processor (DSP). Experimental investigation was undertaken on a finite-length duct to justify the proposed controller. The μ- controller is compared to other control algorithms such as the H2 method, the H∞ method and the filtered-U least mean square (FULMS) algorithm. Experimental results indicate that the proposed system has attained 25.8 dB maximal attenuation in the band 250-650 Hz.
Noise reduction and image enhancement using a hardware implementation of artificial neural networks
NASA Astrophysics Data System (ADS)
David, Robert; Williams, Erin; de Tremiolles, Ghislain; Tannhof, Pascal
1999-03-01
In this paper, we present a neural based solution developed for noise reduction and image enhancement using the ZISC, an IBM hardware processor which implements the Restricted Coulomb Energy algorithm and the K-Nearest Neighbor algorithm. Artificial neural networks present the advantages of processing time reduction in comparison with classical models, adaptability, and the weighted property of pattern learning. The goal of the developed application is image enhancement in order to restore old movies (noise reduction, focus correction, etc.), to improve digital television images, or to treat images which require adaptive processing (medical images, spatial images, special effects, etc.). Image results show a quantitative improvement over the noisy image as well as the efficiency of this system. Further enhancements are being examined to improve the output of the system.
Design and implementation of a hybrid sub-band acoustic echo canceller (AEC)
NASA Astrophysics Data System (ADS)
Bai, Mingsian R.; Yang, Cheng-Ken; Hur, Ker-Nan
2009-04-01
An efficient method is presented for implementing an acoustic echo canceller (AEC) that makes use of hybrid sub-band approach. The hybrid system is comprised of a fixed processor and an adaptive filter in each sub-band. The AEC aims at reducing the echo resulting from the acoustic feedback in loudspeaker-enclosure-microphone (LEM) systems such as teleconferencing and hands-free systems. In order to cancel the acoustical echo efficiently, various processing architectures including fixed filters, hybrid processors, and sub-band structure are investigated. A double-talk detector is incorporated into the proposed AEC to prevent the adaptive filter from diverging in double-talk situations. A de-correlation filter is also used alongside sub-band processing in order to enhance the performance and efficiency of AEC. All algorithms are implemented and verified on the platform of a fixed-point digital signal processor (DSP). The AECs are evaluated in terms of cancellation performance and computation complexity. In addition, listening tests are conducted to assess the subjective performance of the AECs. From the results, the proposed hybrid sub-band AEC was found to be the most effective among all methods in terms of echo reduction and timbral quality.
Real-Time Data Processing in the muon system of the D0 detector.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Neeti Parashar et al.
2001-07-03
This paper presents a real-time application of the 16-bit fixed point Digital Signal Processors (DSPs), in the Muon System of the D0 detector located at the Fermilab Tevatron, presently the world's highest-energy hadron collider. As part of the Upgrade for a run beginning in the year 2000, the system is required to process data at an input event rate of 10 KHz without incurring significant deadtime in readout. The ADSP21csp01 processor has high I/O bandwidth, single cycle instruction execution and fast task switching support to provide efficient multisignal processing. The processor's internal memory consists of 4K words of Program Memorymore » and 4K words of Data Memory. In addition there is an external memory of 32K words for general event buffering and 16K words of Dual port Memory for input data queuing. This DSP fulfills the requirement of the Muon subdetector systems for data readout. All error handling, buffering, formatting and transferring of the data to the various trigger levels of the data acquisition system is done in software. The algorithms developed for the system complete these tasks in about 20 {micro}s per event.« less
Implementation of pulse-coupled neural networks in a CNAPS environment.
Kinser, J M; Lindblad, T
1999-01-01
Pulse coupled neural networks (PCNN's) are biologically inspired algorithms very well suited for image/signal preprocessing. While several analog implementations are proposed we suggest a digital implementation in an existing environment, the connected network of adapted processors system (CNAPS). The reason for this is two fold. First, CNAPS is a commercially available chip which has been used for several neural-network implementations. Second, the PCNN is, in almost all applications, a very efficient component of a system requiring subsequent and additional processing. This may include gating, Fourier transforms, neural classifiers, data mining, etc, with or without feedback to the PCNN.
A VHDL Core for Intrinsic Evolution of Discrete Time Filters with Signal Feedback
NASA Technical Reports Server (NTRS)
Gwaltney, David A.; Dutton, Kenneth
2005-01-01
The design of an Evolvable Machine VHDL Core is presented, representing a discrete-time processing structure capable of supporting control system applications. This VHDL Core is implemented in an FPGA and is interfaced with an evolutionary algorithm implemented in firmware on a Digital Signal Processor (DSP) to create an evolvable system platform. The salient features of this architecture are presented. The capability to implement IIR filter structures is presented along with the results of the intrinsic evolution of a filter. The robustness of the evolved filter design is tested and its unique characteristics are described.
Iterative current mode per pixel ADC for 3D SoftChip implementation in CMOS
NASA Astrophysics Data System (ADS)
Lachowicz, Stefan W.; Rassau, Alexander; Lee, Seung-Minh; Eshraghian, Kamran; Lee, Mike M.
2003-04-01
Mobile multimedia communication has rapidly become a significant area of research and development constantly challenging boundaries on a variety of technological fronts. The processing requirements for the capture, conversion, compression, decompression, enhancement, display, etc. of increasingly higher quality multimedia content places heavy demands even on current ULSI (ultra large scale integration) systems, particularly for mobile applications where area and power are primary considerations. The ADC presented in this paper is designed for a vertically integrated (3D) system comprising two distinct layers bonded together using Indium bump technology. The top layer is a CMOS imaging array containing analogue-to-digital converters, and a buffer memory. The bottom layer takes the form of a configurable array processor (CAP), a highly parallel array of soft programmable processors capable of carrying out complex processing tasks directly on data stored in the top plane. This paper presents a ADC scheme for the image capture plane. The analogue photocurrent or sampled voltage is transferred to the ADC via a column or a column/row bus. In the proposed system, an array of analogue-to-digital converters is distributed, so that a one-bit cell is associated with one sensor. The analogue-to-digital converters are algorithmic current-mode converters. Eight such cells are cascaded to form an 8-bit converter. Additionally, each photo-sensor is equipped with a current memory cell, and multiple conversions are performed with scaled values of the photocurrent for colour processing.
APRON: A Cellular Processor Array Simulation and Hardware Design Tool
NASA Astrophysics Data System (ADS)
Barr, David R. W.; Dudek, Piotr
2009-12-01
We present a software environment for the efficient simulation of cellular processor arrays (CPAs). This software (APRON) is used to explore algorithms that are designed for massively parallel fine-grained processor arrays, topographic multilayer neural networks, vision chips with SIMD processor arrays, and related architectures. The software uses a highly optimised core combined with a flexible compiler to provide the user with tools for the design of new processor array hardware architectures and the emulation of existing devices. We present performance benchmarks for the software processor array implemented on standard commodity microprocessors. APRON can be configured to use additional processing hardware if necessary and can be used as a complete graphical user interface and development environment for new or existing CPA systems, allowing more users to develop algorithms for CPA systems.
Conjugate-Gradient Algorithms For Dynamics Of Manipulators
NASA Technical Reports Server (NTRS)
Fijany, Amir; Scheid, Robert E.
1993-01-01
Algorithms for serial and parallel computation of forward dynamics of multiple-link robotic manipulators by conjugate-gradient method developed. Parallel algorithms have potential for speedup of computations on multiple linked, specialized processors implemented in very-large-scale integrated circuits. Such processors used to stimulate dynamics, possibly faster than in real time, for purposes of planning and control.
A universal computer control system for motors
NASA Technical Reports Server (NTRS)
Szakaly, Zoltan F. (Inventor)
1991-01-01
A control system for a multi-motor system such as a space telerobot, having a remote computational node and a local computational node interconnected with one another by a high speed data link is described. A Universal Computer Control System (UCCS) for the telerobot is located at each node. Each node is provided with a multibus computer system which is characterized by a plurality of processors with all processors being connected to a common bus, and including at least one command processor. The command processor communicates over the bus with a plurality of joint controller cards. A plurality of direct current torque motors, of the type used in telerobot joints and telerobot hand-held controllers, are connected to the controller cards and responds to digital control signals from the command processor. Essential motor operating parameters are sensed by analog sensing circuits and the sensed analog signals are converted to digital signals for storage at the controller cards where such signals can be read during an address read/write cycle of the command processing processor.
Design, characterization and control of the Unique Mobility Corporation robot
NASA Technical Reports Server (NTRS)
Velasco, Virgilio B., Jr.; Newman, Wyatt S.; Steinetz, Bruce; Kopf, Carlo; Malik, John
1994-01-01
Space and mass are at a premium on any space mission, and thus any machinery designed for space use should be lightweight and compact, without sacrificing strength. It is for this reason that NASA/LeRC contracted Unique Mobility Corporation to exploit their novel actuator designs to build a robot that would advance the present state of technology with respect to these requirements. Custom-designed motors are the key feature of this robot. They are compact, high-performance dc brushless servo motors with a high pole count and low inductance, thus permitting high torque generation and rapid phase commutation. Using a custom-designed digital signal processor-based controller board, the pulse width modulation power amplifiers regulate the fast dynamics of the motor currents. In addition, the programmable digital signal processor (DSP) controller permits implementation of nonlinear compensation algorithms to account for motoring vs. regeneration, torque ripple, and back-EMF. As a result, the motors produce a high torque relative to their size and weight, and can do so with good torque regulation and acceptably high velocity saturation limits. This paper presents the Unique Mobility Corporation robot prototype: its actuators, its kinematic design, its control system, and its experimental characterization. Performance results, including saturation torques, saturation velocities and tracking accuracy tests are included.
Scalable Domain Decomposed Monte Carlo Particle Transport
DOE Office of Scientific and Technical Information (OSTI.GOV)
O'Brien, Matthew Joseph
2013-12-05
In this dissertation, we present the parallel algorithms necessary to run domain decomposed Monte Carlo particle transport on large numbers of processors (millions of processors). Previous algorithms were not scalable, and the parallel overhead became more computationally costly than the numerical simulation.
Real-time trajectory optimization on parallel processors
NASA Technical Reports Server (NTRS)
Psiaki, Mark L.
1993-01-01
A parallel algorithm has been developed for rapidly solving trajectory optimization problems. The goal of the work has been to develop an algorithm that is suitable to do real-time, on-line optimal guidance through repeated solution of a trajectory optimization problem. The algorithm has been developed on an INTEL iPSC/860 message passing parallel processor. It uses a zero-order-hold discretization of a continuous-time problem and solves the resulting nonlinear programming problem using a custom-designed augmented Lagrangian nonlinear programming algorithm. The algorithm achieves parallelism of function, derivative, and search direction calculations through the principle of domain decomposition applied along the time axis. It has been encoded and tested on 3 example problems, the Goddard problem, the acceleration-limited, planar minimum-time to the origin problem, and a National Aerospace Plane minimum-fuel ascent guidance problem. Execution times as fast as 118 sec of wall clock time have been achieved for a 128-stage Goddard problem solved on 32 processors. A 32-stage minimum-time problem has been solved in 151 sec on 32 processors. A 32-stage National Aerospace Plane problem required 2 hours when solved on 32 processors. A speed-up factor of 7.2 has been achieved by using 32-nodes instead of 1-node to solve a 64-stage Goddard problem.
A Novel Optical/digital Processing System for Pattern Recognition
NASA Technical Reports Server (NTRS)
Boone, Bradley G.; Shukla, Oodaye B.
1993-01-01
This paper describes two processing algorithms that can be implemented optically: the Radon transform and angular correlation. These two algorithms can be combined in one optical processor to extract all the basic geometric and amplitude features from objects embedded in video imagery. We show that the internal amplitude structure of objects is recovered by the Radon transform, which is a well-known result, but, in addition, we show simulation results that calculate angular correlation, a simple but unique algorithm that extracts object boundaries from suitably threshold images from which length, width, area, aspect ratio, and orientation can be derived. In addition to circumventing scale and rotation distortions, these simulations indicate that the features derived from the angular correlation algorithm are relatively insensitive to tracking shifts and image noise. Some optical architecture concepts, including one based on micro-optical lenslet arrays, have been developed to implement these algorithms. Simulation test and evaluation using simple synthetic object data will be described, including results of a study that uses object boundaries (derivable from angular correlation) to classify simple objects using a neural network.
NASA Astrophysics Data System (ADS)
Alonso, R.; Villuendas, F.; Borja, J.; Barragán, L. A.; Salinas, I.
2003-05-01
A versatile, low-cost, digital signal processor (DSP) based lock-in module with external reference is described. This module is used to implement an industrial spectrophotometer for measuring spectral transmission and reflection of automotive and architectonic coating glasses over the ultraviolet, visible and near-infrared wavelength range. The light beams are modulated with an optical chopper. A digital phase-locked loop (DPLL) is used to lock the lock-in to the chop frequency. The lock-in rejects the ambient radiation and permits the spectrophotometer to work in the presence of ambient light. The algorithm that implements the dual lock-in and the DPLL in the DSP56002 evaluation module from Motorola is described. The use of a DSP allows implementation of the lock-in and DPLL by software, which gives flexibility and programmability to the system. Lock-in module cost, under 300 euro, is an important parameter taking into account that two modules are used in the system. Besides, the algorithms implemented in this DSP can be directly implemented in the latest DSP generations. The DPLL performance and the spectrophotometer are characterized. Capture and lock DPLL ranges have been measured and checked to be greater than the chop frequency drifts. The lock-in measured frequency response shows that the lock-in performs as theoretically predicted.
General optical discrete z transform: design and application.
Ngo, Nam Quoc
2016-12-20
This paper presents a generalization of the discrete z transform algorithm. It is shown that the GOD-ZT algorithm is a generalization of several important conventional discrete transforms. Based on the GOD-ZT algorithm, a tunable general optical discrete z transform (GOD-ZT) processor is synthesized using the silica-based finite impulse response transversal filter. To demonstrate the effectiveness of the method, the design and simulation of a tunable optical discrete Fourier transform (ODFT) processor as a special case of the synthesized GOD-ZT processor is presented. It is also shown that the ODFT processor can function as a real-time optical spectrum analyzer. The tunable ODFT has an important potential application as a tunable optical demultiplexer at the receiver end of an optical orthogonal frequency-division multiplexing transmission system.
Scalable Multiprocessor for High-Speed Computing in Space
NASA Technical Reports Server (NTRS)
Lux, James; Lang, Minh; Nishimoto, Kouji; Clark, Douglas; Stosic, Dorothy; Bachmann, Alex; Wilkinson, William; Steffke, Richard
2004-01-01
A report discusses the continuing development of a scalable multiprocessor computing system for hard real-time applications aboard a spacecraft. "Hard realtime applications" signifies applications, like real-time radar signal processing, in which the data to be processed are generated at "hundreds" of pulses per second, each pulse "requiring" millions of arithmetic operations. In these applications, the digital processors must be tightly integrated with analog instrumentation (e.g., radar equipment), and data input/output must be synchronized with analog instrumentation, controlled to within fractions of a microsecond. The scalable multiprocessor is a cluster of identical commercial-off-the-shelf generic DSP (digital-signal-processing) computers plus generic interface circuits, including analog-to-digital converters, all controlled by software. The processors are computers interconnected by high-speed serial links. Performance can be increased by adding hardware modules and correspondingly modifying the software. Work is distributed among the processors in a parallel or pipeline fashion by means of a flexible master/slave control and timing scheme. Each processor operates under its own local clock; synchronization is achieved by broadcasting master time signals to all the processors, which compute offsets between the master clock and their local clocks.
Limit characteristics of digital optoelectronic processor
NASA Astrophysics Data System (ADS)
Kolobrodov, V. G.; Tymchik, G. S.; Kolobrodov, M. S.
2018-01-01
In this article, the limiting characteristics of a digital optoelectronic processor are explored. The limits are defined by diffraction effects and a matrix structure of the devices for input and output of optical signals. The purpose of a present research is to optimize the parameters of the processor's components. The developed physical and mathematical model of DOEP allowed to establish the limit characteristics of the processor, restricted by diffraction effects and an array structure of the equipment for input and output of optical signals, as well as to optimize the parameters of the processor's components. The diameter of the entrance pupil of the Fourier lens is determined by the size of SLM and the pixel size of the modulator. To determine the spectral resolution, it is offered to use a concept of an optimum phase when the resolved diffraction maxima coincide with the pixel centers of the radiation detector.
NASA Astrophysics Data System (ADS)
Rerucha, Simon; Sarbort, Martin; Hola, Miroslava; Cizek, Martin; Hucl, Vaclav; Cip, Ondrej; Lazar, Josef
2016-12-01
The homodyne detection with only a single detector represents a promising approach in the interferometric application which enables a significant reduction of the optical system complexity while preserving the fundamental resolution and dynamic range of the single frequency laser interferometers. We present the design, implementation and analysis of algorithmic methods for computational processing of the single-detector interference signal based on parallel pipelined processing suitable for real time implementation on a programmable hardware platform (e.g. the FPGA - Field Programmable Gate Arrays or the SoC - System on Chip). The algorithmic methods incorporate (a) the single detector signal (sine) scaling, filtering, demodulations and mixing necessary for the second (cosine) quadrature signal reconstruction followed by a conic section projection in Cartesian plane as well as (a) the phase unwrapping together with the goniometric and linear transformations needed for the scale linearization and periodic error correction. The digital computing scheme was designed for bandwidths up to tens of megahertz which would allow to measure the displacements at the velocities around half metre per second. The algorithmic methods were tested in real-time operation with a PC-based reference implementation that employed the advantage pipelined processing by balancing the computational load among multiple processor cores. The results indicate that the algorithmic methods are suitable for a wide range of applications [3] and that they are bringing the fringe counting interferometry closer to the industrial applications due to their optical setup simplicity and robustness, computational stability, scalability and also a cost-effectiveness.
System-Level Design of a 64-Channel Low Power Neural Spike Recording Sensor.
Delgado-Restituto, Manuel; Rodriguez-Perez, Alberto; Darie, Angela; Soto-Sanchez, Cristina; Fernandez-Jover, Eduardo; Rodriguez-Vazquez, Angel
2017-04-01
This paper reports an integrated 64-channel neural spike recording sensor, together with all the circuitry to process and configure the channels, process the neural data, transmit via a wireless link the information and receive the required instructions. Neural signals are acquired, filtered, digitized and compressed in the channels. Additionally, each channel implements an auto-calibration algorithm which individually configures the transfer characteristics of the recording site. The system has two transmission modes; in one case the information captured by the channels is sent as uncompressed raw data; in the other, feature vectors extracted from the detected neural spikes are released. Data streams coming from the channels are serialized by the embedded digital processor. Experimental results, including in vivo measurements, show that the power consumption of the complete system is lower than 330 μW.
A system-level view of optimizing high-channel-count wireless biosignal telemetry.
Chandler, Rodney J; Gibson, Sarah; Karkare, Vaibhav; Farshchi, Shahin; Marković, Dejan; Judy, Jack W
2009-01-01
In this paper we perform a system-level analysis of a wireless biosignal telemetry system. We perform an analysis of each major system component (e.g., analog front end, analog-to-digital converter, digital signal processor, and wireless link), in which we consider physical, algorithmic, and design limitations. Since there are a wide range applications for wireless biosignal telemetry systems, each with their own unique set of requirements for key parameters (e.g., channel count, power dissipation, noise level, number of bits, etc.), our analysis is equally broad. The net result is a set of plots, in which the power dissipation for each component and as the system as a whole, are plotted as a function of the number of channels for different architectural strategies. These results are also compared to existing implementations of complete wireless biosignal telemetry systems.
A parallel algorithm for generation and assembly of finite element stiffness and mass matrices
NASA Technical Reports Server (NTRS)
Storaasli, O. O.; Carmona, E. A.; Nguyen, D. T.; Baddourah, M. A.
1991-01-01
A new algorithm is proposed for parallel generation and assembly of the finite element stiffness and mass matrices. The proposed assembly algorithm is based on a node-by-node approach rather than the more conventional element-by-element approach. The new algorithm's generality and computation speed-up when using multiple processors are demonstrated for several practical applications on multi-processor Cray Y-MP and Cray 2 supercomputers.
Parallel Implementation of the Wideband DOA Algorithm on the IBM Cell BE Processor
2010-05-01
Abstract—The Multiple Signal Classification ( MUSIC ) algorithm is a powerful technique for determining the Direction of Arrival (DOA) of signals...Broadband Engine Processor (Cell BE). The process of adapting the serial based MUSIC algorithm to the Cell BE will be analyzed in terms of parallelism and...using Multiple Signal Classification MUSIC algorithm [4] • Computation of Focus matrix • Computation of number of sources • Separation of Signal
Noncoherent parallel optical processor for discrete two-dimensional linear transformations.
Glaser, I
1980-10-01
We describe a parallel optical processor, based on a lenslet array, that provides general linear two-dimensional transformations using noncoherent light. Such a processor could become useful in image- and signal-processing applications in which the throughput requirements cannot be adequately satisfied by state-of-the-art digital processors. Experimental results that illustrate the feasibility of the processor by demonstrating its use in parallel optical computation of the two-dimensional Walsh-Hadamard transformation are presented.
MPI Enhancements in John the Ripper
NASA Astrophysics Data System (ADS)
Sykes, Edward R.; Lin, Michael; Skoczen, Wesley
2010-11-01
John the Ripper (JtR) is an open source software package commonly used by system administrators to enforce password policy. JtR is designed to attack (i.e., crack) passwords encrypted in a wide variety of commonly used formats. While parallel implementations of JtR exist, there are several limitations to them. This research reports on two distinct algorithms that enhance this password cracking tool using the Message Passing Interface. The first algorithm is a novel approach that uses numerous processors to crack one password by using an innovative approach to workload distribution. In this algorithm the candidate password is distributed to all participating processors and the word list is divided based on probability so that each processor has the same likelihood of cracking the password while eliminating overlapping operations. The second algorithm developed in this research involves dividing the passwords within a password file equally amongst available processors while ensuring load-balanced and fault-tolerant behavior. This paper describes John the Ripper, the design of these two algorithms and preliminary results. Given the same amount of time, the original JtR can crack 29 passwords, whereas our algorithms 1 and 2 can crack an additional 35 and 45 passwords respectively.
Certification of windshear performance with RTCA class D radomes
NASA Technical Reports Server (NTRS)
Mathews, Bruce D.; Miller, Fran; Rittenhouse, Kirk; Barnett, Lee; Rowe, William
1994-01-01
Superposition testing of detection range performance forms a digital signal for input into a simulation of signal and data processing equipment and algorithms to be employed in a sensor system for advanced warning of hazardous windshear. For suitable pulse-Doppler radar, recording of the digital data at the input to the digital signal processor furnishes a realistic operational scenario and environmentally responsive clutter signal including all sidelobe clutter, ground moving target indications (GMTI), and large signal spurious due to mainbeam clutter and/or RFI respective of the urban airport clutter and aircraft scenarios (approach and landing antenna pointing). For linear radar system processes, a signal at the same point in the process from a hazard phenomena may be calculated from models of the scattering phenomena, for example, as represented in fine 3 dimensional reflectivity and velocity grid structures. Superposition testing furnishes a competing signal environment for detection and warning time performance confirmation of phenomena uncontrollable in a natural environment.
Goavec-Mérou, G; Chrétien, N; Friedt, J-M; Sandoz, P; Martin, G; Lenczner, M; Ballandras, S
2014-01-01
Vibrating mechanical structure characterization is demonstrated using contactless techniques best suited for mobile and rotating equipments. Fast measurement rates are achieved using Field Programmable Gate Array (FPGA) devices as real-time digital signal processors. Two kinds of algorithms are implemented on FPGA and experimentally validated in the case of the vibrating tuning fork. A first application concerns in-plane displacement detection by vision with sampling rates above 10 kHz, thus reaching frequency ranges above the audio range. A second demonstration concerns pulsed-RADAR cooperative target phase detection and is applied to radiofrequency acoustic transducers used as passive wireless strain gauges. In this case, the 250 ksamples/s refresh rate achieved is only limited by the acoustic sensor design but not by the detection bandwidth. These realizations illustrate the efficiency, interest, and potentialities of FPGA-based real-time digital signal processing for the contactless interrogation of passive embedded probes with high refresh rates.
MAP3D: a media processor approach for high-end 3D graphics
NASA Astrophysics Data System (ADS)
Darsa, Lucia; Stadnicki, Steven; Basoglu, Chris
1999-12-01
Equator Technologies, Inc. has used a software-first approach to produce several programmable and advanced VLIW processor architectures that have the flexibility to run both traditional systems tasks and an array of media-rich applications. For example, Equator's MAP1000A is the world's fastest single-chip programmable signal and image processor targeted for digital consumer and office automation markets. The Equator MAP3D is a proposal for the architecture of the next generation of the Equator MAP family. The MAP3D is designed to achieve high-end 3D performance and a variety of customizable special effects by combining special graphics features with high performance floating-point and media processor architecture. As a programmable media processor, it offers the advantages of a completely configurable 3D pipeline--allowing developers to experiment with different algorithms and to tailor their pipeline to achieve the highest performance for a particular application. With the support of Equator's advanced C compiler and toolkit, MAP3D programs can be written in a high-level language. This allows the compiler to successfully find and exploit any parallelism in a programmer's code, thus decreasing the time to market of a given applications. The ability to run an operating system makes it possible to run concurrent applications in the MAP3D chip, such as video decoding while executing the 3D pipelines, so that integration of applications is easily achieved--using real-time decoded imagery for texturing 3D objects, for instance. This novel architecture enables an affordable, integrated solution for high performance 3D graphics.
Computer Sciences and Data Systems, volume 2
NASA Technical Reports Server (NTRS)
1987-01-01
Topics addressed include: data storage; information network architecture; VHSIC technology; fiber optics; laser applications; distributed processing; spaceborne optical disk controller; massively parallel processors; and advanced digital SAR processors.
Large-Scale Parallel Viscous Flow Computations using an Unstructured Multigrid Algorithm
NASA Technical Reports Server (NTRS)
Mavriplis, Dimitri J.
1999-01-01
The development and testing of a parallel unstructured agglomeration multigrid algorithm for steady-state aerodynamic flows is discussed. The agglomeration multigrid strategy uses a graph algorithm to construct the coarse multigrid levels from the given fine grid, similar to an algebraic multigrid approach, but operates directly on the non-linear system using the FAS (Full Approximation Scheme) approach. The scalability and convergence rate of the multigrid algorithm are examined on the SGI Origin 2000 and the Cray T3E. An argument is given which indicates that the asymptotic scalability of the multigrid algorithm should be similar to that of its underlying single grid smoothing scheme. For medium size problems involving several million grid points, near perfect scalability is obtained for the single grid algorithm, while only a slight drop-off in parallel efficiency is observed for the multigrid V- and W-cycles, using up to 128 processors on the SGI Origin 2000, and up to 512 processors on the Cray T3E. For a large problem using 25 million grid points, good scalability is observed for the multigrid algorithm using up to 1450 processors on a Cray T3E, even when the coarsest grid level contains fewer points than the total number of processors.
Parallel Algorithms for Switching Edges in Heterogeneous Graphs.
Bhuiyan, Hasanuzzaman; Khan, Maleq; Chen, Jiangzhuo; Marathe, Madhav
2017-06-01
An edge switch is an operation on a graph (or network) where two edges are selected randomly and one of their end vertices are swapped with each other. Edge switch operations have important applications in graph theory and network analysis, such as in generating random networks with a given degree sequence, modeling and analyzing dynamic networks, and in studying various dynamic phenomena over a network. The recent growth of real-world networks motivates the need for efficient parallel algorithms. The dependencies among successive edge switch operations and the requirement to keep the graph simple (i.e., no self-loops or parallel edges) as the edges are switched lead to significant challenges in designing a parallel algorithm. Addressing these challenges requires complex synchronization and communication among the processors leading to difficulties in achieving a good speedup by parallelization. In this paper, we present distributed memory parallel algorithms for switching edges in massive networks. These algorithms provide good speedup and scale well to a large number of processors. A harmonic mean speedup of 73.25 is achieved on eight different networks with 1024 processors. One of the steps in our edge switch algorithms requires the computation of multinomial random variables in parallel. This paper presents the first non-trivial parallel algorithm for the problem, achieving a speedup of 925 using 1024 processors.
Parallel Algorithms for Switching Edges in Heterogeneous Graphs☆
Khan, Maleq; Chen, Jiangzhuo; Marathe, Madhav
2017-01-01
An edge switch is an operation on a graph (or network) where two edges are selected randomly and one of their end vertices are swapped with each other. Edge switch operations have important applications in graph theory and network analysis, such as in generating random networks with a given degree sequence, modeling and analyzing dynamic networks, and in studying various dynamic phenomena over a network. The recent growth of real-world networks motivates the need for efficient parallel algorithms. The dependencies among successive edge switch operations and the requirement to keep the graph simple (i.e., no self-loops or parallel edges) as the edges are switched lead to significant challenges in designing a parallel algorithm. Addressing these challenges requires complex synchronization and communication among the processors leading to difficulties in achieving a good speedup by parallelization. In this paper, we present distributed memory parallel algorithms for switching edges in massive networks. These algorithms provide good speedup and scale well to a large number of processors. A harmonic mean speedup of 73.25 is achieved on eight different networks with 1024 processors. One of the steps in our edge switch algorithms requires the computation of multinomial random variables in parallel. This paper presents the first non-trivial parallel algorithm for the problem, achieving a speedup of 925 using 1024 processors. PMID:28757680
Onboard Radar Processing Development for Rapid Response Applications
NASA Technical Reports Server (NTRS)
Lou, Yunling; Chien, Steve; Clark, Duane; Doubleday, Josh; Muellerschoen, Ron; Wang, Charles C.
2011-01-01
We are developing onboard processor (OBP) technology to streamline data acquisition on-demand and explore the potential of the L-band SAR instrument onboard the proposed DESDynI mission and UAVSAR for rapid response applications. The technology would enable the observation and use of surface change data over rapidly evolving natural hazards, both as an aid to scientific understanding and to provide timely data to agencies responsible for the management and mitigation of natural disasters. We are adapting complex science algorithms for surface water extent to detect flooding, snow/water/ice classification to assist in transportation/ shipping forecasts, and repeat-pass change detection to detect disturbances. We are near completion of the development of a custom FPGA board to meet the specific memory and processing needs of L-band SAR processor algorithms and high speed interfaces to reformat and route raw radar data to/from the FPGA processor board. We have also developed a high fidelity Matlab model of the SAR processor that is modularized and parameterized for ease to prototype various SAR processor algorithms targeted for the FPGA. We will be testing the OBP and rapid response algorithms with UAVSAR data to determine the fidelity of the products.
Digital Phase-Locked Loop With Phase And Frequency Feedback
NASA Technical Reports Server (NTRS)
Thomas, J. Brooks
1991-01-01
Advanced design for digital phase-lock loop (DPLL) allows loop gains higher than those used in other designs. Divided into two major components: counterrotation processor and tracking processor. Notable features include use of both phase and rate-of-change-of-phase feedback instead of frequency feedback alone, normalized sine phase extractor, improved method for extracting measured phase, and improved method for "compressing" output rate.
A novel VLSI processor architecture for supercomputing arrays
NASA Technical Reports Server (NTRS)
Venkateswaran, N.; Pattabiraman, S.; Devanathan, R.; Ahmed, Ashaf; Venkataraman, S.; Ganesh, N.
1993-01-01
Design of the processor element for general purpose massively parallel supercomputing arrays is highly complex and cost ineffective. To overcome this, the architecture and organization of the functional units of the processor element should be such as to suit the diverse computational structures and simplify mapping of complex communication structures of different classes of algorithms. This demands that the computation and communication structures of different class of algorithms be unified. While unifying the different communication structures is a difficult process, analysis of a wide class of algorithms reveals that their computation structures can be expressed in terms of basic IP,IP,OP,CM,R,SM, and MAA operations. The execution of these operations is unified on the PAcube macro-cell array. Based on this PAcube macro-cell array, we present a novel processor element called the GIPOP processor, which has dedicated functional units to perform the above operations. The architecture and organization of these functional units are such to satisfy the two important criteria mentioned above. The structure of the macro-cell and the unification process has led to a very regular and simpler design of the GIPOP processor. The production cost of the GIPOP processor is drastically reduced as it is designed on high performance mask programmable PAcube arrays.
Event processing in X-IFU detector onboard Athena.
NASA Astrophysics Data System (ADS)
Ceballos, M. T.; Cobos, B.; van der Kuurs, J.; Fraga-Encinas, R.
2015-05-01
The X-ray Observatory ATHENA was proposed in April 2014 as the mission to implement the science theme "The Hot and Energetic Universe" selected by ESA for L2 (the second Large-class mission in ESA's Cosmic Vision science programme). One of the two X-ray detectors designed to be onboard ATHENA is X-IFU, a cryogenic microcalorimeter based on Transition Edge Sensor (TES) technology that will provide spatially resolved high-resolution spectroscopy. X-IFU will be developed by a consortium of European research institutions currently from France (leadership), Italy, The Netherlands, Belgium, UK, Germany and Spain. From Spain, IFCA (CSIC-UC) is involved in the Digital Readout Electronics (DRE) unit of the X-IFU detector, in particular in the Event Processor Subsytem. We at IFCA are in charge of the development and implementation in the DRE unit of the Event Processing algorithms, designed to recognize, from a noisy signal, the intensity pulses generated by the absorption of the X-ray photons, and lately extract their main parameters (coordinates, energy, arrival time, grade, etc.) Here we will present the design and performance of the algorithms developed for the event recognition (adjusted derivative), and pulse grading/qualification as well as the progress in the algorithms designed to extract the energy content of the pulses (pulse optimal filtering). IFCA will finally have the responsibility of the implementation on board in the (TBD) FPGAs or micro-processors of the DRE unit, where this Event Processing part will take place, to fit into the limited telemetry of the instrument.
Eight-Channel Digital Signal Processor and Universal Trigger Module
NASA Astrophysics Data System (ADS)
Skulski, Wojtek; Wolfs, Frank
2003-04-01
A 10-bit, 8-channel, 40 megasamples per second digital signal processor and waveform digitizer DDC-8 (nicknamed Universal Trigger Module) is presented. The digitizer features 8 analog inputs, 1 analog output for a reconstructed analog waveform, 16 NIM logic inputs, 8 NIM logic outputs, and a pool of 16 TTL logic lines which can be individually configured as either inputs or outputs. The first application of this device is to enhance the present trigger electronics for PHOBOS at RHIC. The status of the development and the first results are presented. Possible applications of the new device are discussed. Supported by the NSF grant PHY-0072204.
Digital Plasma Control System for Alcator C-Mod
NASA Astrophysics Data System (ADS)
Ferrara, M.; Wolfe, S.; Stillerman, J.; Fredian, T.; Hutchinson, I.
2004-11-01
A digital plasma control system (DPCS) has been designed to replace the present C-Mod system, which is based on hybrid analog-digital computer. The initial implementation of DPCS comprises two 64 channel, 16 bit, low-latency cPCI digitizers, each with 16 analog outputs, controlled by a rack-mounted single-processor Linux server, which also serves as the compute engine. A prototype system employing three older 32 channel digitizers was tested during the 2003-04 campaign. The hybrid's linear PID feedback system was emulated by IDL code executing a synchronous loop, using the same target waveforms and control parameters. Reliable real-time operation was accomplished under a standard Linux OS (RH9) by locking memory and disabling interrupts during the plasma pulse. The DPCS-computed outputs agreed to within a few percent with those produced by the hybrid system, except for discrepancies due to offsets and non-ideal behavior of the hybrid circuitry. The system operated reliably, with no sample loss, at more than twice the 10kHz design specification, providing extra time for implementing more advanced control algorithms. The code is fault-tolerant and produces consistent output waveforms even with 10% sample loss.
A microcomputer interface for a digital audio processor-based data recording system.
Croxton, T L; Stump, S J; Armstrong, W M
1987-10-01
An inexpensive interface is described that performs direct transfer of digitized data from the digital audio processor and video cassette recorder based data acquisition system designed by Bezanilla (1985, Biophys. J., 47:437-441) to an IBM PC/XT microcomputer. The FORTRAN callable software that drives this interface is capable of controlling the video cassette recorder and starting data collection immediately after recognition of a segment of previously collected data. This permits piecewise analysis of long intervals of data that would otherwise exceed the memory capability of the microcomputer.
A microcomputer interface for a digital audio processor-based data recording system.
Croxton, T L; Stump, S J; Armstrong, W M
1987-01-01
An inexpensive interface is described that performs direct transfer of digitized data from the digital audio processor and video cassette recorder based data acquisition system designed by Bezanilla (1985, Biophys. J., 47:437-441) to an IBM PC/XT microcomputer. The FORTRAN callable software that drives this interface is capable of controlling the video cassette recorder and starting data collection immediately after recognition of a segment of previously collected data. This permits piecewise analysis of long intervals of data that would otherwise exceed the memory capability of the microcomputer. PMID:3676444
Spacecube: A Family of Reconfigurable Hybrid On-Board Science Data Processors
NASA Technical Reports Server (NTRS)
Flatley, Thomas P.
2015-01-01
SpaceCube is a family of Field Programmable Gate Array (FPGA) based on-board science data processing systems developed at the NASA Goddard Space Flight Center (GSFC). The goal of the SpaceCube program is to provide 10x to 100x improvements in on-board computing power while lowering relative power consumption and cost. SpaceCube is based on the Xilinx Virtex family of FPGAs, which include processor, FPGA logic and digital signal processing (DSP) resources. These processing elements are leveraged to produce a hybrid science data processing platform that accelerates the execution of algorithms by distributing computational functions to the most suitable elements. This approach enables the implementation of complex on-board functions that were previously limited to ground based systems, such as on-board product generation, data reduction, calibration, classification, eventfeature detection, data mining and real-time autonomous operations. The system is fully reconfigurable in flight, including data parameters, software and FPGA logic, through either ground commanding or autonomously in response to detected eventsfeatures in the instrument data stream.
Performance analysis of a large-grain dataflow scheduling paradigm
NASA Technical Reports Server (NTRS)
Young, Steven D.; Wills, Robert W.
1993-01-01
A paradigm for scheduling computations on a network of multiprocessors using large-grain data flow scheduling at run time is described and analyzed. The computations to be scheduled must follow a static flow graph, while the schedule itself will be dynamic (i.e., determined at run time). Many applications characterized by static flow exist, and they include real-time control and digital signal processing. With the advent of computer-aided software engineering (CASE) tools for capturing software designs in dataflow-like structures, macro-dataflow scheduling becomes increasingly attractive, if not necessary. For parallel implementations, using the macro-dataflow method allows the scheduling to be insulated from the application designer and enables the maximum utilization of available resources. Further, by allowing multitasking, processor utilizations can approach 100 percent while they maintain maximum speedup. Extensive simulation studies are performed on 4-, 8-, and 16-processor architectures that reflect the effects of communication delays, scheduling delays, algorithm class, and multitasking on performance and speedup gains.
Matrix preconditioning: a robust operation for optical linear algebra processors.
Ghosh, A; Paparao, P
1987-07-15
Analog electrooptical processors are best suited for applications demanding high computational throughput with tolerance for inaccuracies. Matrix preconditioning is one such application. Matrix preconditioning is a preprocessing step for reducing the condition number of a matrix and is used extensively with gradient algorithms for increasing the rate of convergence and improving the accuracy of the solution. In this paper, we describe a simple parallel algorithm for matrix preconditioning, which can be implemented efficiently on a pipelined optical linear algebra processor. From the results of our numerical experiments we show that the efficacy of the preconditioning algorithm is affected very little by the errors of the optical system.
A Real-Time System for Lane Detection Based on FPGA and DSP
NASA Astrophysics Data System (ADS)
Xiao, Jing; Li, Shutao; Sun, Bin
2016-12-01
This paper presents a real-time lane detection system including edge detection and improved Hough Transform based lane detection algorithm and its hardware implementation with field programmable gate array (FPGA) and digital signal processor (DSP). Firstly, gradient amplitude and direction information are combined to extract lane edge information. Then, the information is used to determine the region of interest. Finally, the lanes are extracted by using improved Hough Transform. The image processing module of the system consists of FPGA and DSP. Particularly, the algorithms implemented in FPGA are working in pipeline and processing in parallel so that the system can run in real-time. In addition, DSP realizes lane line extraction and display function with an improved Hough Transform. The experimental results show that the proposed system is able to detect lanes under different road situations efficiently and effectively.
Pre-Hardware Optimization of Spacecraft Image Processing Algorithms and Hardware Implementation
NASA Technical Reports Server (NTRS)
Kizhner, Semion; Petrick, David J.; Flatley, Thomas P.; Hestnes, Phyllis; Jentoft-Nilsen, Marit; Day, John H. (Technical Monitor)
2002-01-01
Spacecraft telemetry rates and telemetry product complexity have steadily increased over the last decade presenting a problem for real-time processing by ground facilities. This paper proposes a solution to a related problem for the Geostationary Operational Environmental Spacecraft (GOES-8) image data processing and color picture generation application. Although large super-computer facilities are the obvious heritage solution, they are very costly, making it imperative to seek a feasible alternative engineering solution at a fraction of the cost. The proposed solution is based on a Personal Computer (PC) platform and synergy of optimized software algorithms, and reconfigurable computing hardware (RC) technologies, such as Field Programmable Gate Arrays (FPGA) and Digital Signal Processors (DSP). It has been shown that this approach can provide superior inexpensive performance for a chosen application on the ground station or on-board a spacecraft.
Moving Object Detection Using Scanning Camera on a High-Precision Intelligent Holder.
Chen, Shuoyang; Xu, Tingfa; Li, Daqun; Zhang, Jizhou; Jiang, Shenwang
2016-10-21
During the process of moving object detection in an intelligent visual surveillance system, a scenario with complex background is sure to appear. The traditional methods, such as "frame difference" and "optical flow", may not able to deal with the problem very well. In such scenarios, we use a modified algorithm to do the background modeling work. In this paper, we use edge detection to get an edge difference image just to enhance the ability of resistance illumination variation. Then we use a "multi-block temporal-analyzing LBP (Local Binary Pattern)" algorithm to do the segmentation. In the end, a connected component is used to locate the object. We also produce a hardware platform, the core of which consists of the DSP (Digital Signal Processor) and FPGA (Field Programmable Gate Array) platforms and the high-precision intelligent holder.
Method and apparatus for high speed data acquisition and processing
Ferron, J.R.
1997-02-11
A method and apparatus are disclosed for high speed digital data acquisition. The apparatus includes one or more multiplexers for receiving multiple channels of digital data at a low data rate and asserting a multiplexed data stream at a high data rate, and one or more FIFO memories for receiving data from the multiplexers and asserting the data to a real time processor. Preferably, the invention includes two multiplexers, two FIFO memories, and a 64-bit bus connecting the FIFO memories with the processor. Each multiplexer receives four channels of 14-bit digital data at a rate of up to 5 MHz per channel, and outputs a data stream to one of the FIFO memories at a rate of 20 MHz. The FIFO memories assert output data in parallel to the 64-bit bus, thus transferring 14-bit data values to the processor at a combined rate of 40 MHz. The real time processor is preferably a floating-point processor which processes 32-bit floating-point words. A set of mask bits is prestored in each 32-bit storage location of the processor memory into which a 14-bit data value is to be written. After data transfer from the FIFO memories, mask bits are concatenated with each stored 14-bit data value to define a valid 32-bit floating-point word. Preferably, a user can select any of several modes for starting and stopping direct memory transfers of data from the FIFO memories to memory within the real time processor, by setting the content of a control and status register. 15 figs.
Method and apparatus for high speed data acquisition and processing
Ferron, John R.
1997-01-01
A method and apparatus for high speed digital data acquisition. The apparatus includes one or more multiplexers for receiving multiple channels of digital data at a low data rate and asserting a multiplexed data stream at a high data rate, and one or more FIFO memories for receiving data from the multiplexers and asserting the data to a real time processor. Preferably, the invention includes two multiplexers, two FIFO memories, and a 64-bit bus connecting the FIFO memories with the processor. Each multiplexer receives four channels of 14-bit digital data at a rate of up to 5 MHz per channel, and outputs a data stream to one of the FIFO memories at a rate of 20 MHz. The FIFO memories assert output data in parallel to the 64-bit bus, thus transferring 14-bit data values to the processor at a combined rate of 40 MHz. The real time processor is preferably a floating-point processor which processes 32-bit floating-point words. A set of mask bits is prestored in each 32-bit storage location of the processor memory into which a 14-bit data value is to be written. After data transfer from the FIFO memories, mask bits are concatenated with each stored 14-bit data value to define a valid 32-bit floating-point word. Preferably, a user can select any of several modes for starting and stopping direct memory transfers of data from the FIFO memories to memory within the real time processor, by setting the content of a control and status register.
Database for LDV Signal Processor Performance Analysis
NASA Technical Reports Server (NTRS)
Baker, Glenn D.; Murphy, R. Jay; Meyers, James F.
1989-01-01
A comparative and quantitative analysis of various laser velocimeter signal processors is difficult because standards for characterizing signal bursts have not been established. This leaves the researcher to select a signal processor based only on manufacturers' claims without the benefit of direct comparison. The present paper proposes the use of a database of digitized signal bursts obtained from a laser velocimeter under various configurations as a method for directly comparing signal processors.
Study of a hybrid multispectral processor
NASA Technical Reports Server (NTRS)
Marshall, R. E.; Kriegler, F. J.
1973-01-01
A hybrid processor is described offering enough handling capacity and speed to process efficiently the large quantities of multispectral data that can be gathered by scanner systems such as MSDS, SKYLAB, ERTS, and ERIM M-7. Combinations of general-purpose and special-purpose hybrid computers were examined to include both analog and digital types as well as all-digital configurations. The current trend toward lower costs for medium-scale digital circuitry suggests that the all-digital approach may offer the better solution within the time frame of the next few years. The study recommends and defines such a hybrid digital computing system in which both special-purpose and general-purpose digital computers would be employed. The tasks of recognizing surface objects would be performed in a parallel, pipeline digital system while the tasks of control and monitoring would be handled by a medium-scale minicomputer system. A program to design and construct a small, prototype, all-digital system has been started.
Onboard Data Processors for Planetary Ice-Penetrating Sounding Radars
NASA Astrophysics Data System (ADS)
Tan, I. L.; Friesenhahn, R.; Gim, Y.; Wu, X.; Jordan, R.; Wang, C.; Clark, D.; Le, M.; Hand, K. P.; Plaut, J. J.
2011-12-01
Among the many concerns faced by outer planetary missions, science data storage and transmission hold special significance. Such missions must contend with limited onboard storage, brief data downlink windows, and low downlink bandwidths. A potential solution to these issues lies in employing onboard data processors (OBPs) to convert raw data into products that are smaller and closely capture relevant scientific phenomena. In this paper, we present the implementation of two OBP architectures for ice-penetrating sounding radars tasked with exploring Europa and Ganymede. Our first architecture utilizes an unfocused processing algorithm extended from the Mars Advanced Radar for Subsurface and Ionosphere Sounding (MARSIS, Jordan et. al. 2009). Compared to downlinking raw data, we are able to reduce data volume by approximately 100 times through OBP usage. To ensure the viability of our approach, we have implemented, simulated, and synthesized this architecture using both VHDL and Matlab models (with fixed-point and floating-point arithmetic) in conjunction with Modelsim. Creation of a VHDL model of our processor is the principle step in transitioning to actual digital hardware, whether in a FPGA (field-programmable gate array) or an ASIC (application-specific integrated circuit), and successful simulation and synthesis strongly indicate feasibility. In addition, we examined the tradeoffs faced in the OBP between fixed-point accuracy, resource consumption, and data product fidelity. Our second architecture is based upon a focused fast back projection (FBP) algorithm that requires a modest amount of computing power and on-board memory while yielding high along-track resolution and improved slope detection capability. We present an overview of the algorithm and details of our implementation, also in VHDL. With the appropriate tradeoffs, the use of OBPs can significantly reduce data downlink requirements without sacrificing data product fidelity. Through the development, simulation, and synthesis of two different OBP architectures, we have proven the feasibility and efficacy of an OBP for planetary ice-penetrating radars.
Digital Hardware Architecture Implementation
1993-02-15
of micro - MOTOROLA 63.7 50MHZ 64 BIT 2092 N/A processors during quarterly re- INTEL 42 50MHz 64 BIT 1092 N/A views and monthly reports. The 186o XP...27 3.2.1 Signal Processor (SP) Analysis...31 3.2.1.11 MasPar Software Statements ........................................................ 32 3.2.2 Data Processor
A class of parallel algorithms for computation of the manipulator inertia matrix
NASA Technical Reports Server (NTRS)
Fijany, Amir; Bejczy, Antal K.
1989-01-01
Parallel and parallel/pipeline algorithms for computation of the manipulator inertia matrix are presented. An algorithm based on composite rigid-body spatial inertia method, which provides better features for parallelization, is used for the computation of the inertia matrix. Two parallel algorithms are developed which achieve the time lower bound in computation. Also described is the mapping of these algorithms with topological variation on a two-dimensional processor array, with nearest-neighbor connection, and with cardinality variation on a linear processor array. An efficient parallel/pipeline algorithm for the linear array was also developed, but at significantly higher efficiency.
Optimal processor assignment for pipeline computations
NASA Technical Reports Server (NTRS)
Nicol, David M.; Simha, Rahul; Choudhury, Alok N.; Narahari, Bhagirath
1991-01-01
The availability of large scale multitasked parallel architectures introduces the following processor assignment problem for pipelined computations. Given a set of tasks and their precedence constraints, along with their experimentally determined individual responses times for different processor sizes, find an assignment of processor to tasks. Two objectives are of interest: minimal response given a throughput requirement, and maximal throughput given a response time requirement. These assignment problems differ considerably from the classical mapping problem in which several tasks share a processor; instead, it is assumed that a large number of processors are to be assigned to a relatively small number of tasks. Efficient assignment algorithms were developed for different classes of task structures. For a p processor system and a series parallel precedence graph with n constituent tasks, an O(np2) algorithm is provided that finds the optimal assignment for the response time optimization problem; it was found that the assignment optimizing the constrained throughput in O(np2log p) time. Special cases of linear, independent, and tree graphs are also considered.
NASA Technical Reports Server (NTRS)
Kriegler, F. J.; Gordon, M. F.; Mclaughlin, R. H.; Marshall, R. E.
1975-01-01
The MIDAS (Multivariate Interactive Digital Analysis System) processor is a high-speed processor designed to process multispectral scanner data (from Landsat, EOS, aircraft, etc.) quickly and cost-effectively to meet the requirements of users of remote sensor data, especially from very large areas. MIDAS consists of a fast multipipeline preprocessor and classifier, an interactive color display and color printer, and a medium scale computer system for analysis and control. The system is designed to process data having as many as 16 spectral bands per picture element at rates of 200,000 picture elements per second into as many as 17 classes using a maximum likelihood decision rule.
Bringing Algorithms to Life: Cooperative Computing Activities Using Students as Processors.
ERIC Educational Resources Information Center
Bachelis, Gregory F.; And Others
1994-01-01
Presents cooperative computing activities in which each student plays the role of a switch or processor and acts out algorithms. Includes binary counting, finding the smallest card in a deck, sorting by selection and merging, adding and multiplying large numbers, and sieving for primes. (16 references) (Author/MKR)
Adaptive Load-Balancing Algorithms Using Symmetric Broadcast Networks
NASA Technical Reports Server (NTRS)
Das, Sajal K.; Biswas, Rupak; Chancellor, Marisa K. (Technical Monitor)
1997-01-01
In a distributed-computing environment, it is important to ensure that the processor workloads are adequately balanced. Among numerous load-balancing algorithms, a unique approach due to Dam and Prasad defines a symmetric broadcast network (SBN) that provides a robust communication pattern among the processors in a topology-independent manner. In this paper, we propose and analyze three novel SBN-based load-balancing algorithms, and implement them on an SP2. A thorough experimental study with Poisson-distributed synthetic loads demonstrates that these algorithms are very effective in balancing system load while minimizing processor idle time. They also compare favorably with several other existing load-balancing techniques. Additional experiments performed with real data demonstrate that the SBN approach is effective in adaptive computational science and engineering applications where dynamic load balancing is extremely crucial.
Solving the Cauchy-Riemann equations on parallel computers
NASA Technical Reports Server (NTRS)
Fatoohi, Raad A.; Grosch, Chester E.
1987-01-01
Discussed is the implementation of a single algorithm on three parallel-vector computers. The algorithm is a relaxation scheme for the solution of the Cauchy-Riemann equations; a set of coupled first order partial differential equations. The computers were chosen so as to encompass a variety of architectures. They are: the MPP, and SIMD machine with 16K bit serial processors; FLEX/32, an MIMD machine with 20 processors; and CRAY/2, an MIMD machine with four vector processors. The machine architectures are briefly described. The implementation of the algorithm is discussed in relation to these architectures and measures of the performance on each machine are given. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Conclusions are presented.
Optimally stopped variational quantum algorithms
NASA Astrophysics Data System (ADS)
Vinci, Walter; Shabani, Alireza
2018-04-01
Quantum processors promise a paradigm shift in high-performance computing which needs to be assessed by accurate benchmarking measures. In this article, we introduce a benchmark for the variational quantum algorithm (VQA), recently proposed as a heuristic algorithm for small-scale quantum processors. In VQA, a classical optimization algorithm guides the processor's quantum dynamics to yield the best solution for a given problem. A complete assessment of the scalability and competitiveness of VQA should take into account both the quality and the time of dynamics optimization. The method of optimal stopping, employed here, provides such an assessment by explicitly including time as a cost factor. Here, we showcase this measure for benchmarking VQA as a solver for some quadratic unconstrained binary optimization. Moreover, we show that a better choice for the cost function of the classical routine can significantly improve the performance of the VQA algorithm and even improve its scaling properties.
Comparing an FPGA to a Cell for an Image Processing Application
NASA Astrophysics Data System (ADS)
Rakvic, Ryan N.; Ngo, Hau; Broussard, Randy P.; Ives, Robert W.
2010-12-01
Modern advancements in configurable hardware, most notably Field-Programmable Gate Arrays (FPGAs), have provided an exciting opportunity to discover the parallel nature of modern image processing algorithms. On the other hand, PlayStation3 (PS3) game consoles contain a multicore heterogeneous processor known as the Cell, which is designed to perform complex image processing algorithms at a high performance. In this research project, our aim is to study the differences in performance of a modern image processing algorithm on these two hardware platforms. In particular, Iris Recognition Systems have recently become an attractive identification method because of their extremely high accuracy. Iris matching, a repeatedly executed portion of a modern iris recognition algorithm, is parallelized on an FPGA system and a Cell processor. We demonstrate a 2.5 times speedup of the parallelized algorithm on the FPGA system when compared to a Cell processor-based version.
An acceleration framework for synthetic aperture radar algorithms
NASA Astrophysics Data System (ADS)
Kim, Youngsoo; Gloster, Clay S.; Alexander, Winser E.
2017-04-01
Algorithms for radar signal processing, such as Synthetic Aperture Radar (SAR) are computationally intensive and require considerable execution time on a general purpose processor. Reconfigurable logic can be used to off-load the primary computational kernel onto a custom computing machine in order to reduce execution time by an order of magnitude as compared to kernel execution on a general purpose processor. Specifically, Field Programmable Gate Arrays (FPGAs) can be used to accelerate these kernels using hardware-based custom logic implementations. In this paper, we demonstrate a framework for algorithm acceleration. We used SAR as a case study to illustrate the potential for algorithm acceleration offered by FPGAs. Initially, we profiled the SAR algorithm and implemented a homomorphic filter using a hardware implementation of the natural logarithm. Experimental results show a linear speedup by adding reasonably small processing elements in Field Programmable Gate Array (FPGA) as opposed to using a software implementation running on a typical general purpose processor.
Multichannel Baseband Processor for Wideband CDMA
NASA Astrophysics Data System (ADS)
Jalloul, Louay M. A.; Lin, Jim
2005-12-01
The system architecture of the cellular base station modem engine (CBME) is described. The CBME is a single-chip multichannel transceiver capable of processing and demodulating signals from multiple users simultaneously. It is optimized to process different classes of code-division multiple-access (CDMA) signals. The paper will show that through key functional system partitioning, tightly coupled small digital signal processing cores, and time-sliced reuse architecture, CBME is able to achieve a high degree of algorithmic flexibility while maintaining efficiency. The paper will also highlight the implementation and verification aspects of the CBME chip design. In this paper, wideband CDMA is used as an example to demonstrate the architecture concept.
An intelligent allocation algorithm for parallel processing
NASA Technical Reports Server (NTRS)
Carroll, Chester C.; Homaifar, Abdollah; Ananthram, Kishan G.
1988-01-01
The problem of allocating nodes of a program graph to processors in a parallel processing architecture is considered. The algorithm is based on critical path analysis, some allocation heuristics, and the execution granularity of nodes in a program graph. These factors, and the structure of interprocessor communication network, influence the allocation. To achieve realistic estimations of the executive durations of allocations, the algorithm considers the fact that nodes in a program graph have to communicate through varying numbers of tokens. Coarse and fine granularities have been implemented, with interprocessor token-communication duration, varying from zero up to values comparable to the execution durations of individual nodes. The effect on allocation of communication network structures is demonstrated by performing allocations for crossbar (non-blocking) and star (blocking) networks. The algorithm assumes the availability of as many processors as it needs for the optimal allocation of any program graph. Hence, the focus of allocation has been on varying token-communication durations rather than varying the number of processors. The algorithm always utilizes as many processors as necessary for the optimal allocation of any program graph, depending upon granularity and characteristics of the interprocessor communication network.
Design and realization of the baseband processor in satellite navigation and positioning receiver
NASA Astrophysics Data System (ADS)
Zhang, Dawei; Hu, Xiulin; Li, Chen
2007-11-01
The content of this paper is focused on the Design and realization of the baseband processor in satellite navigation and positioning receiver. Baseband processor is the most important part of the satellite positioning receiver. The design covers baseband processor's main functions include multi-channel digital signal DDC, acquisition, code tracking, carrier tracking, demodulation, etc. The realization is based on an Altera's FPGA device, that makes the system can be improved and upgraded without modifying the hardware. It embodies the theory of software defined radio (SDR), and puts the theory of the spread spectrum into practice. This paper puts emphasis on the realization of baseband processor in FPGA. In the order of choosing chips, design entry, debugging and synthesis, the flow is presented detailedly. Additionally the paper detailed realization of Digital PLL in order to explain a method of reducing the consumption of FPGA. Finally, the paper presents the result of Synthesis. This design has been used in BD-1, BD-2 and GPS.
NASA Astrophysics Data System (ADS)
Xie, Yiwei; Geng, Zihan; Zhuang, Leimeng; Burla, Maurizio; Taddei, Caterina; Hoekman, Marcel; Leinse, Arne; Roeloffzen, Chris G. H.; Boller, Klaus-J.; Lowery, Arthur J.
2017-12-01
Integrated optical signal processors have been identified as a powerful engine for optical processing of microwave signals. They enable wideband and stable signal processing operations on miniaturized chips with ultimate control precision. As a promising application, such processors enables photonic implementations of reconfigurable radio frequency (RF) filters with wide design flexibility, large bandwidth, and high-frequency selectivity. This is a key technology for photonic-assisted RF front ends that opens a path to overcoming the bandwidth limitation of current digital electronics. Here, the recent progress of integrated optical signal processors for implementing such RF filters is reviewed. We highlight the use of a low-loss, high-index-contrast stoichiometric silicon nitride waveguide which promises to serve as a practical material platform for realizing high-performance optical signal processors and points toward photonic RF filters with digital signal processing (DSP)-level flexibility, hundreds-GHz bandwidth, MHz-band frequency selectivity, and full system integration on a chip scale.
Parallel eigenanalysis of finite element models in a completely connected architecture
NASA Technical Reports Server (NTRS)
Akl, F. A.; Morel, M. R.
1989-01-01
A parallel algorithm is presented for the solution of the generalized eigenproblem in linear elastic finite element analysis, (K)(phi) = (M)(phi)(omega), where (K) and (M) are of order N, and (omega) is order of q. The concurrent solution of the eigenproblem is based on the multifrontal/modified subspace method and is achieved in a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm was successfully implemented on a tightly coupled multiple-instruction multiple-data parallel processing machine, Cray X-MP. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor or to a logical processor (task) if the number of domains exceeds the number of physical processors. The macrotasking library routines are used in mapping each domain to a user task. Computational speed-up and efficiency are used to determine the effectiveness of the algorithm. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts and the dimension of the subspace on the performance of the algorithm are investigated. A parallel finite element dynamic analysis program, p-feda, is documented and the performance of its subroutines in parallel environment is analyzed.
High-Speed Current dq PI Controller for Vector Controlled PMSM Drive
Reaz, Mamun Bin Ibne; Rahman, Labonnah Farzana; Chang, Tae Gyu
2014-01-01
High-speed current controller for vector controlled permanent magnet synchronous motor (PMSM) is presented. The controller is developed based on modular design for faster calculation and uses fixed-point proportional-integral (PI) method for improved accuracy. Current dq controller is usually implemented in digital signal processor (DSP) based computer. However, DSP based solutions are reaching their physical limits, which are few microseconds. Besides, digital solutions suffer from high implementation cost. In this research, the overall controller is realizing in field programmable gate array (FPGA). FPGA implementation of the overall controlling algorithm will certainly trim down the execution time significantly to guarantee the steadiness of the motor. Agilent 16821A Logic Analyzer is employed to validate the result of the implemented design in FPGA. Experimental results indicate that the proposed current dq PI controller needs only 50 ns of execution time in 40 MHz clock, which is the lowest computational cycle for the era. PMID:24574913
Parallel algorithms for boundary value problems
NASA Technical Reports Server (NTRS)
Lin, Avi
1990-01-01
A general approach to solve boundary value problems numerically in a parallel environment is discussed. The basic algorithm consists of two steps: the local step where all the P available processors work in parallel, and the global step where one processor solves a tridiagonal linear system of the order P. The main advantages of this approach are two fold. First, this suggested approach is very flexible, especially in the local step and thus the algorithm can be used with any number of processors and with any of the SIMD or MIMD machines. Secondly, the communication complexity is very small and thus can be used as easily with shared memory machines. Several examples for using this strategy are discussed.
Parallel matrix multiplication on the Connection Machine
NASA Technical Reports Server (NTRS)
Tichy, Walter F.
1988-01-01
Matrix multiplication is a computation and communication intensive problem. Six parallel algorithms for matrix multiplication on the Connection Machine are presented and compared with respect to their performance and processor usage. For n by n matrices, the algorithms have theoretical running times of O(n to the 2nd power log n), O(n log n), O(n), and O(log n), and require n, n to the 2nd power, n to the 2nd power, and n to the 3rd power processors, respectively. With careful attention to communication patterns, the theoretically predicted runtimes can indeed be achieved in practice. The parallel algorithms illustrate the tradeoffs between performance, communication cost, and processor usage.
BPSK Demodulation Using Digital Signal Processing
NASA Technical Reports Server (NTRS)
Garcia, Thomas R.
1996-01-01
A digital communications signal is a sinusoidal waveform that is modified by a binary (digital) information signal. The sinusoidal waveform is called the carrier. The carrier may be modified in amplitude, frequency, phase, or a combination of these. In this project a binary phase shift keyed (BPSK) signal is the communication signal. In a BPSK signal the phase of the carrier is set to one of two states, 180 degrees apart, by a binary (i.e., 1 or 0) information signal. A digital signal is a sampled version of a "real world" time continuous signal. The digital signal is generated by sampling the continuous signal at discrete points in time. The rate at which the signal is sampled is called the sampling rate (f(s)). The device that performs this operation is called an analog-to-digital (A/D) converter or a digitizer. The digital signal is composed of the sequence of individual values of the sampled BPSK signal. Digital signal processing (DSP) is the modification of the digital signal by mathematical operations. A device that performs this processing is called a digital signal processor. After processing, the digital signal may then be converted back to an analog signal using a digital-to-analog (D/A) converter. The goal of this project is to develop a system that will recover the digital information from a BPSK signal using DSP techniques. The project is broken down into the following steps: (1) Development of the algorithms required to demodulate the BPSK signal; (2) Simulation of the system; and (3) Implementation a BPSK receiver using digital signal processing hardware.
Assignment Of Finite Elements To Parallel Processors
NASA Technical Reports Server (NTRS)
Salama, Moktar A.; Flower, Jon W.; Otto, Steve W.
1990-01-01
Elements assigned approximately optimally to subdomains. Mapping algorithm based on simulated-annealing concept used to minimize approximate time required to perform finite-element computation on hypercube computer or other network of parallel data processors. Mapping algorithm needed when shape of domain complicated or otherwise not obvious what allocation of elements to subdomains minimizes cost of computation.
Research on NC motion controller based on SOPC technology
NASA Astrophysics Data System (ADS)
Jiang, Tingbiao; Meng, Biao
2006-11-01
With the rapid development of the digitization and informationization, the application of numerical control technology in the manufacturing industry becomes more and more important. However, the conventional numerical control system usually has some shortcomings such as the poor in system openness, character of real-time, cutability and reconfiguration. In order to solve these problems, this paper investigates the development prospect and advantage of the application in numerical control area with system-on-a-Programmable-Chip (SOPC) technology, and puts forward to a research program approach to the NC controller based on SOPC technology. Utilizing the characteristic of SOPC technology, we integrate high density logic device FPGA, memory SRAM, and embedded processor ARM into a single programmable logic device. We also combine the 32-bit RISC processor with high computing capability of the complicated algorithm with the FPGA device with strong motivable reconfiguration logic control ability. With these steps, we can greatly resolve the defect described in above existing numerical control systems. For the concrete implementation method, we use FPGA chip embedded with ARM hard nuclear processor to construct the control core of the motion controller. We also design the peripheral circuit of the controller according to the requirements of actual control functions, transplant real-time operating system into ARM, design the driver of the peripheral assisted chip, develop the application program to control and configuration of FPGA, design IP core of logic algorithm for various NC motion control to configured it into FPGA. The whole control system uses the concept of modular and structured design to develop hardware and software system. Thus the NC motion controller with the advantage of easily tailoring, highly opening, reconfigurable, and expandable can be implemented.
Multi-processing on supercomputers for computational aerodynamics
NASA Technical Reports Server (NTRS)
Yarrow, Maurice; Mehta, Unmeel B.
1990-01-01
The MIMD concept is applied, through multitasking, with relatively minor modifications to an existing code for a single processor. This approach maps the available memory to multiple processors, exploiting the C-FORTRAN-Unix interface. An existing single processor algorithm is mapped without the need for developing a new algorithm. The procedure of designing a code utilizing this approach is automated with the Unix stream editor. A Multiple Processor Multiple Grid (MPMG) code is developed as a demonstration of this approach. This code solves the three-dimensional, Reynolds-averaged, thin-layer and slender-layer Navier-Stokes equations with an implicit, approximately factored and diagonalized method. This solver is applied to a generic, oblique-wing aircraft problem on a four-processor computer using one process for data management and nonparallel computations and three processes for pseudotime advance on three different grid systems.
Low-cost space-varying FIR filter architecture for computational imaging systems
NASA Astrophysics Data System (ADS)
Feng, Guotong; Shoaib, Mohammed; Schwartz, Edward L.; Dirk Robinson, M.
2010-01-01
Recent research demonstrates the advantage of designing electro-optical imaging systems by jointly optimizing the optical and digital subsystems. The optical systems designed using this joint approach intentionally introduce large and often space-varying optical aberrations that produce blurry optical images. Digital sharpening restores reduced contrast due to these intentional optical aberrations. Computational imaging systems designed in this fashion have several advantages including extended depth-of-field, lower system costs, and improved low-light performance. Currently, most consumer imaging systems lack the necessary computational resources to compensate for these optical systems with large aberrations in the digital processor. Hence, the exploitation of the advantages of the jointly designed computational imaging system requires low-complexity algorithms enabling space-varying sharpening. In this paper, we describe a low-cost algorithmic framework and associated hardware enabling the space-varying finite impulse response (FIR) sharpening required to restore largely aberrated optical images. Our framework leverages the space-varying properties of optical images formed using rotationally-symmetric optical lens elements. First, we describe an approach to leverage the rotational symmetry of the point spread function (PSF) about the optical axis allowing computational savings. Second, we employ a specially designed bank of sharpening filters tuned to the specific radial variation common to optical aberrations. We evaluate the computational efficiency and image quality achieved by using this low-cost space-varying FIR filter architecture.
Instrumentation System Diagnoses a Thermocouple
NASA Technical Reports Server (NTRS)
Perotti, Jose; Santiago, Josephine; Mata, Carlos; Vokrot, Peter; Zavala, Carlos; Burns, Bradley
2008-01-01
An improved self-validating thermocouple (SVT) instrumentation system not only acquires readings from a thermocouple but is also capable of detecting deterioration and a variety of discrete faults in the thermocouple and its lead wires. Prime examples of detectable discrete faults and deterioration include open- and short-circuit conditions and debonding of the thermocouple junction from the object, the temperature of which one seeks to measure. Debonding is the most common cause of errors in thermocouple measurements, but most prior SVT instrumentation systems have not been capable of detecting debonding. The improved SVT instrumentation system includes power circuitry, a cold-junction compensator, signal-conditioning circuitry, pulse-width-modulation (PWM) thermocouple-excitation circuitry, an analog-to-digital converter (ADC), a digital data processor, and a universal serial bus (USB) interface. The system can operate in any of the following three modes: temperature measurement, thermocouple validation, and bonding/debonding detection. The software running in the processor includes components that implement statistical algorithms to evaluate the state of the thermocouple and the instrumentation system. When the power is first turned on, the user can elect to start a diagnosis/ monitoring sequence, in which the PWM is used to estimate the characteristic times corresponding to the correct configuration. The user also has the option of using previous diagnostic values, which are stored in an electrically erasable, programmable read-only memory so that they are available every time the power is turned on.
The GPS Burst Detector W-Sensor
DOE Office of Scientific and Technical Information (OSTI.GOV)
McCrady, D.D.; Phipps, P.
1994-08-01
The NAVSTAR satellites have two missions: navigation and nuclear detonation detection. The main objective of this paper is to describe one of the key elements of the Nuclear Detonation Detection System (NDS), the Burst Detector W-Sensor (BDW) that was developed for the Air Force Space and Missle Systems Center, its mission on GPS Block IIR, and how it utilizes GPS timing signals to precisely locate nuclear detonations (NUDET). The paper will also cover the interface to the Burst Detector Processor (BDP) which links the BDW to the ground station where the BDW is controlled and where data from multiple satellitesmore » are processed to determine the location of the NUDET. The Block IIR BDW is the culmination of a development program that has produced a state-of-the-art, space qualified digital receiver/processor that dissipates only 30 Watts, weighs 57 pounds, and has a 12in. {times} l4.2in. {times} 7.16in. footprint. The paper will highlight several of the key multilayer printed circuit cards without which the required power, weight, size, and radiation requirements could not have been met. In addition, key functions of the system software will be covered. The paper will be concluded with a discussion of the high speed digital signal processing and algorithm used to determine the time-of-arrival (TOA) of the electromagnetic pulse (EMP) from the NUDET.« less
Demonstration of two-qubit algorithms with a superconducting quantum processor.
DiCarlo, L; Chow, J M; Gambetta, J M; Bishop, Lev S; Johnson, B R; Schuster, D I; Majer, J; Blais, A; Frunzio, L; Girvin, S M; Schoelkopf, R J
2009-07-09
Quantum computers, which harness the superposition and entanglement of physical states, could outperform their classical counterparts in solving problems with technological impact-such as factoring large numbers and searching databases. A quantum processor executes algorithms by applying a programmable sequence of gates to an initialized register of qubits, which coherently evolves into a final state containing the result of the computation. Building a quantum processor is challenging because of the need to meet simultaneously requirements that are in conflict: state preparation, long coherence times, universal gate operations and qubit readout. Processors based on a few qubits have been demonstrated using nuclear magnetic resonance, cold ion trap and optical systems, but a solid-state realization has remained an outstanding challenge. Here we demonstrate a two-qubit superconducting processor and the implementation of the Grover search and Deutsch-Jozsa quantum algorithms. We use a two-qubit interaction, tunable in strength by two orders of magnitude on nanosecond timescales, which is mediated by a cavity bus in a circuit quantum electrodynamics architecture. This interaction allows the generation of highly entangled states with concurrence up to 94 per cent. Although this processor constitutes an important step in quantum computing with integrated circuits, continuing efforts to increase qubit coherence times, gate performance and register size will be required to fulfil the promise of a scalable technology.
Scheduling time-critical graphics on multiple processors
NASA Technical Reports Server (NTRS)
Meyer, Tom W.; Hughes, John F.
1995-01-01
This paper describes an algorithm for the scheduling of time-critical rendering and computation tasks on single- and multiple-processor architectures, with minimal pipelining. It was developed to manage scientific visualization scenes consisting of hundreds of objects, each of which can be computed and displayed at thousands of possible resolution levels. The algorithm generates the time-critical schedule using progressive-refinement techniques; it always returns a feasible schedule and, when allowed to run to completion, produces a near-optimal schedule which takes advantage of almost the entire multiple-processor system.
VLSI implementation of a new LMS-based algorithm for noise removal in ECG signal
NASA Astrophysics Data System (ADS)
Satheeskumaran, S.; Sabrigiriraj, M.
2016-06-01
Least mean square (LMS)-based adaptive filters are widely deployed for removing artefacts in electrocardiogram (ECG) due to less number of computations. But they posses high mean square error (MSE) under noisy environment. The transform domain variable step-size LMS algorithm reduces the MSE at the cost of computational complexity. In this paper, a variable step-size delayed LMS adaptive filter is used to remove the artefacts from the ECG signal for improved feature extraction. The dedicated digital Signal processors provide fast processing, but they are not flexible. By using field programmable gate arrays, the pipelined architectures can be used to enhance the system performance. The pipelined architecture can enhance the operation efficiency of the adaptive filter and save the power consumption. This technique provides high signal-to-noise ratio and low MSE with reduced computational complexity; hence, it is a useful method for monitoring patients with heart-related problem.
Note: Ultrasonic gas flowmeter based on optimized time-of-flight algorithms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, X. F.; Tang, Z. A.
2011-04-15
A new digital signal processor based single path ultrasonic gas flowmeter is designed, constructed, and experimentally tested. To achieve high accuracy measurements, an optimized ultrasound driven method of incorporation of the amplitude modulation and the phase modulation of the transmit-receive technique is used to stimulate the transmitter. Based on the regularities among the received envelope zero-crossings, different received signal's signal-to-noise ratio situations are discriminated and optional time-of-flight algorithms are applied to take flow rate calculations. Experimental results from the dry calibration indicate that the designed flowmeter prototype can meet the zero-flow verification test requirements of the American Gas Association Reportmore » No. 9. Furthermore, the results derived from the flow calibration prove that the proposed flowmeter prototype can measure flow rate accurately in the practical experiments, and the nominal accuracies after FWME adjustment are lower than 0.8% throughout the calibration range.« less
Moving Object Detection Using Scanning Camera on a High-Precision Intelligent Holder
Chen, Shuoyang; Xu, Tingfa; Li, Daqun; Zhang, Jizhou; Jiang, Shenwang
2016-01-01
During the process of moving object detection in an intelligent visual surveillance system, a scenario with complex background is sure to appear. The traditional methods, such as “frame difference” and “optical flow”, may not able to deal with the problem very well. In such scenarios, we use a modified algorithm to do the background modeling work. In this paper, we use edge detection to get an edge difference image just to enhance the ability of resistance illumination variation. Then we use a “multi-block temporal-analyzing LBP (Local Binary Pattern)” algorithm to do the segmentation. In the end, a connected component is used to locate the object. We also produce a hardware platform, the core of which consists of the DSP (Digital Signal Processor) and FPGA (Field Programmable Gate Array) platforms and the high-precision intelligent holder. PMID:27775671
A hierarchical, automated target recognition algorithm for a parallel analog processor
NASA Technical Reports Server (NTRS)
Woodward, Gail; Padgett, Curtis
1997-01-01
A hierarchical approach is described for an automated target recognition (ATR) system, VIGILANTE, that uses a massively parallel, analog processor (3DANN). The 3DANN processor is capable of performing 64 concurrent inner products of size 1x4096 every 250 nanoseconds.
Self-Calibrating and Remote Programmable Signal Conditioning Amplifier System and Method
NASA Technical Reports Server (NTRS)
Medelius, Pedro J. (Inventor); Hallberg, Carl G. (Inventor); Simpson, Howard J., III (Inventor); Thayer, Stephen W. (Inventor)
1998-01-01
A self-calibrating, remote programmable signal conditioning amplifier system employs information read from a memory attached to a measurement transducer for automatic calibration. The signal conditioning amplifier is self-calibrated on a continuous basis through use of a dual input path arrangement, with each path containing a multiplexer and a programmable amplifier. A digital signal processor controls operation of the system such that a transducer signal is applied to one of the input paths, while one or more calibration signals are applied to the second input path. Once the second path is calibrated, the digital signal processor switches the transducer signal to the second path. and then calibrates the first path. This process is continually repeated so that each path is calibrated on an essentially continuous basis. Dual output paths are also employed which are calibrated in the same manner. The digital signal processor also allows the implementation of a variety of digital filters which are either programmed into the system or downloaded by an operator, and performs up to eighth order linearization.
Potential of minicomputer/array-processor system for nonlinear finite-element analysis
NASA Technical Reports Server (NTRS)
Strohkorb, G. A.; Noor, A. K.
1983-01-01
The potential of using a minicomputer/array-processor system for the efficient solution of large-scale, nonlinear, finite-element problems is studied. A Prime 750 is used as the host computer, and a software simulator residing on the Prime is employed to assess the performance of the Floating Point Systems AP-120B array processor. Major hardware characteristics of the system such as virtual memory and parallel and pipeline processing are reviewed, and the interplay between various hardware components is examined. Effective use of the minicomputer/array-processor system for nonlinear analysis requires the following: (1) proper selection of the computational procedure and the capability to vectorize the numerical algorithms; (2) reduction of input-output operations; and (3) overlapping host and array-processor operations. A detailed discussion is given of techniques to accomplish each of these tasks. Two benchmark problems with 1715 and 3230 degrees of freedom, respectively, are selected to measure the anticipated gain in speed obtained by using the proposed algorithms on the array processor.
Real-time Enhancement, Registration, and Fusion for a Multi-Sensor Enhanced Vision System
NASA Technical Reports Server (NTRS)
Hines, Glenn D.; Rahman, Zia-ur; Jobson, Daniel J.; Woodell, Glenn A.
2006-01-01
Over the last few years NASA Langley Research Center (LaRC) has been developing an Enhanced Vision System (EVS) to aid pilots while flying in poor visibility conditions. The EVS captures imagery using two infrared video cameras. The cameras are placed in an enclosure that is mounted and flown forward-looking underneath the NASA LaRC ARIES 757 aircraft. The data streams from the cameras are processed in real-time and displayed on monitors on-board the aircraft. With proper processing the camera system can provide better-than- human-observed imagery particularly during poor visibility conditions. However, to obtain this goal requires several different stages of processing including enhancement, registration, and fusion, and specialized processing hardware for real-time performance. We are using a real-time implementation of the Retinex algorithm for image enhancement, affine transformations for registration, and weighted sums to perform fusion. All of the algorithms are executed on a single TI DM642 digital signal processor (DSP) clocked at 720 MHz. The image processing components were added to the EVS system, tested, and demonstrated during flight tests in August and September of 2005. In this paper we briefly discuss the EVS image processing hardware and algorithms. We then discuss implementation issues and show examples of the results obtained during flight tests. Keywords: enhanced vision system, image enhancement, retinex, digital signal processing, sensor fusion
Competitive Parallel Processing For Compression Of Data
NASA Technical Reports Server (NTRS)
Diner, Daniel B.; Fender, Antony R. H.
1990-01-01
Momentarily-best compression algorithm selected. Proposed competitive-parallel-processing system compresses data for transmission in channel of limited band-width. Likely application for compression lies in high-resolution, stereoscopic color-television broadcasting. Data from information-rich source like color-television camera compressed by several processors, each operating with different algorithm. Referee processor selects momentarily-best compressed output.
Managing coherence via put/get windows
Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton on Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Hoenicke, Dirk [Ossining, NY; Ohmacht, Martin [Yorktown Heights, NY
2011-01-11
A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.
Managing coherence via put/get windows
Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton on Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Hoenicke, Dirk [Ossining, NY; Ohmacht, Martin [Yorktown Heights, NY
2012-02-21
A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.
Parallel conjugate gradient algorithms for manipulator dynamic simulation
NASA Technical Reports Server (NTRS)
Fijany, Amir; Scheld, Robert E.
1989-01-01
Parallel conjugate gradient algorithms for the computation of multibody dynamics are developed for the specialized case of a robot manipulator. For an n-dimensional positive-definite linear system, the Classical Conjugate Gradient (CCG) algorithms are guaranteed to converge in n iterations, each with a computation cost of O(n); this leads to a total computational cost of O(n sq) on a serial processor. A conjugate gradient algorithms is presented that provide greater efficiency using a preconditioner, which reduces the number of iterations required, and by exploiting parallelism, which reduces the cost of each iteration. Two Preconditioned Conjugate Gradient (PCG) algorithms are proposed which respectively use a diagonal and a tridiagonal matrix, composed of the diagonal and tridiagonal elements of the mass matrix, as preconditioners. Parallel algorithms are developed to compute the preconditioners and their inversions in O(log sub 2 n) steps using n processors. A parallel algorithm is also presented which, on the same architecture, achieves the computational time of O(log sub 2 n) for each iteration. Simulation results for a seven degree-of-freedom manipulator are presented. Variants of the proposed algorithms are also developed which can be efficiently implemented on the Robot Mathematics Processor (RMP).
Treecode with a Special-Purpose Processor
NASA Astrophysics Data System (ADS)
Makino, Junichiro
1991-08-01
We describe an implementation of the modified Barnes-Hut tree algorithm for a gravitational N-body calculation on a GRAPE (GRAvity PipE) backend processor. GRAPE is a special-purpose computer for N-body calculations. It receives the positions and masses of particles from a host computer and then calculates the gravitational force at each coordinate specified by the host. To use this GRAPE processor with the hierarchical tree algorithm, the host computer must maintain a list of all nodes that exert force on a particle. If we create this list for each particle of the system at each timestep, the number of floating-point operations on the host and that on GRAPE would become comparable, and the increased speed obtained by using GRAPE would be small. In our modified algorithm, we create a list of nodes for many particles. Thus, the amount of the work required of the host is significantly reduced. This algorithm was originally developed by Barnes in order to vectorize the force calculation on a Cyber 205. With this algorithm, the computing time of the force calculation becomes comparable to that of the tree construction, if the GRAPE backend processor is sufficiently fast. The obtained speed-up factor is 30 to 50 for a RISC-based host computer and GRAPE-1A with a peak speed of 240 Mflops.
Methods and systems for providing reconfigurable and recoverable computing resources
NASA Technical Reports Server (NTRS)
Stange, Kent (Inventor); Hess, Richard (Inventor); Kelley, Gerald B (Inventor); Rogers, Randy (Inventor)
2010-01-01
A method for optimizing the use of digital computing resources to achieve reliability and availability of the computing resources is disclosed. The method comprises providing one or more processors with a recovery mechanism, the one or more processors executing one or more applications. A determination is made whether the one or more processors needs to be reconfigured. A rapid recovery is employed to reconfigure the one or more processors when needed. A computing system that provides reconfigurable and recoverable computing resources is also disclosed. The system comprises one or more processors with a recovery mechanism, with the one or more processors configured to execute a first application, and an additional processor configured to execute a second application different than the first application. The additional processor is reconfigurable with rapid recovery such that the additional processor can execute the first application when one of the one more processors fails.
An efficient parallel-processing method for transposing large matrices in place.
Portnoff, M R
1999-01-01
We have developed an efficient algorithm for transposing large matrices in place. The algorithm is efficient because data are accessed either sequentially in blocks or randomly within blocks small enough to fit in cache, and because the same indexing calculations are shared among identical procedures operating on independent subsets of the data. This inherent parallelism makes the method well suited for a multiprocessor computing environment. The algorithm is easy to implement because the same two procedures are applied to the data in various groupings to carry out the complete transpose operation. Using only a single processor, we have demonstrated nearly an order of magnitude increase in speed over the previously published algorithm by Gate and Twigg for transposing a large rectangular matrix in place. With multiple processors operating in parallel, the processing speed increases almost linearly with the number of processors. A simplified version of the algorithm for square matrices is presented as well as an extension for matrices large enough to require virtual memory.
NASA Astrophysics Data System (ADS)
The present conference on global telecommunications discusses topics in the fields of Integrated Services Digital Network (ISDN) technology field trial planning and results to date, motion video coding, ISDN networking, future network communications security, flexible and intelligent voice/data networks, Asian and Pacific lightwave and radio systems, subscriber radio systems, the performance of distributed systems, signal processing theory, satellite communications modulation and coding, and terminals for the handicapped. Also discussed are knowledge-based technologies for communications systems, future satellite transmissions, high quality image services, novel digital signal processors, broadband network access interface, traffic engineering for ISDN design and planning, telecommunications software, coherent optical communications, multimedia terminal systems, advanced speed coding, portable and mobile radio communications, multi-Gbit/second lightwave transmission systems, enhanced capability digital terminals, communications network reliability, advanced antimultipath fading techniques, undersea lightwave transmission, image coding, modulation and synchronization, adaptive signal processing, integrated optical devices, VLSI technologies for ISDN, field performance of packet switching, CSMA protocols, optical transport system architectures for broadband ISDN, mobile satellite communications, indoor wireless communication, echo cancellation in communications, and distributed network algorithms.
Embedded Palmprint Recognition System Using OMAP 3530
Shen, Linlin; Wu, Shipei; Zheng, Songhao; Ji, Zhen
2012-01-01
We have proposed in this paper an embedded palmprint recognition system using the dual-core OMAP 3530 platform. An improved algorithm based on palm code was proposed first. In this method, a Gabor wavelet is first convolved with the palmprint image to produce a response image, where local binary patterns are then applied to code the relation among the magnitude of wavelet response at the ccentral pixel with that of its neighbors. The method is fully tested using the public PolyU palmprint database. While palm code achieves only about 89% accuracy, over 96% accuracy is achieved by the proposed G-LBP approach. The proposed algorithm was then deployed to the DSP processor of OMAP 3530 and work together with the ARM processor for feature extraction. When complicated algorithms run on the DSP processor, the ARM processor can focus on image capture, user interface and peripheral control. Integrated with an image sensing module and central processing board, the designed device can achieve accurate and real time performance. PMID:22438721
Embedded palmprint recognition system using OMAP 3530.
Shen, Linlin; Wu, Shipei; Zheng, Songhao; Ji, Zhen
2012-01-01
We have proposed in this paper an embedded palmprint recognition system using the dual-core OMAP 3530 platform. An improved algorithm based on palm code was proposed first. In this method, a Gabor wavelet is first convolved with the palmprint image to produce a response image, where local binary patterns are then applied to code the relation among the magnitude of wavelet response at the central pixel with that of its neighbors. The method is fully tested using the public PolyU palmprint database. While palm code achieves only about 89% accuracy, over 96% accuracy is achieved by the proposed G-LBP approach. The proposed algorithm was then deployed to the DSP processor of OMAP 3530 and work together with the ARM processor for feature extraction. When complicated algorithms run on the DSP processor, the ARM processor can focus on image capture, user interface and peripheral control. Integrated with an image sensing module and central processing board, the designed device can achieve accurate and real time performance.
System and method for resolving gamma-ray spectra
Gentile, Charles A.; Perry, Jason; Langish, Stephen W.; Silber, Kenneth; Davis, William M.; Mastrovito, Dana
2010-05-04
A system for identifying radionuclide emissions is described. The system includes at least one processor for processing output signals from a radionuclide detecting device, at least one training algorithm run by the at least one processor for analyzing data derived from at least one set of known sample data from the output signals, at least one classification algorithm derived from the training algorithm for classifying unknown sample data, wherein the at least one training algorithm analyzes the at least one sample data set to derive at least one rule used by said classification algorithm for identifying at least one radionuclide emission detected by the detecting device.
Adaptive Load-Balancing Algorithms using Symmetric Broadcast Networks
NASA Technical Reports Server (NTRS)
Das, Sajal K.; Harvey, Daniel J.; Biswas, Rupak; Biegel, Bryan A. (Technical Monitor)
2002-01-01
In a distributed computing environment, it is important to ensure that the processor workloads are adequately balanced, Among numerous load-balancing algorithms, a unique approach due to Das and Prasad defines a symmetric broadcast network (SBN) that provides a robust communication pattern among the processors in a topology-independent manner. In this paper, we propose and analyze three efficient SBN-based dynamic load-balancing algorithms, and implement them on an SGI Origin2000. A thorough experimental study with Poisson distributed synthetic loads demonstrates that our algorithms are effective in balancing system load. By optimizing completion time and idle time, the proposed algorithms are shown to compare favorably with several existing approaches.
Performance Modeling of the ADA Rendezvous
1991-10-01
queueing network of figure 2, SERVERTASK can complete only one rendezvous at a time. Thus, the rate that the rendezvous requests are processed at the... Network 1, SERVERTASK competes with the traffic tasks of Server Processor. Each time SERVERTASK gains access to the processor, SERVERTASK completes...Client Processor Server Processor Software Server Nek Netork2 Figure 10. A conceptualization of the algorithm. The SERVERTASK software server of Network 2
Jiang, Chao; Zhang, Hongyan; Wang, Jia; Wang, Yaru; He, Heng; Liu, Rui; Zhou, Fangyuan; Deng, Jialiang; Li, Pengcheng; Luo, Qingming
2011-11-01
Laser speckle imaging (LSI) is a noninvasive and full-field optical imaging technique which produces two-dimensional blood flow maps of tissues from the raw laser speckle images captured by a CCD camera without scanning. We present a hardware-friendly algorithm for the real-time processing of laser speckle imaging. The algorithm is developed and optimized specifically for LSI processing in the field programmable gate array (FPGA). Based on this algorithm, we designed a dedicated hardware processor for real-time LSI in FPGA. The pipeline processing scheme and parallel computing architecture are introduced into the design of this LSI hardware processor. When the LSI hardware processor is implemented in the FPGA running at the maximum frequency of 130 MHz, up to 85 raw images with the resolution of 640×480 pixels can be processed per second. Meanwhile, we also present a system on chip (SOC) solution for LSI processing by integrating the CCD controller, memory controller, LSI hardware processor, and LCD display controller into a single FPGA chip. This SOC solution also can be used to produce an application specific integrated circuit for LSI processing.
1993-01-01
Deoxyribose nucleicacid DPP: Digital Post-Processor DREO Detence Research Establishment Ottawa RF: Radio Frequency TeO2 : tellurium dioxide TIC: Time... TeO2 is 620 m/s, a device with a 100-As aperture device is 62-mm long. To take advantage of the full interaction time of these Bragg cells, the whole...INCLUDED IN THE DIGITAL POST-PROCESSOR HARDWARE Characteristics of Bandwidth Center Frequency Bragg Cell glass (bulk 100 MHz 150 MHz interaction) iNbO3
Multimedia systems in ultrasound image boundary detection and measurements
NASA Astrophysics Data System (ADS)
Pathak, Sayan D.; Chalana, Vikram; Kim, Yongmin
1997-05-01
Ultrasound as a medical imaging modality offers the clinician a real-time of the anatomy of the internal organs/tissues, their movement, and flow noninvasively. One of the applications of ultrasound is to monitor fetal growth by measuring biparietal diameter (BPD) and head circumference (HC). We have been working on automatic detection of fetal head boundaries in ultrasound images. These detected boundaries are used to measure BPD and HC. The boundary detection algorithm is based on active contour models and takes 32 seconds on an external high-end workstation, SUN SparcStation 20/71. Our goal has been to make this tool available within an ultrasound machine and at the same time significantly improve its performance utilizing multimedia technology. With the advent of high- performance programmable digital signal processors (DSP), the software solution within an ultrasound machine instead of the traditional hardwired approach or requiring an external computer is now possible. We have integrated our boundary detection algorithm into a programmable ultrasound image processor (PUIP) that fits into a commercial ultrasound machine. The PUIP provides both the high computing power and flexibility needed to support computationally-intensive image processing algorithms within an ultrasound machine. According to our data analysis, BPD/HC measurements made on PUIP lie within the interobserver variability. Hence, the errors in the automated BPD/HC measurements using the algorithm are on the same order as the average interobserver differences. On PUIP, it takes 360 ms to measure the values of BPD/HC on one head image. When processing multiple head images in sequence, it takes 185 ms per image, thus enabling 5.4 BPD/HC measurements per second. Reduction in the overall execution time from 32 seconds to a fraction of a second and making this multimedia system available within an ultrasound machine will help this image processing algorithm and other computer-intensive imaging applications become a practical tool for the sonographers in the feature.
The precision-processing subsystem for the Earth Resources Technology Satellite.
NASA Technical Reports Server (NTRS)
Chapelle, W. E.; Bybee, J. E.; Bedross, G. M.
1972-01-01
Description of the precision processor, a subsystem in the image-processing system for the Earth Resources Technology Satellite (ERTS). This processor is a special-purpose image-measurement and printing system, designed to process user-selected bulk images to produce 1:1,000,000-scale film outputs and digital image data, presented in a Universal-Transverse-Mercator (UTM) projection. The system will remove geometric and radiometric errors introduced by the ERTS multispectral sensors and by the bulk-processor electron-beam recorder. The geometric transformations required for each input scene are determined by resection computations based on reseau measurements and image comparisons with a special ground-control base contained within the system; the images are then printed and digitized by electronic image-transfer techniques.
NASA Astrophysics Data System (ADS)
Chen, Ming-Chih; Hsiao, Shen-Fu
In this paper, we propose an area-efficient design of Advanced Encryption Standard (AES) processor by applying a new common-expression-elimination (CSE) method to the sub-functions of various transformations required in AES. The proposed method reduces the area cost of realizing the sub-functions by extracting the common factors in the bit-level XOR/AND-based sum-of-product expressions of these sub-functions using a new CSE algorithm. Cell-based implementation results show that the AES processor with our proposed CSE method has significant area improvement compared with previous designs.
Real-time separation of multineuron recordings with a DSP32C signal processor.
Gädicke, R; Albus, K
1995-04-01
We have developed a hardware and software package for real-time discrimination of multiple-unit activities recorded simultaneously from multiple microelectrodes using a VME-Bus system. Compared with other systems cited in literature or commercially available, our system has the following advantages. (1) Each electrode is served by its own preprocessor (DSP32C); (2) On-line spike discrimination is performed independently for each electrode. (3) The VME-bus allows processing of data received from 16 electrodes. The digitized (62.5 kHz) spike form is itself used as the model spike; the algorithm allows for comparing and sorting complete wave forms in real time into 8 different models per electrode.
Wideband aperture array using RF channelizers and massively parallel digital 2D IIR filterbank
NASA Astrophysics Data System (ADS)
Sengupta, Arindam; Madanayake, Arjuna; Gómez-García, Roberto; Engeberg, Erik D.
2014-05-01
Wideband receive-mode beamforming applications in wireless location, electronically-scanned antennas for radar, RF sensing, microwave imaging and wireless communications require digital aperture arrays that offer a relatively constant far-field beam over several octaves of bandwidth. Several beamforming schemes including the well-known true time-delay and the phased array beamformers have been realized using either finite impulse response (FIR) or fast Fourier transform (FFT) digital filter-sum based techniques. These beamforming algorithms offer the desired selectivity at the cost of a high computational complexity and frequency-dependant far-field array patterns. A novel approach to receiver beamforming is the use of massively parallel 2-D infinite impulse response (IIR) fan filterbanks for the synthesis of relatively frequency independent RF beams at an order of magnitude lower multiplier complexity compared to FFT or FIR filter based conventional algorithms. The 2-D IIR filterbanks demand fast digital processing that can support several octaves of RF bandwidth, fast analog-to-digital converters (ADCs) for RF-to-bits type direct conversion of wideband antenna element signals. Fast digital implementation platforms that can realize high-precision recursive filter structures necessary for real-time beamforming, at RF radio bandwidths, are also desired. We propose a novel technique that combines a passive RF channelizer, multichannel ADC technology, and single-phase massively parallel 2-D IIR digital fan filterbanks, realized at low complexity using FPGA and/or ASIC technology. There exists native support for a larger bandwidth than the maximum clock frequency of the digital implementation technology. We also strive to achieve More-than-Moore throughput by processing a wideband RF signal having content with N-fold (B = N Fclk/2) bandwidth compared to the maximum clock frequency Fclk Hz of the digital VLSI platform under consideration. Such increase in bandwidth is achieved without use of polyphase signal processing or time-interleaved ADC methods. That is, all digital processors operate at the same Fclk clock frequency without phasing, while wideband operation is achieved by sub-sampling of narrower sub-bands at the the RF channelizer outputs.
NASA Astrophysics Data System (ADS)
Takeda, Sawako; Tashiro, Makoto S.; Ishisaki, Yoshitaka; Tsujimoto, Masahiro; Seta, Hiromi; Shimoda, Yuya; Yamaguchi, Sunao; Uehara, Sho; Terada, Yukikatsu; Fujimoto, Ryuichi; Mitsuda, Kazuhisa
2014-07-01
The soft X-ray spectrometer (SXS) aboard ASTRO-H is equipped with dedicated digital signal processing units called pulse shape processors (PSPs). The X-ray microcalorimeter system SXS has 36 sensor pixels, which are operated at 50 mK to measure heat input of X-ray photons and realize an energy resolution of 7 eV FWHM in the range 0.3-12.0 keV. Front-end signal processing electronics are used to filter and amplify the electrical pulse output from the sensor and for analog-to-digital conversion. The digitized pulses from the 36 pixels are multiplexed and are sent to the PSP over low-voltage differential signaling lines. Each of two identical PSP units consists of an FPGA board, which assists the hardware logic, and two CPU boards, which assist the onboard software. The FPGA board triggers at every pixel event and stores the triggering information as a pulse waveform in the installed memory. The CPU boards read the event data to evaluate pulse heights by an optimal filtering algorithm. The evaluated X-ray photon data (including the pixel ID, energy, and arrival time information) are transferred to the satellite data recorder along with event quality information. The PSP units have been developed and tested with the engineering model (EM) and the flight model. Utilizing the EM PSP, we successfully verified the entire hardware system and the basic software design of the PSPs, including their communication capability and signal processing performance. In this paper, we show the key metrics of the EM test, such as accuracy and synchronicity of sampling clocks, event grading capability, and resultant energy resolution.
The automatic control system and stand-by facilities of the TDMA-40 equipment
NASA Astrophysics Data System (ADS)
Gudenko, D. V.; Pankov, G. Kh.; Pauk, A. G.; Tsirlin, V. M.
1980-10-01
When a controlling station in a satellite communications system is out of order, a complex algorithm must be carried out for automatic operation of the stand-by equipment. A processor has been developed to perform this algorithm, as well as operations involving the stand-by facilities of the receiving-transmitting equipment of the station. The design principles and solutions to problems in developing the equipment for the monitoring and controlling systems are described. These systems are based on multistation access using time division multiplexing. Algorithms are presented for the operation of the synchronizing processor and the control processor of the equipment. The automatic control system and stand-by facilities make it possible to reduce the service personnel and to design an unattended station.
Optical systolic solutions of linear algebraic equations
NASA Technical Reports Server (NTRS)
Neuman, C. P.; Casasent, D.
1984-01-01
The philosophy and data encoding possible in systolic array optical processor (SAOP) were reviewed. The multitude of linear algebraic operations achievable on this architecture is examined. These operations include such linear algebraic algorithms as: matrix-decomposition, direct and indirect solutions, implicit and explicit methods for partial differential equations, eigenvalue and eigenvector calculations, and singular value decomposition. This architecture can be utilized to realize general techniques for solving matrix linear and nonlinear algebraic equations, least mean square error solutions, FIR filters, and nested-loop algorithms for control engineering applications. The data flow and pipelining of operations, design of parallel algorithms and flexible architectures, application of these architectures to computationally intensive physical problems, error source modeling of optical processors, and matching of the computational needs of practical engineering problems to the capabilities of optical processors are emphasized.
Simplifying and speeding the management of intra-node cache coherence
Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton on Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Phillip [Cortlandt Manor, NY; Hoenicke, Dirk [Ossining, NY; Ohmacht, Martin [Yorktown Heights, NY
2012-04-17
A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.
Managing coherence via put/get windows
DOE Office of Scientific and Technical Information (OSTI.GOV)
Blumrich, Matthias A; Chen, Dong; Coteus, Paul W
A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an areamore » of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.« less
Parallel processing approach to transform-based image coding
NASA Astrophysics Data System (ADS)
Normile, James O.; Wright, Dan; Chu, Ken; Yeh, Chia L.
1991-06-01
This paper describes a flexible parallel processing architecture designed for use in real time video processing. The system consists of floating point DSP processors connected to each other via fast serial links, each processor has access to a globally shared memory. A multiple bus architecture in combination with a dual ported memory allows communication with a host control processor. The system has been applied to prototyping of video compression and decompression algorithms. The decomposition of transform based algorithms for decompression into a form suitable for parallel processing is described. A technique for automatic load balancing among the processors is developed and discussed, results ar presented with image statistics and data rates. Finally techniques for accelerating the system throughput are analyzed and results from the application of one such modification described.
The CSM testbed matrix processors internal logic and dataflow descriptions
NASA Technical Reports Server (NTRS)
Regelbrugge, Marc E.; Wright, Mary A.
1988-01-01
This report constitutes the final report for subtask 1 of Task 5 of NASA Contract NAS1-18444, Computational Structural Mechanics (CSM) Research. This report contains a detailed description of the coded workings of selected CSM Testbed matrix processors (i.e., TOPO, K, INV, SSOL) and of the arithmetic utility processor AUS. These processors and the current sparse matrix data structures are studied and documented. Items examined include: details of the data structures, interdependence of data structures, data-blocking logic in the data structures, processor data flow and architecture, and processor algorithmic logic flow.
A parallel algorithm for computing the eigenvalues of a symmetric tridiagonal matrix
NASA Technical Reports Server (NTRS)
Swarztrauber, Paul N.
1993-01-01
A parallel algorithm, called polysection, is presented for computing the eigenvalues of a symmetric tridiagonal matrix. The method is based on a quadratic recurrence in which the characteristic polynomial is constructed on a binary tree from polynomials whose degree doubles at each level. Intervals that contain exactly one zero are determined by the zeros of polynomials at the previous level which ensures that different processors compute different zeros. The signs of the polynomials at the interval endpoints are determined a priori and used to guarantee that all zeros are found. The use of finite-precision arithmetic may result in multiple zeros; however, in this case, the intervals coalesce and their number determines exactly the multiplicity of the zero. For an N x N matrix the eigenvalues can be determined in O(log-squared N) time with N-squared processors and O(N) time with N processors. The method is compared with a parallel variant of bisection that requires O(N-squared) time on a single processor, O(N) time with N processors, and O(log N) time with N-squared processors.
A Parallel Compact Multi-Dimensional Numerical Algorithm with Aeroacoustics Applications
NASA Technical Reports Server (NTRS)
Povitsky, Alex; Morris, Philip J.
1999-01-01
In this study we propose a novel method to parallelize high-order compact numerical algorithms for the solution of three-dimensional PDEs (Partial Differential Equations) in a space-time domain. For this numerical integration most of the computer time is spent in computation of spatial derivatives at each stage of the Runge-Kutta temporal update. The most efficient direct method to compute spatial derivatives on a serial computer is a version of Gaussian elimination for narrow linear banded systems known as the Thomas algorithm. In a straightforward pipelined implementation of the Thomas algorithm processors are idle due to the forward and backward recurrences of the Thomas algorithm. To utilize processors during this time, we propose to use them for either non-local data independent computations, solving lines in the next spatial direction, or local data-dependent computations by the Runge-Kutta method. To achieve this goal, control of processor communication and computations by a static schedule is adopted. Thus, our parallel code is driven by a communication and computation schedule instead of the usual "creative, programming" approach. The obtained parallelization speed-up of the novel algorithm is about twice as much as that for the standard pipelined algorithm and close to that for the explicit DRP algorithm.
A unified approach to VLSI layout automation and algorithm mapping on processor arrays
NASA Technical Reports Server (NTRS)
Venkateswaran, N.; Pattabiraman, S.; Srinivasan, Vinoo N.
1993-01-01
Development of software tools for designing supercomputing systems is highly complex and cost ineffective. To tackle this a special purpose PAcube silicon compiler which integrates different design levels from cell to processor arrays has been proposed. As a part of this, we present in this paper a novel methodology which unifies the problems of Layout Automation and Algorithm Mapping.
Two-dimensional optoelectronic interconnect-processor and its operational bit error rate
NASA Astrophysics Data System (ADS)
Liu, J. Jiang; Gollsneider, Brian; Chang, Wayne H.; Carhart, Gary W.; Vorontsov, Mikhail A.; Simonis, George J.; Shoop, Barry L.
2004-10-01
Two-dimensional (2-D) multi-channel 8x8 optical interconnect and processor system were designed and developed using complementary metal-oxide-semiconductor (CMOS) driven 850-nm vertical-cavity surface-emitting laser (VCSEL) arrays and the photodetector (PD) arrays with corresponding wavelengths. We performed operation and bit-error-rate (BER) analysis on this free-space integrated 8x8 VCSEL optical interconnects driven by silicon-on-sapphire (SOS) circuits. Pseudo-random bit stream (PRBS) data sequence was used in operation of the interconnects. Eye diagrams were measured from individual channels and analyzed using a digital oscilloscope at data rates from 155 Mb/s to 1.5 Gb/s. Using a statistical model of Gaussian distribution for the random noise in the transmission, we developed a method to compute the BER instantaneously with the digital eye-diagrams. Direct measurements on this interconnects were also taken on a standard BER tester for verification. We found that the results of two methods were in the same order and within 50% accuracy. The integrated interconnects were investigated in an optoelectronic processing architecture of digital halftoning image processor. Error diffusion networks implemented by the inherently parallel nature of photonics promise to provide high quality digital halftoned images.
Low-power wearable respiratory sound sensing.
Oletic, Dinko; Arsenali, Bruno; Bilas, Vedran
2014-04-09
Building upon the findings from the field of automated recognition of respiratory sound patterns, we propose a wearable wireless sensor implementing on-board respiratory sound acquisition and classification, to enable continuous monitoring of symptoms, such as asthmatic wheezing. Low-power consumption of such a sensor is required in order to achieve long autonomy. Considering that the power consumption of its radio is kept minimal if transmitting only upon (rare) occurrences of wheezing, we focus on optimizing the power consumption of the digital signal processor (DSP). Based on a comprehensive review of asthmatic wheeze detection algorithms, we analyze the computational complexity of common features drawn from short-time Fourier transform (STFT) and decision tree classification. Four algorithms were implemented on a low-power TMS320C5505 DSP. Their classification accuracies were evaluated on a dataset of prerecorded respiratory sounds in two operating scenarios of different detection fidelities. The execution times of all algorithms were measured. The best classification accuracy of over 92%, while occupying only 2.6% of the DSP's processing time, is obtained for the algorithm featuring the time-frequency tracking of shapes of crests originating from wheezing, with spectral features modeled using energy.
Improving energy efficiency in handheld biometric applications
NASA Astrophysics Data System (ADS)
Hoyle, David C.; Gale, John W.; Schultz, Robert C.; Rakvic, Ryan N.; Ives, Robert W.
2012-06-01
With improved smartphone and tablet technology, it is becoming increasingly feasible to implement powerful biometric recognition algorithms on portable devices. Typical iris recognition algorithms, such as Ridge Energy Direction (RED), utilize two-dimensional convolution in their implementation. This paper explores the energy consumption implications of 12 different methods of implementing two-dimensional convolution on a portable device. Typically, convolution is implemented using floating point operations. If a given algorithm implemented integer convolution vice floating point convolution, it could drastically reduce the energy consumed by the processor. The 12 methods compared include 4 major categories: Integer C, Integer Java, Floating Point C, and Floating Point Java. Each major category is further divided into 3 implementations: variable size looped convolution, static size looped convolution, and unrolled looped convolution. All testing was performed using the HTC Thunderbolt with energy measured directly using a Tektronix TDS5104B Digital Phosphor oscilloscope. Results indicate that energy savings as high as 75% are possible by using Integer C versus Floating Point C. Considering the relative proportion of processing time that convolution is responsible for in a typical algorithm, the savings in energy would likely result in significantly greater time between battery charges.
Tolbert, Jeremy R; Kabali, Pratik; Brar, Simeranjit; Mukhopadhyay, Saibal
2009-01-01
We present a digital system for adaptive data compression for low power wireless transmission of Electroencephalography (EEG) data. The proposed system acts as a base-band processor between the EEG analog-to-digital front-end and RF transceiver. It performs a real-time accuracy energy trade-off for multi-channel EEG signal transmission by controlling the volume of transmitted data. We propose a multi-core digital signal processor for on-chip processing of EEG signals, to detect signal information of each channel and perform real-time adaptive compression. Our analysis shows that the proposed approach can provide significant savings in transmitter power with minimal impact on the overall signal accuracy.
Loran-C digital word generator for use with a KIM-1 microprocessor system
NASA Technical Reports Server (NTRS)
Nickum, J. D.
1977-01-01
The problem of translating the time of occurrence of received Loran-C pulses into a time, referenced to a particular period of occurrence is addressed and applied to the design of a digital word generator for a Loran-C sensor processor package. The digital information from this word generator is processed in a KIM-1 microprocessor system which is based on the MOS 6502 CPU. This final system will consist of a complete time difference sensor processor for determining position information using Loran-C charts. The system consists of the KIM-1 microprocessor module, a 4K RAM memory board, a user interface, and the Loran-C word generator.
Compression of CCD raw images for digital still cameras
NASA Astrophysics Data System (ADS)
Sriram, Parthasarathy; Sudharsanan, Subramania
2005-03-01
Lossless compression of raw CCD images captured using color filter arrays has several benefits. The benefits include improved storage capacity, reduced memory bandwidth, and lower power consumption for digital still camera processors. The paper discusses the benefits in detail and proposes the use of a computationally efficient block adaptive scheme for lossless compression. Experimental results are provided that indicate that the scheme performs well for CCD raw images attaining compression factors of more than two. The block adaptive method also compares favorably with JPEG-LS. A discussion is provided indicating how the proposed lossless coding scheme can be incorporated into digital still camera processors enabling lower memory bandwidth and storage requirements.
Theory of Remote Image Formation
NASA Astrophysics Data System (ADS)
Blahut, Richard E.
2004-11-01
In many applications, images, such as ultrasonic or X-ray signals, are recorded and then analyzed with digital or optical processors in order to extract information. Such processing requires the development of algorithms of great precision and sophistication. This book presents a unified treatment of the mathematical methods that underpin the various algorithms used in remote image formation. The author begins with a review of transform and filter theory. He then discusses two- and three-dimensional Fourier transform theory, the ambiguity function, image construction and reconstruction, tomography, baseband surveillance systems, and passive systems (where the signal source might be an earthquake or a galaxy). Information-theoretic methods in image formation are also covered, as are phase errors and phase noise. Throughout the book, practical applications illustrate theoretical concepts, and there are many homework problems. The book is aimed at graduate students of electrical engineering and computer science, and practitioners in industry. Presents a unified treatment of the mathematical methods that underpin the algorithms used in remote image formation Illustrates theoretical concepts with reference to practical applications Provides insights into the design parameters of real systems
A Course on Reconfigurable Processors
ERIC Educational Resources Information Center
Shoufan, Abdulhadi; Huss, Sorin A.
2010-01-01
Reconfigurable computing is an established field in computer science. Teaching this field to computer science students demands special attention due to limited student experience in electronics and digital system design. This article presents a compact course on reconfigurable processors, which was offered at the Technische Universitat Darmstadt,…
Data processing with microcode designed with source coding
McCoy, James A; Morrison, Steven E
2013-05-07
Programming for a data processor to execute a data processing application is provided using microcode source code. The microcode source code is assembled to produce microcode that includes digital microcode instructions with which to signal the data processor to execute the data processing application.
Video image processor on the Spacelab 2 Solar Optical Universal Polarimeter /SL2 SOUP/
NASA Technical Reports Server (NTRS)
Lindgren, R. W.; Tarbell, T. D.
1981-01-01
The SOUP instrument is designed to obtain diffraction-limited digital images of the sun with high photometric accuracy. The Video Processor originated from the requirement to provide onboard real-time image processing, both to reduce the telemetry rate and to provide meaningful video displays of scientific data to the payload crew. This original concept has evolved into a versatile digital processing system with a multitude of other uses in the SOUP program. The central element in the Video Processor design is a 16-bit central processing unit based on 2900 family bipolar bit-slice devices. All arithmetic, logical and I/O operations are under control of microprograms, stored in programmable read-only memory and initiated by commands from the LSI-11. Several functions of the Video Processor are described, including interface to the High Rate Multiplexer downlink, cosmetic and scientific data processing, scan conversion for crew displays, focus and exposure testing, and use as ground support equipment.
Software-Reconfigurable Processors for Spacecraft
NASA Technical Reports Server (NTRS)
Farrington, Allen; Gray, Andrew; Bell, Bryan; Stanton, Valerie; Chong, Yong; Peters, Kenneth; Lee, Clement; Srinivasan, Jeffrey
2005-01-01
A report presents an overview of an architecture for a software-reconfigurable network data processor for a spacecraft engaged in scientific exploration. When executed on suitable electronic hardware, the software performs the functions of a physical layer (in effect, acts as a software radio in that it performs modulation, demodulation, pulse-shaping, error correction, coding, and decoding), a data-link layer, a network layer, a transport layer, and application-layer processing of scientific data. The software-reconfigurable network processor is undergoing development to enable rapid prototyping and rapid implementation of communication, navigation, and scientific signal-processing functions; to provide a long-lived communication infrastructure; and to provide greatly improved scientific-instrumentation and scientific-data-processing functions by enabling science-driven in-flight reconfiguration of computing resources devoted to these functions. This development is an extension of terrestrial radio and network developments (e.g., in the cellular-telephone industry) implemented in software running on such hardware as field-programmable gate arrays, digital signal processors, traditional digital circuits, and mixed-signal application-specific integrated circuits (ASICs).
Using Modern Design Tools for Digital Avionics Development
NASA Technical Reports Server (NTRS)
Hyde, David W.; Lakin, David R., II; Asquith, Thomas E.
2000-01-01
Using Modem Design Tools for Digital Avionics Development Shrinking development time and increased complexity of new avionics forces the designer to use modem tools and methods during hardware development. Engineers at the Marshall Space Flight Center have successfully upgraded their design flow and used it to develop a Mongoose V based radiation tolerant processor board for the International Space Station's Water Recovery System. The design flow, based on hardware description languages, simulation, synthesis, hardware models, and full functional software model libraries, allowed designers to fully simulate the processor board from reset, through initialization before any boards were built. The fidelity of a digital simulation is limited to the accuracy of the models used and how realistically the designer drives the circuit's inputs during simulation. By using the actual silicon during simulation, device modeling errors are reduced. Numerous design flaws were discovered early in the design phase when they could be easily fixed. The use of hardware models and actual MIPS software loaded into full functional memory models also provided checkout of the software development environment. This paper will describe the design flow used to develop the processor board and give examples of errors that were found using the tools. An overview of the processor board firmware will also be covered.
NASA Astrophysics Data System (ADS)
Pape, Dennis R.
1990-09-01
The present conference discusses topics in optical image processing, optical signal processing, acoustooptic spectrum analyzer systems and components, and optical computing. Attention is given to tradeoffs in nonlinearly recorded matched filters, miniature spatial light modulators, detection and classification using higher-order statistics of optical matched filters, rapid traversal of an image data base using binary synthetic discriminant filters, wideband signal processing for emitter location, an acoustooptic processor for autonomous SAR guidance, and sampling of Fresnel transforms. Also discussed are an acoustooptic RF signal-acquisition system, scanning acoustooptic spectrum analyzers, the effects of aberrations on acoustooptic systems, fast optical digital arithmetic processors, information utilization in analog and digital processing, optical processors for smart structures, and a self-organizing neural network for unsupervised learning.
Acousto-optic time- and space-integrating spotlight-mode SAR processor
NASA Astrophysics Data System (ADS)
Haney, Michael W.; Levy, James J.; Michael, Robert R., Jr.
1993-09-01
The technical approach and recent experimental results for the acousto-optic time- and space- integrating real-time SAR image formation processor program are reported. The concept overcomes the size and power consumption limitations of electronic approaches by using compact, rugged, and low-power analog optical signal processing techniques for the most computationally taxing portions of the SAR imaging problem. Flexibility and performance are maintained by the use of digital electronics for the critical low-complexity filter generation and output image processing functions. The results include a demonstration of the processor's ability to perform high-resolution spotlight-mode SAR imaging by simultaneously compensating for range migration and range/azimuth coupling in the analog optical domain, thereby avoiding a highly power-consuming digital interpolation or reformatting operation usually required in all-electronic approaches.
Warburton, William K.; Zhou, Zhiquing
1999-01-01
A high speed, digitally based, signal processing system which accepts a digitized input signal and detects the presence of step-like pulses in the this data stream, extracts filtered estimates of their amplitudes, inspects for pulse pileup, and records input pulse rates and system livetime. The system has two parallel processing channels: a slow channel, which filters the data stream with a long time constant trapezoidal filter for good energy resolution; and a fast channel which filters the data stream with a short time constant trapezoidal filter, detects pulses, inspects for pileups, and captures peak values from the slow channel for good events. The presence of a simple digital interface allows the system to be easily integrated with a digital processor to produce accurate spectra at high count rates and allow all spectrometer functions to be fully automated. Because the method is digitally based, it allows pulses to be binned based on time related values, as well as on their amplitudes, if desired.
Design and development progress of a LLRF control system for a 500 MHz superconducting cavity
NASA Astrophysics Data System (ADS)
Lee, Y. S.; Kim, H. W.; Song, H. S.; Lee, J. H.; Park, K. H.; Yu, I. H.; Chai, J. S.
2012-07-01
The LLRF (low-level radio-frequency) control system which regulates the amplitude and the phase of the accelerating voltage inside a RF cavity is essential to ensure the stable operation of charged particle accelerators. Recent advances in digital signal processors and data acquisition systems have allowed the LLRF control system to be implemented in digitally and have made it possible to meet the higher demands associated with the performance of LLRF control systems, such as stability, accuracy, etc. For this reason, many accelerator laboratories have completed or are completing the developments of digital LLRF control systems. The digital LLRF control system has advantages related with flexibility and fast reconfiguration. This paper describes the design of the FPGA (field programmable gate array) based LLRF control system and the status of development for this system. The proposed LLRF control system includes an analog front-end, a digital board (ADC (analog to digital converter), DAC (digital to analog converter), FPGA, etc.) and a RF & clock generation system. The control algorithms will be implemented by using the VHDL (VHSIC (very high speed integrated circuits) hardware description language), and the EPICS (experiment physics and industrial control system) will be ported to the host computer for the communication. In addition, the purpose of this system is to control a 500 MHz RF cavity, so the system will be applied to the superconducting cavity to be installed in the PLS storage ring, and its performance will be tested.
Performance of a plasma fluid code on the Intel parallel computers
NASA Technical Reports Server (NTRS)
Lynch, V. E.; Carreras, B. A.; Drake, J. B.; Leboeuf, J. N.; Liewer, P.
1992-01-01
One approach to improving the real-time efficiency of plasma turbulence calculations is to use a parallel algorithm. A parallel algorithm for plasma turbulence calculations was tested on the Intel iPSC/860 hypercube and the Touchtone Delta machine. Using the 128 processors of the Intel iPSC/860 hypercube, a factor of 5 improvement over a single-processor CRAY-2 is obtained. For the Touchtone Delta machine, the corresponding improvement factor is 16. For plasma edge turbulence calculations, an extrapolation of the present results to the Intel (sigma) machine gives an improvement factor close to 64 over the single-processor CRAY-2.
Parallel implementation of D-Phylo algorithm for maximum likelihood clusters.
Malik, Shamita; Sharma, Dolly; Khatri, Sunil Kumar
2017-03-01
This study explains a newly developed parallel algorithm for phylogenetic analysis of DNA sequences. The newly designed D-Phylo is a more advanced algorithm for phylogenetic analysis using maximum likelihood approach. The D-Phylo while misusing the seeking capacity of k -means keeps away from its real constraint of getting stuck at privately conserved motifs. The authors have tested the behaviour of D-Phylo on Amazon Linux Amazon Machine Image(Hardware Virtual Machine)i2.4xlarge, six central processing unit, 122 GiB memory, 8 × 800 Solid-state drive Elastic Block Store volume, high network performance up to 15 processors for several real-life datasets. Distributing the clusters evenly on all the processors provides us the capacity to accomplish a near direct speed if there should arise an occurrence of huge number of processors.
Olson, Eric J.
2013-06-11
An apparatus, program product, and method that run an algorithm on a hardware based processor, generate a hardware error as a result of running the algorithm, generate an algorithm output for the algorithm, compare the algorithm output to another output for the algorithm, and detect the hardware error from the comparison. The algorithm is designed to cause the hardware based processor to heat to a degree that increases the likelihood of hardware errors to manifest, and the hardware error is observable in the algorithm output. As such, electronic components may be sufficiently heated and/or sufficiently stressed to create better conditions for generating hardware errors, and the output of the algorithm may be compared at the end of the run to detect a hardware error that occurred anywhere during the run that may otherwise not be detected by traditional methodologies (e.g., due to cooling, insufficient heat and/or stress, etc.).
DRACULA: Dynamic range control for broadcasting and other applications
NASA Astrophysics Data System (ADS)
Gilchrist, N. H. C.
The BBC has developed a digital processor which is capable of reducing the dynamic range of audio in an unobtrusive manner. It is ideally suited to the task of controlling the level of musical programs. Operating as a self-contained dynamic range controller, the processor is suitable for controlling levels in conventional AM or FM broadcasting, or for applications such as the compression of program material for in-flight entertainment. It can, alternatively, be used to provide a supplementary signal in DAB (digital audio broadcasting) for optional dynamic compression in the receiver.
Measurement of fault latency in a digital avionic mini processor, part 2
NASA Technical Reports Server (NTRS)
Mcgough, J.; Swern, F.
1983-01-01
The results of fault injection experiments utilizing a gate-level emulation of the central processor unit of the Bendix BDX-930 digital computer are described. Several earlier programs were reprogrammed, expanding the instruction set to capitalize on the full power of the BDX-930 computer. As a final demonstration of fault coverage an extensive, 3-axis, high performance flght control computation was added. The stages in the development of a CPU self-test program emphasizing the relationship between fault coverage, speed, and quantity of instructions were demonstrated.
DOE Office of Scientific and Technical Information (OSTI.GOV)
O'Brien, M. J.; Brantley, P. S.
2015-01-20
In order to run Monte Carlo particle transport calculations on new supercomputers with hundreds of thousands or millions of processors, care must be taken to implement scalable algorithms. This means that the algorithms must continue to perform well as the processor count increases. In this paper, we examine the scalability of:(1) globally resolving the particle locations on the correct processor, (2) deciding that particle streaming communication has finished, and (3) efficiently coupling neighbor domains together with different replication levels. We have run domain decomposed Monte Carlo particle transport on up to 2 21 = 2,097,152 MPI processes on the IBMmore » BG/Q Sequoia supercomputer and observed scalable results that agree with our theoretical predictions. These calculations were carefully constructed to have the same amount of work on every processor, i.e. the calculation is already load balanced. We also examine load imbalanced calculations where each domain’s replication level is proportional to its particle workload. In this case we show how to efficiently couple together adjacent domains to maintain within workgroup load balance and minimize memory usage.« less
Parallel text rendering by a PostScript interpreter
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kritskii, S.P.; Zastavnoi, B.A.
1994-11-01
The most radical method of increasing the performance of devices controlled by PostScript interpreters may be the use of multiprocessor controllers. This paper presents a method for parallelizing the operation of a PostScript interpreter for rendering text. The proposed method is based on decomposition of the outlines of letters into horizontal strips covering equal areas. The subroutines thus obtained are distributed to the processors in a network and then filled in by conventional sequential algorithms. A special algorithm has been developed for dividing the outlines of characters into subroutines so that each may be colored independently of the others. Themore » algorithm uses special estimates for estimating the correct partition so that the corresponding outlines are divided into horizontal strips. A method is presented for finding such estimates. Two different processing approaches are presented. In the first, one of the processors performs the decomposition of the outlines and distributes the strips to the remaining processors, which are responsible for the rendering. In the second approach, the decomposition process is itself distributed among the processors in the network.« less
NASA Astrophysics Data System (ADS)
Rakvic, Ryan N.; Ives, Robert W.; Lira, Javier; Molina, Carlos
2011-01-01
General purpose computer designers have recently begun adding cores to their processors in order to increase performance. For example, Intel has adopted a homogeneous quad-core processor as a base for general purpose computing. PlayStation3 (PS3) game consoles contain a multicore heterogeneous processor known as the Cell, which is designed to perform complex image processing algorithms at a high level. Can modern image-processing algorithms utilize these additional cores? On the other hand, modern advancements in configurable hardware, most notably field-programmable gate arrays (FPGAs) have created an interesting question for general purpose computer designers. Is there a reason to combine FPGAs with multicore processors to create an FPGA multicore hybrid general purpose computer? Iris matching, a repeatedly executed portion of a modern iris-recognition algorithm, is parallelized on an Intel-based homogeneous multicore Xeon system, a heterogeneous multicore Cell system, and an FPGA multicore hybrid system. Surprisingly, the cheaper PS3 slightly outperforms the Intel-based multicore on a core-for-core basis. However, both multicore systems are beaten by the FPGA multicore hybrid system by >50%.
Detailed description of the HP-9825A HFRMP trajectory processor (TRAJ)
NASA Technical Reports Server (NTRS)
Kindall, S. M.; Wilson, S. W.
1979-01-01
The computer code for the trajectory processor of the HP-9825A High Fidelity Relative Motion Program is described in detail. The processor is a 12-degrees-of-freedom trajectory integrator which can be used to generate digital and graphical data describing the relative motion of the Space Shuttle Orbiter and a free-flying cylindrical payload. Coding standards and flow charts are given and the computational logic is discussed.
Application of Prognostic Health Management in Digital Electronic Systems
2007-01-01
variable external supply applied the necessary core power to the processor while the motherboard continued to source power from the ATX supply. By...isolating the processor power from the motherboard power , control over the aging profile of the processor was achieved. Once nominal operating...Physics-of-failure RISC – Reduced Instruction Set Computer RUL – Remaining Useful Life 1 1-4244-0525-4/07/$20.00 ©2007 IEEE. Paper 1326
Reagor, David; Vasquez-Dominguez, Jose
2006-12-12
A through-the-earth communication system that includes a digital signal input device; a transmitter operating at a predetermined frequency sufficiently low to effectively penetrate useful distances through-the earth; a data compression circuit that is connected to an encoding processor; an amplifier that receives encoded output from the encoding processor for amplifying the output and transmitting the data to an antenna; and a receiver with an antenna, a band pass filter, a decoding processor, and a data decompressor.
An architecture for real-time vision processing
NASA Technical Reports Server (NTRS)
Chien, Chiun-Hong
1994-01-01
To study the feasibility of developing an architecture for real time vision processing, a task queue server and parallel algorithms for two vision operations were designed and implemented on an i860-based Mercury Computing System 860VS array processor. The proposed architecture treats each vision function as a task or set of tasks which may be recursively divided into subtasks and processed by multiple processors coordinated by a task queue server accessible by all processors. Each idle processor subsequently fetches a task and associated data from the task queue server for processing and posts the result to shared memory for later use. Load balancing can be carried out within the processing system without the requirement for a centralized controller. The author concludes that real time vision processing cannot be achieved without both sequential and parallel vision algorithms and a good parallel vision architecture.
Parallelising a molecular dynamics algorithm on a multi-processor workstation
NASA Astrophysics Data System (ADS)
Müller-Plathe, Florian
1990-12-01
The Verlet neighbour-list algorithm is parallelised for a multi-processor Hewlett-Packard/Apollo DN10000 workstation. The implementation makes use of memory shared between the processors. It is a genuine master-slave approach by which most of the computational tasks are kept in the master process and the slaves are only called to do part of the nonbonded forces calculation. The implementation features elements of both fine-grain and coarse-grain parallelism. Apart from three calls to library routines, two of which are standard UNIX calls, and two machine-specific language extensions, the whole code is written in standard Fortran 77. Hence, it may be expected that this parallelisation concept can be transfered in parts or as a whole to other multi-processor shared-memory computers. The parallel code is routinely used in production work.
Multiprocessing on supercomputers for computational aerodynamics
NASA Technical Reports Server (NTRS)
Yarrow, Maurice; Mehta, Unmeel B.
1991-01-01
Little use is made of multiple processors available on current supercomputers (computers with a theoretical peak performance capability equal to 100 MFLOPS or more) to improve turnaround time in computational aerodynamics. The productivity of a computer user is directly related to this turnaround time. In a time-sharing environment, such improvement in this speed is achieved when multiple processors are used efficiently to execute an algorithm. The concept of multiple instructions and multiple data (MIMD) is applied through multitasking via a strategy that requires relatively minor modifications to an existing code for a single processor. This approach maps the available memory to multiple processors, exploiting the C-Fortran-Unix interface. The existing code is mapped without the need for developing a new algorithm. The procedure for building a code utilizing this approach is automated with the Unix stream editor.
Parallel Processing Systems for Passive Ranging During Helicopter Flight
NASA Technical Reports Server (NTRS)
Sridhar, Bavavar; Suorsa, Raymond E.; Showman, Robert D. (Technical Monitor)
1994-01-01
The complexity of rotorcraft missions involving operations close to the ground result in high pilot workload. In order to allow a pilot time to perform mission-oriented tasks, sensor-aiding and automation of some of the guidance and control functions are highly desirable. Images from an electro-optical sensor provide a covert way of detecting objects in the flight path of a low-flying helicopter. Passive ranging consists of processing a sequence of images using techniques based on optical low computation and recursive estimation. The passive ranging algorithm has to extract obstacle information from imagery at rates varying from five to thirty or more frames per second depending on the helicopter speed. We have implemented and tested the passive ranging algorithm off-line using helicopter-collected images. However, the real-time data and computation requirements of the algorithm are beyond the capability of any off-the-shelf microprocessor or digital signal processor. This paper describes the computational requirements of the algorithm and uses parallel processing technology to meet these requirements. Various issues in the selection of a parallel processing architecture are discussed and four different computer architectures are evaluated regarding their suitability to process the algorithm in real-time. Based on this evaluation, we conclude that real-time passive ranging is a realistic goal and can be achieved with a short time.
1992-12-01
Dynamics and Free Energy Perturbation Methods." Reviews in Computational Chem- istry edited by Kenny B. Lipkowitz and Donald B. Boyd, chapter 8, 295-320...atomic motions during annealing, allows the search to probabilistically move in a locally non-optimal direction. The probability of doing so is...Network processors communicate via communication links. This type of communication is generally very slow relative to other processor activities
Integrated digital flight-control system for the space shuttle orbiter
NASA Technical Reports Server (NTRS)
1973-01-01
The integrated digital flight control system is presented which provides rotational and translational control of the space shuttle orbiter in all phases of flight: from launch ascent through orbit to entry and touchdown, and during powered horizontal flights. The program provides a versatile control system structure while maintaining uniform communications with other programs, sensors, and control effectors by using an executive routine/functional subroutine format. The program reads all external variables at a single point, copies them into its dedicated storage, and then calls the required subroutines in the proper sequence. As a result, the flight control program is largely independent of other programs in the GN&C computer complex and is equally insensitive to the characteristics of the processor configuration. The integrated structure of the control system and the DFCS executive routine which embodies that structure are described along with the input and output. The specific estimation and control algorithms used in the various mission phases are given.
On the impact of approximate computation in an analog DeSTIN architecture.
Young, Steven; Lu, Junjie; Holleman, Jeremy; Arel, Itamar
2014-05-01
Deep machine learning (DML) holds the potential to revolutionize machine learning by automating rich feature extraction, which has become the primary bottleneck of human engineering in pattern recognition systems. However, the heavy computational burden renders DML systems implemented on conventional digital processors impractical for large-scale problems. The highly parallel computations required to implement large-scale deep learning systems are well suited to custom hardware. Analog computation has demonstrated power efficiency advantages of multiple orders of magnitude relative to digital systems while performing nonideal computations. In this paper, we investigate typical error sources introduced by analog computational elements and their impact on system-level performance in DeSTIN--a compositional deep learning architecture. These inaccuracies are evaluated on a pattern classification benchmark, clearly demonstrating the robustness of the underlying algorithm to the errors introduced by analog computational elements. A clear understanding of the impacts of nonideal computations is necessary to fully exploit the efficiency of analog circuits.
Parallel and fault-tolerant algorithms for hypercube multiprocessors
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aykanat, C.
1988-01-01
Several techniques for increasing the performance of parallel algorithms on distributed-memory message-passing multi-processor systems are investigated. These techniques are effectively implemented for the parallelization of the Scaled Conjugate Gradient (SCG) algorithm on a hypercube connected message-passing multi-processor. Significant performance improvement is achieved by using these techniques. The SCG algorithm is used for the solution phase of an FE modeling system. Almost linear speed-up is achieved, and it is shown that hypercube topology is scalable for an FE class of problem. The SCG algorithm is also shown to be suitable for vectorization, and near supercomputer performance is achieved on a vectormore » hypercube multiprocessor by exploiting both parallelization and vectorization. Fault-tolerance issues for the parallel SCG algorithm and for the hypercube topology are also addressed.« less
Accelerating molecular dynamic simulation on the cell processor and Playstation 3.
Luttmann, Edgar; Ensign, Daniel L; Vaidyanathan, Vishal; Houston, Mike; Rimon, Noam; Øland, Jeppe; Jayachandran, Guha; Friedrichs, Mark; Pande, Vijay S
2009-01-30
Implementation of molecular dynamics (MD) calculations on novel architectures will vastly increase its power to calculate the physical properties of complex systems. Herein, we detail algorithmic advances developed to accelerate MD simulations on the Cell processor, a commodity processor found in PlayStation 3 (PS3). In particular, we discuss issues regarding memory access versus computation and the types of calculations which are best suited for streaming processors such as the Cell, focusing on implicit solvation models. We conclude with a comparison of improved performance on the PS3's Cell processor over more traditional processors. (c) 2008 Wiley Periodicals, Inc.
Leung, Vitus J [Albuquerque, NM; Phillips, Cynthia A [Albuquerque, NM; Bender, Michael A [East Northport, NY; Bunde, David P [Urbana, IL
2009-07-21
In a multiple processor computing apparatus, directional routing restrictions and a logical channel construct permit fault tolerant, deadlock-free routing. Processor allocation can be performed by creating a linear ordering of the processors based on routing rules used for routing communications between the processors. The linear ordering can assume a loop configuration, and bin-packing is applied to this loop configuration. The interconnection of the processors can be conceptualized as a generally rectangular 3-dimensional grid, and the MC allocation algorithm is applied with respect to the 3-dimensional grid.
Ghosh, A
1988-08-01
Lanczos and conjugate gradient algorithms are important in computational linear algebra. In this paper, a parallel pipelined realization of these algorithms on a ring of optical linear algebra processors is described. The flow of data is designed to minimize the idle times of the optical multiprocessor and the redundancy of computations. The effects of optical round-off errors on the solutions obtained by the optical Lanczos and conjugate gradient algorithms are analyzed, and it is shown that optical preconditioning can improve the accuracy of these algorithms substantially. Algorithms for optical preconditioning and results of numerical experiments on solving linear systems of equations arising from partial differential equations are discussed. Since the Lanczos algorithm is used mostly with sparse matrices, a folded storage scheme to represent sparse matrices on spatial light modulators is also described.
WATERLOPP V2/64: A highly parallel machine for numerical computation
NASA Astrophysics Data System (ADS)
Ostlund, Neil S.
1985-07-01
Current technological trends suggest that the high performance scientific machines of the future are very likely to consist of a large number (greater than 1024) of processors connected and communicating with each other in some as yet undetermined manner. Such an assembly of processors should behave as a single machine in obtaining numerical solutions to scientific problems. However, the appropriate way of organizing both the hardware and software of such an assembly of processors is an unsolved and active area of research. It is particularly important to minimize the organizational overhead of interprocessor comunication, global synchronization, and contention for shared resources if the performance of a large number ( n) of processors is to be anything like the desirable n times the performance of a single processor. In many situations, adding a processor actually decreases the performance of the overall system since the extra organizational overhead is larger than the extra processing power added. The systolic loop architecture is a new multiple processor architecture which attemps at a solution to the problem of how to organize a large number of asynchronous processors into an effective computational system while minimizing the organizational overhead. This paper gives a brief overview of the basic systolic loop architecture, systolic loop algorithms for numerical computation, and a 64-processor implementation of the architecture, WATERLOOP V2/64, that is being used as a testbed for exploring the hardware, software, and algorithmic aspects of the architecture.
Multiprocessing on supercomputers for computational aerodynamics
NASA Technical Reports Server (NTRS)
Yarrow, Maurice; Mehta, Unmeel B.
1990-01-01
Very little use is made of multiple processors available on current supercomputers (computers with a theoretical peak performance capability equal to 100 MFLOPs or more) in computational aerodynamics to significantly improve turnaround time. The productivity of a computer user is directly related to this turnaround time. In a time-sharing environment, the improvement in this speed is achieved when multiple processors are used efficiently to execute an algorithm. The concept of multiple instructions and multiple data (MIMD) through multi-tasking is applied via a strategy which requires relatively minor modifications to an existing code for a single processor. Essentially, this approach maps the available memory to multiple processors, exploiting the C-FORTRAN-Unix interface. The existing single processor code is mapped without the need for developing a new algorithm. The procedure for building a code utilizing this approach is automated with the Unix stream editor. As a demonstration of this approach, a Multiple Processor Multiple Grid (MPMG) code is developed. It is capable of using nine processors, and can be easily extended to a larger number of processors. This code solves the three-dimensional, Reynolds averaged, thin-layer and slender-layer Navier-Stokes equations with an implicit, approximately factored and diagonalized method. The solver is applied to generic oblique-wing aircraft problem on a four processor Cray-2 computer. A tricubic interpolation scheme is developed to increase the accuracy of coupling of overlapped grids. For the oblique-wing aircraft problem, a speedup of two in elapsed (turnaround) time is observed in a saturated time-sharing environment.
Digital signal processing in microwave radiometers
NASA Technical Reports Server (NTRS)
Lawrence, R. W.; Stanley, W. D.; Harrington, R. F.
1980-01-01
A microprocessor based digital signal processing unit has been proposed to replace analog sections of a microwave radiometer. A brief introduction to the radiometer system involved and a description of problems encountered in the use of digital techniques in radiometer design are discussed. An analysis of the digital signal processor as part of the radiometer is then presented.
NASA Technical Reports Server (NTRS)
Kemeny, Sabrina E.
1994-01-01
Electronic and optoelectronic hardware implementations of highly parallel computing architectures address several ill-defined and/or computation-intensive problems not easily solved by conventional computing techniques. The concurrent processing architectures developed are derived from a variety of advanced computing paradigms including neural network models, fuzzy logic, and cellular automata. Hardware implementation technologies range from state-of-the-art digital/analog custom-VLSI to advanced optoelectronic devices such as computer-generated holograms and e-beam fabricated Dammann gratings. JPL's concurrent processing devices group has developed a broad technology base in hardware implementable parallel algorithms, low-power and high-speed VLSI designs and building block VLSI chips, leading to application-specific high-performance embeddable processors. Application areas include high throughput map-data classification using feedforward neural networks, terrain based tactical movement planner using cellular automata, resource optimization (weapon-target assignment) using a multidimensional feedback network with lateral inhibition, and classification of rocks using an inner-product scheme on thematic mapper data. In addition to addressing specific functional needs of DOD and NASA, the JPL-developed concurrent processing device technology is also being customized for a variety of commercial applications (in collaboration with industrial partners), and is being transferred to U.S. industries. This viewgraph p resentation focuses on two application-specific processors which solve the computation intensive tasks of resource allocation (weapon-target assignment) and terrain based tactical movement planning using two extremely different topologies. Resource allocation is implemented as an asynchronous analog competitive assignment architecture inspired by the Hopfield network. Hardware realization leads to a two to four order of magnitude speed-up over conventional techniques and enables multiple assignments, (many to many), not achievable with standard statistical approaches. Tactical movement planning (finding the best path from A to B) is accomplished with a digital two-dimensional concurrent processor array. By exploiting the natural parallel decomposition of the problem in silicon, a four order of magnitude speed-up over optimized software approaches has been demonstrated.
Line-drawing algorithms for parallel machines
NASA Technical Reports Server (NTRS)
Pang, Alex T.
1990-01-01
The fact that conventional line-drawing algorithms, when applied directly on parallel machines, can lead to very inefficient codes is addressed. It is suggested that instead of modifying an existing algorithm for a parallel machine, a more efficient implementation can be produced by going back to the invariants in the definition. Popular line-drawing algorithms are compared with two alternatives; distance to a line (a point is on the line if sufficiently close to it) and intersection with a line (a point on the line if an intersection point). For massively parallel single-instruction-multiple-data (SIMD) machines (with thousands of processors and up), the alternatives provide viable line-drawing algorithms. Because of the pixel-per-processor mapping, their performance is independent of the line length and orientation.
Optical linear algebra processors: noise and error-source modeling.
Casasent, D; Ghosh, A
1985-06-01
The modeling of system and component noise and error sources in optical linear algebra processors (OLAP's) are considered, with attention to the frequency-multiplexed OLAP. General expressions are obtained for the output produced as a function of various component errors and noise. A digital simulator for this model is discussed.
Optical linear algebra processors - Noise and error-source modeling
NASA Technical Reports Server (NTRS)
Casasent, D.; Ghosh, A.
1985-01-01
The modeling of system and component noise and error sources in optical linear algebra processors (OLAPs) are considered, with attention to the frequency-multiplexed OLAP. General expressions are obtained for the output produced as a function of various component errors and noise. A digital simulator for this model is discussed.
A Modular Pipelined Processor for High Resolution Gamma-Ray Spectroscopy
NASA Astrophysics Data System (ADS)
Veiga, Alejandro; Grunfeld, Christian
2016-02-01
The design of a digital signal processor for gamma-ray applications is presented in which a single ADC input can simultaneously provide temporal and energy characterization of gamma radiation for a wide range of applications. Applying pipelining techniques, the processor is able to manage and synchronize very large volumes of streamed real-time data. Its modular user interface provides a flexible environment for experimental design. The processor can fit in a medium-sized FPGA device operating at ADC sampling frequency, providing an efficient solution for multi-channel applications. Two experiments are presented in order to characterize its temporal and energy resolution.
On the relationship between parallel computation and graph embedding
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gupta, A.K.
1989-01-01
The problem of efficiently simulating an algorithm designed for an n-processor parallel machine G on an m-processor parallel machine H with n > m arises when parallel algorithms designed for an ideal size machine are simulated on existing machines which are of a fixed size. The author studies this problem when every processor of H takes over the function of a number of processors in G, and he phrases the simulation problem as a graph embedding problem. New embeddings presented address relevant issues arising from the parallel computation environment. The main focus centers around embedding complete binary trees into smaller-sizedmore » binary trees, butterflies, and hypercubes. He also considers simultaneous embeddings of r source machines into a single hypercube. Constant factors play a crucial role in his embeddings since they are not only important in practice but also lead to interesting theoretical problems. All of his embeddings minimize dilation and load, which are the conventional cost measures in graph embeddings and determine the maximum amount of time required to simulate one step of G on H. His embeddings also optimize a new cost measure called ({alpha},{beta})-utilization which characterizes how evenly the processors of H are used by the processors of G. Ideally, the utilization should be balanced (i.e., every processor of H simulates at most (n/m) processors of G) and the ({alpha},{beta})-utilization measures how far off from a balanced utilization the embedding is. He presents embeddings for the situation when some processors of G have different capabilities (e.g. memory or I/O) than others and the processors with different capabilities are to be distributed uniformly among the processors of H. Placing such conditions on an embedding results in an increase in some of the cost measures.« less
Windprofiler optimization using digital deconvolution procedures
NASA Astrophysics Data System (ADS)
Hocking, W. K.; Hocking, A.; Hocking, D. G.; Garbanzo-Salas, M.
2014-10-01
Digital improvements to data acquisition procedures used for windprofiler radars have the potential for improving the height coverage at optimum resolution, and permit improved height resolution. A few newer systems already use this capability. Real-time deconvolution procedures offer even further optimization, and this has not been effectively employed in recent years. In this paper we demonstrate the advantages of combining these features, with particular emphasis on the advantages of real-time deconvolution. Using several multi-core CPUs, we have been able to achieve speeds of up to 40 GHz from a standard commercial motherboard, allowing data to be digitized and processed without the need for any type of hardware except for a transmitter (and associated drivers), a receiver and a digitizer. No Digital Signal Processor chips are needed, allowing great flexibility with analysis algorithms. By using deconvolution procedures, we have then been able to not only optimize height resolution, but also have been able to make advances in dealing with spectral contaminants like ground echoes and other near-zero-Hz spectral contamination. Our results also demonstrate the ability to produce fine-resolution measurements, revealing small-scale structures within the backscattered echoes that were previously not possible to see. Resolutions of 30 m are possible for VHF radars. Furthermore, our deconvolution technique allows the removal of range-aliasing effects in real time, a major bonus in many instances. Results are shown using new radars in Canada and Costa Rica.
Analog Delta-Back-Propagation Neural-Network Circuitry
NASA Technical Reports Server (NTRS)
Eberhart, Silvio
1990-01-01
Changes in synapse weights due to circuit drifts suppressed. Proposed fully parallel analog version of electronic neural-network processor based on delta-back-propagation algorithm. Processor able to "learn" when provided with suitable combinations of inputs and enforced outputs. Includes programmable resistive memory elements (corresponding to synapses), conductances (synapse weights) adjusted during learning. Buffer amplifiers, summing circuits, and sample-and-hold circuits arranged in layers of electronic neurons in accordance with delta-back-propagation algorithm.
Implementation of an ADI method on parallel computers
NASA Technical Reports Server (NTRS)
Fatoohi, Raad A.; Grosch, Chester E.
1987-01-01
The implementation of an ADI method for solving the diffusion equation on three parallel/vector computers is discussed. The computers were chosen so as to encompass a variety of architectures. They are: the MPP, an SIMD machine with 16K bit serial processors; FLEX/32, an MIMD machine with 20 processors; and CRAY/2, an MIMD machine with four vector processors. The Gaussian elimination algorithm is used to solve a set of tridiagonal systems on the FLEX/32 and CRAY/2 while the cyclic elimination algorithm is used to solve these systems on the MPP. The implementation of the method is discussed in relation to these architectures and measures of the performance on each machine are given. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Finally, conclusions are presented.
Implementation of an ADI method on parallel computers
NASA Technical Reports Server (NTRS)
Fatoohi, Raad A.; Grosch, Chester E.
1987-01-01
In this paper the implementation of an ADI method for solving the diffusion equation on three parallel/vector computers is discussed. The computers were chosen so as to encompass a variety of architectures. They are the MPP, an SIMD machine with 16-Kbit serial processors; Flex/32, an MIMD machine with 20 processors; and Cray/2, an MIMD machine with four vector processors. The Gaussian elimination algorithm is used to solve a set of tridiagonal systems on the Flex/32 and Cray/2 while the cyclic elimination algorithm is used to solve these systems on the MPP. The implementation of the method is discussed in relation to these architectures and measures of the performance on each machine are given. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Finally conclusions are presented.
NASA Astrophysics Data System (ADS)
Wright, Adam A.; Momin, Orko; Shin, Young Ho; Shakya, Rahul; Nepal, Kumud; Ahlgren, David J.
2010-01-01
This paper presents the application of a distributed systems architecture to an autonomous ground vehicle, Q, that participates in both the autonomous and navigation challenges of the Intelligent Ground Vehicle Competition. In the autonomous challenge the vehicle is required to follow a course, while avoiding obstacles and staying within the course boundaries, which are marked by white lines. For the navigation challenge, the vehicle is required to reach a set of target destinations, known as way points, with given GPS coordinates and avoid obstacles that it encounters in the process. Previously the vehicle utilized a single laptop to execute all processing activities including image processing, sensor interfacing and data processing, path planning and navigation algorithms and motor control. National Instruments' (NI) LabVIEW served as the programming language for software implementation. As an upgrade to last year's design, a NI compact Reconfigurable Input/Output system (cRIO) was incorporated to the system architecture. The cRIO is NI's solution for rapid prototyping that is equipped with a real time processor, an FPGA and modular input/output. Under the current system, the real time processor handles the path planning and navigation algorithms, the FPGA gathers and processes sensor data. This setup leaves the laptop to focus on running the image processing algorithm. Image processing as previously presented by Nepal et. al. is a multi-step line extraction algorithm and constitutes the largest processor load. This distributed approach results in a faster image processing algorithm which was previously Q's bottleneck. Additionally, the path planning and navigation algorithms are executed more reliably on the real time processor due to the deterministic nature of operation. The implementation of this architecture required exploration of various inter-system communication techniques. Data transfer between the laptop and the real time processor using UDP packets was established as the most reliable protocol after testing various options. Improvement can be made to the system by migrating more algorithms to the hardware based FPGA to further speed up the operations of the vehicle.
Vascular system modeling in parallel environment - distributed and shared memory approaches
Jurczuk, Krzysztof; Kretowski, Marek; Bezy-Wendling, Johanne
2011-01-01
The paper presents two approaches in parallel modeling of vascular system development in internal organs. In the first approach, new parts of tissue are distributed among processors and each processor is responsible for perfusing its assigned parts of tissue to all vascular trees. Communication between processors is accomplished by passing messages and therefore this algorithm is perfectly suited for distributed memory architectures. The second approach is designed for shared memory machines. It parallelizes the perfusion process during which individual processing units perform calculations concerning different vascular trees. The experimental results, performed on a computing cluster and multi-core machines, show that both algorithms provide a significant speedup. PMID:21550891
A hybrid algorithm for parallel molecular dynamics simulations
NASA Astrophysics Data System (ADS)
Mangiardi, Chris M.; Meyer, R.
2017-10-01
This article describes algorithms for the hybrid parallelization and SIMD vectorization of molecular dynamics simulations with short-range forces. The parallelization method combines domain decomposition with a thread-based parallelization approach. The goal of the work is to enable efficient simulations of very large (tens of millions of atoms) and inhomogeneous systems on many-core processors with hundreds or thousands of cores and SIMD units with large vector sizes. In order to test the efficiency of the method, simulations of a variety of configurations with up to 74 million atoms have been performed. Results are shown that were obtained on multi-core systems with Sandy Bridge and Haswell processors as well as systems with Xeon Phi many-core processors.
Parallel solution of high-order numerical schemes for solving incompressible flows
NASA Technical Reports Server (NTRS)
Milner, Edward J.; Lin, Avi; Liou, May-Fun; Blech, Richard A.
1993-01-01
A new parallel numerical scheme for solving incompressible steady-state flows is presented. The algorithm uses a finite-difference approach to solving the Navier-Stokes equations. The algorithms are scalable and expandable. They may be used with only two processors or with as many processors as are available. The code is general and expandable. Any size grid may be used. Four processors of the NASA LeRC Hypercluster were used to solve for steady-state flow in a driven square cavity. The Hypercluster was configured in a distributed-memory, hypercube-like architecture. By using a 50-by-50 finite-difference solution grid, an efficiency of 74 percent (a speedup of 2.96) was obtained.
A sparse matrix algorithm on the Boolean vector machine
NASA Technical Reports Server (NTRS)
Wagner, Robert A.; Patrick, Merrell L.
1988-01-01
VLSI technology is being used to implement a prototype Boolean Vector Machine (BVM), which is a large network of very small processors with equally small memories that operate in SIMD mode; these use bit-serial arithmetic, and communicate via cube-connected cycles network. The BVM's bit-serial arithmetic and the small memories of individual processors are noted to compromise the system's effectiveness in large numerical problem applications. Attention is presently given to the implementation of a basic matrix-vector iteration algorithm for space matrices of the BVM, in order to generate over 1 billion useful floating-point operations/sec for this iteration algorithm. The algorithm is expressed in a novel language designated 'BVM'.
Algorithms for solving large sparse systems of simultaneous linear equations on vector processors
NASA Technical Reports Server (NTRS)
David, R. E.
1984-01-01
Very efficient algorithms for solving large sparse systems of simultaneous linear equations have been developed for serial processing computers. These involve a reordering of matrix rows and columns in order to obtain a near triangular pattern of nonzero elements. Then an LU factorization is developed to represent the matrix inverse in terms of a sequence of elementary Gaussian eliminations, or pivots. In this paper it is shown how these algorithms are adapted for efficient implementation on vector processors. Results obtained on the CYBER 200 Model 205 are presented for a series of large test problems which show the comparative advantages of the triangularization and vector processing algorithms.
An incentive-based distributed mechanism for scheduling divisible loads in tree networks
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carroll, T. E.; Grosu, D.
The underlying assumption of Divisible Load Scheduling (DLS) theory is that the pro-cessors composing the network are obedient, i.e., they do not “cheat” the scheduling algorithm. This assumption is unrealistic if the processors are owned by autonomous, self-interested organizations that have no a priori motivation for cooperation and they will manipulate the algorithm if it is beneficial to do so. In this paper, we address this issue by designing a distributed mechanism for scheduling divisible loads in tree net-works, called DLS-T, which provides incentives to processors for reporting their true processing capacity and executing their assigned load at full processingmore » capacity. We prove that the DLS-T mechanism computes the optimal allocation in an ex post Nash equilibrium. Finally, we simulate and study the mechanism under various network structures and processor parameters.« less
A sweep algorithm for massively parallel simulation of circuit-switched networks
NASA Technical Reports Server (NTRS)
Gaujal, Bruno; Greenberg, Albert G.; Nicol, David M.
1992-01-01
A new massively parallel algorithm is presented for simulating large asymmetric circuit-switched networks, controlled by a randomized-routing policy that includes trunk-reservation. A single instruction multiple data (SIMD) implementation is described, and corresponding experiments on a 16384 processor MasPar parallel computer are reported. A multiple instruction multiple data (MIMD) implementation is also described, and corresponding experiments on an Intel IPSC/860 parallel computer, using 16 processors, are reported. By exploiting parallelism, our algorithm increases the possible execution rate of such complex simulations by as much as an order of magnitude.
Real-time implementation of logo detection on open source BeagleBoard
NASA Astrophysics Data System (ADS)
George, M.; Kehtarnavaz, N.; Estevez, L.
2011-03-01
This paper presents the real-time implementation of our previously developed logo detection and tracking algorithm on the open source BeagleBoard mobile platform. This platform has an OMAP processor that incorporates an ARM Cortex processor. The algorithm combines Scale Invariant Feature Transform (SIFT) with k-means clustering, online color calibration and moment invariants to robustly detect and track logos in video. Various optimization steps that are carried out to allow the real-time execution of the algorithm on BeagleBoard are discussed. The results obtained are compared to the PC real-time implementation results.
Automatic calibration system for analog instruments based on DSP and CCD sensor
NASA Astrophysics Data System (ADS)
Lan, Jinhui; Wei, Xiangqin; Bai, Zhenlong
2008-12-01
Currently, the calibration work of analog measurement instruments is mainly completed by manual and there are many problems waiting for being solved. In this paper, an automatic calibration system (ACS) based on Digital Signal Processor (DSP) and Charge Coupled Device (CCD) sensor is developed and a real-time calibration algorithm is presented. In the ACS, TI DM643 DSP processes the data received by CCD sensor and the outcome is displayed on Liquid Crystal Display (LCD) screen. For the algorithm, pointer region is firstly extracted for improving calibration speed. And then a math model of the pointer is built to thin the pointer and determine the instrument's reading. Through numbers of experiments, the time of once reading is no more than 20 milliseconds while it needs several seconds if it is done manually. At the same time, the error of the instrument's reading satisfies the request of the instruments. It is proven that the automatic calibration system can effectively accomplish the calibration work of the analog measurement instruments.
Hardware design and implementation of fast DOA estimation method based on multicore DSP
NASA Astrophysics Data System (ADS)
Guo, Rui; Zhao, Yingxiao; Zhang, Yue; Lin, Qianqiang; Chen, Zengping
2016-10-01
In this paper, we present a high-speed real-time signal processing hardware platform based on multicore digital signal processor (DSP). The real-time signal processing platform shows several excellent characteristics including high performance computing, low power consumption, large-capacity data storage and high speed data transmission, which make it able to meet the constraint of real-time direction of arrival (DOA) estimation. To reduce the high computational complexity of DOA estimation algorithm, a novel real-valued MUSIC estimator is used. The algorithm is decomposed into several independent steps and the time consumption of each step is counted. Based on the statistics of the time consumption, we present a new parallel processing strategy to distribute the task of DOA estimation to different cores of the real-time signal processing hardware platform. Experimental results demonstrate that the high processing capability of the signal processing platform meets the constraint of real-time direction of arrival (DOA) estimation.
Implementation of Adaptive Digital Controllers on Programmable Logic Devices
NASA Technical Reports Server (NTRS)
Gwaltney, David A.; King, Kenneth D.; Smith, Keary J.; Ormsby, John (Technical Monitor)
2002-01-01
Much has been made of the capabilities of FPGA's (Field Programmable Gate Arrays) in the hardware implementation of fast digital signal processing (DSP) functions. Such capability also makes and FPGA a suitable platform for the digital implementation of closed loop controllers. There are myriad advantages to utilizing an FPGA for discrete-time control functions which include the capability for reconfiguration when SRAM- based FPGA's are employed, fast parallel implementation of multiple control loops and implementations that can meet space level radiation tolerance in a compact form-factor. Other researchers have presented the notion that a second order digital filter with proportional-integral-derivative (PID) control functionality can be implemented in an FPGA. At Marshall Space Flight Center, the Control Electronics Group has been studying adaptive discrete-time control of motor driven actuator systems using digital signal processor (DSF) devices. Our goal is to create a fully digital, flight ready controller design that utilizes an FPGA for implementation of signal conditioning for control feedback signals, generation of commands to the controlled system, and hardware insertion of adaptive control algorithm approaches. While small form factor, commercial DSP devices are now available with event capture, data conversion, pulse width modulated outputs and communication peripherals, these devices are not currently available in designs and packages which meet space level radiation requirements. Meeting our goals requires alternative compact implementation of such functionality to withstand the harsh environment encountered on spacecraft. Radiation tolerant FPGA's are a feasible option for reaching these goals.
Blaettler, M; Bruegger, A; Forster, I C; Lehareinger, Y
1988-03-01
The design of an analog interface to a digital audio signal processor (DASP)-video cassette recorder (VCR) system is described. The complete system represents a low-cost alternative to both FM instrumentation tape recorders and multi-channel chart recorders. The interface or DASP input-output unit described in this paper enables the recording and playback of up to 12 analog channels with a maximum of 12 bit resolution and a bandwidth of 2 kHz per channel. Internal control and timing in the recording component of the interface is performed using ROMs which can be reprogrammed to suit different analog-to-digital converter hardware. Improvement in the bandwidth specifications is possible by connecting channels in parallel. A parallel 16 bit data output port is provided for direct transfer of the digitized data to a computer.
NASA Astrophysics Data System (ADS)
Stampoulidis, L.; Kehayas, E.; Karppinen, M.; Tanskanen, A.; Heikkinen, V.; Westbergh, P.; Gustavsson, J.; Larsson, A.; Grüner-Nielsen, L.; Sotom, M.; Venet, N.; Ko, M.; Micusik, D.; Kissinger, D.; Ulusoy, A. C.; King, R.; Safaisini, R.
2017-11-01
Modern broadband communication networks rely on satellites to complement the terrestrial telecommunication infrastructure. Satellites accommodate global reach and enable world-wide direct broadcasting by facilitating wide access to the backbone network from remote sites or areas where the installation of ground segment infrastructure is not economically viable. At the same time the new broadband applications increase the bandwidth demands in every part of the network - and satellites are no exception. Modern telecom satellites incorporate On-Board Processors (OBP) having analogue-to-digital (ADC) and digital-to-analogue converters (DAC) at their inputs/outputs and making use of digital processing to handle hundreds of signals; as the amount of information exchanged increases, so do the physical size, mass and power consumption of the interconnects required to transfer massive amounts of data through bulk electric wires.
A Parallel Rendering Algorithm for MIMD Architectures
NASA Technical Reports Server (NTRS)
Crockett, Thomas W.; Orloff, Tobias
1991-01-01
Applications such as animation and scientific visualization demand high performance rendering of complex three dimensional scenes. To deliver the necessary rendering rates, highly parallel hardware architectures are required. The challenge is then to design algorithms and software which effectively use the hardware parallelism. A rendering algorithm targeted to distributed memory MIMD architectures is described. For maximum performance, the algorithm exploits both object-level and pixel-level parallelism. The behavior of the algorithm is examined both analytically and experimentally. Its performance for large numbers of processors is found to be limited primarily by communication overheads. An experimental implementation for the Intel iPSC/860 shows increasing performance from 1 to 128 processors across a wide range of scene complexities. It is shown that minimal modifications to the algorithm will adapt it for use on shared memory architectures as well.
Ansari, A H; Cherian, P J; Dereymaeker, A; Matic, V; Jansen, K; De Wispelaere, L; Dielman, C; Vervisch, J; Swarte, R M; Govaert, P; Naulaers, G; De Vos, M; Van Huffel, S
2016-09-01
After identifying the most seizure-relevant characteristics by a previously developed heuristic classifier, a data-driven post-processor using a novel set of features is applied to improve the performance. The main characteristics of the outputs of the heuristic algorithm are extracted by five sets of features including synchronization, evolution, retention, segment, and signal features. Then, a support vector machine and a decision making layer remove the falsely detected segments. Four datasets including 71 neonates (1023h, 3493 seizures) recorded in two different university hospitals, are used to train and test the algorithm without removing the dubious seizures. The heuristic method resulted in a false alarm rate of 3.81 per hour and good detection rate of 88% on the entire test databases. The post-processor, effectively reduces the false alarm rate by 34% while the good detection rate decreases by 2%. This post-processing technique improves the performance of the heuristic algorithm. The structure of this post-processor is generic, improves our understanding of the core visually determined EEG features of neonatal seizures and is applicable for other neonatal seizure detectors. The post-processor significantly decreases the false alarm rate at the expense of a small reduction of the good detection rate. Copyright © 2016 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.
Efficiency of parallel direct optimization
NASA Technical Reports Server (NTRS)
Janies, D. A.; Wheeler, W. C.
2001-01-01
Tremendous progress has been made at the level of sequential computation in phylogenetics. However, little attention has been paid to parallel computation. Parallel computing is particularly suited to phylogenetics because of the many ways large computational problems can be broken into parts that can be analyzed concurrently. In this paper, we investigate the scaling factors and efficiency of random addition and tree refinement strategies using the direct optimization software, POY, on a small (10 slave processors) and a large (256 slave processors) cluster of networked PCs running LINUX. These algorithms were tested on several data sets composed of DNA and morphology ranging from 40 to 500 taxa. Various algorithms in POY show fundamentally different properties within and between clusters. All algorithms are efficient on the small cluster for the 40-taxon data set. On the large cluster, multibuilding exhibits excellent parallel efficiency, whereas parallel building is inefficient. These results are independent of data set size. Branch swapping in parallel shows excellent speed-up for 16 slave processors on the large cluster. However, there is no appreciable speed-up for branch swapping with the further addition of slave processors (>16). This result is independent of data set size. Ratcheting in parallel is efficient with the addition of up to 32 processors in the large cluster. This result is independent of data set size. c2001 The Willi Hennig Society.
Design and Performance of the Astro-E/XRS Signal Processing System
NASA Technical Reports Server (NTRS)
Boyce, Kevin R.; Audley, M. D.; Baker, R. G.; Dumonthier, J. J.; Fujimoto, R.; Gendreau, K. C.; Ishisaki, Y.; Kelley, R. L.; Stahle, C. K.; Szymkowiak, A. E.
1999-01-01
We describe the signal processing system of the Astro-E XRS instrument. The Calorimeter Analog Processor (CAP) provides bias and power for the detectors and amplifies the detector signals by a factor of 20,000. The Calorimeter Digital Processor (CDP) performs the digital processing of the calorimeter signals, detecting X-ray pulses and analyzing them by optimal filtering. We describe the operation of pulse detection, Pulse height analysis. and risetime determination. We also discuss performance, including the three event grades (hi-res mid-res, and low-res). anticoincidence detection, counting rate dependence, and noise rejection.
NASA Technical Reports Server (NTRS)
Dongarra, Jack
1998-01-01
This exploratory study initiated our inquiry into algorithms and applications that would benefit by latency tolerant approach to algorithm building, including the construction of new algorithms where appropriate. In a multithreaded execution, when a processor reaches a point where remote memory access is necessary, the request is sent out on the network and a context--switch occurs to a new thread of computation. This effectively masks a long and unpredictable latency due to remote loads, thereby providing tolerance to remote access latency. We began to develop standards to profile various algorithm and application parameters, such as the degree of parallelism, granularity, precision, instruction set mix, interprocessor communication, latency etc. These tools will continue to develop and evolve as the Information Power Grid environment matures. To provide a richer context for this research, the project also focused on issues of fault-tolerance and computation migration of numerical algorithms and software. During the initial phase we tried to increase our understanding of the bottlenecks in single processor performance. Our work began by developing an approach for the automatic generation and optimization of numerical software for processors with deep memory hierarchies and pipelined functional units. Based on the results we achieved in this study we are planning to study other architectures of interest, including development of cost models, and developing code generators appropriate to these architectures.
SPROC: A multiple-processor DSP IC
NASA Technical Reports Server (NTRS)
Davis, R.
1991-01-01
A large, single-chip, multiple-processor, digital signal processing (DSP) integrated circuit (IC) fabricated in HP-Cmos34 is presented. The innovative architecture is best suited for analog and real-time systems characterized by both parallel signal data flows and concurrent logic processing. The IC is supported by a powerful development system that transforms graphical signal flow graphs into production-ready systems in minutes. Automatic compiler partitioning of tasks among four on-chip processors gives the IC the signal processing power of several conventional DSP chips.
DSP-Based dual-polarity mass spectrum pattern recognition for bio-detection
DOE Office of Scientific and Technical Information (OSTI.GOV)
Riot, V; Coffee, K; Gard, E
2006-04-21
The Bio-Aerosol Mass Spectrometry (BAMS) instrument analyzes single aerosol particles using a dual-polarity time-of-flight mass spectrometer recording simultaneously spectra of thirty to a hundred thousand points on each polarity. We describe here a real-time pattern recognition algorithm developed at Lawrence Livermore National Laboratory that has been implemented on a nine Digital Signal Processor (DSP) system from Signatec Incorporated. The algorithm first preprocesses independently the raw time-of-flight data through an adaptive baseline removal routine. The next step consists of a polarity dependent calibration to a mass-to-charge representation, reducing the data to about five hundred to a thousand channels per polarity. Themore » last step is the identification step using a pattern recognition algorithm based on a library of known particle signatures including threat agents and background particles. The identification step includes integrating the two polarities for a final identification determination using a score-based rule tree. This algorithm, operating on multiple channels per-polarity and multiple polarities, is well suited for parallel real-time processing. It has been implemented on the PMP8A from Signatec Incorporated, which is a computer based board that can interface directly to the two one-Giga-Sample digitizers (PDA1000 from Signatec Incorporated) used to record the two polarities of time-of-flight data. By using optimized data separation, pipelining, and parallel processing across the nine DSPs it is possible to achieve a processing speed of up to a thousand particles per seconds, while maintaining the recognition rate observed on a non-real time implementation. This embedded system has allowed the BAMS technology to improve its throughput and therefore its sensitivity while maintaining a large dynamic range (number of channels and two polarities) thus maintaining the systems specificity for bio-detection.« less
Noise Analysis of Spatial Phase coding in analog Acoustooptic Processors
NASA Technical Reports Server (NTRS)
Gary, Charles K.; Lum, Henry, Jr. (Technical Monitor)
1994-01-01
Optical beams can carry information in their amplitude and phase; however, optical analog numerical calculators such as an optical matrix processor use incoherent light to achieve linear operation. Thus, the phase information is lost and only the magnitude can be used. This limits such processors to the representation of positive real numbers. Many systems have been devised to overcome this deficit through the use of digital number representations, but they all operate at a greatly reduced efficiency in contrast to analog systems. The most widely accepted method to achieve sign coding in analog optical systems has been the use of an offset for the zero level. Unfortunately, this results in increased noise sensitivity for small numbers. In this paper, we examine the use of spatially coherent sign coding in acoustooptical processors, a method first developed for digital calculations by D. V. Tigin. This coding technique uses spatial coherence for the representation of signed numbers, while temporal incoherence allows for linear analog processing of the optical information. We show how spatial phase coding reduces noise sensitivity for signed analog calculations.
A low power biomedical signal processor ASIC based on hardware software codesign.
Nie, Z D; Wang, L; Chen, W G; Zhang, T; Zhang, Y T
2009-01-01
A low power biomedical digital signal processor ASIC based on hardware and software codesign methodology was presented in this paper. The codesign methodology was used to achieve higher system performance and design flexibility. The hardware implementation included a low power 32bit RISC CPU ARM7TDMI, a low power AHB-compatible bus, and a scalable digital co-processor that was optimized for low power Fast Fourier Transform (FFT) calculations. The co-processor could be scaled for 8-point, 16-point and 32-point FFTs, taking approximate 50, 100 and 150 clock circles, respectively. The complete design was intensively simulated using ARM DSM model and was emulated by ARM Versatile platform, before conducted to silicon. The multi-million-gate ASIC was fabricated using SMIC 0.18 microm mixed-signal CMOS 1P6M technology. The die area measures 5,000 microm x 2,350 microm. The power consumption was approximately 3.6 mW at 1.8 V power supply and 1 MHz clock rate. The power consumption for FFT calculations was less than 1.5 % comparing with the conventional embedded software-based solution.
Control Software for Advanced Video Guidance Sensor
NASA Technical Reports Server (NTRS)
Howard, Richard T.; Book, Michael L.; Bryan, Thomas C.
2006-01-01
Embedded software has been developed specifically for controlling an Advanced Video Guidance Sensor (AVGS). A Video Guidance Sensor is an optoelectronic system that provides guidance for automated docking of two vehicles. Such a system includes pulsed laser diodes and a video camera, the output of which is digitized. From the positions of digitized target images and known geometric relationships, the relative position and orientation of the vehicles are computed. The present software consists of two subprograms running in two processors that are parts of the AVGS. The subprogram in the first processor receives commands from an external source, checks the commands for correctness, performs commanded non-image-data-processing control functions, and sends image data processing parts of commands to the second processor. The subprogram in the second processor processes image data as commanded. Upon power-up, the software performs basic tests of functionality, then effects a transition to a standby mode. When a command is received, the software goes into one of several operational modes (e.g. acquisition or tracking). The software then returns, to the external source, the data appropriate to the command.
NASA Technical Reports Server (NTRS)
Samba, A. S.
1985-01-01
The problem of solving banded linear systems by direct (non-iterative) techniques on the Vector Processor System (VPS) 32 supercomputer is considered. Two efficient direct methods for solving banded linear systems on the VPS 32 are described. The vector cyclic reduction (VCR) algorithm is discussed in detail. The performance of the VCR on a three parameter model problem is also illustrated. The VCR is an adaptation of the conventional point cyclic reduction algorithm. The second direct method is the Customized Reduction of Augmented Triangles' (CRAT). CRAT has the dominant characteristics of an efficient VPS 32 algorithm. CRAT is tailored to the pipeline architecture of the VPS 32 and as a consequence the algorithm is implicitly vectorizable.
Northeast Parallel Architectures Center (NPAC)
1992-07-01
Computational Techniques: Mapping receptor units to processors , using NEWS communication to model interaction in the inhibitory field Goal of the Research...algorithms for classical problems to take advantage of multiple processors . Experiments in probability that have been too time consuming on serial...machine and achieved speedups of 4 to 5 times with 11 processors . It is believed that a slightly better speedup is achievable. In the case of stuck
Warburton, W.K.
1999-02-16
A high speed, digitally based, signal processing system is disclosed which accepts a digitized input signal and detects the presence of step-like pulses in the this data stream, extracts filtered estimates of their amplitudes, inspects for pulse pileup, and records input pulse rates and system lifetime. The system has two parallel processing channels: a slow channel, which filters the data stream with a long time constant trapezoidal filter for good energy resolution; and a fast channel which filters the data stream with a short time constant trapezoidal filter, detects pulses, inspects for pileups, and captures peak values from the slow channel for good events. The presence of a simple digital interface allows the system to be easily integrated with a digital processor to produce accurate spectra at high count rates and allow all spectrometer functions to be fully automated. Because the method is digitally based, it allows pulses to be binned based on time related values, as well as on their amplitudes, if desired. 31 figs.
IDSP- INTERACTIVE DIGITAL SIGNAL PROCESSOR
NASA Technical Reports Server (NTRS)
Mish, W. H.
1994-01-01
The Interactive Digital Signal Processor, IDSP, consists of a set of time series analysis "operators" based on the various algorithms commonly used for digital signal analysis work. The processing of a digital time series to extract information is usually achieved by the application of a number of fairly standard operations. However, it is often desirable to "experiment" with various operations and combinations of operations to explore their effect on the results. IDSP is designed to provide an interactive and easy-to-use system for this type of digital time series analysis. The IDSP operators can be applied in any sensible order (even recursively), and can be applied to single time series or to simultaneous time series. IDSP is being used extensively to process data obtained from scientific instruments onboard spacecraft. It is also an excellent teaching tool for demonstrating the application of time series operators to artificially-generated signals. IDSP currently includes over 43 standard operators. Processing operators provide for Fourier transformation operations, design and application of digital filters, and Eigenvalue analysis. Additional support operators provide for data editing, display of information, graphical output, and batch operation. User-developed operators can be easily interfaced with the system to provide for expansion and experimentation. Each operator application generates one or more output files from an input file. The processing of a file can involve many operators in a complex application. IDSP maintains historical information as an integral part of each file so that the user can display the operator history of the file at any time during an interactive analysis. IDSP is written in VAX FORTRAN 77 for interactive or batch execution and has been implemented on a DEC VAX-11/780 operating under VMS. The IDSP system generates graphics output for a variety of graphics systems. The program requires the use of Versaplot and Template plotting routines and IMSL Math/Library routines. These software packages are not included in IDSP. The virtual memory requirement for the program is approximately 2.36 MB. The IDSP system was developed in 1982 and was last updated in 1986. Versaplot is a registered trademark of Versatec Inc. Template is a registered trademark of Template Graphics Software Inc. IMSL Math/Library is a registered trademark of IMSL Inc.
Parallel algorithms for quantum chemistry. I. Integral transformations on a hypercube multiprocessor
DOE Office of Scientific and Technical Information (OSTI.GOV)
Whiteside, R.A.; Binkley, J.S.; Colvin, M.E.
1987-02-15
For many years it has been recognized that fundamental physical constraints such as the speed of light will limit the ultimate speed of single processor computers to less than about three billion floating point operations per second (3 GFLOPS). This limitation is becoming increasingly restrictive as commercially available machines are now within an order of magnitude of this asymptotic limit. A natural way to avoid this limit is to harness together many processors to work on a single computational problem. In principle, these parallel processing computers have speeds limited only by the number of processors one chooses to acquire. Themore » usefulness of potentially unlimited processing speed to a computationally intensive field such as quantum chemistry is obvious. If these methods are to be applied to significantly larger chemical systems, parallel schemes will have to be employed. For this reason we have developed distributed-memory algorithms for a number of standard quantum chemical methods. We are currently implementing these on a 32 processor Intel hypercube. In this paper we present our algorithm and benchmark results for one of the bottleneck steps in quantum chemical calculations: the four index integral transformation.« less
NASA Astrophysics Data System (ADS)
Cary, John R.; Abell, D.; Amundson, J.; Bruhwiler, D. L.; Busby, R.; Carlsson, J. A.; Dimitrov, D. A.; Kashdan, E.; Messmer, P.; Nieter, C.; Smithe, D. N.; Spentzouris, P.; Stoltz, P.; Trines, R. M.; Wang, H.; Werner, G. R.
2006-09-01
As the size and cost of particle accelerators escalate, high-performance computing plays an increasingly important role; optimization through accurate, detailed computermodeling increases performance and reduces costs. But consequently, computer simulations face enormous challenges. Early approximation methods, such as expansions in distance from the design orbit, were unable to supply detailed accurate results, such as in the computation of wake fields in complex cavities. Since the advent of message-passing supercomputers with thousands of processors, earlier approximations are no longer necessary, and it is now possible to compute wake fields, the effects of dampers, and self-consistent dynamics in cavities accurately. In this environment, the focus has shifted towards the development and implementation of algorithms that scale to large numbers of processors. So-called charge-conserving algorithms evolve the electromagnetic fields without the need for any global solves (which are difficult to scale up to many processors). Using cut-cell (or embedded) boundaries, these algorithms can simulate the fields in complex accelerator cavities with curved walls. New implicit algorithms, which are stable for any time-step, conserve charge as well, allowing faster simulation of structures with details small compared to the characteristic wavelength. These algorithmic and computational advances have been implemented in the VORPAL7 Framework, a flexible, object-oriented, massively parallel computational application that allows run-time assembly of algorithms and objects, thus composing an application on the fly.
Multiphase complete exchange: A theoretical analysis
NASA Technical Reports Server (NTRS)
Bokhari, Shahid H.
1993-01-01
Complete Exchange requires each of N processors to send a unique message to each of the remaining N-1 processors. For a circuit switched hypercube with N = 2(sub d) processors, the Direct and Standard algorithms for Complete Exchange are optimal for very large and very small message sizes, respectively. For intermediate sizes, a hybrid Multiphase algorithm is better. This carries out Direct exchanges on a set of subcubes whose dimensions are a partition of the integer d. The best such algorithm for a given message size m could hitherto only be found by enumerating all partitions of d. The Multiphase algorithm is analyzed assuming a high performance communication network. It is proved that only algorithms corresponding to equipartitions of d (partitions in which the maximum and minimum elements differ by at most 1) can possibly be optimal. The run times of these algorithms plotted against m form a hull of optimality. It is proved that, although there is an exponential number of partitions, (1) the number of faces on this hull is Theta(square root of d), (2) the hull can be found in theta(square root of d) time, and (3) once it has been found, the optimal algorithm for any given m can be found in Theta(log d) time. These results provide a very fast technique for minimizing communication overhead in many important applications, such as matrix transpose, Fast Fourier transform, and ADI.
Hierarchical fractional-step approximations and parallel kinetic Monte Carlo algorithms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Arampatzis, Giorgos, E-mail: garab@math.uoc.gr; Katsoulakis, Markos A., E-mail: markos@math.umass.edu; Plechac, Petr, E-mail: plechac@math.udel.edu
2012-10-01
We present a mathematical framework for constructing and analyzing parallel algorithms for lattice kinetic Monte Carlo (KMC) simulations. The resulting algorithms have the capacity to simulate a wide range of spatio-temporal scales in spatially distributed, non-equilibrium physiochemical processes with complex chemistry and transport micro-mechanisms. Rather than focusing on constructing exactly the stochastic trajectories, our approach relies on approximating the evolution of observables, such as density, coverage, correlations and so on. More specifically, we develop a spatial domain decomposition of the Markov operator (generator) that describes the evolution of all observables according to the kinetic Monte Carlo algorithm. This domain decompositionmore » corresponds to a decomposition of the Markov generator into a hierarchy of operators and can be tailored to specific hierarchical parallel architectures such as multi-core processors or clusters of Graphical Processing Units (GPUs). Based on this operator decomposition, we formulate parallel Fractional step kinetic Monte Carlo algorithms by employing the Trotter Theorem and its randomized variants; these schemes, (a) are partially asynchronous on each fractional step time-window, and (b) are characterized by their communication schedule between processors. The proposed mathematical framework allows us to rigorously justify the numerical and statistical consistency of the proposed algorithms, showing the convergence of our approximating schemes to the original serial KMC. The approach also provides a systematic evaluation of different processor communicating schedules. We carry out a detailed benchmarking of the parallel KMC schemes using available exact solutions, for example, in Ising-type systems and we demonstrate the capabilities of the method to simulate complex spatially distributed reactions at very large scales on GPUs. Finally, we discuss work load balancing between processors and propose a re-balancing scheme based on probabilistic mass transport methods.« less
Spacewire on Earth orbiting scatterometers
NASA Technical Reports Server (NTRS)
Bachmann, Alex; Lang, Minh; Lux, James; Steffke, Richard
2002-01-01
The need for a high speed, reliable and easy to implement communication link has led to the development of a space flight oriented version of IEEE 1355 called SpaceWire. SpaceWire is based on high-speed (200 Mbps) serial point-to-point links using Low Voltage Differential Signaling (LVDS). SpaceWIre has provisions for routing messages between a large network of processors, using wormhole routing for low overhead and latency. {additionally, there are available space qualified hybrids, which provide the Link layer to the user's bus}. A test bed of multiple digital signal processor breadboards, demonstrating the ability to meet signal processing requirements for an orbiting scatterometer has been implemented using three Astrium MCM-DSPs, each breadboard consists of a Multi Chip Module (MCM) that combines a space qualified Digital Signal Processor and peripherals, including IEEE-1355 links. With the addition of appropriate physical layer interfaces and software on the DSP, the SpaceWire link is used to communicate between processors on the test bed, e.g. sending timing references, commands, status, and science data among the processors. Results are presented on development issues surrounding the use of SpaceWire in this environment, from physical layer implementation (cables, connectors, LVDS drivers) to diagnostic tools, driver firmware, and development methodology. The tools, methods, and hardware, software challenges and preliminary performance are investigated and discussed.
Progress report on PIXIE3D, a fully implicit 3D extended MHD solver
NASA Astrophysics Data System (ADS)
Chacon, Luis
2008-11-01
Recently, invited talk at DPP07 an optimal, massively parallel implicit algorithm for 3D resistive magnetohydrodynamics (PIXIE3D) was demonstrated. Excellent algorithmic and parallel results were obtained with up to 4096 processors and 138 million unknowns. While this is a remarkable result, further developments are still needed for PIXIE3D to become a 3D extended MHD production code in general geometries. In this poster, we present an update on the status of PIXIE3D on several fronts. On the physics side, we will describe our progress towards the full Braginskii model, including: electron Hall terms, anisotropic heat conduction, and gyroviscous corrections. Algorithmically, we will discuss progress towards a robust, optimal, nonlinear solver for arbitrary geometries, including preconditioning for the new physical effects described, the implementation of a coarse processor-grid solver (to maintain optimal algorithmic performance for an arbitrarily large number of processors in massively parallel computations), and of a multiblock capability to deal with complicated geometries. L. Chac'on, Phys. Plasmas 15, 056103 (2008);
NASA Astrophysics Data System (ADS)
Santer, Richard P.; Fell, Frank
2003-05-01
The first "ocean colour" sensor, Coastal Zone Color Scanner (CZCS), was launched in 1978. Oceanographers learnt a lot from CZCS but it remained a purely scientific sensor. In recent years, a new generation of satellite-borne earth observation (EO) instruments has been brought into space. These instruments combine high spectral and spatial resolution with revisiting rates of the order of one per day. More instruments with further increased spatial, spectral and temporal resolution will be available within the next years. In the meantime, evaluation procedures taking advantage of the capabilities of the new instruments were derived, allowing the retrieval of ecologically important parameters with higher accuracy than before. Space agencies are now able to collect and to process satellite data in real time and to disseminate them via the Internet. It is therefore meanwhile possible to envisage using EO operationally. In principle, a significant demand for EO data products on terrestrial or marine ecosystems exists both with public authorities (environmental protection, emergency management, natural resources management, national parks, regional planning, etc) and private companies (tourist industry, insurance companies, water suppliers, etc). However, for a number of reasons, many data products that can be derived from the new instruments and methods have not yet left the scientific community towards public or private end users. It is the intention of the proposed SISCAL (Satellite-based Information System on Coastal Areas and Lakes) project to contribute to the closure of the existing gap between space agencies and research institutions on one side and end users on the other side. To do so, we intend to create a data processor that automatically derives and subsequently delivers over the Internet, in Near-Real-Time (NRT), a number of data products tailored to individual end user needs. The data products will be generated using a Geographical Information System (GIS), combining satellite data, evaluation algorithms and value-adding ancillary digital information. This prevents the end user from investing funds into expensive equipment or to hire specialized personnel. The data processor shall be a generic tool, which may be applied to a large variety of operationally gathered satellite data. In the frame of SISCAL, the processor shall be applied to remotely sensed data of selected coastal areas and lakes in Central Europe and the Eastern Mediterranean, according to the needs of the end users within the SISCAL consortium. A number of measures are required to achieve the objective of the proposed project: (1) Identification and specification of the SISCAL end user needs for NRT water related data products accessible to EO techniques. (2) Selection of the most appropriate instruments, evaluation algorithms and ancillary data bases required to provide the identified data products. (3) Development of the actual Near-Real-Time data processor for the specified EO data products. (4) Development of the GIS processor adding ancillary digital information to the satellite images and providing the required geographical projections. (5) Development of a product retrieval and management system to handle ordering and distribution of data products between the SISCAL server and the end users, including payment and invoicing. (6) Evaluation of the derived data products in terms of accuracy and usefulness by comparison with available in-situ measurements and by making use of the local expertise of the end users. (7) Establishing an Internet server dedicated to internal communication between the consortium members as well as presenting the SISCAL project to a larger public. (8) Marketing activities, presentation of data processor to potential external customers, identification of their exact needs. The innovative aspect of the SISCAL project consists in the generation of NRT data products on water quality parameters from EO data. This article mainly deals with the identification of the end user requirements within the SISCAL consortium and the methods employed to realize them. Details on the technical implementation of the SISCAL processor are provided by Fell et al. (this issue).
Architectures for reasoning in parallel
NASA Technical Reports Server (NTRS)
Hall, Lawrence O.
1989-01-01
The research conducted has dealt with rule-based expert systems. The algorithms that may lead to effective parallelization of them were investigated. Both the forward and backward chained control paradigms were investigated in the course of this work. The best computer architecture for the developed and investigated algorithms has been researched. Two experimental vehicles were developed to facilitate this research. They are Backpac, a parallel backward chained rule-based reasoning system and Datapac, a parallel forward chained rule-based reasoning system. Both systems have been written in Multilisp, a version of Lisp which contains the parallel construct, future. Applying the future function to a function causes the function to become a task parallel to the spawning task. Additionally, Backpac and Datapac have been run on several disparate parallel processors. The machines are an Encore Multimax with 10 processors, the Concert Multiprocessor with 64 processors, and a 32 processor BBN GP1000. Both the Concert and the GP1000 are switch-based machines. The Multimax has all its processors hung off a common bus. All are shared memory machines, but have different schemes for sharing the memory and different locales for the shared memory. The main results of the investigations come from experiments on the 10 processor Encore and the Concert with partitions of 32 or less processors. Additionally, experiments have been run with a stripped down version of EMYCIN.
On-Board, Real-Time Preprocessing System for Optical Remote-Sensing Imagery
Qi, Baogui; Zhuang, Yin; Chen, He; Chen, Liang
2018-01-01
With the development of remote-sensing technology, optical remote-sensing imagery processing has played an important role in many application fields, such as geological exploration and natural disaster prevention. However, relative radiation correction and geometric correction are key steps in preprocessing because raw image data without preprocessing will cause poor performance during application. Traditionally, remote-sensing data are downlinked to the ground station, preprocessed, and distributed to users. This process generates long delays, which is a major bottleneck in real-time applications for remote-sensing data. Therefore, on-board, real-time image preprocessing is greatly desired. In this paper, a real-time processing architecture for on-board imagery preprocessing is proposed. First, a hierarchical optimization and mapping method is proposed to realize the preprocessing algorithm in a hardware structure, which can effectively reduce the computation burden of on-board processing. Second, a co-processing system using a field-programmable gate array (FPGA) and a digital signal processor (DSP; altogether, FPGA-DSP) based on optimization is designed to realize real-time preprocessing. The experimental results demonstrate the potential application of our system to an on-board processor, for which resources and power consumption are limited. PMID:29693585
On-Board, Real-Time Preprocessing System for Optical Remote-Sensing Imagery.
Qi, Baogui; Shi, Hao; Zhuang, Yin; Chen, He; Chen, Liang
2018-04-25
With the development of remote-sensing technology, optical remote-sensing imagery processing has played an important role in many application fields, such as geological exploration and natural disaster prevention. However, relative radiation correction and geometric correction are key steps in preprocessing because raw image data without preprocessing will cause poor performance during application. Traditionally, remote-sensing data are downlinked to the ground station, preprocessed, and distributed to users. This process generates long delays, which is a major bottleneck in real-time applications for remote-sensing data. Therefore, on-board, real-time image preprocessing is greatly desired. In this paper, a real-time processing architecture for on-board imagery preprocessing is proposed. First, a hierarchical optimization and mapping method is proposed to realize the preprocessing algorithm in a hardware structure, which can effectively reduce the computation burden of on-board processing. Second, a co-processing system using a field-programmable gate array (FPGA) and a digital signal processor (DSP; altogether, FPGA-DSP) based on optimization is designed to realize real-time preprocessing. The experimental results demonstrate the potential application of our system to an on-board processor, for which resources and power consumption are limited.
Real Time Phase Noise Meter Based on a Digital Signal Processor
NASA Technical Reports Server (NTRS)
Angrisani, Leopoldo; D'Arco, Mauro; Greenhall, Charles A.; Schiano Lo Morille, Rosario
2006-01-01
A digital signal-processing meter for phase noise measurement on sinusoidal signals is dealt with. It enlists a special hardware architecture, made up of a core digital signal processor connected to a data acquisition board, and takes advantage of a quadrature demodulation-based measurement scheme, already proposed by the authors. Thanks to an efficient measurement process and an optimized implementation of its fundamental stages, the proposed meter succeeds in exploiting all hardware resources in such an effective way as to gain high performance and real-time operation. For input frequencies up to some hundreds of kilohertz, the meter is capable both of updating phase noise power spectrum while seamlessly capturing the analyzed signal into its memory, and granting as good frequency resolution as few units of hertz.
Development of the SEASIS instrument for SEDSAT
NASA Technical Reports Server (NTRS)
Maier, Mark W.
1996-01-01
Two SEASIS experiment objectives are key: take images that allow three axis attitude determination and take multi-spectral images of the earth. During the tether mission it is also desirable to capture images for the recoiling tether from the endmass perspective (which has never been observed). SEASIS must store all its imagery taken during the tether mission until the earth downlink can be established. SEASIS determines attitude with a panoramic camera and performs earth observation with a telephoto lens camera. Camera video is digitized, compressed, and stored in solid state memory. These objectives are addressed through the following architectural choices: (1) A camera system using a Panoramic Annular Lens (PAL). This lens has a 360 deg. azimuthal field of view by a +45 degree vertical field measured from a plan normal to the lens boresight axis. It has been shown in Mr. Mark Steadham's UAH M.S. thesis that his camera can determine three axis attitude anytime the earth and one other recognizable celestial object (for example, the sun) is in the field of view. This will be essentially all the time during tether deployment. (2) A second camera system using telephoto lens and filter wheel. The camera is a black and white standard video camera. The filters are chosen to cover the visible spectral bands of remote sensing interest. (3) A processor and mass memory arrangement linked to the cameras. Video signals from the cameras are digitized, compressed in the processor, and stored in a large static RAM bank. The processor is a multi-chip module consisting of a T800 Transputer and three Zoran floating point Digital Signal Processors. This processor module was supplied under ARPA contract by the Space Computer Corporation to demonstrate its use in space.
Parallel computing of physical maps--a comparative study in SIMD and MIMD parallelism.
Bhandarkar, S M; Chirravuri, S; Arnold, J
1996-01-01
Ordering clones from a genomic library into physical maps of whole chromosomes presents a central computational problem in genetics. Chromosome reconstruction via clone ordering is usually isomorphic to the NP-complete Optimal Linear Arrangement problem. Parallel SIMD and MIMD algorithms for simulated annealing based on Markov chain distribution are proposed and applied to the problem of chromosome reconstruction via clone ordering. Perturbation methods and problem-specific annealing heuristics are proposed and described. The SIMD algorithms are implemented on a 2048 processor MasPar MP-2 system which is an SIMD 2-D toroidal mesh architecture whereas the MIMD algorithms are implemented on an 8 processor Intel iPSC/860 which is an MIMD hypercube architecture. A comparative analysis of the various SIMD and MIMD algorithms is presented in which the convergence, speedup, and scalability characteristics of the various algorithms are analyzed and discussed. On a fine-grained, massively parallel SIMD architecture with a low synchronization overhead such as the MasPar MP-2, a parallel simulated annealing algorithm based on multiple periodically interacting searches performs the best. For a coarse-grained MIMD architecture with high synchronization overhead such as the Intel iPSC/860, a parallel simulated annealing algorithm based on multiple independent searches yields the best results. In either case, distribution of clonal data across multiple processors is shown to exacerbate the tendency of the parallel simulated annealing algorithm to get trapped in a local optimum.
Exact diagonalization of quantum lattice models on coprocessors
NASA Astrophysics Data System (ADS)
Siro, T.; Harju, A.
2016-10-01
We implement the Lanczos algorithm on an Intel Xeon Phi coprocessor and compare its performance to a multi-core Intel Xeon CPU and an NVIDIA graphics processor. The Xeon and the Xeon Phi are parallelized with OpenMP and the graphics processor is programmed with CUDA. The performance is evaluated by measuring the execution time of a single step in the Lanczos algorithm. We study two quantum lattice models with different particle numbers, and conclude that for small systems, the multi-core CPU is the fastest platform, while for large systems, the graphics processor is the clear winner, reaching speedups of up to 7.6 compared to the CPU. The Xeon Phi outperforms the CPU with sufficiently large particle number, reaching a speedup of 2.5.
Towards developing robust algorithms for solving partial differential equations on MIMD machines
NASA Technical Reports Server (NTRS)
Saltz, Joel H.; Naik, Vijay K.
1988-01-01
Methods for efficient computation of numerical algorithms on a wide variety of MIMD machines are proposed. These techniques reorganize the data dependency patterns to improve the processor utilization. The model problem finds the time-accurate solution to a parabolic partial differential equation discretized in space and implicitly marched forward in time. The algorithms are extensions of Jacobi and SOR. The extensions consist of iterating over a window of several timesteps, allowing efficient overlap of computation with communication. The methods increase the degree to which work can be performed while data are communicated between processors. The effect of the window size and of domain partitioning on the system performance is examined both by implementing the algorithm on a simulated multiprocessor system.
Towards developing robust algorithms for solving partial differential equations on MIMD machines
NASA Technical Reports Server (NTRS)
Saltz, J. H.; Naik, V. K.
1985-01-01
Methods for efficient computation of numerical algorithms on a wide variety of MIMD machines are proposed. These techniques reorganize the data dependency patterns to improve the processor utilization. The model problem finds the time-accurate solution to a parabolic partial differential equation discretized in space and implicitly marched forward in time. The algorithms are extensions of Jacobi and SOR. The extensions consist of iterating over a window of several timesteps, allowing efficient overlap of computation with communication. The methods increase the degree to which work can be performed while data are communicated between processors. The effect of the window size and of domain partitioning on the system performance is examined both by implementing the algorithm on a simulated multiprocessor system.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Chao; Pouransari, Hadi; Rajamanickam, Sivasankaran
We present a parallel hierarchical solver for general sparse linear systems on distributed-memory machines. For large-scale problems, this fully algebraic algorithm is faster and more memory-efficient than sparse direct solvers because it exploits the low-rank structure of fill-in blocks. Depending on the accuracy of low-rank approximations, the hierarchical solver can be used either as a direct solver or as a preconditioner. The parallel algorithm is based on data decomposition and requires only local communication for updating boundary data on every processor. Moreover, the computation-to-communication ratio of the parallel algorithm is approximately the volume-to-surface-area ratio of the subdomain owned by everymore » processor. We also provide various numerical results to demonstrate the versatility and scalability of the parallel algorithm.« less
A synergistic method for vibration suppression of an elevator mechatronic system
NASA Astrophysics Data System (ADS)
Knezevic, Bojan Z.; Blanusa, Branko; Marcetic, Darko P.
2017-10-01
Modern elevators are complex mechatronic systems which have to satisfy high performance in precision, safety and ride comfort. Each elevator mechatronic system (EMS) contains a mechanical subsystem which is characterized by its resonant frequency. In order to achieve high performance of the whole system, the control part of the EMS inevitably excites resonant circuits causing the occurrence of vibration. This paper proposes a synergistic solution based on the jerk control and the upgrade of the speed controller with a band-stop filter to restore lost ride comfort and speed control caused by vibration. The band-stop filter eliminates the resonant component from the speed controller spectra and jerk control provides operating of the speed controller in a linear mode as well as increased ride comfort. The original method for band-stop filter tuning based on Goertzel algorithm and Kiefer search algorithm is proposed in this paper. In order to generate the speed reference trajectory which can be defined by different shapes and amplitudes of jerk, a unique generalized model is proposed. The proposed algorithm is integrated in the power drive control algorithm and implemented on the digital signal processor. Through experimental verifications on a scale down prototype of the EMS it has been verified that only synergistic effect of controlling jerk and filtrating the reference torque can completely eliminate vibrations.
Sentinel-2 Level 2A Prototype Processor: Architecture, Algorithms And First Results
NASA Astrophysics Data System (ADS)
Muller-Wilm, Uwe; Louis, Jerome; Richter, Rudolf; Gascon, Ferran; Niezette, Marc
2013-12-01
Sen2Core is a prototype processor for Sentinel-2 Level 2A product processing and formatting. The processor is developed for and with ESA and performs the tasks of Atmospheric Correction and Scene Classification of Level 1C input data. Level 2A outputs are: Bottom-Of- Atmosphere (BOA) corrected reflectance images, Aerosol Optical Thickness-, Water Vapour-, Scene Classification maps and Quality indicators, including cloud and snow probabilities. The Level 2A Product Formatting performed by the processor follows the specification of the Level 1C User Product.
Yang, L. H.; Brooks III, E. D.; Belak, J.
1992-01-01
A molecular dynamics algorithm for performing large-scale simulations using the Parallel C Preprocessor (PCP) programming paradigm on the BBN TC2000, a massively parallel computer, is discussed. The algorithm uses a linked-cell data structure to obtain the near neighbors of each atom as time evoles. Each processor is assigned to a geometric domain containing many subcells and the storage for that domain is private to the processor. Within this scheme, the interdomain (i.e., interprocessor) communication is minimized.
Options for Parallelizing a Planning and Scheduling Algorithm
NASA Technical Reports Server (NTRS)
Clement, Bradley J.; Estlin, Tara A.; Bornstein, Benjamin D.
2011-01-01
Space missions have a growing interest in putting multi-core processors onboard spacecraft. For many missions processing power significantly slows operations. We investigate how continual planning and scheduling algorithms can exploit multi-core processing and outline different potential design decisions for a parallelized planning architecture. This organization of choices and challenges helps us with an initial design for parallelizing the CASPER planning system for a mesh multi-core processor. This work extends that presented at another workshop with some preliminary results.
Optical linear algebra processors - Architectures and algorithms
NASA Technical Reports Server (NTRS)
Casasent, David
1986-01-01
Attention is given to the component design and optical configuration features of a generic optical linear algebra processor (OLAP) architecture, as well as the large number of OLAP architectures, number representations, algorithms and applications encountered in current literature. Number-representation issues associated with bipolar and complex-valued data representations, high-accuracy (including floating point) performance, and the base or radix to be employed, are discussed, together with case studies on a space-integrating frequency-multiplexed architecture and a hybrid space-integrating and time-integrating multichannel architecture.
A Discussion of Using a Reconfigurable Processor to Implement the Discrete Fourier Transform
NASA Technical Reports Server (NTRS)
White, Michael J.
2004-01-01
This paper presents the design and implementation of the Discrete Fourier Transform (DFT) algorithm on a reconfigurable processor system. While highly applicable to many engineering problems, the DFT is an extremely computationally intensive algorithm. Consequently, the eventual goal of this work is to enhance the execution of a floating-point precision DFT algorithm by off loading the algorithm from the computing system. This computing system, within the context of this research, is a typical high performance desktop computer with an may of field programmable gate arrays (FPGAs). FPGAs are hardware devices that are configured by software to execute an algorithm. If it is desired to change the algorithm, the software is changed to reflect the modification, then download to the FPGA, which is then itself modified. This paper will discuss methodology for developing the DFT algorithm to be implemented on the FPGA. We will discuss the algorithm, the FPGA code effort, and the results to date.
Transient Finite Element Computations on a Variable Transputer System
NASA Technical Reports Server (NTRS)
Smolinski, Patrick J.; Lapczyk, Ireneusz
1993-01-01
A parallel program to analyze transient finite element problems was written and implemented on a system of transputer processors. The program uses the explicit time integration algorithm which eliminates the need for equation solving, making it more suitable for parallel computations. An interprocessor communication scheme was developed for arbitrary two dimensional grid processor configurations. Several 3-D problems were analyzed on a system with a small number of processors.
Crosetto, D.B.
1996-12-31
The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor to a plurality of slave processors to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor`s status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer, a digital signal processor, a parallel transfer controller, and two three-port memory devices. A communication switch within each node connects it to a fast parallel hardware channel through which all high density data arrives or leaves the node. 6 figs.
On nonlinear finite element analysis in single-, multi- and parallel-processors
NASA Technical Reports Server (NTRS)
Utku, S.; Melosh, R.; Islam, M.; Salama, M.
1982-01-01
Numerical solution of nonlinear equilibrium problems of structures by means of Newton-Raphson type iterations is reviewed. Each step of the iteration is shown to correspond to the solution of a linear problem, therefore the feasibility of the finite element method for nonlinear analysis is established. Organization and flow of data for various types of digital computers, such as single-processor/single-level memory, single-processor/two-level-memory, vector-processor/two-level-memory, and parallel-processors, with and without sub-structuring (i.e. partitioning) are given. The effect of the relative costs of computation, memory and data transfer on substructuring is shown. The idea of assigning comparable size substructures to parallel processors is exploited. Under Cholesky type factorization schemes, the efficiency of parallel processing is shown to decrease due to the occasional shared data, just as that due to the shared facilities.
The ATLAS Level-1 Calorimeter Trigger: PreProcessor implementation and performance
NASA Astrophysics Data System (ADS)
Åsman, B.; Achenbach, R.; Allbrooke, B. M. M.; Anders, G.; Andrei, V.; Büscher, V.; Bansil, H. S.; Barnett, B. M.; Bauss, B.; Bendtz, K.; Bohm, C.; Bracinik, J.; Brawn, I. P.; Brock, R.; Buttinger, W.; Caputo, R.; Caughron, S.; Cerrito, L.; Charlton, D. G.; Childers, J. T.; Curtis, C. J.; Daniells, A. C.; Davis, A. O.; Davygora, Y.; Dorn, M.; Eckweiler, S.; Edmunds, D.; Edwards, J. P.; Eisenhandler, E.; Ellis, K.; Ermoline, Y.; Föhlisch, F.; Faulkner, P. J. W.; Fedorko, W.; Fleckner, J.; French, S. T.; Gee, C. N. P.; Gillman, A. R.; Goeringer, C.; Hülsing, T.; Hadley, D. R.; Hanke, P.; Hauser, R.; Heim, S.; Hellman, S.; Hickling, R. S.; Hidvégi, A.; Hillier, S. J.; Hofmann, J. I.; Hristova, I.; Ji, W.; Johansen, M.; Keller, M.; Khomich, A.; Kluge, E.-E.; Koll, J.; Laier, H.; Landon, M. P. J.; Lang, V. S.; Laurens, P.; Lepold, F.; Lilley, J. N.; Linnemann, J. T.; Müller, F.; Müller, T.; Mahboubi, K.; Martin, T. A.; Mass, A.; Meier, K.; Meyer, C.; Middleton, R. P.; Moa, T.; Moritz, S.; Morris, J. D.; Mudd, R. D.; Narayan, R.; zur Nedden, M.; Neusiedl, A.; Newman, P. R.; Nikiforov, A.; Ohm, C. C.; Perera, V. J. O.; Pfeiffer, U.; Plucinski, P.; Poddar, S.; Prieur, D. P. F.; Qian, W.; Rieck, P.; Rizvi, E.; Sankey, D. P. C.; Schäfer, U.; Scharf, V.; Schmitt, K.; Schröder, C.; Schultz-Coulon, H.-C.; Schumacher, C.; Schwienhorst, R.; Silverstein, S. B.; Simioni, E.; Snidero, G.; Staley, R. J.; Stamen, R.; Stock, P.; Stockton, M. C.; Tan, C. L. A.; Tapprogge, S.; Thomas, J. P.; Thompson, P. D.; Thomson, M.; True, P.; Watkins, P. M.; Watson, A. T.; Watson, M. F.; Weber, P.; Wessels, M.; Wiglesworth, C.; Williams, S. L.
2012-12-01
The PreProcessor system of the ATLAS Level-1 Calorimeter Trigger (L1Calo) receives about 7200 analogue signals from the electromagnetic and hadronic components of the calorimetric detector system. Lateral division results in cells which are pre-summed to so-called Trigger Towers of size 0.1 × 0.1 along azimuth (phi) and pseudorapidity (η). The received calorimeter signals represent deposits of transverse energy. The system consists of 124 individual PreProcessor modules that digitise the input signals for each LHC collision, and provide energy and timing information to the digital processors of the L1Calo system, which identify physics objects forming much of the basis for the full ATLAS first level trigger decision. This paper describes the architecture of the PreProcessor, its hardware realisation, functionality, and performance.
NASA Technical Reports Server (NTRS)
Lee, C. S. G.; Chen, C. L.
1989-01-01
Two efficient mapping algorithms for scheduling the robot inverse dynamics computation consisting of m computational modules with precedence relationship to be executed on a multiprocessor system consisting of p identical homogeneous processors with processor and communication costs to achieve minimum computation time are presented. An objective function is defined in terms of the sum of the processor finishing time and the interprocessor communication time. The minimax optimization is performed on the objective function to obtain the best mapping. This mapping problem can be formulated as a combination of the graph partitioning and the scheduling problems; both have been known to be NP-complete. Thus, to speed up the searching for a solution, two heuristic algorithms were proposed to obtain fast but suboptimal mapping solutions. The first algorithm utilizes the level and the communication intensity of the task modules to construct an ordered priority list of ready modules and the module assignment is performed by a weighted bipartite matching algorithm. For a near-optimal mapping solution, the problem can be solved by the heuristic algorithm with simulated annealing. These proposed optimization algorithms can solve various large-scale problems within a reasonable time. Computer simulations were performed to evaluate and verify the performance and the validity of the proposed mapping algorithms. Finally, experiments for computing the inverse dynamics of a six-jointed PUMA-like manipulator based on the Newton-Euler dynamic equations were implemented on an NCUBE/ten hypercube computer to verify the proposed mapping algorithms. Computer simulation and experimental results are compared and discussed.
Evaluation of MERIS products from Baltic Sea coastal waters rich in CDOM
NASA Astrophysics Data System (ADS)
Beltrán-Abaunza, J. M.; Kratzer, S.; Brockmann, C.
2013-11-01
In this study, retrievals of the medium resolution imaging spectrometer (MERIS) reflectances and water quality products using 4 different coastal processing algorithms freely available are assessed by comparison against sea-truthing data. The study is based on a pair-wise comparison using processor-dependent quality flags for the retrieval of valid common macro-pixels. This assessment is required in order to ensure the reliability of monitoring systems based on MERIS data, such as the Swedish coastal and lake monitoring system (http.vattenkvalitet.se). The results show that the pre-processing with the Improved Contrast between Ocean and Land (ICOL) processor, correcting for adjacency effects, improve the retrieval of spectral reflectance for all processors, Therefore, it is recommended that the ICOL processor should be applied when Baltic coastal waters are investigated. Chlorophyll was retrieved best using the FUB (Free University of Berlin) processing algorithm, although overestimations in the range 18-26.5%, dependent on the compared pairs, were obtained. At low chlorophyll concentrations (< 2.5 mg m-3), random errors dominated in the retrievals with the MEGS (MERIS ground segment processor) processor. The lowest bias and random errors were obtained with MEGS for suspended particulate matter, for which overestimations in te range of 8-16% were found. Only the FUB retrieved CDOM (Coloured Dissolved Organic Matter) correlate with in situ values. However, a large systematic underestimation appears in the estimates that nevertheless may be corrected for by using a~local correction factor. The MEGS has the potential to be used as an operational processing algorithm for the Himmerfjärden bay and adjacent areas, but it requires further improvement of the atmospheric correction for the blue bands and better definition at relatively low chlorophyll concentrations in presence of high CDOM attenuation.
Evaluation of MERIS products from Baltic Sea coastal waters rich in CDOM
NASA Astrophysics Data System (ADS)
Beltrán-Abaunza, J. M.; Kratzer, S.; Brockmann, C.
2014-05-01
In this study, retrievals of the medium resolution imaging spectrometer (MERIS) reflectances and water quality products using four different coastal processing algorithms freely available are assessed by comparison against sea-truthing data. The study is based on a pair-wise comparison using processor-dependent quality flags for the retrieval of valid common macro-pixels. This assessment is required in order to ensure the reliability of monitoring systems based on MERIS data, such as the Swedish coastal and lake monitoring system (http://vattenkvalitet.se). The results show that the pre-processing with the Improved Contrast between Ocean and Land (ICOL) processor, correcting for adjacency effects, improves the retrieval of spectral reflectance for all processors. Therefore, it is recommended that the ICOL processor should be applied when Baltic coastal waters are investigated. Chlorophyll was retrieved best using the FUB (Free University of Berlin) processing algorithm, although overestimations in the range 18-26.5%, dependent on the compared pairs, were obtained. At low chlorophyll concentrations (< 2.5 mg m-3), data dispersion dominated in the retrievals with the MEGS (MERIS ground segment processor) processor. The lowest bias and data dispersion were obtained with MEGS for suspended particulate matter, for which overestimations in the range of 8-16% were found. Only the FUB retrieved CDOM (coloured dissolved organic matter) correlate with in situ values. However, a large systematic underestimation appears in the estimates that nevertheless may be corrected for by using a local correction factor. The MEGS has the potential to be used as an operational processing algorithm for the Himmerfjärden bay and adjacent areas, but it requires further improvement of the atmospheric correction for the blue bands and better definition at relatively low chlorophyll concentrations in the presence of high CDOM attenuation.
NASA Astrophysics Data System (ADS)
Blume, H.; Alexandru, R.; Applegate, R.; Giordano, T.; Kamiya, K.; Kresina, R.
1986-06-01
In a digital diagnostic imaging department, the majority of operations for handling and processing of images can be grouped into a small set of basic operations, such as image data buffering and storage, image processing and analysis, image display, image data transmission and image data compression. These operations occur in almost all nodes of the diagnostic imaging communications network of the department. An image processor architecture was developed in which each of these functions has been mapped into hardware and software modules. The modular approach has advantages in terms of economics, service, expandability and upgradeability. The architectural design is based on the principles of hierarchical functionality, distributed and parallel processing and aims at real time response. Parallel processing and real time response is facilitated in part by a dual bus system: a VME control bus and a high speed image data bus, consisting of 8 independent parallel 16-bit busses, capable of handling combined up to 144 MBytes/sec. The presented image processor is versatile enough to meet the video rate processing needs of digital subtraction angiography, the large pixel matrix processing requirements of static projection radiography, or the broad range of manipulation and display needs of a multi-modality diagnostic work station. Several hardware modules are described in detail. For illustrating the capabilities of the image processor, processed 2000 x 2000 pixel computed radiographs are shown and estimated computation times for executing the processing opera-tions are presented.
High-resolution streaming video integrated with UGS systems
NASA Astrophysics Data System (ADS)
Rohrer, Matthew
2010-04-01
Imagery has proven to be a valuable complement to Unattended Ground Sensor (UGS) systems. It provides ultimate verification of the nature of detected targets. However, due to the power, bandwidth, and technological limitations inherent to UGS, sacrifices have been made to the imagery portion of such systems. The result is that these systems produce lower resolution images in small quantities. Currently, a high resolution, wireless imaging system is being developed to bring megapixel, streaming video to remote locations to operate in concert with UGS. This paper will provide an overview of how using Wifi radios, new image based Digital Signal Processors (DSP) running advanced target detection algorithms, and high resolution cameras gives the user an opportunity to take high-powered video imagers to areas where power conservation is a necessity.
NASA Astrophysics Data System (ADS)
Gaydecki, P.
2009-07-01
A system is described for the design, downloading and execution of arbitrary functions, intended for use with acoustic and low-frequency ultrasonic transducers in condition monitoring and materials testing applications. The instrumentation comprises a software design tool and a powerful real-time digital signal processor unit, operating at 580 million multiplication-accumulations per second (MMACs). The embedded firmware employs both an established look-up table approach and a new function interpolation technique to generate the real-time signals with very high precision and flexibility. Using total harmonic distortion (THD) analysis, the purity of the waveforms have been compared with those generated using traditional analogue function generators; this analysis has confirmed that the new instrument has a consistently superior signal-to-noise ratio.
Warren, Steven F; Gilkerson, Jill; Richards, Jeffrey A; Oller, D Kimbrough; Xu, Dongxin; Yapanel, Umit; Gray, Sharmistha
2010-05-01
The study compared the vocal production and language learning environments of 26 young children with autism spectrum disorder (ASD) to 78 typically developing children using measures derived from automated vocal analysis. A digital language processor and audio-processing algorithms measured the amount of adult words to children and the amount of vocalizations they produced during 12-h recording periods in their natural environments. The results indicated significant differences between typically developing children and children with ASD in the characteristics of conversations, the number of conversational turns, and in child vocalizations that correlated with parent measures of various child characteristics. Automated measurement of the language learning environment of young children with ASD reveals important differences from the environments experienced by typically developing children.
NASA Technical Reports Server (NTRS)
Kindall, S. M.
1980-01-01
The computer code for the trajectory processor (#TRAJ) of the high fidelity relative motion program is described. The #TRAJ processor is a 12-degrees-of-freedom trajectory integrator (6 degrees of freedom for each of two vehicles) which can be used to generate digital and graphical data describing the relative motion of the Space Shuttle Orbiter and a free-flying cylindrical payload. A listing of the code, coding standards and conventions, detailed flow charts, and discussions of the computational logic are included.
Ring-array processor distribution topology for optical interconnects
NASA Technical Reports Server (NTRS)
Li, Yao; Ha, Berlin; Wang, Ting; Wang, Sunyu; Katz, A.; Lu, X. J.; Kanterakis, E.
1992-01-01
The existing linear and rectangular processor distribution topologies for optical interconnects, although promising in many respects, cannot solve problems such as clock skews, the lack of supporting elements for efficient optical implementation, etc. The use of a ring-array processor distribution topology, however, can overcome these problems. Here, a study of the ring-array topology is conducted with an aim of implementing various fast clock rate, high-performance, compact optical networks for digital electronic multiprocessor computers. Practical design issues are addressed. Some proof-of-principle experimental results are included.
Software design and implementation of ship heave motion monitoring system based on MBD method
NASA Astrophysics Data System (ADS)
Yu, Yan; Li, Yuhan; Zhang, Chunwei; Kang, Won-Hee; Ou, Jinping
2015-03-01
Marine transportation plays a significant role in the modern transport sector due to its advantage of low cost, large capacity. It is being attached enormous importance to all over the world. Nowadays the related areas of product development have become an existing hot spot. DSP signal processors feature micro volume, low cost, high precision, fast processing speed, which has been widely used in all kinds of monitoring systems. But traditional DSP code development process is time-consuming, inefficiency, costly and difficult. MathWorks company proposed Model-based Design (MBD) to overcome these defects. By calling the target board modules in simulink library to compile and generate the corresponding code for the target processor. And then automatically call DSP integrated development environment CCS for algorithm validation on the target processor. This paper uses the MDB to design the algorithm for the ship heave motion monitoring system. It proves the effectiveness of the MBD run successfully on the processor.
A wideband software reconfigurable modem
NASA Astrophysics Data System (ADS)
Turner, J. H., Jr.; Vickers, H.
A wideband modem is described which provides signal processing capability for four Lx-band signals employing QPSK, MSK and PPM waveforms and employs a software reconfigurable architecture for maximum system flexibility and graceful degradation. The current processor uses a 2901 and two 8086 microprocessors per channel and performs acquisition, tracking, and data demodulation for JITDS, GPS, IFF and TACAN systems. The next generation processor will be implemented using a VHSIC chip set employing a programmable complex array vector processor module, a GP computer module, customized gate array modules, and a digital array correlator. This integrated processor has application to a wide number of diverse system waveforms, and will bring the benefits of VHSIC technology insertion into avionic antijam communications systems.
Implementation of 4-way Superscalar Hash MIPS Processor Using FPGA
NASA Astrophysics Data System (ADS)
Sahib Omran, Safaa; Fouad Jumma, Laith
2018-05-01
Due to the quick advancements in the personal communications systems and wireless communications, giving data security has turned into a more essential subject. This security idea turns into a more confounded subject when next-generation system requirements and constant calculation speed are considered in real-time. Hash functions are among the most essential cryptographic primitives and utilized as a part of the many fields of signature authentication and communication integrity. These functions are utilized to acquire a settled size unique fingerprint or hash value of an arbitrary length of message. In this paper, Secure Hash Algorithms (SHA) of types SHA-1, SHA-2 (SHA-224, SHA-256) and SHA-3 (BLAKE) are implemented on Field-Programmable Gate Array (FPGA) in a processor structure. The design is described and implemented using a hardware description language, namely VHSIC “Very High Speed Integrated Circuit” Hardware Description Language (VHDL). Since the logical operation of the hash types of (SHA-1, SHA-224, SHA-256 and SHA-3) are 32-bits, so a Superscalar Hash Microprocessor without Interlocked Pipelines (MIPS) processor are designed with only few instructions that were required in invoking the desired Hash algorithms, when the four types of hash algorithms executed sequentially using the designed processor, the total time required equal to approximately 342 us, with a throughput of 4.8 Mbps while the required to execute the same four hash algorithms using the designed four-way superscalar is reduced to 237 us with improved the throughput to 5.1 Mbps.
Efficient Parallel Algorithms on Restartable Fail-Stop Processors
1991-01-01
resource (memory), and ( 3 ) that processors, memory and their interconnection must be The model of parallel computation known as the Par- perfectly...setting), arid ure an(I restart errors. We describe these arguments if] [AAtPS 871 (in a deterministic setting). Fault-tolerance Section 3 . of...grannmarity at the processor level --- for recent work on where Al is the nmber of failures during this step’s gate granilarities see [All 90, Pip 85
An Intrinsically Digital Amplification Scheme for Hearing Aids
NASA Astrophysics Data System (ADS)
Blamey, Peter J.; Macfarlane, David S.; Steele, Brenton R.
2005-12-01
Results for linear and wide-dynamic range compression were compared with a new 64-channel digital amplification strategy in three separate studies. The new strategy addresses the requirements of the hearing aid user with efficient computations on an open-platform digital signal processor (DSP). The new amplification strategy is not modeled on prior analog strategies like compression and linear amplification, but uses statistical analysis of the signal to optimize the output dynamic range in each frequency band independently. Using the open-platform DSP processor also provided the opportunity for blind trial comparisons of the different processing schemes in BTE and ITE devices of a high commercial standard. The speech perception scores and questionnaire results show that it is possible to provide improved audibility for sound in many narrow frequency bands while simultaneously improving comfort, speech intelligibility in noise, and sound quality.
Geocoding of AIRSAR/TOPSAR SAR Data
NASA Technical Reports Server (NTRS)
Holecz, Francesco; Lou, Yun-Ling; vanZyl, Jakob
1996-01-01
It has been demonstrated and recognized that radar interferometry is a promising method for the determination of digital elevation information and terrain slope from Synthetic Aperture Radar (SAR) data. An important application of Interferometric SAR (InSAR) data in areas with topographic variations is that the derived elevation and slope can be directly used for the absolute radiometric calibration of the amplitude SAR data as well as for scattering mechanisms analysis. On the other hand polarimetric SAR data has long been recognized as permitting a more complete inference of natural surfaces than a single channel radar system. In fact, imaging polarimetry provides the measurement of the amplitude and relative phase of all transmit and receive polarizations. On board the NASA DC-8 aircraft, NASA/JPL operates the multifrequency (P, L and C bands) multipolarimetric radar AIRSAR. The TOPSAR, a special mode of the AIRSAR system, is able to collect single-pass interferometric C- and/or L-band VV polarized data. A possible configuration of the AIRSAR/TOPSAR system is to acquire single-pass interferometric data at C-band VV polarization and polarimetric radar data at the two other lower frequencies. The advantage of this system configuration is to get digital topography information at the same time the radar data is collected. The digital elevation information can therefore be used to correctly calibrate the SAR data. This step is directly included in the new AIRSAR Integrated Processor. This processor uses a modification of the full motion compensation algorithm described by Madsen et al. (1993). However, the Digital Elevation Model (DEM) with the additional products such as local incidence angle map, and the SAR data are in a geometry which is not convenient, since especially DEMs must be referred to a specific cartographic reference system. Furthermore, geocoding of SAR data is important for multisensor and/or multitemporal purposes. In this paper, a procedure to geocode the new AIRSAR/TOPSAR data is presented. As an example an AIRSAR/TOPSAR image acquired in 1994 is geocoded and evaluated in terms of geometric accuracy.
A parallel simulated annealing algorithm for standard cell placement on a hypercube computer
NASA Technical Reports Server (NTRS)
Jones, Mark Howard
1987-01-01
A parallel version of a simulated annealing algorithm is presented which is targeted to run on a hypercube computer. A strategy for mapping the cells in a two dimensional area of a chip onto processors in an n-dimensional hypercube is proposed such that both small and large distance moves can be applied. Two types of moves are allowed: cell exchanges and cell displacements. The computation of the cost function in parallel among all the processors in the hypercube is described along with a distributed data structure that needs to be stored in the hypercube to support parallel cost evaluation. A novel tree broadcasting strategy is used extensively in the algorithm for updating cell locations in the parallel environment. Studies on the performance of the algorithm on example industrial circuits show that it is faster and gives better final placement results than the uniprocessor simulated annealing algorithms. An improved uniprocessor algorithm is proposed which is based on the improved results obtained from parallelization of the simulated annealing algorithm.
Deep learning for medical image segmentation - using the IBM TrueNorth neurosynaptic system
NASA Astrophysics Data System (ADS)
Moran, Steven; Gaonkar, Bilwaj; Whitehead, William; Wolk, Aidan; Macyszyn, Luke; Iyer, Subramanian S.
2018-03-01
Deep convolutional neural networks have found success in semantic image segmentation tasks in computer vision and medical imaging. These algorithms are executed on conventional von Neumann processor architectures or GPUs. This is suboptimal. Neuromorphic processors that replicate the structure of the brain are better-suited to train and execute deep learning models for image segmentation by relying on massively-parallel processing. However, given that they closely emulate the human brain, on-chip hardware and digital memory limitations also constrain them. Adapting deep learning models to execute image segmentation tasks on such chips, requires specialized training and validation. In this work, we demonstrate for the first-time, spinal image segmentation performed using a deep learning network implemented on neuromorphic hardware of the IBM TrueNorth Neurosynaptic System and validate the performance of our network by comparing it to human-generated segmentations of spinal vertebrae and disks. To achieve this on neuromorphic hardware, the training model constrains the coefficients of individual neurons to {-1,0,1} using the Energy Efficient Deep Neuromorphic (EEDN)1 networks training algorithm. Given the 1 million neurons and 256 million synapses, the scale and size of the neural network implemented by the IBM TrueNorth allows us to execute the requisite mapping between segmented images and non-uniform intensity MR images >20 times faster than on a GPU-accelerated network and using <0.1 W. This speed and efficiency implies that a trained neuromorphic chip can be deployed in intra-operative environments where real-time medical image segmentation is necessary.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bylaska, Eric J.; Jacquelin, Mathias; De Jong, Wibe A.
2017-10-20
Ab-initio Molecular Dynamics (AIMD) methods are an important class of algorithms, as they enable scientists to understand the chemistry and dynamics of molecular and condensed phase systems while retaining a first-principles-based description of their interactions. Many-core architectures such as the Intel® Xeon Phi™ processor are an interesting and promising target for these algorithms, as they can provide the computational power that is needed to solve interesting problems in chemistry. In this paper, we describe the efforts of refactoring the existing AIMD plane-wave method of NWChem from an MPI-only implementation to a scalable, hybrid code that employs MPI and OpenMP tomore » exploit the capabilities of current and future many-core architectures. We describe the optimizations required to get close to optimal performance for the multiplication of the tall-and-skinny matrices that form the core of the computational algorithm. We present strong scaling results on the complete AIMD simulation for a test case that simulates 256 water molecules and that strong-scales well on a cluster of 1024 nodes of Intel Xeon Phi processors. We compare the performance obtained with a cluster of dual-socket Intel® Xeon® E5–2698v3 processors.« less
Assessment of directionality performances: comparison between Freedom and CP810 sound processors.
Razza, Sergio; Albanese, Greta; Ermoli, Lucilla; Zaccone, Monica; Cristofari, Eliana
2013-10-01
To compare speech recognition in noise for the Nucleus Freedom and CP810 sound processors using different directional settings among those available in the SmartSound portfolio. Single-subject, repeated measures study. Tertiary care referral center. Thirty-one monoaurally and binaurally implanted subjects (24 children and 7 adults) were enrolled. They were all experienced Nucleus Freedom sound processor users and achieved a 100% open set word recognition score in quiet listening conditions. Each patient was fitted with the Freedom and the CP810 processor. The program setting incorporated Adaptive Dynamic Range Optimization (ADRO) and adopted the directional algorithm BEAM (both devices) and ZOOM (only on CP810). Speech reception threshold (SRT) was assessed in a free-field layout, with disyllabic word list and interfering multilevel babble noise in the 3 different pre-processing configurations. On average, CP810 improved significantly patients' SRTs as compared to Freedom SP after 1 hour of use. Instead, no significant difference was observed in patients' SRT between the BEAM and the ZOOM algorithm fitted in the CP810 processor. The results suggest that hardware developments achieved in the design of CP810 allow an immediate and relevant directional advantage as compared to the previous-generation Freedom device.
Dynamically programmable cache
NASA Astrophysics Data System (ADS)
Nakkar, Mouna; Harding, John A.; Schwartz, David A.; Franzon, Paul D.; Conte, Thomas
1998-10-01
Reconfigurable machines have recently been used as co- processors to accelerate the execution of certain algorithms or program subroutines. The problems with the above approach include high reconfiguration time and limited partial reconfiguration. By far the most critical problems are: (1) the small on-chip memory which results in slower execution time, and (2) small FPGA areas that cannot implement large subroutines. Dynamically Programmable Cache (DPC) is a novel architecture for embedded processors which offers solutions to the above problems. To solve memory access problems, DPC processors merge reconfigurable arrays with the data cache at various cache levels to create a multi-level reconfigurable machines. As a result DPC machines have both higher data accessibility and FPGA memory bandwidth. To solve the limited FPGA resource problem, DPC processors implemented multi-context switching (Virtualization) concept. Virtualization allows implementation of large subroutines with fewer FPGA cells. Additionally, DPC processors can parallelize the execution of several operations resulting in faster execution time. In this paper, the speedup improvement for DPC machines are shown to be 5X faster than an Altera FLEX10K FPGA chip and 2X faster than a Sun Ultral SPARC station for two different algorithms (convolution and motion estimation).
FFT Computation with Systolic Arrays, A New Architecture
NASA Technical Reports Server (NTRS)
Boriakoff, Valentin
1994-01-01
The use of the Cooley-Tukey algorithm for computing the l-d FFT lends itself to a particular matrix factorization which suggests direct implementation by linearly-connected systolic arrays. Here we present a new systolic architecture that embodies this algorithm. This implementation requires a smaller number of processors and a smaller number of memory cells than other recent implementations, as well as having all the advantages of systolic arrays. For the implementation of the decimation-in-frequency case, word-serial data input allows continuous real-time operation without the need of a serial-to-parallel conversion device. No control or data stream switching is necessary. Computer simulation of this architecture was done in the context of a 1024 point DFT with a fixed point processor, and CMOS processor implementation has started.
CALL in the Zone of Proximal Development: Novelty Effects and Teacher Guidance
ERIC Educational Resources Information Center
Karlström, Petter; Lundin, Eva
2013-01-01
Digital tools are not always used in the manner their designers had in mind. Therefore, it is not enough to assume that learning through CALL tools occurs in intended ways, if at all. We have studied the use of an enhanced word processor for writing essays in Swedish as a second language. The word processor contained natural language processing…
Multi-gigabit optical interconnects for next-generation on-board digital equipment
NASA Astrophysics Data System (ADS)
Venet, Norbert; Favaro, Henri; Sotom, Michel; Maignan, Michel; Berthon, Jacques
2017-11-01
Parallel optical interconnects are experimentally assessed as a technology that may offer the high-throughput data communication capabilities required to the next-generation on-board digital processing units. An optical backplane interconnect was breadboarded, on the basis of a digital transparent processor that provides flexible connectivity and variable bandwidth in telecom missions with multi-beam antenna coverage. The unit selected for the demonstration required that more than tens of Gbit/s be supported by the backplane. The demonstration made use of commercial parallel optical link modules at 850 nm wavelength, with 12 channels running at up to 2.5 Gbit/s. A flexible optical fibre circuit was developed so as to route board-to-board connections. It was plugged to the optical transmitter and receiver modules through 12-fibre MPO connectors. BER below 10-14 and optical link budgets in excess of 12 dB were measured, which would enable to integrate broadcasting. Integration of the optical backplane interconnect was successfully demonstrated by validating the overall digital processor functionality.
Multi-gigabit optical interconnects for next-generation on-board digital equipment
NASA Astrophysics Data System (ADS)
Venet, Norbert; Favaro, Henri; Sotom, Michel; Maignan, Michel; Berthon, Jacques
2004-06-01
Parallel optical interconnects are experimentally assessed as a technology that may offer the high-throughput data communication capabilities required to the next-generation on-board digital processing units. An optical backplane interconnect was breadboarded, on the basis of a digital transparent processor that provides flexible connectivity and variable bandwidth in telecom missions with multi-beam antenna coverage. The unit selected for the demonstration required that more than tens of Gbit/s be supported by the backplane. The demonstration made use of commercial parallel optical link modules at 850 nm wavelength, with 12 channels running at up to 2.5 Gbit/s. A flexible optical fibre circuit was developed so as to route board-to-board connections. It was plugged to the optical transmitter and receiver modules through 12-fibre MPO connectors. BER below 10-14 and optical link budgets in excess of 12 dB were measured, which would enable to integrate broadcasting. Integration of the optical backplane interconnect was successfully demonstrated by validating the overall digital processor functionality.
Mitra, Avik; Ghosh, Arindam; Das, Ranabir; Patel, Apoorva; Kumar, Anil
2005-12-01
Quantum adiabatic algorithm is a method of solving computational problems by evolving the ground state of a slowly varying Hamiltonian. The technique uses evolution of the ground state of a slowly varying Hamiltonian to reach the required output state. In some cases, such as the adiabatic versions of Grover's search algorithm and Deutsch-Jozsa algorithm, applying the global adiabatic evolution yields a complexity similar to their classical algorithms. However, using the local adiabatic evolution, the algorithms given by J. Roland and N.J. Cerf for Grover's search [J. Roland, N.J. Cerf, Quantum search by local adiabatic evolution, Phys. Rev. A 65 (2002) 042308] and by Saurya Das, Randy Kobes, and Gabor Kunstatter for the Deutsch-Jozsa algorithm [S. Das, R. Kobes, G. Kunstatter, Adiabatic quantum computation and Deutsh's algorithm, Phys. Rev. A 65 (2002) 062301], yield a complexity of order N (where N=2(n) and n is the number of qubits). In this paper, we report the experimental implementation of these local adiabatic evolution algorithms on a 2-qubit quantum information processor, by Nuclear Magnetic Resonance.
Global synchronization algorithms for the Intel iPSC/860
NASA Technical Reports Server (NTRS)
Seidel, Steven R.; Davis, Mark A.
1992-01-01
In a distributed memory multicomputer that has no global clock, global processor synchronization can only be achieved through software. Global synchronization algorithms are used in tridiagonal systems solvers, CFD codes, sequence comparison algorithms, and sorting algorithms. They are also useful for event simulation, debugging, and for solving mutual exclusion problems. For the Intel iPSC/860 in particular, global synchronization can be used to ensure the most effective use of the communication network for operations such as the shift, where each processor in a one-dimensional array or ring concurrently sends a message to its right (or left) neighbor. Three global synchronization algorithms are considered for the iPSC/860: the gysnc() primitive provided by Intel, the PICL primitive sync0(), and a new recursive doubling synchronization (RDS) algorithm. The performance of these algorithms is compared to the performance predicted by communication models of both the long and forced message protocols. Measurements of the cost of shift operations preceded by global synchronization show that the RDS algorithm always synchronizes the nodes more precisely and costs only slightly more than the other two algorithms.
A parallel implementation of an off-lattice individual-based model of multicellular populations
NASA Astrophysics Data System (ADS)
Harvey, Daniel G.; Fletcher, Alexander G.; Osborne, James M.; Pitt-Francis, Joe
2015-07-01
As computational models of multicellular populations include ever more detailed descriptions of biophysical and biochemical processes, the computational cost of simulating such models limits their ability to generate novel scientific hypotheses and testable predictions. While developments in microchip technology continue to increase the power of individual processors, parallel computing offers an immediate increase in available processing power. To make full use of parallel computing technology, it is necessary to develop specialised algorithms. To this end, we present a parallel algorithm for a class of off-lattice individual-based models of multicellular populations. The algorithm divides the spatial domain between computing processes and comprises communication routines that ensure the model is correctly simulated on multiple processors. The parallel algorithm is shown to accurately reproduce the results of a deterministic simulation performed using a pre-existing serial implementation. We test the scaling of computation time, memory use and load balancing as more processes are used to simulate a cell population of fixed size. We find approximate linear scaling of both speed-up and memory consumption on up to 32 processor cores. Dynamic load balancing is shown to provide speed-up for non-regular spatial distributions of cells in the case of a growing population.
Data acquisition using the 168/E. [CERN ISR
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carroll, J.T.; Cittolin, S.; Demoulin, M.
1983-03-01
Event sizes and data rates at the CERN anti p p collider compose a formidable environment for a high level trigger. A system using three 168/E processors for experiment UA1 real-time event selection is described. With 168/E data memory expanded to 512K bytes, each processor holds a complete event allowing a FORTRAN trigger algorithm access to data from the entire detector. A smart CAMAC interface reads five Remus branches in parallel transferring one word to the target processor every 0.5 ..mu..s. The NORD host computer can simultaneously read an accepted event from another processor.
CMOS Image Sensor with a Built-in Lane Detector.
Hsiao, Pei-Yung; Cheng, Hsien-Chein; Huang, Shih-Shinh; Fu, Li-Chen
2009-01-01
This work develops a new current-mode mixed signal Complementary Metal-Oxide-Semiconductor (CMOS) imager, which can capture images and simultaneously produce vehicle lane maps. The adopted lane detection algorithm, which was modified to be compatible with hardware requirements, can achieve a high recognition rate of up to approximately 96% under various weather conditions. Instead of a Personal Computer (PC) based system or embedded platform system equipped with expensive high performance chip of Reduced Instruction Set Computer (RISC) or Digital Signal Processor (DSP), the proposed imager, without extra Analog to Digital Converter (ADC) circuits to transform signals, is a compact, lower cost key-component chip. It is also an innovative component device that can be integrated into intelligent automotive lane departure systems. The chip size is 2,191.4 × 2,389.8 μm, and the package uses 40 pin Dual-In-Package (DIP). The pixel cell size is 18.45 × 21.8 μm and the core size of photodiode is 12.45 × 9.6 μm; the resulting fill factor is 29.7%.
Integrated Digital Flight Control System for the Space Shuttle Orbiter
NASA Technical Reports Server (NTRS)
1973-01-01
The objectives of the integrated digital flight control system (DFCS) is to provide rotational and translational control of the space shuttle orbiter in all phases of flight: from launch ascent through orbit to entry and touchdown, and during powered horizontal flights. The program provides a versatile control system structure while maintaining uniform communications with other programs, sensors, and control effectors by using an executive routine/functional subroutine format. The program reads all external variables at a single point, copies them into its dedicated storage, and then calls the required subroutines in the proper sequence. As a result, the flight control program is largely independent of other programs in the computer complex and is equally insensitive to characteristics of the processor configuration. The integrated structure is described of the control system and the DFCS executive routine which embodies that structure. The input and output, including jet selection are included. Specific estimation and control algorithm are shown for the various mission phases: cruise (including horizontal powered flight), entry, on-orbit, and boost. Attitude maneuver routines that interface with the DFCS are included.
Parallel/distributed direct method for solving linear systems
NASA Technical Reports Server (NTRS)
Lin, Avi
1990-01-01
A new family of parallel schemes for directly solving linear systems is presented and analyzed. It is shown that these schemes exhibit a near optimal performance and enjoy several important features: (1) For large enough linear systems, the design of the appropriate paralleled algorithm is insensitive to the number of processors as its performance grows monotonically with them; (2) It is especially good for large matrices, with dimensions large relative to the number of processors in the system; (3) It can be used in both distributed parallel computing environments and tightly coupled parallel computing systems; and (4) This set of algorithms can be mapped onto any parallel architecture without any major programming difficulties or algorithmical changes.
On-board attitude determination for the Explorer Platform satellite
NASA Technical Reports Server (NTRS)
Jayaraman, C.; Class, B.
1992-01-01
This paper describes the attitude determination algorithm for the Explorer Platform satellite. The algorithm, which is baselined on the Landsat code, is a six-element linear quadratic state estimation processor, in the form of a Kalman filter augmented by an adaptive filter process. Improvements to the original Landsat algorithm were required to meet mission pointing requirements. These consisted of a more efficient sensor processing algorithm and the addition of an adaptive filter which acts as a check on the Kalman filter during satellite slew maneuvers. A 1750A processor will be flown on board the satellite for the first time as a coprocessor (COP) in addition to the NASA Standard Spacecraft Computer. The attitude determination algorithm, which will be resident in the COP's memory, will make full use of its improved processing capabilities to meet mission requirements. Additional benefits were gained by writing the attitude determination code in Ada.
A digital video tracking system
NASA Astrophysics Data System (ADS)
Giles, M. K.
1980-01-01
The Real-Time Videotheodolite (RTV) was developed in connection with the requirement to replace film as a recording medium to obtain the real-time location of an object in the field-of-view (FOV) of a long focal length theodolite. Design philosophy called for a system capable of discriminatory judgment in identifying the object to be tracked with 60 independent observations per second, capable of locating the center of mass of the object projection on the image plane within about 2% of the FOV in rapidly changing background/foreground situations, and able to generate a predicted observation angle for the next observation. A description is given of a number of subsystems of the RTV, taking into account the processor configuration, the video processor, the projection processor, the tracker processor, the control processor, and the optics interface and imaging subsystem.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moreland, Kenneth; Sewell, Christopher; Usher, William
Here, one of the most critical challenges for high-performance computing (HPC) scientific visualization is execution on massively threaded processors. Of the many fundamental changes we are seeing in HPC systems, one of the most profound is a reliance on new processor types optimized for execution bandwidth over latency hiding. Our current production scientific visualization software is not designed for these new types of architectures. To address this issue, the VTK-m framework serves as a container for algorithms, provides flexible data representation, and simplifies the design of visualization algorithms on new and future computer architecture.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moreland, Kenneth; Sewell, Christopher; Usher, William
Execution on massively threaded processors is one of the most critical challenges for high-performance computing (HPC) scientific visualization. Of the many fundamental changes we are seeing in HPC systems, one of the most profound is a reliance on new processor types optimized for execution bandwidth over latency hiding. Moreover, our current production scientific visualization software is not designed for these new types of architectures. In order to address this issue, the VTK-m framework serves as a container for algorithms, provides flexible data representation, and simplifies the design of visualization algorithms on new and future computer architecture.
Scan line graphics generation on the massively parallel processor
NASA Technical Reports Server (NTRS)
Dorband, John E.
1988-01-01
Described here is how researchers implemented a scan line graphics generation algorithm on the Massively Parallel Processor (MPP). Pixels are computed in parallel and their results are applied to the Z buffer in large groups. To perform pixel value calculations, facilitate load balancing across the processors and apply the results to the Z buffer efficiently in parallel requires special virtual routing (sort computation) techniques developed by the author especially for use on single-instruction multiple-data (SIMD) architectures.
Optical systolic array processor using residue arithmetic
NASA Technical Reports Server (NTRS)
Jackson, J.; Casasent, D.
1983-01-01
The use of residue arithmetic to increase the accuracy and reduce the dynamic range requirements of optical matrix-vector processors is evaluated. It is determined that matrix-vector operations and iterative algorithms can be performed totally in residue notation. A new parallel residue quantizer circuit is developed which significantly improves the performance of the systolic array feedback processor. Results are presented of a computer simulation of this system used to solve a set of three simultaneous equations.
2017-02-15
Maunz2 Quantum information processors promise fast algorithms for problems inaccessible to classical computers. But since qubits are noisy and error-prone...information processors have been demonstrated experimentally using superconducting circuits1–3, electrons in semiconductors4–6, trapped atoms and...qubit quantum information processor has been realized14, and single- qubit gates have demonstrated randomized benchmarking (RB) infidelities as low as 10
List-mode PET image reconstruction for motion correction using the Intel XEON PHI co-processor
NASA Astrophysics Data System (ADS)
Ryder, W. J.; Angelis, G. I.; Bashar, R.; Gillam, J. E.; Fulton, R.; Meikle, S.
2014-03-01
List-mode image reconstruction with motion correction is computationally expensive, as it requires projection of hundreds of millions of rays through a 3D array. To decrease reconstruction time it is possible to use symmetric multiprocessing computers or graphics processing units. The former can have high financial costs, while the latter can require refactoring of algorithms. The Xeon Phi is a new co-processor card with a Many Integrated Core architecture that can run 4 multiple-instruction, multiple data threads per core with each thread having a 512-bit single instruction, multiple data vector register. Thus, it is possible to run in the region of 220 threads simultaneously. The aim of this study was to investigate whether the Xeon Phi co-processor card is a viable alternative to an x86 Linux server for accelerating List-mode PET image reconstruction for motion correction. An existing list-mode image reconstruction algorithm with motion correction was ported to run on the Xeon Phi coprocessor with the multi-threading implemented using pthreads. There were no differences between images reconstructed using the Phi co-processor card and images reconstructed using the same algorithm run on a Linux server. However, it was found that the reconstruction runtimes were 3 times greater for the Phi than the server. A new version of the image reconstruction algorithm was developed in C++ using OpenMP for mutli-threading and the Phi runtimes decreased to 1.67 times that of the host Linux server. Data transfer from the host to co-processor card was found to be a rate-limiting step; this needs to be carefully considered in order to maximize runtime speeds. When considering the purchase price of a Linux workstation with Xeon Phi co-processor card and top of the range Linux server, the former is a cost-effective computation resource for list-mode image reconstruction. A multi-Phi workstation could be a viable alternative to cluster computers at a lower cost for medical imaging applications.
Low-Latency Embedded Vision Processor (LLEVS)
2016-03-01
26 3.2.3 Task 3 Projected Performance Analysis of FPGA- based Vision Processor ........... 31 3.2.3.1 Algorithms Latency Analysis ...Programmable Gate Array Custom Hardware for Real- Time Multiresolution Analysis . ............................................... 35...conduct data analysis for performance projections. The data acquired through measurements , simulation and estimation provide the requisite platform for
Implementation of Adaptive Digital Controllers on Programmable Logic Devices
NASA Technical Reports Server (NTRS)
Gwaltney, David A.; King, Kenneth D.; Smith, Keary J.; Montenegro, Justino (Technical Monitor)
2002-01-01
Much has been made of the capabilities of Field Programmable Gate Arrays (FPGA's) in the hardware implementation of fast digital signal processing functions. Such capability also makes an FPGA a suitable platform for the digital implementation of closed loop controllers. Other researchers have implemented a variety of closed-loop digital controllers on FPGA's. Some of these controllers include the widely used Proportional-Integral-Derivative (PID) controller, state space controllers, neural network and fuzzy logic based controllers. There are myriad advantages to utilizing an FPGA for discrete-time control functions which include the capability for reconfiguration when SRAM- based FPGA's are employed, fast parallel implementation of multiple control loops and implementations that can meet space level radiation tolerance requirements in a compact form-factor. Generally, a software implementation on a Digital Signal Processor (DSP) device or microcontroller is used to implement digital controllers. At Marshall Space Flight Center, the Control Electronics Group has been studying adaptive discrete-time control of motor driven actuator systems using DSP devices. While small form factor, commercial DSP devices are now available with event capture, data conversion, Pulse Width Modulated (PWM) outputs and communication peripherals, these devices are not currently available in designs and packages which meet space level radiation requirements. In general, very few DSP devices are produced that are designed to meet any level of radiation tolerance or hardness. An alternative is required for compact implementation of such functionality to withstand the harsh environment encountered on spacemap. The goal of this effort is to create a fully digital, flight ready controller design that utilizes an FPGA for implementation of signal conditioning for control feedback signals, generation of commands to the controlled system, and hardware insertion of adaptive-control algorithm approaches. Radiation tolerant FPGA's are a feasible option for reaching this goal.
NASA Astrophysics Data System (ADS)
Odinokov, S. B.; Petrov, A. V.
1995-10-01
Mathematical models of components of a vector-matrix optoelectronic multiplier are considered. Perturbing factors influencing a real optoelectronic system — noise and errors of radiation sources and detectors, nonlinearity of an analogue—digital converter, nonideal optical systems — are taken into account. Analytic expressions are obtained for relating the precision of such a multiplier to the probability of an error amounting to one bit, to the parameters describing the quality of the multiplier components, and to the quality of the optical system of the processor. Various methods of increasing the dynamic range of a multiplier are considered at the technical systems level.
Interdisciplinary education in optics and photonics based on microcontrollers
NASA Astrophysics Data System (ADS)
Dreßler, Paul; Wielage, Heinz-Hermann; Haiss, Ulrich; Vauderwange, Oliver; Curticapean, Dan
2014-07-01
Not only is the number of new devices constantly increasing, but so is their application complexity and power. Most of their applications are in optics, photonics, acoustic and mobile devices. Working speed and functionality is achieved in most of media devices by strategic use of digital signal processors and microcontrollers of the new generation. Considering all these premises of media development dynamics, the authors present how to integrate microcontrollers and digital signal processors in the curricula of media technology lectures by using adequate content. This also includes interdisciplinary content that consists of using the acquired knowledge in media software. These entries offer a deeper understanding of photonics, acoustics and media engineering.
Hamby, David M [Corvallis, OR; Farsoni, Abdollah T [Corvallis, OR; Cazalas, Edward [Corvallis, OR
2011-06-21
A technique and device provides absolute skin dosimetry in real time at multiple tissue depths simultaneously. The device uses a phoswich detector which has multiple scintillators embedded at different depths within a non-scintillating material. A digital pulse processor connected to the phoswich detector measures a differential distribution (dN/dH) of count rate N as function of pulse height H for signals from each of the multiple scintillators. A digital processor computes in real time from the differential count-rate distribution for each of multiple scintillators an estimate of an ionizing radiation dose delivered to each of multiple depths of skin tissue corresponding to the multiple scintillators embedded at multiple corresponding depths within the non-scintillating material.
Signal processor for processing ultrasonic receiver signals
Fasching, George E.
1980-01-01
A signal processor is provided which uses an analog integrating circuit in conjunction with a set of digital counters controlled by a precision clock for sampling timing to provide an improved presentation of an ultrasonic transmitter/receiver signal. The signal is sampled relative to the transmitter trigger signal timing at precise times, the selected number of samples are integrated and the integrated samples are transferred and held for recording on a strip chart recorder or converted to digital form for storage. By integrating multiple samples taken at precisely the same time with respect to the trigger for the ultrasonic transmitter, random noise, which is contained in the ultrasonic receiver signal, is reduced relative to the desired useful signal.
Power processor for a 30cm ion thruster
NASA Technical Reports Server (NTRS)
Biess, J. J.; Inouye, L. Y.
1974-01-01
A thermal vacuum power processor for the NASA Lewis 30cm Mercury Ion Engine was designed, fabricated and tested to determine compliance with electrical specifications. The power processor breadboard used the silicon controlled rectifier (SCR) series resonant inverter as the basic power stage to process all the power to an ion engine. The power processor includes a digital interface unit to process all input commands and internal telemetry signals so that operation is compatible with a central computer system. The breadboard was tested in a thermal vacuum environment. Integration tests were performed with the ion engine and demonstrate operational compatibility and reliable operation without any component failures. Electromagnetic interference data were also recorded on the design to provide information on the interaction with total spacecraft.
Embedded processor extensions for image processing
NASA Astrophysics Data System (ADS)
Thevenin, Mathieu; Paindavoine, Michel; Letellier, Laurent; Heyrman, Barthélémy
2008-04-01
The advent of camera phones marks a new phase in embedded camera sales. By late 2009, the total number of camera phones will exceed that of both conventional and digital cameras shipped since the invention of photography. Use in mobile phones of applications like visiophony, matrix code readers and biometrics requires a high degree of component flexibility that image processors (IPs) have not, to date, been able to provide. For all these reasons, programmable processor solutions have become essential. This paper presents several techniques geared to speeding up image processors. It demonstrates that a gain of twice is possible for the complete image acquisition chain and the enhancement pipeline downstream of the video sensor. Such results confirm the potential of these computing systems for supporting future applications.
Independent backup mode transfer and mechanism for digital control computers
NASA Technical Reports Server (NTRS)
Tulpule, Bhalchandra R. (Inventor); Oscarson, Edward M. (Inventor)
1992-01-01
An interrupt is provided to a signal processor having a non-maskable interrupt input, in response to the detection of a request for transfer to backup software. The signal processor provides a transfer signal to a transfer mechanism only after completion of the present machine cycle. Transfer to the backup software is initiated by the transfer mechanism only upon reception of the transfer signal.
A distributed fault-tolerant signal processor /FTSP/
NASA Astrophysics Data System (ADS)
Bonneau, R. J.; Evett, R. C.; Young, M. J.
1980-01-01
A digital fault-tolerant signal processor (FTSP), an example of a self-repairing programmable system is analyzed. The design configuration is discussed in terms of fault tolerance, system-level fault detection, isolation and common memory. Special attention is given to the FDIR (fault detection isolation and reconfiguration) logic, noting that the reconfiguration decisions are based on configuration, summary status, end-around tests, and north marker/synchro data. Several mechanisms of fault detection are described which initiate reconfiguration at different levels. It is concluded that the reliability of a signal processor can be significantly enhanced by the use of fault-tolerant techniques.
Accuracy requirements of optical linear algebra processors in adaptive optics imaging systems.
Downie, J D; Goodman, J W
1989-10-15
A ground-based adaptive optics imaging telescope system attempts to improve image quality by measuring and correcting for atmospherically induced wavefront aberrations. The necessary control computations during each cycle will take a finite amount of time, which adds to the residual error variance since the atmosphere continues to change during that time. Thus an optical processor may be well-suited for this task. This paper investigates this possibility by studying the accuracy requirements in a general optical processor that will make it competitive with, or superior to, a conventional digital computer for adaptive optics use.
NASA Astrophysics Data System (ADS)
Lashkin, S. V.; Kozelkov, A. S.; Yalozo, A. V.; Gerasimov, V. Yu.; Zelensky, D. K.
2017-12-01
This paper describes the details of the parallel implementation of the SIMPLE algorithm for numerical solution of the Navier-Stokes system of equations on arbitrary unstructured grids. The iteration schemes for the serial and parallel versions of the SIMPLE algorithm are implemented. In the description of the parallel implementation, special attention is paid to computational data exchange among processors under the condition of the grid model decomposition using fictitious cells. We discuss the specific features for the storage of distributed matrices and implementation of vector-matrix operations in parallel mode. It is shown that the proposed way of matrix storage reduces the number of interprocessor exchanges. A series of numerical experiments illustrates the effect of the multigrid SLAE solver tuning on the general efficiency of the algorithm; the tuning involves the types of the cycles used (V, W, and F), the number of iterations of a smoothing operator, and the number of cells for coarsening. Two ways (direct and indirect) of efficiency evaluation for parallelization of the numerical algorithm are demonstrated. The paper presents the results of solving some internal and external flow problems with the evaluation of parallelization efficiency by two algorithms. It is shown that the proposed parallel implementation enables efficient computations for the problems on a thousand processors. Based on the results obtained, some general recommendations are made for the optimal tuning of the multigrid solver, as well as for selecting the optimal number of cells per processor.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vydyanathan, Naga; Krishnamoorthy, Sriram; Sabin, Gerald M.
2009-08-01
Complex parallel applications can often be modeled as directed acyclic graphs of coarse-grained application-tasks with dependences. These applications exhibit both task- and data-parallelism, and combining these two (also called mixedparallelism), has been shown to be an effective model for their execution. In this paper, we present an algorithm to compute the appropriate mix of task- and data-parallelism required to minimize the parallel completion time (makespan) of these applications. In other words, our algorithm determines the set of tasks that should be run concurrently and the number of processors to be allocated to each task. The processor allocation and scheduling decisionsmore » are made in an integrated manner and are based on several factors such as the structure of the taskgraph, the runtime estimates and scalability characteristics of the tasks and the inter-task data communication volumes. A locality conscious scheduling strategy is used to improve inter-task data reuse. Evaluation through simulations and actual executions of task graphs derived from real applications as well as synthetic graphs shows that our algorithm consistently generates schedules with lower makespan as compared to CPR and CPA, two previously proposed scheduling algorithms. Our algorithm also produces schedules that have lower makespan than pure taskand data-parallel schedules. For task graphs with known optimal schedules or lower bounds on the makespan, our algorithm generates schedules that are closer to the optima than other scheduling approaches.« less
NASA Technical Reports Server (NTRS)
Eidson, T. M.; Erlebacher, G.
1994-01-01
While parallel computers offer significant computational performance, it is generally necessary to evaluate several programming strategies. Two programming strategies for a fairly common problem - a periodic tridiagonal solver - are developed and evaluated. Simple model calculations as well as timing results are presented to evaluate the various strategies. The particular tridiagonal solver evaluated is used in many computational fluid dynamic simulation codes. The feature that makes this algorithm unique is that these simulation codes usually require simultaneous solutions for multiple right-hand-sides (RHS) of the system of equations. Each RHS solutions is independent and thus can be computed in parallel. Thus a Gaussian elimination type algorithm can be used in a parallel computation and the more complicated approaches such as cyclic reduction are not required. The two strategies are a transpose strategy and a distributed solver strategy. For the transpose strategy, the data is moved so that a subset of all the RHS problems is solved on each of the several processors. This usually requires significant data movement between processor memories across a network. The second strategy attempts to have the algorithm allow the data across processor boundaries in a chained manner. This usually requires significantly less data movement. An approach to accomplish this second strategy in a near-perfect load-balanced manner is developed. In addition, an algorithm will be shown to directly transform a sequential Gaussian elimination type algorithm into the parallel chained, load-balanced algorithm.
Optical triple-in digital logic using nonlinear optical four-wave mixing
NASA Astrophysics Data System (ADS)
Widjaja, Joewono; Tomita, Yasuo
1995-08-01
A new programmable optical processor is proposed for implementing triple-in combinatorial digital logic that uses four-wave mixing. Binary-coded decimal-to-octal decoding is experimentally demonstrated by use of a photorefractive BaTiO 3 crystal. The result confirms the feasibility of the proposed system.
A real-time tracking system of infrared dim and small target based on FPGA and DSP
NASA Astrophysics Data System (ADS)
Rong, Sheng-hui; Zhou, Hui-xin; Qin, Han-lin; Wang, Bing-jian; Qian, Kun
2014-11-01
A core technology in the infrared warning system is the detection tracking of dim and small targets with complicated background. Consequently, running the detection algorithm on the hardware platform has highly practical value in the military field. In this paper, a real-time detection tracking system of infrared dim and small target which is used FPGA (Field Programmable Gate Array) and DSP (Digital Signal Processor) as the core was designed and the corresponding detection tracking algorithm and the signal flow is elaborated. At the first stage, the FPGA obtain the infrared image sequence from the sensor, then it suppresses background clutter by mathematical morphology method and enhances the target intensity by Laplacian of Gaussian operator. At the second stage, the DSP obtain both the original image and the filtered image form the FPGA via the video port. Then it segments the target from the filtered image by an adaptive threshold segmentation method and gets rid of false target by pipeline filter. Experimental results show that our system can achieve higher detection rate and lower false alarm rate.
Dolphin Sounds-Inspired Covert Underwater Acoustic Communication and Micro-Modem
Qiao, Gang; Liu, Songzuo; Bilal, Muhammad
2017-01-01
A novel portable underwater acoustic modem is proposed in this paper for covert communication between divers or underwater unmanned vehicles (UUVs) and divers at a short distance. For the first time, real dolphin calls are used in the modem to realize biologically inspired Covert Underwater Acoustic Communication (CUAC). A variety of dolphin whistles and clicks stored in an SD card inside the modem helps to realize different biomimetic CUAC algorithms based on the specified covert scenario. In this paper, the information is conveyed during the time interval between dolphin clicks. TMS320C6748 and TLV320AIC3106 are the core processors used in our unique modem for fast digital processing and interconnection with other terminals or sensors. Simulation results show that the bit error rate (BER) of the CUAC algorithm is less than 10−5 when the signal to noise ratio is over ‒5 dB. The modem was tested in an underwater pool, and a data rate of 27.1 bits per second at a distance of 10 m was achieved. PMID:29068363
Model predictive controller design for boost DC-DC converter using T-S fuzzy cost function
NASA Astrophysics Data System (ADS)
Seo, Sang-Wha; Kim, Yong; Choi, Han Ho
2017-11-01
This paper proposes a Takagi-Sugeno (T-S) fuzzy method to select cost function weights of finite control set model predictive DC-DC converter control algorithms. The proposed method updates the cost function weights at every sample time by using T-S type fuzzy rules derived from the common optimal control engineering knowledge that a state or input variable with an excessively large magnitude can be penalised by increasing the weight corresponding to the variable. The best control input is determined via the online optimisation of the T-S fuzzy cost function for all the possible control input sequences. This paper implements the proposed model predictive control algorithm in real time on a Texas Instruments TMS320F28335 floating-point Digital Signal Processor (DSP). Some experimental results are given to illuminate the practicality and effectiveness of the proposed control system under several operating conditions. The results verify that our method can yield not only good transient and steady-state responses (fast recovery time, small overshoot, zero steady-state error, etc.) but also insensitiveness to abrupt load or input voltage parameter variations.
Systems-on-chip approach for real-time simulation of wheel-rail contact laws
NASA Astrophysics Data System (ADS)
Mei, T. X.; Zhou, Y. J.
2013-04-01
This paper presents the development of a systems-on-chip approach to speed up the simulation of wheel-rail contact laws, which can be used to reduce the requirement for high-performance computers and enable simulation in real time for the use of hardware-in-loop for experimental studies of the latest vehicle dynamic and control technologies. The wheel-rail contact laws are implemented using a field programmable gate array (FPGA) device with a design that substantially outperforms modern general-purpose PC platforms or fixed architecture digital signal processor devices in terms of processing time, configuration flexibility and cost. In order to utilise the FPGA's parallel-processing capability, the operations in the contact laws algorithms are arranged in a parallel manner and multi-contact patches are tackled simultaneously in the design. The interface between the FPGA device and the host PC is achieved by using a high-throughput and low-latency Ethernet link. The development is based on FASTSIM algorithms, although the design can be adapted and expanded for even more computationally demanding tasks.
Real-time unmanned aircraft systems surveillance video mosaicking using GPU
NASA Astrophysics Data System (ADS)
Camargo, Aldo; Anderson, Kyle; Wang, Yi; Schultz, Richard R.; Fevig, Ronald A.
2010-04-01
Digital video mosaicking from Unmanned Aircraft Systems (UAS) is being used for many military and civilian applications, including surveillance, target recognition, border protection, forest fire monitoring, traffic control on highways, monitoring of transmission lines, among others. Additionally, NASA is using digital video mosaicking to explore the moon and planets such as Mars. In order to compute a "good" mosaic from video captured by a UAS, the algorithm must deal with motion blur, frame-to-frame jitter associated with an imperfectly stabilized platform, perspective changes as the camera tilts in flight, as well as a number of other factors. The most suitable algorithms use SIFT (Scale-Invariant Feature Transform) to detect the features consistent between video frames. Utilizing these features, the next step is to estimate the homography between two consecutives video frames, perform warping to properly register the image data, and finally blend the video frames resulting in a seamless video mosaick. All this processing takes a great deal of resources of resources from the CPU, so it is almost impossible to compute a real time video mosaic on a single processor. Modern graphics processing units (GPUs) offer computational performance that far exceeds current CPU technology, allowing for real-time operation. This paper presents the development of a GPU-accelerated digital video mosaicking implementation and compares it with CPU performance. Our tests are based on two sets of real video captured by a small UAS aircraft; one video comes from Infrared (IR) and Electro-Optical (EO) cameras. Our results show that we can obtain a speed-up of more than 50 times using GPU technology, so real-time operation at a video capture of 30 frames per second is feasible.
Parallel processor for real-time structural control
NASA Astrophysics Data System (ADS)
Tise, Bert L.
1993-07-01
A parallel processor that is optimized for real-time linear control has been developed. This modular system consists of A/D modules, D/A modules, and floating-point processor modules. The scalable processor uses up to 1,000 Motorola DSP96002 floating-point processors for a peak computational rate of 60 GFLOPS. Sampling rates up to 625 kHz are supported by this analog-in to analog-out controller. The high processing rate and parallel architecture make this processor suitable for computing state-space equations and other multiply/accumulate-intensive digital filters. Processor features include 14-bit conversion devices, low input-to-output latency, 240 Mbyte/s synchronous backplane bus, low-skew clock distribution circuit, VME connection to host computer, parallelizing code generator, and look- up-tables for actuator linearization. This processor was designed primarily for experiments in structural control. The A/D modules sample sensors mounted on the structure and the floating- point processor modules compute the outputs using the programmed control equations. The outputs are sent through the D/A module to the power amps used to drive the structure's actuators. The host computer is a Sun workstation. An OpenWindows-based control panel is provided to facilitate data transfer to and from the processor, as well as to control the operating mode of the processor. A diagnostic mode is provided to allow stimulation of the structure and acquisition of the structural response via sensor inputs.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hopwood, J.E.; Affeldt, B.
An IBM personal computer (PC), a Gerber coordinate digitizer, and a collection of other instruments make up a system known as the Coordinate Digitizer Interactive Processor (CDIP). The PC extracts coordinate data from the digitizer through a special interface, and then, after reformatting, transmits the data to a remote VAX computer, a floppy disk, and a display terminal. This system has improved the efficiency of producing printed circuit-board artwork and extended the useful life of the Gerber GCD-1 Digitizer. 1 ref., 12 figs.
A digital retina-like low-level vision processor.
Mertoguno, S; Bourbakis, N G
2003-01-01
This correspondence presents the basic design and the simulation of a low level multilayer vision processor that emulates to some degree the functional behavior of a human retina. This retina-like multilayer processor is the lower part of an autonomous self-organized vision system, called Kydon, that could be used on visually impaired people with a damaged visual cerebral cortex. The Kydon vision system, however, is not presented in this paper. The retina-like processor consists of four major layers, where each of them is an array processor based on hexagonal, autonomous processing elements that perform a certain set of low level vision tasks, such as smoothing and light adaptation, edge detection, segmentation, line recognition and region-graph generation. At each layer, the array processor is a 2D array of k/spl times/m hexagonal identical autonomous cells that simultaneously execute certain low level vision tasks. Thus, the hardware design and the simulation at the transistor level of the processing elements (PEs) of the retina-like processor and its simulated functionality with illustrative examples are provided in this paper.
Ahn, J M; Lee, J H; Choi, S W; Kim, W E; Omn, K S; Park, S K; Kim, W G; Roh, J R; Min, B G
1998-03-01
The moving actuator type total artificial heart (TAH) developed in the Seoul National University has numerous design improvements based upon the digital signal processor (DSP). These improvements include the implantability of all electronics, an automatic control algorithm, and extension of the battery run-time in connection with an amorphous silicon solar system (SS). The implantable electronics consist of the motor drive, main processor, intelligent Li ion battery management (LIBM) based upon the DSP, telemetry system, and transcutaneous energy transmission (TET) system. Major changes in the implantable electronics include decreasing the temperature rise by over 21 degrees C on the motor drive, volume reduction (40 x 55 x 33 mm, 7 cell assembly) of the battery pack using a Li ion (3.6 V/cell, 900 mA.h), and improvement of the battery run-time (over 40 min) while providing the cardiac output (CO) of 5 L/min at 100 mm Hg afterload when the external battery for testing is connected with the SS (2.5 W, 192.192, 1 kg) for the external battery recharge or the partial TAH drive. The phase locked loop (PLL) based telemetry system was implemented to improve stability and the error correction DSP algorithm programmed to achieve high accuracy. A field focused light emitting diode (LED) was used to obtain low light scattering along the propagation path, similar to the optical property of the laser and miniature sized, mounted on the pancake type TET coils. The TET operating resonance frequency was self tuned in a range of 360 to 410 kHz to provide enough power even at high afterloads. An automatic cardiac output regulation algorithm was developed based on interventricular pressure analysis and carried out in several animal experiments successfully. All electronics have been evaluated in vitro and in vivo and prepared for implantation of the TAH. Substantial progress has been made in designing a completely implantable TAH at the preclinical stage.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Madduri, Kamesh; Ediger, David; Jiang, Karl
2009-05-29
We present a new lock-free parallel algorithm for computing betweenness centrality of massive small-world networks. With minor changes to the data structures, our algorithm also achieves better spatial cache locality compared to previous approaches. Betweenness centrality is a key algorithm kernel in the HPCS SSCA#2 Graph Analysis benchmark, which has been extensively used to evaluate the performance of emerging high-performance computing architectures for graph-theoretic computations. We design optimized implementations of betweenness centrality and the SSCA#2 benchmark for two hardware multithreaded systems: a Cray XMT system with the ThreadStorm processor, and a single-socket Sun multicore server with the UltraSparc T2 processor.more » For a small-world network of 134 million vertices and 1.073 billion edges, the 16-processor XMT system and the 8-core Sun Fire T5120 server achieve TEPS scores (an algorithmic performance count for the SSCA#2 benchmark) of 160 million and 90 million respectively, which corresponds to more than a 2X performance improvement over the previous parallel implementations. To better characterize the performance of these multithreaded systems, we correlate the SSCA#2 performance results with data from the memory-intensive STREAM and RandomAccess benchmarks. Finally, we demonstrate the applicability of our implementation to analyze massive real-world datasets by computing approximate betweenness centrality for a large-scale IMDb movie-actor network.« less
Load Balancing Scientific Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pearce, Olga Tkachyshyn
2014-12-01
The largest supercomputers have millions of independent processors, and concurrency levels are rapidly increasing. For ideal efficiency, developers of the simulations that run on these machines must ensure that computational work is evenly balanced among processors. Assigning work evenly is challenging because many large modern parallel codes simulate behavior of physical systems that evolve over time, and their workloads change over time. Furthermore, the cost of imbalanced load increases with scale because most large-scale scientific simulations today use a Single Program Multiple Data (SPMD) parallel programming model, and an increasing number of processors will wait for the slowest one atmore » the synchronization points. To address load imbalance, many large-scale parallel applications use dynamic load balance algorithms to redistribute work evenly. The research objective of this dissertation is to develop methods to decide when and how to load balance the application, and to balance it effectively and affordably. We measure and evaluate the computational load of the application, and develop strategies to decide when and how to correct the imbalance. Depending on the simulation, a fast, local load balance algorithm may be suitable, or a more sophisticated and expensive algorithm may be required. We developed a model for comparison of load balance algorithms for a specific state of the simulation that enables the selection of a balancing algorithm that will minimize overall runtime.« less
Parallel optimization algorithms and their implementation in VLSI design
NASA Technical Reports Server (NTRS)
Lee, G.; Feeley, J. J.
1991-01-01
Two new parallel optimization algorithms based on the simplex method are described. They may be executed by a SIMD parallel processor architecture and be implemented in VLSI design. Several VLSI design implementations are introduced. An application example is reported to demonstrate that the algorithms are effective.
Design of a dataway processor for a parallel image signal processing system
NASA Astrophysics Data System (ADS)
Nomura, Mitsuru; Fujii, Tetsuro; Ono, Sadayasu
1995-04-01
Recently, demands for high-speed signal processing have been increasing especially in the field of image data compression, computer graphics, and medical imaging. To achieve sufficient power for real-time image processing, we have been developing parallel signal-processing systems. This paper describes a communication processor called 'dataway processor' designed for a new scalable parallel signal-processing system. The processor has six high-speed communication links (Dataways), a data-packet routing controller, a RISC CORE, and a DMA controller. Each communication link operates at 8-bit parallel in a full duplex mode at 50 MHz. Moreover, data routing, DMA, and CORE operations are processed in parallel. Therefore, sufficient throughput is available for high-speed digital video signals. The processor is designed in a top- down fashion using a CAD system called 'PARTHENON.' The hardware is fabricated using 0.5-micrometers CMOS technology, and its hardware is about 200 K gates.
Crosetto, Dario B.
1996-01-01
The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor (100) to a plurality of slave processors (200) to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor's status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer (104), a digital signal processor (114), a parallel transfer controller (106), and two three-port memory devices. A communication switch (108) within each node (100) connects it to a fast parallel hardware channel (70) through which all high density data arrives or leaves the node.
Han, Bing; Ding, Chibiao; Zhong, Lihua; Liu, Jiayin; Qiu, Xiaolan; Hu, Yuxin; Lei, Bin
2018-01-01
The Gaofen-3 (GF-3) data processor was developed as a workstation-based GF-3 synthetic aperture radar (SAR) data processing system. The processor consists of two vital subsystems of the GF-3 ground segment, which are referred to as data ingesting subsystem (DIS) and product generation subsystem (PGS). The primary purpose of DIS is to record and catalogue GF-3 raw data with a transferring format, and PGS is to produce slant range or geocoded imagery from the signal data. This paper presents a brief introduction of the GF-3 data processor, including descriptions of the system architecture, the processing algorithms and its output format. PMID:29534464
The science of computing - Parallel computation
NASA Technical Reports Server (NTRS)
Denning, P. J.
1985-01-01
Although parallel computation architectures have been known for computers since the 1920s, it was only in the 1970s that microelectronic components technologies advanced to the point where it became feasible to incorporate multiple processors in one machine. Concommitantly, the development of algorithms for parallel processing also lagged due to hardware limitations. The speed of computing with solid-state chips is limited by gate switching delays. The physical limit implies that a 1 Gflop operational speed is the maximum for sequential processors. A computer recently introduced features a 'hypercube' architecture with 128 processors connected in networks at 5, 6 or 7 points per grid, depending on the design choice. Its computing speed rivals that of supercomputers, but at a fraction of the cost. The added speed with less hardware is due to parallel processing, which utilizes algorithms representing different parts of an equation that can be broken into simpler statements and processed simultaneously. Present, highly developed computer languages like FORTRAN, PASCAL, COBOL, etc., rely on sequential instructions. Thus, increased emphasis will now be directed at parallel processing algorithms to exploit the new architectures.
The LOGO Processor; A Guide for System Programmers.
ERIC Educational Resources Information Center
Weiner, Walter B.; And Others
A detailed specification of the LOGO programing system is given. The level of description is intended to enable system programers to design LOGO processors of their own. The discussion of storage allocation and garbage collection algorithms is virtually complete. An annotated LOGO system listing for the PDP-10 computer system may be obtained on…
NASA Astrophysics Data System (ADS)
Coffey, Stephen; Connell, Joseph
2005-06-01
This paper presents a development platform for real-time image processing based on the ADSP-BF533 Blackfin processor and the MicroC/OS-II real-time operating system (RTOS). MicroC/OS-II is a completely portable, ROMable, pre-emptive, real-time kernel. The Blackfin Digital Signal Processors (DSPs), incorporating the Analog Devices/Intel Micro Signal Architecture (MSA), are a broad family of 16-bit fixed-point products with a dual Multiply Accumulate (MAC) core. In addition, they have a rich instruction set with variable instruction length and both DSP and MCU functionality thus making them ideal for media based applications. Using the MicroC/OS-II for task scheduling and management, the proposed system can capture and process raw RGB data from any standard 8-bit greyscale image sensor in soft real-time and then display the processed result using a simple PC graphical user interface (GUI). Additionally, the GUI allows configuration of the image capture rate and the system and core DSP clock rates thereby allowing connectivity to a selection of image sensors and memory devices. The GUI also allows selection from a set of image processing algorithms based in the embedded operating system.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Krityakierne, Tipaluck; Akhtar, Taimoor; Shoemaker, Christine A.
This paper presents a parallel surrogate-based global optimization method for computationally expensive objective functions that is more effective for larger numbers of processors. To reach this goal, we integrated concepts from multi-objective optimization and tabu search into, single objective, surrogate optimization. Our proposed derivative-free algorithm, called SOP, uses non-dominated sorting of points for which the expensive function has been previously evaluated. The two objectives are the expensive function value of the point and the minimum distance of the point to previously evaluated points. Based on the results of non-dominated sorting, P points from the sorted fronts are selected as centersmore » from which many candidate points are generated by random perturbations. Based on surrogate approximation, the best candidate point is subsequently selected for expensive evaluation for each of the P centers, with simultaneous computation on P processors. Centers that previously did not generate good solutions are tabu with a given tenure. We show almost sure convergence of this algorithm under some conditions. The performance of SOP is compared with two RBF based methods. The test results show that SOP is an efficient method that can reduce time required to find a good near optimal solution. In a number of cases the efficiency of SOP is so good that SOP with 8 processors found an accurate answer in less wall-clock time than the other algorithms did with 32 processors.« less
Chung, King; Nelson, Lance; Teske, Melissa
2012-09-01
The purpose of this study was to investigate whether a multichannel adaptive directional microphone and a modulation-based noise reduction algorithm could enhance cochlear implant performance in reverberant noise fields. A hearing aid was modified to output electrical signals (ePreprocessor) and a cochlear implant speech processor was modified to receive electrical signals (eProcessor). The ePreprocessor was programmed to flat frequency response and linear amplification. Cochlear implant listeners wore the ePreprocessor-eProcessor system in three reverberant noise fields: 1) one noise source with variable locations; 2) three noise sources with variable locations; and 3) eight evenly spaced noise sources from 0° to 360°. Listeners' speech recognition scores were tested when the ePreprocessor was programmed to omnidirectional microphone (OMNI), omnidirectional microphone plus noise reduction algorithm (OMNI + NR), and adaptive directional microphone plus noise reduction algorithm (ADM + NR). They were also tested with their own cochlear implant speech processor (CI_OMNI) in the three noise fields. Additionally, listeners rated overall sound quality preferences on recordings made in the noise fields. Results indicated that ADM+NR produced the highest speech recognition scores and the most preferable rating in all noise fields. Factors requiring attention in the hearing aid-cochlear implant integration process are discussed. Copyright © 2012 Elsevier B.V. All rights reserved.
NASA Technical Reports Server (NTRS)
Hartley, Tom T. (Editor)
1987-01-01
Recent advances in control-system design and simulation are discussed in reviews and reports. Among the topics considered are fast algorithms for generating near-optimal binary decision programs, trajectory control of robot manipulators with compensation of load effects via a six-axis force sensor, matrix integrators for real-time simulation, a high-level control language for an autonomous land vehicle, and a practical engineering design method for stable model-reference adaptive systems. Also addressed are the identification and control of flexible-limb robots with unknown loads, adaptive control and robust adaptive control for manipulators with feedforward compensation, adaptive pole-placement controllers with predictive action, variable-structure strategies for motion control, and digital signal-processor-based variable-structure controls.
NASA Technical Reports Server (NTRS)
Marthaler, J. G.; Heighway, J. E.
1979-01-01
An iceberg detection and identification system consisting of a moderate resolution Side Looking Airborne Radar (SLAR) interfaced with a Radar Image Processor (RIP) based on a ROLM 1664 computer with a 32K core memory updatable to 64K is described. The system can be operated in high- or low-resolution sampling modes. Specifically designed algorithms are applied to digitized signal returns to provide automatic target detection and location, geometrically correct video image display and data recording. The real aperture Motorola AN/APS-94D SLAR operates in the X-band and is tunable between 9.10 and 9.40 GHz; its output power is 45 kW peak with a pulse repetition rate of 750 pulses per hour. Schematic diagrams of the system are provided, together with preliminary test data.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dreyer, J
2007-09-18
During my internship at Lawrence Livermore National Laboratory I worked with microcalorimeter gamma-ray and fast-neutron detectors based on superconducting Transition Edge Sensors (TESs). These instruments are being developed for fundamental science and nuclear non-proliferation applications because of their extremely high energy resolution; however, this comes at the expense of a small pixel size and slow decay times. The small pixel sizes are being addressed by developing detector arrays while the low count rate is being addressed by developing Digital Signal Processors (DSPs) that allow higher throughput than traditional pulse processing algorithms. Traditionally, low-temperature microcalorimeter pulses have been processed off-line withmore » optimum filtering routines based on the measured spectral characteristics of the signal and the noise. These optimum filters rely on the spectral content of the signal being identical for all events, and therefore require capturing the entire pulse signal without pile-up. In contrast, the DSP algorithm being developed is based on differences in signal levels before and after a trigger event, and therefore does not require the waveform to fully decay, or even the signal level to be close to the base line. The readout system allows for real time data acquisition and analysis at count rates exceeding 100 Hz for pulses with several {approx}ms decay times with minimal loss of energy resolution. Originally developed for gamma-ray analysis with HPGe detectors we have modified the hardware and firmware of the system to accommodate the slower TES signals and optimized the parameters of the filtering algorithm to maximize either resolution or throughput. The following presents an overview of the digital signal processing hardware and discusses the results of characterization measurements made to determine the systems performance.« less
Segmentation of financial seals and its implementation on a DSP-based system
NASA Astrophysics Data System (ADS)
He, Jin; Liu, Tiegen; Guo, Jingjing; Zhang, Hao
2009-11-01
Automatic seal imprint identification is an important part of modern financial security. Accurate segmentation is the basis of correct identification. In this paper, a DSP (digital signal processor) based identification system was designed, and an adaptive algorithm was proposed to extract binary seal images from financial instruments. As the kernel of the identification system, a DSP chip of TMS320DM642 was used to implement image processing, controlling and coordinating works of each system module. The proposed algorithm consisted of three stages, including extraction of grayscale seal image, denoising and binarization. A grayscale seal image was extracted by color transform from a financial instrument image. Adaptive morphological operations were used to highlight details of the extracted grayscale seal image and smooth the background. After median filter for noise elimination, the filtered seal image was binarized by Otsu's method. The algorithm was developed based on the DSP development environment CCS and real-time operation system DSP/BIOS. To simplify the implementation of the proposed algorithm, the calibration of white balance and the coarse positioning of the seal imprint were implemented by TMS320DM642 controlling image acquisition. IMGLIB of TMS320DM642 was used for the efficiency improvement. The experiment result showed that financial seal imprints, even with intricate and dense strokes can be correctly segmented by the proposed algorithm. Adhesion and incompleteness distortions in the segmentation results were reduced, even when the original seal imprint had a poor quality.
Parallel Processing of Broad-Band PPM Signals
NASA Technical Reports Server (NTRS)
Gray, Andrew; Kang, Edward; Lay, Norman; Vilnrotter, Victor; Srinivasan, Meera; Lee, Clement
2010-01-01
A parallel-processing algorithm and a hardware architecture to implement the algorithm have been devised for timeslot synchronization in the reception of pulse-position-modulated (PPM) optical or radio signals. As in the cases of some prior algorithms and architectures for parallel, discrete-time, digital processing of signals other than PPM, an incoming broadband signal is divided into multiple parallel narrower-band signals by means of sub-sampling and filtering. The number of parallel streams is chosen so that the frequency content of the narrower-band signals is low enough to enable processing by relatively-low speed complementary metal oxide semiconductor (CMOS) electronic circuitry. The algorithm and architecture are intended to satisfy requirements for time-varying time-slot synchronization and post-detection filtering, with correction of timing errors independent of estimation of timing errors. They are also intended to afford flexibility for dynamic reconfiguration and upgrading. The architecture is implemented in a reconfigurable CMOS processor in the form of a field-programmable gate array. The algorithm and its hardware implementation incorporate three separate time-varying filter banks for three distinct functions: correction of sub-sample timing errors, post-detection filtering, and post-detection estimation of timing errors. The design of the filter bank for correction of timing errors, the method of estimating timing errors, and the design of a feedback-loop filter are governed by a host of parameters, the most critical one, with regard to processing very broadband signals with CMOS hardware, being the number of parallel streams (equivalently, the rate-reduction parameter).
Parallel Monte Carlo Search for Hough Transform
NASA Astrophysics Data System (ADS)
Lopes, Raul H. C.; Franqueira, Virginia N. L.; Reid, Ivan D.; Hobson, Peter R.
2017-10-01
We investigate the problem of line detection in digital image processing and in special how state of the art algorithms behave in the presence of noise and whether CPU efficiency can be improved by the combination of a Monte Carlo Tree Search, hierarchical space decomposition, and parallel computing. The starting point of the investigation is the method introduced in 1962 by Paul Hough for detecting lines in binary images. Extended in the 1970s to the detection of space forms, what came to be known as Hough Transform (HT) has been proposed, for example, in the context of track fitting in the LHC ATLAS and CMS projects. The Hough Transform transfers the problem of line detection, for example, into one of optimization of the peak in a vote counting process for cells which contain the possible points of candidate lines. The detection algorithm can be computationally expensive both in the demands made upon the processor and on memory. Additionally, it can have a reduced effectiveness in detection in the presence of noise. Our first contribution consists in an evaluation of the use of a variation of the Radon Transform as a form of improving theeffectiveness of line detection in the presence of noise. Then, parallel algorithms for variations of the Hough Transform and the Radon Transform for line detection are introduced. An algorithm for Parallel Monte Carlo Search applied to line detection is also introduced. Their algorithmic complexities are discussed. Finally, implementations on multi-GPU and multicore architectures are discussed.
Voltage scheduling for low power/energy
NASA Astrophysics Data System (ADS)
Manzak, Ali
2001-07-01
Power considerations have become an increasingly dominant factor in the design of both portable and desk-top systems. An effective way to reduce power consumption is to lower the supply voltage since voltage is quadratically related to power. This dissertation considers the problem of lowering the supply voltage at (i) the system level and at (ii) the behavioral level. At the system level, the voltage of the variable voltage processor is dynamically changed with the work load. Processors with limited sized buffers as well as those with very large buffers are considered. Given the task arrival times, deadline times, execution times, periods and switching activities, task scheduling algorithms that minimize energy or peak power are developed for the processors equipped with very large buffers. A relation between the operating voltages of the tasks for minimum energy/power is determined using the Lagrange multiplier method, and an iterative algorithm that utilizes this relation is developed. Experimental results show that the voltage assignment obtained by the proposed algorithm is very close (0.1% error) to that of the optimal energy assignment and the optimal peak power (1% error) assignment. Next, on-line and off-fine minimum energy task scheduling algorithms are developed for processors with limited sized buffers. These algorithms have polynomial time complexity and present optimal (off-line) and close-to-optimal (on-line) solutions. A procedure to calculate the minimum buffer size given information about the size of the task (maximum, minimum), execution time (best case, worst case) and deadlines is also presented. At the behavioral level, resources operating at multiple voltages are used to minimize power while maintaining the throughput. Such a scheme has the advantage of allowing modules on the critical paths to be assigned to the highest voltage levels (thus meeting the required timing constraints) while allowing modules on non-critical paths to be assigned to lower voltage levels (thus reducing the power consumption). A polynomial time resource and latency constrained scheduling algorithm is developed to distribute the available slack among the nodes such that power consumption is minimum. The algorithm is iterative and utilizes the slack based on the Lagrange multiplier method.
VTK-m: Accelerating the Visualization Toolkit for Massively Threaded Architectures
Moreland, Kenneth; Sewell, Christopher; Usher, William; ...
2016-05-09
Here, one of the most critical challenges for high-performance computing (HPC) scientific visualization is execution on massively threaded processors. Of the many fundamental changes we are seeing in HPC systems, one of the most profound is a reliance on new processor types optimized for execution bandwidth over latency hiding. Our current production scientific visualization software is not designed for these new types of architectures. To address this issue, the VTK-m framework serves as a container for algorithms, provides flexible data representation, and simplifies the design of visualization algorithms on new and future computer architecture.
VTK-m: Accelerating the Visualization Toolkit for Massively Threaded Architectures
Moreland, Kenneth; Sewell, Christopher; Usher, William; ...
2016-05-09
Execution on massively threaded processors is one of the most critical challenges for high-performance computing (HPC) scientific visualization. Of the many fundamental changes we are seeing in HPC systems, one of the most profound is a reliance on new processor types optimized for execution bandwidth over latency hiding. Moreover, our current production scientific visualization software is not designed for these new types of architectures. In order to address this issue, the VTK-m framework serves as a container for algorithms, provides flexible data representation, and simplifies the design of visualization algorithms on new and future computer architecture.
Content addressable memory project
NASA Technical Reports Server (NTRS)
Hall, J. Storrs; Levy, Saul; Smith, Donald E.; Miyake, Keith M.
1992-01-01
A parameterized version of the tree processor was designed and tested (by simulation). The leaf processor design is 90 percent complete. We expect to complete and test a combination of tree and leaf cell designs in the next period. Work is proceeding on algorithms for the computer aided manufacturing (CAM), and once the design is complete we will begin simulating algorithms for large problems. The following topics are covered: (1) the practical implementation of content addressable memory; (2) design of a LEAF cell for the Rutgers CAM architecture; (3) a circuit design tool user's manual; and (4) design and analysis of efficient hierarchical interconnection networks.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rodenbeck, Christopher T.; Young, Derek; Chou, Tina
A combined radar and telemetry system is described. The combined radar and telemetry system includes a processing unit that executes instructions, where the instructions define a radar waveform and a telemetry waveform. The processor outputs a digital baseband signal based upon the instructions, where the digital baseband signal is based upon the radar waveform and the telemetry waveform. A radar and telemetry circuit transmits, simultaneously, a radar signal and telemetry signal based upon the digital baseband signal.
Modem design for a MOBILESAT terminal
NASA Technical Reports Server (NTRS)
Rice, M.; Miller, M. J.; Cowley, W. G.; Rowe, D.
1990-01-01
The implementation is described of a programmable digital signal processor based system, designed for use as a test bed in the development of a digital modem, codec, and channel simulator. Code was written to configure the system as a 5600 bps or 6600 bps QPSK modem. The test bed is currently being used in an experiment to evaluate the performance of digital speech over shadowed channels in the Australian mobile satellite (MOBILESAT) project.
Lewis hybrid computing system, users manual
NASA Technical Reports Server (NTRS)
Bruton, W. M.; Cwynar, D. S.
1979-01-01
The Lewis Research Center's Hybrid Simulation Lab contains a collection of analog, digital, and hybrid (combined analog and digital) computing equipment suitable for the dynamic simulation and analysis of complex systems. This report is intended as a guide to users of these computing systems. The report describes the available equipment' and outlines procedures for its use. Particular is given to the operation of the PACER 100 digital processor. System software to accomplish the usual digital tasks such as compiling, editing, etc. and Lewis-developed special purpose software are described.
Digital pulse processing in Mössbauer spectroscopy
NASA Astrophysics Data System (ADS)
Veiga, A.; Grunfeld, C. M.
2014-04-01
In this work we present some advances towards full digitization of the detection subsystem of a Mössbauer transmission spectrometer. We show how, using adequate instrumentation, preamplifier output of a proportional counter can be digitized with no deterioration in spectrum quality, avoiding the need of a shaping amplifier. A pipelined architecture is proposed for a digital processor, which constitutes a versatile platform for the development of pulse processing techniques. Requirements for minimization of the analog processing are considered and experimental results are presented.
Real-time Enhancement, Registration, and Fusion for an Enhanced Vision System
NASA Technical Reports Server (NTRS)
Hines, Glenn D.; Rahman, Zia-ur; Jobson, Daniel J.; Woodell, Glenn A.
2006-01-01
Over the last few years NASA Langley Research Center (LaRC) has been developing an Enhanced Vision System (EVS) to aid pilots while flying in poor visibility conditions. The EVS captures imagery using two infrared video cameras. The cameras are placed in an enclosure that is mounted and flown forward-looking underneath the NASA LaRC ARIES 757 aircraft. The data streams from the cameras are processed in real-time and displayed on monitors on-board the aircraft. With proper processing the camera system can provide better-than-human-observed imagery particularly during poor visibility conditions. However, to obtain this goal requires several different stages of processing including enhancement, registration, and fusion, and specialized processing hardware for real-time performance. We are using a real-time implementation of the Retinex algorithm for image enhancement, affine transformations for registration, and weighted sums to perform fusion. All of the algorithms are executed on a single TI DM642 digital signal processor (DSP) clocked at 720 MHz. The image processing components were added to the EVS system, tested, and demonstrated during flight tests in August and September of 2005. In this paper we briefly discuss the EVS image processing hardware and algorithms. We then discuss implementation issues and show examples of the results obtained during flight tests.
NASA Astrophysics Data System (ADS)
Eilbert, Richard F.; Krug, Kristoph D.
1993-04-01
The Vivid Rapid Explosives Detection Systems is a true dual energy x-ray machine employing precision x-ray data acquisition in combination with unique algorithms and massive computation capability. Data from the system's 960 detectors is digitally stored and processed by powerful supermicro-computers organized as an expandable array of parallel processors. The algorithms operate on the dual energy attenuation image data to recognize and define objects in the milieu of the baggage contents. Each object is then systematically examined for a match to a specific effective atomic number, density, and mass threshold. Material properties are determined by comparing the relative attenuations of the 75 kVp and 150 kVp beams and electronically separating the object from its local background. Other heuristic algorithms search for specific configurations and provide additional information. The machine automatically detects explosive materials and identifies bomb components in luggage with high specificity and throughput, X-ray dose is comparable to that of current airport x-ray machines. The machine is also configured to find heroin, cocaine, and US currency by selecting appropriate settings on-site. Since January 1992, production units have been operationally deployed at U.S. and European airports for improved screening of checked baggage.
A Real-Time Optical 3D Tracker for Head-Mounted Display Systems
1990-03-01
paper. OPTOTRAK [Nor88] uses one camera with two dual-axis CCD infrared position sensors. Each position sen- sor has a dedicated processor board to...enhance the use- [Nor88] Northern Digital. Trade literature on Optotrak fulness of head-mounted display systems. - Northern Digital’s Three Dimensional
NASA Technical Reports Server (NTRS)
Gilliland, M. G.; Rougelot, R. S.; Schumaker, R. A.
1966-01-01
Video signal processor uses special-purpose integrated circuits with nonsaturating current mode switching to accept texture and color information from a digital computer in a visual spaceflight simulator and to combine these, for display on color CRT with analog information concerning fading.
NASA Technical Reports Server (NTRS)
Johnson, Dennis A. (Inventor)
1996-01-01
A laser doppler velocimeter uses frequency shifting of a laser beam to provide signal information for each velocity component. A composite electrical signal generated by a light detector is digitized and a processor produces a discrete Fourier transform based on the digitized electrical signal. The transform includes two peak frequencies corresponding to the two velocity components.
Fast words boundaries localization in text fields for low quality document images
NASA Astrophysics Data System (ADS)
Ilin, Dmitry; Novikov, Dmitriy; Polevoy, Dmitry; Nikolaev, Dmitry
2018-04-01
The paper examines the problem of word boundaries precise localization in document text zones. Document processing on a mobile device consists of document localization, perspective correction, localization of individual fields, finding words in separate zones, segmentation and recognition. While capturing an image with a mobile digital camera under uncontrolled capturing conditions, digital noise, perspective distortions or glares may occur. Further document processing gets complicated because of its specifics: layout elements, complex background, static text, document security elements, variety of text fonts. However, the problem of word boundaries localization has to be solved at runtime on mobile CPU with limited computing capabilities under specified restrictions. At the moment, there are several groups of methods optimized for different conditions. Methods for the scanned printed text are quick but limited only for images of high quality. Methods for text in the wild have an excessively high computational complexity, thus, are hardly suitable for running on mobile devices as part of the mobile document recognition system. The method presented in this paper solves a more specialized problem than the task of finding text on natural images. It uses local features, a sliding window and a lightweight neural network in order to achieve an optimal algorithm speed-precision ratio. The duration of the algorithm is 12 ms per field running on an ARM processor of a mobile device. The error rate for boundaries localization on a test sample of 8000 fields is 0.3
Machine-checked proofs of the design and implementation of a fault-tolerant circuit
NASA Technical Reports Server (NTRS)
Bevier, William R.; Young, William D.
1990-01-01
A formally verified implementation of the 'oral messages' algorithm of Pease, Shostak, and Lamport is described. An abstract implementation of the algorithm is verified to achieve interactive consistency in the presence of faults. This abstract characterization is then mapped down to a hardware level implementation which inherits the fault-tolerant characteristics of the abstract version. All steps in the proof were checked with the Boyer-Moore theorem prover. A significant results is the demonstration of a fault-tolerant device that is formally specified and whose implementation is proved correct with respect to this specification. A significant simplifying assumption is that the redundant processors behave synchronously. A mechanically checked proof that the oral messages algorithm is 'optimal' in the sense that no algorithm which achieves agreement via similar message passing can tolerate a larger proportion of faulty processor is also described.
NASA Astrophysics Data System (ADS)
González, Diego; Botella, Guillermo; García, Carlos; Prieto, Manuel; Tirado, Francisco
2013-12-01
This contribution focuses on the optimization of matching-based motion estimation algorithms widely used for video coding standards using an Altera custom instruction-based paradigm and a combination of synchronous dynamic random access memory (SDRAM) with on-chip memory in Nios II processors. A complete profile of the algorithms is achieved before the optimization, which locates code leaks, and afterward, creates a custom instruction set, which is then added to the specific design, enhancing the original system. As well, every possible memory combination between on-chip memory and SDRAM has been tested to achieve the best performance. The final throughput of the complete designs are shown. This manuscript outlines a low-cost system, mapped using very large scale integration technology, which accelerates software algorithms by converting them into custom hardware logic blocks and showing the best combination between on-chip memory and SDRAM for the Nios II processor.
Knowledge-based vision for space station object motion detection, recognition, and tracking
NASA Technical Reports Server (NTRS)
Symosek, P.; Panda, D.; Yalamanchili, S.; Wehner, W., III
1987-01-01
Computer vision, especially color image analysis and understanding, has much to offer in the area of the automation of Space Station tasks such as construction, satellite servicing, rendezvous and proximity operations, inspection, experiment monitoring, data management and training. Knowledge-based techniques improve the performance of vision algorithms for unstructured environments because of their ability to deal with imprecise a priori information or inaccurately estimated feature data and still produce useful results. Conventional techniques using statistical and purely model-based approaches lack flexibility in dealing with the variabilities anticipated in the unstructured viewing environment of space. Algorithms developed under NASA sponsorship for Space Station applications to demonstrate the value of a hypothesized architecture for a Video Image Processor (VIP) are presented. Approaches to the enhancement of the performance of these algorithms with knowledge-based techniques and the potential for deployment of highly-parallel multi-processor systems for these algorithms are discussed.
NASA Astrophysics Data System (ADS)
Meringer, Markus; Gretschany, Sergei; Lichtenberg, Gunter; Hilboll, Andreas; Richter, Andreas; Burrows, John P.
2015-11-01
SCIAMACHY (SCanning Imaging Absorption spectroMeter for Atmospheric ChartographY) aboard ESA's environmental satellite ENVISAT observed the Earth's atmosphere in limb, nadir, and solar/lunar occultation geometries covering the UV-Visible to NIR spectral range. Limb and nadir geometries were the main operation modes for the retrieval of scientific data. The new version 6 of ESA's level 2 processor now provides for the first time an operational algorithm to combine measurements of these two geometries in order to generate new products. As a first instance the retrieval of tropospheric NO2 has been implemented based on IUP-Bremen's reference algorithm. We will detail the single processing steps performed by the operational limb-nadir matching algorithm and report the results of comparisons with the scientific tropospheric NO2 products of IUP and the Tropospheric Emission Monitoring Internet Service (TEMIS).
Computation of Molecular Spectra on a Quantum Processor with an Error-Resilient Algorithm
Colless, J. I.; Ramasesh, V. V.; Dahlen, D.; ...
2018-02-12
Harnessing the full power of nascent quantum processors requires the efficient management of a limited number of quantum bits with finite coherent lifetimes. Hybrid algorithms, such as the variational quantum eigensolver (VQE), leverage classical resources to reduce the required number of quantum gates. Experimental demonstrations of VQE have resulted in calculation of Hamiltonian ground states, and a new theoretical approach based on a quantum subspace expansion (QSE) has outlined a procedure for determining excited states that are central to dynamical processes. Here, we use a superconducting-qubit-based processor to apply the QSE approach to the H 2 molecule, extracting both groundmore » and excited states without the need for auxiliary qubits or additional minimization. Further, we show that this extended protocol can mitigate the effects of incoherent errors, potentially enabling larger-scale quantum simulations without the need for complex error-correction techniques.« less
Computation of Molecular Spectra on a Quantum Processor with an Error-Resilient Algorithm
NASA Astrophysics Data System (ADS)
Colless, J. I.; Ramasesh, V. V.; Dahlen, D.; Blok, M. S.; Kimchi-Schwartz, M. E.; McClean, J. R.; Carter, J.; de Jong, W. A.; Siddiqi, I.
2018-02-01
Harnessing the full power of nascent quantum processors requires the efficient management of a limited number of quantum bits with finite coherent lifetimes. Hybrid algorithms, such as the variational quantum eigensolver (VQE), leverage classical resources to reduce the required number of quantum gates. Experimental demonstrations of VQE have resulted in calculation of Hamiltonian ground states, and a new theoretical approach based on a quantum subspace expansion (QSE) has outlined a procedure for determining excited states that are central to dynamical processes. We use a superconducting-qubit-based processor to apply the QSE approach to the H2 molecule, extracting both ground and excited states without the need for auxiliary qubits or additional minimization. Further, we show that this extended protocol can mitigate the effects of incoherent errors, potentially enabling larger-scale quantum simulations without the need for complex error-correction techniques.
The parallel algorithm for the 2D discrete wavelet transform
NASA Astrophysics Data System (ADS)
Barina, David; Najman, Pavel; Kleparnik, Petr; Kula, Michal; Zemcik, Pavel
2018-04-01
The discrete wavelet transform can be found at the heart of many image-processing algorithms. Until now, the transform on general-purpose processors (CPUs) was mostly computed using a separable lifting scheme. As the lifting scheme consists of a small number of operations, it is preferred for processing using single-core CPUs. However, considering a parallel processing using multi-core processors, this scheme is inappropriate due to a large number of steps. On such architectures, the number of steps corresponds to the number of points that represent the exchange of data. Consequently, these points often form a performance bottleneck. Our approach appropriately rearranges calculations inside the transform, and thereby reduces the number of steps. In other words, we propose a new scheme that is friendly to parallel environments. When evaluating on multi-core CPUs, we consistently overcome the original lifting scheme. The evaluation was performed on 61-core Intel Xeon Phi and 8-core Intel Xeon processors.